Re: validator.nu

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: validator.nu

Frank Ellermann-3

Henri Sivonen wrote:
 
> http://ln.hixie.ch/?start=1137799947&count=1

Whatever that is, apparently the test works with FF2,
and apparently it's about the same SGML comment issue
as in my res.htm and res.html HTML test cases.

> HTML5 parsing has no such thing as a valid DTD subset.

<sigh />  If it cannot parse valid XHTML 1 it's fine,
just don't offer the option, or give up with a clear
error message when you "see" a DTD subset or anything
else that won't fit into your model, valid or not.

> The error conditions follow the HTML5 parsing spec
> without ascribing SGML meaning to syntax errors.

By definition XHTML 1 doesn't care what "HTML5" might
be, it has its own specification and syntax.  Some of
it odd, but not really worse than "HTML5" or HTML -
from my POV decisively better than any HTML cum SGML.

> The impression that I get is that the TAG and the
> HTTP WG aren't part of "everyone and his dog".

A recent proposal in the HTTP WG apparently says that
iso-8859-1 text means "unknown charset", while the
default text charset is still iso-8859-1, go figure.

Whatever this means, lots of fun while testing HTTP
comes to mind, it has nothing to do with testing an
isolated HTML / HTML5 / XHTML document for validity.

> I might be persuaded to ignore Content-Type if you
> can get the TAG to repeal mime-respect and the IETF
> HTTP WG to endorse content sniffing

I'll try to convince the HTTP WG that any "sniffing"
is no job jor HTTP.  But this doesn't affect you or
other validators, what they should do is answer the
simple question:

 Is document X valid HTML / HTML5 / XHTML ?

For any given X, independent of how you get it, HTTP,
upload, FTP, pigeon carrier, gopher, form input, ...

>> As noted above, just ignore what HTTP servers say,
>> all you get are mad lies, resulting in hopelessly
>> confusing error messages about issues not under the
>> control of the tester.
 
> That's like people on www-validator complaining that
> their invalid ad serving boilerplate is not under
> their control.

NAK, for a given X you can't say that X is actually X',
because a validator cannot know what X' was supposed
to be before inserted ads (etc.) mutilated it into X.

OTOH what you got as X, however you got it, *is* X,
the valid or invalid input for validation.  What HTTP
servers claim is at best *optional* additional info
for the task to validate X.

If folks actually want to check X'' = X + HTTP header
or X''' = X + charset or doctype overrides offer this
as option.  As you already do it for X''' but not X''.

> Making the references to a misconfigured server is
> under your control.

Yeah, I could use form input or upload instead of a
HTTP URL, or maybe set up a decent gopher server and
let your validator tackle this.    

If that is your idea of usability we are wasting time,
as I can simply use validators doing what I want, i.e.
check X, neither X' nor X'', and typically not X'''.

 Frank

Reply | Threaded
Open this post in threaded view
|

Re: validator.nu

Henri Sivonen

Disclaimer: Still not a WG response.

On Feb 16, 2008, at 14:25, Frank Ellermann wrote:

> Henri Sivonen wrote:
>
>> HTML5 parsing has no such thing as a valid DTD subset.
>
> <sigh />  If it cannot parse valid XHTML 1 it's fine,
> just don't offer the option, or give up with a clear
> error message when you "see" a DTD subset or anything
> else that won't fit into your model, valid or not.

Like I said before, that was how Validator.nu used to work and a  
change to the old behavior was requested. I cannot comply with  
everyone's suggestions at the same time when mutually exclusive  
behaviors are suggested. I have chosen not to comply with yours on  
this point.

> But this doesn't affect you or
> other validators, what they should do is answer the
> simple question:
>
> Is document X valid HTML / HTML5 / XHTML ?
>
> For any given X, independent of how you get it, HTTP,
> upload, FTP, pigeon carrier, gopher, form input, ...

Validator.nu checks the combination of the protocol entity body and  
the Content-Type header. Pretending that Content-Type didn't matter  
wouldn't make sense when it does make a difference in terms of  
processing in a browser.

> OTOH what you got as X, however you got it, *is* X,
> the valid or invalid input for validation.  What HTTP
> servers claim is at best *optional* additional info
> for the task to validate X.

Content-Type is acted on by browsers when provided, so even if  
supplying it were optional, looking at it once supplied isn't.

> If folks actually want to check X'' = X + HTTP header
> or X''' = X + charset or doctype overrides offer this
> as option.  As you already do it for X''' but not X''.

I also provide the lax type option to override the MIME type (albeit  
in a limited way to prevent Validator.nu loading images, movies,  
etc.). Respecting Content-Type is the default, though.

The main reason for adding the character encoding override was  
supporting the form-based file upload case, but I opted not to hide  
the UI in other cases.

>> Making the references to a misconfigured server is
>> under your control.
>
> Yeah, I could use form input or upload instead of a
> HTTP URL, or maybe set up a decent gopher server and
> let your validator tackle this.

What are you trying to achieve? Are you trying to check that your Web  
content doesn't have obvious technical problems? If you are, surely it  
would be less useful if the validator pretended that Content-Type  
didn't matter to parser choice when it does matter in browsers. Or are  
you just trying to game a tool to say that your page is valid while  
insisting on doing stuff that is practically problematic? If so,  
what's the point?

> If that is your idea of usability we are wasting time,
> as I can simply use validators doing what I want, i.e.
> check X, neither X' nor X'', and typically not X'''.

In order to assess whether doing what you want is a waste of time, I'd  
like to know what objective you have in mind in the use case sense.  
Why are you validating pages?

--
Henri Sivonen
[hidden email]
http://hsivonen.iki.fi/



Reply | Threaded
Open this post in threaded view
|

Re: validator.nu

Frank Ellermann

Henri Sivonen wrote:
 
> Validator.nu checks the combination of the protocol
> entity body and the Content-Type header. Pretending
> that Content-Type didn't matter wouldn't make sense
> when it does make a difference in terms of processing
> in a browser.

I checked if the W3C validator servers still claim that
application/xml-external-parsed-entity is chemical/x-pdb

This was either fixed, or it is an intermittent problem,
therefore I can continue my I18N tests today.  XHTML 1
like HTML 4 wants URIs in links.  For experiments with
IRIs I created a homebrewn XHTML 1 i18n document type.

Actually the same syntax renaming URI to IRI everywhere,
updating RFC 2396 + 3066 to 3987 + 4646 in DTD comments,
with absolute links to some entity files hosted by the
W3C validator - that caused the chemical/x-pdb trouble.

To get some results related to the *content* of my test
files I have to set three options explicitly:

* Be "lax" about HTTP content - whatever that is, XHTML 1
  does not really say "anything goes", but validator.nu
  apparently considers obscure "advocacy" pages instead
  of the official XHTML 1 specification as "normative".

* Parser "XML; load external entities" - whatever it is,
  validator.nu cannot handle the <?xml etc. intro for
  XHTML 1 otherwise.  But that is required depending on
  the charset, and certainly always allowed for XHTML 1.

* Preset "XHTML 1 transitional" - actually the test is
  not realy XHTML 1 transitional, but a uses a homebrewn
  XHTML 1 i18n DTD, but maybe that's beside the point for
  a validator not supporting DTDs to start with.

With those three explicitly set options it could finally
report that my test page is "valid" XHTML 1 transitional.

But it's *not*, it uses real IRIs in places where only URIs
are allowed, a major security flaw in DTD based validators:
<http://omniplex.blogspot.com/2007/11/broken-validators.html>

I know why DTD validators have issues to check URI syntax,
it's beyond me why schema validators don't get this right,
IMO "get something better than CDATA for attribute types"
is the point of not using DTDs.  And "can do STD 66 syntax
for URIs", a full Internet Standard, is the very minimum
I'd expect from something claiming to be better than DTDs.

The broken URIs starting "calc" (on XP with installed IE7)
from various applications were a hot topic for some months
in 2007 until Adobe, Mozilla, MS, etc. finally arrived at
the conclusion that the question whose fault that was isn't
relevant.  If all parties simply follow STD 66 it is okay.

Four more related XHTML 1 I18N tests likely can't fly with
validator.nu not supporting the (very) basic idea of DTDs,
out of curiosity I tried it anyway:

| Warning: XML processors are required to support the UTF-8
| and UTF-16 character encodings. The encoding was KOI8-R
| instead, which is an incompatibility risk.

Untested, I hope US-ASCII wouldn't trigger this warning, as
a mobile-ok prototype did some months ago (and maybe still
does).  
 
Validator.nu accepts U-labels (UTF-8) in system identifiers,
W3C validator doesn't, and I also think they aren't allowed
in XML 1.0 (all editions).  Martin suggested they are okay,
see <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5279>.

Validator.nu rejects percent encoded UTF-8 labels in system
identifiers, like the W3C validator.  I think that is okay,
*unless* you believe in a non-DNS STD 66 <reg-name>, where
it might be syntactically okay.  Hard to decide, potentially
a bug <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5280>.

 [back to the general "HTML5 considered hostile to users"]
> What are you trying to achieve?

As mentioned about ten times in this thread I typically try
to validate content, as author of the relevant document, or
in a position to edit (in)valid documents.  

The complete number of HTTP servers under my control at this
second (counting servers where I can edit dot-files used as
configuration files by a popular server) is *zero*.  That is
a perfectly normal scenario for many authors and editors.

Of course I'm not happy if files are served as chemical/x-pdb
or similar crap, but it is outside my sphere of influence, and
not what I'm interested in when I want to know what *I* did to
make the overall picture worse *within* documents edited by me.

Of course mediawiki *could* translate IRIs to equivalent URIs
when it claims to produce XHTML 1 transitional, etc., just to
mention another example.  They are IMO in an ideal position to
do this on the fly, for compatibility with almost all browsers,
and IRIs are designed to have equivalent URIs.  

Where "outside my sphere of influence" is negotiable, e.g. I'd
have reported chemical/x-pdb as bug today, but it was already
fixed.  My "plan B" was to use the "official" absolute URIs on
a W3C server instead of the validator's SGML library, "plan C"
would be to copy these files and put them on the same server as
the homebrewn DTD.  While googlepages won't try chemical/x-pdb
I fear they'll never support the correct type for *.ent files,
that is rather obscure.

> Are you trying to check that your Web content doesn't have
> obvious technical problems?

Normally, yes.  Of course we are discussing mainly my validator
torture test pages, intentionally *unnormal* pages.  I don't use
HTML 2 strict or HTML i18n elsewhere, I don't use "raw" IRIs on
"normal" XHTML 1 transitional pages because I know it's invalid,
I use obscure colour names in legacy markup working more or less
with any browser only on a single test page, and when you find
*hundreds* of "&" instead of "&amp;" on my blogger page this is
no test, but a blogger bug, and I reported it months ago.  Maybe
they don't care, or are busy with other stuff like "open-id", or
the most likely case:  For products with thousands of users such
bug reports NEVER reach developers, because they are filtered by
folks drilled to suppress^H^H^H^Hort technically clueless users.

> Or are you just trying to game a tool to say that your page is
> valid

Rarely.  I use image-links hidden by span within pre on one page,
at some point in time validators will tell me that this is a hack,
no matter if it works with all browsers I've ever tested.  Sanity
check with validator.nu:  Your tool says that this is an error.

Maybe HTML5 could permit it, I'm not hot about it unless somebody
produces a browser where this horribly fails.

> Why are you validating pages?

To find bugs.  And for some years I used the W3C validator and its
mailing list also as a way to learn XHTML 1 above a level offered
by an O'Reilly book.  Until I could read DTDs, read the XML spec.
often enough for a vague impression, and figured relevant parts of
the HTML history out.  Using a legacy "3.2" browser also helped.

 Frank


Reply | Threaded
Open this post in threaded view
|

Re: validator.nu

olivier Thereaux
In reply to this post by Frank Ellermann-3

Frank wrote:
> I checked if the W3C validator servers still claim that application/
> xml-external-parsed-entity is chemical/x-pdb This was either fixed,  
> or it is an intermittent problem,
It was indeed aptly reported by Henri, and fixed.
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5446

regards,
--
olivier