Validator error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Validator error

Roman Grinyov
Hello.
Validation of this page: http://websnippets.ru/article.php?id=30; an error occurs in line 33. However apparent reason at no.
Reply | Threaded
Open this post in threaded view
|

Re: Validator error

Jukka K. Korpela
2014-10-26 22:38, Roman Grinyov wrote:

> Validation of this page: http://websnippets.ru/article.php?id=30; an
> error occurs in line 33. However apparent reason at no.

This seems to be an odd bug in the validator. The data contains the
correct character reference > with no hidden control characters. I
tried to isolate the problem and noticed that deleting everything after
the <ul> element on lines 30–36 except the end tag </article> makes the
page validate. It is a mystery how any content there can make the
validator reject the correct character reference on line 33.

I even tried deleting the &gt; reference. Then the validator issues an
error message about &lt;. OK, let’s delete that too. Now it complains
about &amp;. After removing that as well I get
“Error: & did not start a character reference. (& probably should have
been escaped as &amp;.)”
with no reference to any line number.

OK, one more test: starting from the original document and deleting just
the <textarea> element containing some sample HTML code as data (with
“<” properly encoded as “&lt;”), the document passes.

The culprit appears to be on line 48:

                &lt;p&gt;10 строка | &w3 &lt;/p&gt;

Validating this line in isolation, with a minimal document around it,
results in a correct message that points to the “&w3” construct.

The bug in the validator is that it does not report this properly at all
in the given context but instead flags completely correct character
references *before* it as erroneous.

The bug is reproducible at http://validator.nu too.

Yucca



Reply | Threaded
Open this post in threaded view
|

Re: Validator error

Michael[tm] Smith
Hi Jukka,

> Date: Mon, 27 Oct 2014 00:29:57 +0200
> From: "Jukka K. Korpela" <[hidden email]>
> Archived-At: <http://www.w3.org/mid/544D75E5.9030602@...>
...

> The culprit appears to be on line 48:
>
> &lt;p&gt;10 строка | &w3 &lt;/p&gt;
>
> Validating this line in isolation, with a minimal document around it,
> results in a correct message that points to the “&w3” construct.
>
> The bug in the validator is that it does not report this properly at
> all in the given context but instead flags completely correct
> character references *before* it as erroneous.
>
> The bug is reproducible at http://validator.nu too.
Thanks for examining this, and thanks to Roman for reporting it. It's
definitely a bug.

The message is this case is coming from the HTML parser but I can't
reproduce it in "View source" in Firefox (which uses the same HTML parser):

  view-source:http://websnippets.ru/article.php?id=30
  (mouse over the "&w3")

...so it seems a problem specific to the validator usage of the HTML parser.

This is minimally reproducible with the following document:

  <!doctype html><title>test</title>&gt;<textarea>&w3</textarea>

  http://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%3C%2521doctype%2520html%3E%3Ctitle%3Etest%3C%252Ftitle%3E%2526gt%253B%3Ctextarea%3E%2526w3%3C%252Ftextarea%3E

If you replace the `texarea` with a `span` or whatever, you can't reproduce
it. That makes some sense because the `textarea` elements have special code
path in the parser, along with `title` elements.

So I kinda expect the core problem here is, the validator code isn't
passing on line-number info correctly to the parser when processing
`textarea` and `title` elements. Here's an even more minimal case:

  <!doctype html><title>&w3</title>

  http://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%3C%2521doctype%2520html%3E%3Ctitle%3E%2526w3%3C%252Ftitle%3E

For that case, the validator just reports "Error: & did not start a
character reference. (& probably should have been escaped as &amp;.)",
without reporting line+col numbers at all or flagging the position.

So I think the root cause of the problem Roman ran into is that the
validator doesn't have any line-number info to report in this case, and
then the parser's character-reference reporting isn't getting
re-initialized correctly, so it reports the position of the last character
reference it checked that did have a line+col numbers.

Anyway, I've filed a bug http://bugzilla.validator.nu/show_bug.cgi?id=1010
and I'll try to make some time soon to investigate the code around this.

  --Mike

--
Michael[tm] Smith https://people.w3.org/mike

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Validator error

Michael[tm] Smith
> Date: Mon, 10 Nov 2014 12:07:24 +0900

> From: "Michael[tm] Smith" <[hidden email]>
> Archived-At: <http://www.w3.org/mid/20141110030724.GQ4173@...>
>
> > Date: Mon, 27 Oct 2014 00:29:57 +0200
> > From: "Jukka K. Korpela" <[hidden email]>
> > Archived-At: <http://www.w3.org/mid/544D75E5.9030602@...>
> ...
> > The culprit appears to be on line 48:
> >
> > &lt;p&gt;10 строка | &w3 &lt;/p&gt;
> > ...
> > The bug in the validator is that it does not report this properly at
> > all in the given context but instead flags completely correct
> > character references *before* it as erroneous.
> > ..
> So I think the root cause of the problem Roman ran into is that the
> validator doesn't have any line-number info to report in this case, and
> then the parser's character-reference reporting isn't getting
> re-initialized correctly, so it reports the position of the last character
> reference it checked that did have a line+col numbers.
>
> Anyway, I've filed a bug http://bugzilla.validator.nu/show_bug.cgi?id=1010
> and I'll try to make some time soon to investigate the code around this.
The fix for this turned out to be simple but I can't land it myself since
it's in the htmlparser code, which isn't part of the validator code proper.

So I've opened another bug for it with a patch -

  https://bugzilla.mozilla.org/show_bug.cgi?id=1096172

As soon as that patch lands, I'll push it out to the W3C validator and then
post a follow-up on the thread here.

  --Mike

--
Michael[tm] Smith https://people.w3.org/mike

signature.asc (853 bytes) Download Attachment