Semicolon after entities

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
435 messages Options
1234 ... 22
Reply | Threaded
Open this post in threaded view
|

Semicolon after entities

mmiikkee13 (Bugzilla)
(I hope this is the right place to send suggestions for the next HTML spec... it seems to be judging from some of the other messages here.)

The W3C validator (using HTML 4.01 Transitional) says that a & in a URL should be encoded as &. I don't think that this should be required. For one thing, I like to keep my code neat with as few entities as possible, and having to encode &'s all the time doesn't really help that. (Maybe I'm just obsessive about clean code or something :-)

Another (more important) reason is that an entity is not recognized as an entity unless it starts with &, and ends with a semicolon. A URL such as the one in <a href="somepage.php?foo=1&copy=2"> has the string '&copy' in it, but it has no trailing semicolon and therefore should not recognized as an entity in a browser. (I just tested this in Firefox, and it does indeed convert &copy to a copyright symbol, but I see this as incorrect behavior as the HTML spec itself states that "In SGML, it is possible to eliminate the final ";" after a character reference in some cases (e.g., at a line break or immediately before a tag).", and inside an attribute value is not a line break or before a tag.)

(If I'm wrong, and it is actually legal to omit the ; after an entity, then perhaps it should be required to stop confusion like this?)
Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

David Dorward-3

Mike S wrote:
> The W3C validator (using HTML 4.01 Transitional) says that a & in a URL
> should be encoded as &amp;. I don't think that this should be required.

& should be encoded as &amp; except in attribute values which represent
URLs?

Please, no! Simplicity is a virtue, and exceptions are the enemy of
simplicity.

> For one thing, I like to keep my code neat with as few entities as
> possible, and having to encode &'s all the time doesn't really help
> that.

Your options include using an authoring tool that does it for you, or
using semi-colons instead (most form data parsing libraries I've
encounted respect the advice of HTML 4.01:
http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2

> Another (more important) reason is that an entity is not recognized as
> an entity unless it starts with &, and ends with a semicolon.

If I remember correctly, that is not true. The semi-colon is optional
where a non-name character is present. So ?foo=bar&amp=12 is an HTML
representation of ?foo=bar&=12.

I'm not a big fan of this and would rather the semi-colon is required
(as it is in XML based languages) for the reasons mentioned above
(simplicity).

> A URL such
> as the one in <a href="somepage.php?foo=1&copy=2"> has the string
> '&copy' in it, but it has no trailing semicolon and therefore should not
> recognized as an entity in a browser. (I just tested this in Firefox,
> and it does indeed convert &copy to a copyright symbol, but I see this
> as incorrect behavior as the HTML spec itself states that "In SGML, it
> is possible to eliminate the final ";" after a character reference in
> some cases (e.g., at a line break or immediately before a tag).", and
> inside an attribute value is not a line break or before a tag.)

Those "some cases" include, I believe, "if the next character is a
non-name character such as an equals sign". The example was just that,
not a complete list of circumstances.

> (If I'm wrong, and it is actually legal to omit the ; after an entity,
> then perhaps it should be required to stop confusion like this?)

It is in XHTML.

--
David Dorward                               <http://dorward.me.uk/>

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

David Woolley (E.L)

David Dorward wrote:
[ ; mandatory at end of entity reference ]

> It is in XHTML.

Moreover bare & is allowed in neither normal character data nor
attribute values, so &copy= is a well formedness violation and will
cause a conforming user agent to abort the document on seeing the
non-name character.

Not using &amp; for & in href attributes is an error in HTML.  You get
away with it because of browser error recovery, but you cannot rely on
this because the number of defined entities may increase, or backwards
compatibility may result in old entities being recognized.


Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Lachlan Hunt
In reply to this post by mmiikkee13 (Bugzilla)

Mike S wrote:
> (I hope this is the right place to send suggestions for the next HTML
>  spec... it seems to be judging from some of the other messages
> here.)

The preferred mailing list is the whatwg mailing list or the new
public-html mailing list, though feedback sent here will be taken into
account anyway.

> The W3C validator (using HTML 4.01 Transitional) says that a & in a
> URL should be encoded as &amp;. I don't think that this should be
> required.  For one thing, I like to keep my code neat with as few
> entities as possible, and having to encode &'s all the time doesn't
> really help that.

Why do you consider the use of entity references to be messy?

> Another (more important) reason is that an entity is not recognized
> as an entity unless it starts with &, and ends with a semicolon. A
> URL such as the one in <a href="somepage.php?foo=1&copy=2"> has the
> string '&copy' in it, but it has no trailing semicolon and therefore
> should not recognized as an entity in a browser.

Actually, according to SGML rules for HTML4, the semi-colon is optional
in that case.  That is, in fact, one of the entity references for which
browsers use the correct behaviour and expand it to a copyright symbol.

HTML5 has simplified the document conformance rules to require a
semi-colon in all cases.

> (I just tested this in Firefox, and it does indeed convert &copy to a
> copyright symbol, but I see this as incorrect behavior as the HTML
> spec itself states that "In SGML, it is possible to eliminate the final
> ";" after a character reference in some cases (e.g., at a line break or
> immediately before a tag).", and inside an attribute value is not a
> line break or before a tag.)

Those are just example situations when the semi-colon may be omitted,
not an exhaustive list of all situations.  The '=' character is another
case where it can be omitted per SGML rules, along with several others.

That behaviour needs to be retained for backwards compatibility so that
sites using entity refs without semi-colons won't break.

--
Lachlan Hunt
http://lachy.id.au/

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Jukka K. Korpela

On Wed, 25 Apr 2007, Lachlan Hunt wrote:

> HTML5 has simplified the document conformance rules to require a semi-colon
> in all cases.
- -
> Those are just example situations when the semi-colon may be omitted, not an
> exhaustive list of all situations.  The '=' character is another case where
> it can be omitted per SGML rules, along with several others.
>
> That behaviour needs to be retained for backwards compatibility so that sites
> using entity refs without semi-colons won't break.

Thus, it seems that HTML5 effectively retains the old HTML policy: the
semicolon is not required before a name character, but it is recommended.

If you first specify a requirement on documents (always use ";") and then
specify mandatory error processing related to it (browsers must recognize
entity references without ";"), then you have effectively defined the
error as a feature, though a deprecated one. But you can proclaim that you
have now defined a stricter version of the language. :-)

--
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/


Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Henri Sivonen

On Apr 25, 2007, at 08:28, Jukka K. Korpela wrote:

> If you first specify a requirement on documents (always use ";")  
> and then specify mandatory error processing related to it (browsers  
> must recognize entity references without ";"), then you have  
> effectively defined the error as a feature, though a deprecated  
> one. But you can proclaim that you have now defined a stricter  
> version of the language. :-)

The difference is when a conformance checker alerts an author about  
an error. Since the omission of the semicolon is potentially  
confusing, it makes sense to make the omission non-conforming so that  
conformance checkers alert authors who have omitted the semicolon  
inadvertently (e.g. by pasting a URL that contains a query string  
part that looks like an entity reference). This way, unintentional  
omissions are caught. After all, deliberate omissions are probably  
only done by a small group of language lawyers.

--
Henri Sivonen
[hidden email]
http://hsivonen.iki.fi/



Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

David Woolley (E.L)
In reply to this post by Jukka K. Korpela

Jukka K. Korpela wrote:
> recognize entity references without ";"), then you have effectively
> defined the error as a feature, though a deprecated one. But you can
> proclaim that you have now defined a stricter version of the language. :-)

That is why the HTML5 parsing rules are so large.  They do this for many
things, and, yes, I agree that the effect of this is not to define error
recovery, but rather to define official parts of the language.

It is, however, what the market seems to want, i.e. the ability to
ignore standards when writing and have all browsers attempt to divine
the author intent in the same way.  A very small proportion of users
even read the standards and, judging by some of the books I've sampled,
nor do many authors of books on or including material on, how to use HTML.



Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

David Woolley (E.L)
In reply to this post by Henri Sivonen

Henri Sivonen wrote:

> The difference is when a conformance checker alerts an author about an

Very few web sites have ever been conformance checked by their authors!
I haven't done this recently, but it used to be amusing to check the
corporate web sites of W3C members.  It was usual to find conformance
errors and not unusual to find ones that were major structural errors,
rather than just the use of proprietary extensions.

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Ian Hickson
In reply to this post by Jukka K. Korpela

On Wed, 25 Apr 2007, Jukka K. Korpela wrote:
>
> If you first specify a requirement on documents (always use ";") and
> then specify mandatory error processing related to it (browsers must
> recognize entity references without ";"), then you have effectively
> defined the error as a feature, though a deprecated one. But you can
> proclaim that you have now defined a stricter version of the language.

No, if you say something is non-conforming, it's non-conforming. Whether
the error handling is defined recovery, reverse-engineered undefined
recovery, or a fatal error has no effect on how strict the language is.
The language's strictness is up to its conformance criteria.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Bjoern Hoehrmann

* Ian Hickson wrote:

>On Wed, 25 Apr 2007, Jukka K. Korpela wrote:
>>
>> If you first specify a requirement on documents (always use ";") and
>> then specify mandatory error processing related to it (browsers must
>> recognize entity references without ";"), then you have effectively
>> defined the error as a feature, though a deprecated one. But you can
>> proclaim that you have now defined a stricter version of the language.
>
>No, if you say something is non-conforming, it's non-conforming. Whether
>the error handling is defined recovery, reverse-engineered undefined
>recovery, or a fatal error has no effect on how strict the language is.
>The language's strictness is up to its conformance criteria.

Jukka compares two things, saying one can be seen as stricter than the
other. You on the other hand measure the strictness of only one thing,
which makes your argument unfit to refute Jukka's point, much like not
quoting Jukka's smiley and ignoring qualifiers like "effectively", both
of which admit some laxness in the argument, while you attempt to show
an absolute truth.

In the study of formal languages "strictness" is used to describe the
relationship between expressive power and syntactic freedom in the
context of comparable languages or expectations (Canonical XML 1.0 is
stricter than XML 1.0; a format that insists on LF line endings is
regarded as strict as we expect to be able to use CR LF endings aswell).

Jukka is simply playing on the concept of having two notions of
syntactic freedom: what is accepted as proper, and what is accepted as
equivalent; no one would argue <p>&ouml</p> and <p>&ouml;</p> have
different textual content; if your only concern is correct interpre-
tation of your document, you are free to use either form.

In the context of the "WHATWG" proposal, WHATWG member David Baron re-
cently pointed out that using non-conformant tag soup is a reasonable
transition strategy for authors trying to gradually migrate from con-
formant HTML4 to conformant WHATWG HTML; on that ground alone I think
it is unfair to claim error handling requirements do not affect
syntactic freedom and therefore do not affect strictness.

It is either acceptable to use non-conforming markup in which case we
may study error handling when deciding about strictness, or it is not
acceptable to use non-conforming markup, in which case the WHATWG can
not point to error handling to dismiss transition strategy concerns.

In conclusion, the disagreement seems to be about the meaning of the
term "language" where Jukka's definition appears broader than yours,
since he includes processing details while you reject that idea. In
that case it is not useful to point out what follows from your defi-
nition, but rather what your definition is, or making your point using
different terminology. By your definition WHATWG HTML would be about
as strict a language as XML 1.0; I don't think that view is widely
held on this list; if that is indeed so, your definition is not useful.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Jukka K. Korpela
In reply to this post by Henri Sivonen

On Wed, 25 Apr 2007, Henri Sivonen wrote:

> On Apr 25, 2007, at 08:28, Jukka K. Korpela wrote:
>
>> If you first specify a requirement on documents (always use ";") and then
>> specify mandatory error processing related to it (browsers must recognize
>> entity references without ";"), then you have effectively defined the error
>> as a feature, though a deprecated one. But you can proclaim that you have
>> now defined a stricter version of the language. :-)
>
> The difference is when a conformance checker alerts an author about an error.

In that case, the difference is in the name of a tool: conformance checker
or just checker.

> Since the omission of the semicolon is potentially confusing, it makes sense
> to make the omission non-conforming so that conformance checkers alert
> authors who have omitted the semicolon inadvertently

Conformance checkers may issue warnings if they like, as they like, though
strictly speaking they won't be pure conformance checkers anyway. But
even the W3C markup validator isn't a pure conformance checker (still less
a pure validator).

--
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/


Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Jukka K. Korpela
In reply to this post by Ian Hickson

On Wed, 25 Apr 2007, Ian Hickson wrote:

> On Wed, 25 Apr 2007, Jukka K. Korpela wrote:
>>
>> If you first specify a requirement on documents (always use ";") and
>> then specify mandatory error processing related to it (browsers must
>> recognize entity references without ";"), then you have effectively
>> defined the error as a feature, though a deprecated one. But you can
>> proclaim that you have now defined a stricter version of the language.
>
> No, if you say something is non-conforming, it's non-conforming.

So what? You still have defined an error as a feature. Who cares about
document non-conformance in a particular issue when software processing
the documents is required to process non-conforming documents in a
specific way and it actually does that? It's like saying that the use of
the word black as color value is non-conforming but if it is used,
programs must interpret it as #000.

> Whether
> the error handling is defined recovery, reverse-engineered undefined
> recovery, or a fatal error has no effect on how strict the language is.
> The language's strictness is up to its conformance criteria.

Conformance as such is relevant only in situations where conformance is
required by law or enforceable instructions. I don't think that's a common
situation on Earth. Besides, if the conformance criteria are pointlessly
strong (e.g., prohibiting something that still has well-defined and
widely implemented meaning), people who make the laws or instructions
could (and probably should) tune them accordingly: thou shalt conform,
except for...

I wonder whether this discussion actually relates to the IE 7 madness of
disallowing some of the valid entity and character references without a
semicolon. It achieves nothing but breaks many existing pages.

--
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/


Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Nicholas Shanks
In reply to this post by David Woolley (E.L)
On 25 Apr 2007, at 07:32, David Woolley wrote:

> A very small proportion of users even read the standards and,  
> judging by some of the books I've sampled, nor do many authors of  
> books on or including material on, how to use HTML.

Can't someone propose a bill to make publishing such books illegal?

- Nicholas.



smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Lachlan Hunt
In reply to this post by David Woolley (E.L)

David Woolley wrote:
> Not using &amp; for & in href attributes is an error in HTML.  You get
> away with it because of browser error recovery, but you cannot rely on
> this because the number of defined entities may increase, or backwards
> compatibility may result in old entities being recognized.

In reality, that's not going to happen.  One of the design principles
for the new HTMLWG is Don't Break the Web.  Introducing new entity
references, or any new feature for that matter, that has a significant
chance of breaking pages and isn't backwards compatible, won't be
introduced.

The only new entities that have been introduced into HTML5 are &apos;,
and the uppercase variants of &AMP;, &LT;, &GT; &QUOT;, &COPY; and
&REG;, which are already widely supported.

--
Lachlan Hunt
http://lachy.id.au/

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

mmiikkee13 (Bugzilla)

How can making a new version of a standard break existing pages? If an
existing page uses, for example, HTML 4.01 as its DOCTYPE, and HTML
5.0 is published, won't that page continue to be read by browsers as
HTML 4.01?

(I used to think XHTML was insane. I'm on the verge of changing my
mind because of this :-)

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Lachlan Hunt

Mike S wrote:
> How can making a new version of a standard break existing pages? If an
> existing page uses, for example, HTML 4.01 as its DOCTYPE, and HTML
> 5.0 is published, won't that page continue to be read by browsers as
> HTML 4.01?

No, ideally browsers would handle all revisions of HTML the same.
Although browsers have quirks/standards mode, that is based upon the
DOCTYPE used, if any, not the version of the langague.

The HTML5 spec is attempting to define how to handle all HTML now and in
the future.  With the unfortunate exception of IE, browsers will not be
adding additional DOCTYPE sniffing to distinguish between HTML5 and
other revisions.

--
Lachlan Hunt
http://lachy.id.au/

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Philip TAYLOR-3






Lachlan Hunt wrote:

> The HTML5 spec is attempting to define how to handle all HTML now and in
> the future.  With the unfortunate exception of IE, browsers will not be
> adding additional DOCTYPE sniffing to distinguish between HTML5 andother  
> revisions.

That is, I think at the very centre of this debate/argument/w-h-y, although
this is the first explicit mention that I have seen.  Web Apps 1 (I avoid
calling it HTML5, since there is by no means universal agreement that
Web Apps 1 should become HTML5) appears to be defining (amongst other
things) a processing model that will allow all HTML pages to be
processed in the same way (including an attempt to define the behaviour
if a document is ill-formed).  What I believe is really needed is
about as diametrically opposed to this as can be imagined : a processing
model which varies with the DOCTYPE.  I have little objection to it
defining a processing model which treats HTML 3.2 and earlier as tag
soup.  HTML 4.0 was a mistake, HTML 4.01 corrected the error and -- if
it had been properly used in the wild -- could have been parsed and
processed more rigorously : as it is, there is such a corpus of
ill-formed legacy documents that one has little choice but to once
again allow the tag-soup model.

But HTML5 should be different.  This is surely the time at which to
say "enough is enough" : either a document is well-formed (in which
case its processing is well-defined) or it is not, in which case
the browser can process it as it will.  There is <shout>no need</>
for all browsers to handle something that /alleges/ to be HTML5
consistently if the document is defective (poorly formed).  Indeed,
if browsers /do/ vary wildly in their treatment of ill-formed
HTML5 documents, there will be far greater pressure on /hoi polloi/
to write good, well-formed, HTML5 if they wish their offerings to
be seen consistently.  Thus, IMHO, HTML5 can be processed quite
differently to earlier, legacy, DTDs and it should be quite correct
for a conforming browser to switch processing models (from "lax"
to "strict") when an HTML5 DTD is detected.

To summarise, I think the following statement, taken from Web Apps 1,
is fundamentally flawed and requires radical thinking if sanity
is to prevail :

        8.1.1. The DOCTYPE

        A DOCTYPE is a mostly useless, but required, header.

        DOCTYPEs are required for legacy reasons. When omitted,
        browsers tend to use a different rendering mode that is
        incompatible with some specifications. Including the
        DOCTYPE in a document ensures that the browser makes
        a best-effort attempt at following the relevant specifications.

I would re-cast this along the lines of the following :

        8.1.1. The DOCTYPE

        A DOCTYPE is a much abused, but required, header.

        Until the introduction of HTML5, DOCTYPEs have -- in the
        main -- been mere eye-candy at the start of a putative
        HTML document.  With the introduction of HTML5, the
        DOCTYPE plays a vital role in determining the processing
        model for HTML documents.  If a well-formed HTML5 DOCTYPE
        is found (in the syntactically correct position), a
        conforming browser is REQUIRED to adopt the strict processing
        model described elsewhere in this specification.  If such
        a DOCTYPE is NOT found (or is found but in a position where
        its semantics are undefined), then a conforming browser is
        entitled to adopt any processing model that it deems fit.

Philip Taylor

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Philip Taylor (Webmaster, Ret'd)
In reply to this post by Lachlan Hunt


Lachlan Hunt wrote:

 > > The HTML5 spec is attempting to define how to handle all HTML now and in
 > > the future.  With the unfortunate exception of IE, browsers will not be
 > > adding additional DOCTYPE sniffing to distinguish between HTML5 and
 > > other revisions.

That is, I think at the very centre of this debate/argument/w-h-y, although
this is the first explicit mention that I have seen.  Web Apps 1 (I avoid
calling it HTML5, since there is by no means universal agreement that
Web Apps 1 should become HTML5) appears to be defining (amongst other
things) a processing model that will allow all HTML pages to be
processed in the same way (including an attempt to define the behaviour
if a document is ill-formed).  What I believe is really needed is
about as diametrically opposed to this as can be imagined : a processing
model which varies with the DOCTYPE.  I have little objection to it
defining a processing model which treats HTML 3.2 and earlier as tag
soup.  HTML 4.0 was a mistake, HTML 4.01 corrected the error and -- if
it had been properly used in the wild -- could have been parsed and
processed more rigorously : as it is, there is such a corpus of
ill-formed legacy documents that one has little choice but to once
again allow the tag-soup model.

But HTML5 should be different.  This is surely the time at which to
say "enough is enough" : either a document is well-formed (in which
case its processing is well-defined) or it is not, in which case
the browser can process it as it will.  There is <shout>no need</>
for all browsers to handle something that /alleges/ to be HTML5
consistently if the document is defective (poorly formed).  Indeed,
if browsers /do/ vary wildly in their treatment of ill-formed
HTML5 documents, there will be far greater pressure on /hoi polloi/
to write good, well-formed, HTML5 if they wish their offerings to
be seen consistently.  Thus, IMHO, HTML5 can be processed quite
differently to earlier, legacy, DTDs and it should be quite correct
for a conforming browser to switch processing models (from "lax"
to "strict") when an HTML5 DTD is detected.

To summarise, I think the following statement, taken from Web Apps 1,
is fundamentally flawed and requires radical re-thinking if sanity
is to prevail :

        8.1.1. The DOCTYPE

        A DOCTYPE is a mostly useless, but required, header.

        DOCTYPEs are required for legacy reasons. When omitted,
        browsers tend to use a different rendering mode that is
        incompatible with some specifications. Including the
        DOCTYPE in a document ensures that the browser makes
        a best-effort attempt at following the relevant specifications.

I would re-cast this along the lines of the following :

        8.1.1. The DOCTYPE

        A DOCTYPE is a much abused, but required, header.

        Until the introduction of HTML5, DOCTYPEs have -- in the
        main -- been mere eye-candy at the start of a putative
        HTML document.  With the introduction of HTML5, the
        DOCTYPE plays a vital role in determining the processing
        model for HTML documents.  If a well-formed HTML5 DOCTYPE
        is found (in the syntactically correct position), a
        conforming browser is REQUIRED to adopt the strict processing
        model described elsewhere in this specification.  If such
        a DOCTYPE is NOT found (or is found but in a position where
        its semantics are undefined), then a conforming browser is
        entitled to adopt any processing model that it deems fit.

Philip Taylor


Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Patrick H. Lauke

Quoting "Philip Taylor (Webmaster)" <[hidden email]>:

> Until the introduction of HTML5, DOCTYPEs have -- in the
> main -- been mere eye-candy at the start of a putative
> HTML document. With the introduction of HTML5, the
> DOCTYPE plays a vital role in determining the processing
> model for HTML documents.

Mere eye-candy for most cases, perhaps, but doctype switching has  
certainly been one of the main factors that have allowed modern css  
layouts etc to be used in an almost sane way in modern browsers,  
without breaking support for the display of legacy (usually  
doctype-less) sites.

P
--
Patrick H. Lauke
______________________________________________________________
re·dux (adj.): brought back; returned. used postpositively
[latin : re-, re- + dux, leader; see duke.]
www.splintered.co.uk | www.photographia.co.uk
http://redux.deviantart.com
______________________________________________________________
Co-lead, Web Standards Project (WaSP) Accessibility Task Force
http://webstandards.org/
______________________________________________________________
Take it to the streets ... join the WaSP Street Team
http://streetteam.webstandards.org/
______________________________________________________________

Reply | Threaded
Open this post in threaded view
|

Re: Semicolon after entities

Nicholas Shanks
In reply to this post by Philip TAYLOR-3
We wouldn't be having any of these kinds of arguments if the market  
leading browser enforced document conformance (not just well-
formedness) upon web developers.

"Sorry, the website at web.example.com could not be displayed because  
the developers are incompetent.

Error on line 30:  invalid type value ‘combobox’ for input element."

If website owners saw that message every time they opened their site  
in the browser that 80% of their customers are using, they'd sure as  
hell fix it sharpish.

Microsoft need only do this for HTML 5 documents and leave HTML 4 and  
earlier in not-very-strict or quirks mode as appropriate. I do not  
believe it will hinder their market share at all, since developers  
who cannot cope with obeying rules will still have tag soup mode to  
retreat to. So whilst it may impair HTML 5 take-up in the short term,  
it will improve both the health and interoperability of the web and  
also the job of those creating HTML 6 in a few years' time. HTML 5 in  
this scenario would still prevail through brute force, especially as  
more new features are added that sites simply must have to stay  
'cool', and will not go the way of XHTML 2.

I believe versioning of documents is superior to non-versioned  
document formats, as does the overwhelming majority of the computer  
industry. Mark-up languages should be no different. Whilst forwards  
compatibility (and thus lack of requirement for versioning) is a  
laudable goal, I offer the history of HTML as proof that it cannot  
always be achieved, and moreover, that it cannot be retro-actively be  
applied to an extant format.

If anyone can present a case for not explicitly versioning documents  
when different versions of the format for those documents are  
INCOMPATIBLE (as is the case with HTML), please present it. I have  
yet to see any such argument that has merit.


On 26 Apr 2007, at 09:41, Philip & Le Khanh wrote:

> Lachlan Hunt wrote:
>
>> The HTML5 spec is attempting to define how to handle all HTML now  
>> and inthe future.  With the unfortunate exception of IE, browsers  
>> will not beadding additional DOCTYPE sniffing to distinguish  
>> between HTML5 andother revisions.
>
> That is, I think at the very centre of this debate/argument/w-h-y,  
> although
> this is the first explicit mention that I have seen.  Web Apps 1 (I  
> avoid
> calling it HTML5, since there is by no means universal agreement that
> Web Apps 1 should become HTML5) appears to be defining (amongst other
> things) a processing model that will allow all HTML pages to be
> processed in the same way (including an attempt to define the  
> behaviour
> if a document is ill-formed).  What I believe is really needed is
> about as diametrically opposed to this as can be imagined : a  
> processing
> model which varies with the DOCTYPE.  I have little objection to it
> defining a processing model which treats HTML 3.2 and earlier as tag
> soup.  HTML 4.0 was a mistake, HTML 4.01 corrected the error and -- if
> it had been properly used in the wild -- could have been parsed and
> processed more rigorously : as it is, there is such a corpus of
> ill-formed legacy documents that one has little choice but to once
> again allow the tag-soup model.
>
> But HTML5 should be different.  This is surely the time at which to
> say "enough is enough" : either a document is well-formed (in which
> case its processing is well-defined) or it is not, in which case
> the browser can process it as it will.  There is <shout>no need</>
> for all browsers to handle something that /alleges/ to be HTML5
> consistently if the document is defective (poorly formed).  Indeed,
> if browsers /do/ vary wildly in their treatment of ill-formed
> HTML5 documents, there will be far greater pressure on /hoi polloi/
> to write good, well-formed, HTML5 if they wish their offerings to
> be seen consistently.  Thus, IMHO, HTML5 can be processed quite
> differently to earlier, legacy, DTDs and it should be quite correct
> for a conforming browser to switch processing models (from "lax"
> to "strict") when an HTML5 DTD is detected.
>
> To summarise, I think the following statement, taken from Web Apps 1,
> is fundamentally flawed and requires radical thinking if sanity
> is to prevail :
>
> 8.1.1. The DOCTYPE
>
> A DOCTYPE is a mostly useless, but required, header.
>
> DOCTYPEs are required for legacy reasons. When omitted,
> browsers tend to use a different rendering mode that is
> incompatible with some specifications. Including the
> DOCTYPE in a document ensures that the browser makes
> a best-effort attempt at following the relevant specifications.
>
> I would re-cast this along the lines of the following :
>
> 8.1.1. The DOCTYPE
>
> A DOCTYPE is a much abused, but required, header.
>
> Until the introduction of HTML5, DOCTYPEs have -- in the
> main -- been mere eye-candy at the start of a putative
> HTML document.  With the introduction of HTML5, the
> DOCTYPE plays a vital role in determining the processing
> model for HTML documents.  If a well-formed HTML5 DOCTYPE
> is found (in the syntactically correct position), a
> conforming browser is REQUIRED to adopt the strict processing
> model described elsewhere in this specification.  If such
> a DOCTYPE is NOT found (or is found but in a position where
> its semantics are undefined), then a conforming browser is
> entitled to adopt any processing model that it deems fit.
>
> Philip Taylor
>
>
- Nicholas.



smime.p7s (2K) Download Attachment
1234 ... 22