markup spec [was: Re: Should we Publish a Language Specification?]

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

markup spec [was: Re: Should we Publish a Language Specification?]

Jim Jewett

Julian wrote:

> I agree that the language HTML5 should have a singular normative
> definition. I'd prefer it not to be the same document that describes all
> the rest.

I'll go farther and say that is
such a good start that I'm ready to start commenting on it.  Some of
these comment would apply to the original spec as well, but I kept
getting sort of lost there, because of the size.

Section 2, Terminology:

Should "case-insensitive" be "ASCII-case-insensitive", which is the
term used in section 3.6. attributes?

Should "space characters" be called "spacing characters", to
distinguish them from the specific character named SPACE?  Should it
be called out explicitly that these are only a subset of the unicode
characters having the White_Space property?

Section 3, Syntax:

"an XML parser" should probably be in a dfn tag, if an HTML parser is.
 (Unless you are intentionally delegating that definition ... but they
be explicit.  And the HTML parser should probably also be delegated to
the error-correction document.)

General, but first noticed in Section 3.1:

Should "MUST", "MUST NOT", "SHOULD", etc be capitalized, as in other
recent specs?

Section 3.4, Character Encoding

This should not always assume HTTP, so

        "... and if its encoding is not explicitly given by Content-Type metadata,"
        "... and if its encoding is not explicitly given by a higher-level
protocol, such as the HTTP Content-Type header,"

I couldn't quite make sense of the "or ... " clause for the meta
element.  My suggestion is

        then the encoding must be specified using a meta element with a
charset attribute or a meta element in the Encoding declaration state.
        then the encoding must be specified using a meta element with a
charset attribute.

Section 3.5, Elements

        Attributes may be separated from each other
        Attributes MUST be separated from each other

Section 3.5, Rule 6 implies that elements which *could* have content
cannot be self-closing.  Therefore, <div /> is illegal.  That is OK
with me, but it is worth being explicit.

Section 3.6, Attributes

Is the Attribute Names rule correct?  It seems to imply that a
semi-colon is a legitimate attribute name.  If so, should there at
least be a SHOULD on using XML-compatible names?

Section 3.7, Text.

Why the extra work to ensure that <!--> is a valid escaping text span?
 (Similar question on comments.)  I understand that it is an edge case
which the parser needs to handle, but is there a reason to have such
an empy text span be valid?

Section 3.8, character references.

Why are non-ambiguous ampersands allowed?  Are they useful enough to
justify the extra complexity?  (Maybe... but I'm not sure.  To me, the
fact that &< is OK just makes the rules seem arbitrary.)

Section 4, the HTML elements

The assertions sections are very useful.

Element a:

I think a.elem.phrase is a strict subset of a.elem.prose, so it might
be worth adding a short note explaining the difference.  (Even if that
does violate the separation of concerns... but I think it doesn't.  I
think the difference is that using prose makes the tag itself a
block-level element instead of a phrase-level element.)

It is probably worth adding a classification subsection to each
element.  For example, a is interactive, and can be either phrasing or

Element abbr:

There is a stray ` character after the name Philip in the example --
this seems to be copied straight from a similar typo in the full spec.

Element acronym:

Should this just be dropped from the valid markup spec, and included
only in the parsing-and-error-correction spec?  At the very least, it
should say "Use the abbr element instead."

Element address:

Should there be an invalid example that is still an address, but just
not a contact address, such as

My Dad lives at <address>123 Memory Lane</address>?

Element area:

needs some cleanup from the conversion, about the various state that
coords would represent.

Element canvas:

I think most of this could be left in the processing document.  Just
list the two attributes, and their default values.  Maybe specify that
the coordinate space is in abstract units, which may not correspond to
pixels or pica or ex.  Say it is typically used with scripting, but
maybe specify the default/initial appearance when no script is run.

Element col:

"If a col element has a parent ..."

What does it represent otherwise?  (The current spec doesn't say either.)

Maybe for the valid markup, just reword it to show proper usage.

"A col element represents one (or more) columns within its parent
colgroup element."

Element colgroup:

Similar issue to col.  Just drop the ", if it has a parent and that is
a table element."

(I think I'll stop with the "c" elements for today.)