The advent of HTML5/XHTML5 has made documents with a DOCTYPE without a
DTD popular. However, some XML tools reports validation constraint errors for documents with the HTML5 doctype. This happens because the very HTML5 DOCTYPES apparently causes some tools to dip into DTD validation mode - and subsequently report all elements and attributes as an error, since none of them are defined in the (non-existing) DTD. This may happen even if the tool supports more useful conformance checking means, such as XSD schemas etc. Thus, it happens despite that it would have been more fruitful to go into e.g. XSD conformance checking mode (or simply just check well-formedness). When trying to discuss this behavior when XML tools developers, it would be helpful to have an authoritative statement to point to. Therefore, my proposal is to extract rules or guidance for what to do when the DOCTYPE declaration points to no markup declaration and place this into the 6th edition of XML. (Or to put it differently: define what to do when the DOCTYPE lacks an internal or external DTD.) XML 1.0 fifth edition says: “[Definition: An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.]” Question: But which constraints does a document type declaration without an internal or external DTD express? Answer: "no restriction", because document type declarations are defined to contain markup declarations, something which none of the two HTML5 doctypes (<!DOCTYPE html SYSTEM "about:legacy-compat"> and <!DOCTYPE html>) contain. Simply put, since the HTML5 doctypes contains no ”element type declaration, an attribute-list declaration, an entity declaration, or a notation declaration”, they should not be seen as markup declarations, from validating xml processor’s point of view: “[Definition: The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents. This grammar is known as a document type definition, or DTD. The document type declaration can point to an external subset (a special kind of external entity) containing markup declarations, or can contain the markup declarations directly in an internal subset, or can do both. The DTD for a document consists of both subsets taken together.]” “[Definition: A markup declaration is an element type declaration, an attribute-list declaration, an entity declaration, or a notation declaration.] These declarations may be contained in whole or in part within parameter entities, as described in the well-formedness and validity constraints below. For further information, see 4 Physical Structures.“ By the way: the spec contains several examples of simple documents to which validity applies. And it would be good to includes examples of documents where teh doctype does not reference a markup declaration. I may provide verbatim spec text change proposals, if this would be useful. -- leif halvard silli |
On 19.1.2014 21:29, Leif Halvard Silli wrote:
> Therefore, my proposal is to extract rules or guidance for what to do > when the DOCTYPE declaration points to no markup declaration and place > this into the 6th edition of XML. (Or to put it differently: define > what to do when the DOCTYPE lacks an internal or external DTD.) I don't think this makes sense. Whether validation is done is decided not by document itself, but by processor you use -- in terms of XML 1.0 spec you can use validating or non-validating processor. (http://www.w3.org/TR/REC-xml/#proc-types) If some tool triggers validating mode on encountering <!DOCTYPE> then I suggest appraoching developers of such tool and ask for some option that will allow control of such behaviour. I don't think that behaviour you describe is generic and is implied by statements in XML 1.0 spec. Jirka -- ------------------------------------------------------------------ Jirka Kosek e-mail: [hidden email] http://xmlguru.cz ------------------------------------------------------------------ Professional XML consulting and training services DocBook customization, custom XSLT/XSL-FO document processing ------------------------------------------------------------------ OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep. ------------------------------------------------------------------ Bringing you XML Prague conference http://xmlprague.cz ------------------------------------------------------------------ |
Jirka Kosek, Tue, 21 Jan 2014 15:17:08 +0100:
> On 19.1.2014 21:29, Leif Halvard Silli wrote: >> Therefore, my proposal is to extract rules or guidance for what to do >> when the DOCTYPE declaration points to no markup declaration and place >> this into the 6th edition of XML. (Or to put it differently: define >> what to do when the DOCTYPE lacks an internal or external DTD.) > > I don't think this makes sense. Whether validation is done is decided > not by document itself, but by processor you use -- in terms of XML 1.0 > spec you can use validating or non-validating processor. > (http://www.w3.org/TR/REC-xml/#proc-types) > > If some tool triggers validating mode on encountering <!DOCTYPE> then I > suggest appraoching developers of such tool and ask for some option that > will allow control of such behaviour. > > I don't think that behaviour you describe is generic and is implied by > statements in XML 1.0 spec. But they cannot report validity errors when the lack anything to validate it against. The behavior of xmllint is OK: When it fails to find a DTD, it reports that the *process* known as validation failed: “validity error : Validation failed: no DTD found !“ (even if I think it could delete the phrase "validity error"). However, I have another XML tool which, in face of the HTML5 doctype, reports an error for every single element or attribute the document contains. And btw, that same tool shows a behavior similar to that of xmllint if I use the SYSTEM variant of the HTML5 doctype - <!DOCTYPE html SYSTEM "about:legacy-compat">. A document that lacks DTD is simply ”not valid” <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid, whether it has validation errors is a question that is out of the question. Leif Halvard Silli |
Regarding http://www.w3.org/TR/REC-xml/#proc-types, then it explains
well on what basis to report errors: ]] [Definition: [ … snip … ] To accomplish this, validating XML processors MUST read and process the entire DTD and all external parsed entities referenced in the document. [[ Leif Halvard SIlli Leif Halvard Silli, Tue, 21 Jan 2014 18:30:39 +0100: > Jirka Kosek, Tue, 21 Jan 2014 15:17:08 +0100: >> On 19.1.2014 21:29, Leif Halvard Silli wrote: >>> Therefore, my proposal is to extract rules or guidance for what to do >>> when the DOCTYPE declaration points to no markup declaration and place >>> this into the 6th edition of XML. (Or to put it differently: define >>> what to do when the DOCTYPE lacks an internal or external DTD.) >> >> I don't think this makes sense. Whether validation is done is decided >> not by document itself, but by processor you use -- in terms of XML 1.0 >> spec you can use validating or non-validating processor. >> (http://www.w3.org/TR/REC-xml/#proc-types) >> >> If some tool triggers validating mode on encountering <!DOCTYPE> then I >> suggest appraoching developers of such tool and ask for some option that >> will allow control of such behaviour. >> >> I don't think that behaviour you describe is generic and is implied by >> statements in XML 1.0 spec. > > But they cannot report validity errors when the lack anything to > validate it against. > > The behavior of xmllint is OK: When it fails to find a DTD, it reports > that the *process* known as validation failed: “validity error : > Validation failed: no DTD found !“ (even if I think it could delete the > phrase "validity error"). > > However, I have another XML tool which, in face of the HTML5 doctype, > reports an error for every single element or attribute the document > contains. And btw, that same tool shows a behavior similar to that of > xmllint if I use the SYSTEM variant of the HTML5 doctype - <!DOCTYPE > html SYSTEM "about:legacy-compat">. > > A document that lacks DTD is simply ”not valid” > <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid, > whether it has validation errors is a question that is out of the > question. > > Leif Halvard Silli |
In reply to this post by Leif Halvard Silli-4
Leif Halvard Silli writes:
> A document that lacks DTD is simply ”not valid” > <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid, > whether it has validation errors is a question that is out of the > question. I presume you're referring here to these lines near the beginning: [Definition: XML documents SHOULD begin with an XML declaration which specifies the version of XML being used.] For example, the following is a complete XML document, _well-formed_ but not _valid_: <?xml version="1.0"?> <greeting>Hello, world!</greeting> and so is this: <greeting>Hello, world!</greeting> [emphasis in original] It's not *valid*, but it's not *invalid* either: XML provides a mechanism, the document type declaration, to define constraints on the logical structure and to support the use of predefined storage units. [Definition: An XML document is *valid* if it has an associated document type declaration and if the document complies with the constraints expressed in it.] Each of your examples, i.e. <!DOCTYPE html> <html/> and <!DOCTYPE html SYSTEM "about:legacy-compat"> <html/> clearly does have an "associated document type declaration", and equally clearly contain "failures to fulfill the validity constraints given in this specification" [1], so I conclude they are not only not valid, but invalid (although that, interestingly, is not a term defined in the spec. What we find at [1] is an obligation on *validating processors* to _report_ "failures to fulfill the validity constraints given in this specification".) The validity constraint they both fail to fulfill is VC: Element Valid [2], which requires a declaration for every element in a document. It's unfortunate that the definition of *valid* is less explicit than the definition of conforming validating processor, but my guess is that the way the Core WG is most likely to fix that is by making the definition of *valid* stronger, not by making the Conformance section weaker. It would be possible to expand the definition of *validating processors* to be clearer about their responsibilities in the absence of a document type declaration, and that might be a good idea. It would also probably be a good idea to clarify that as things stand <!DOCTYPE html> <html/> is, using the usual convention, _invalid_, where <html/> is neither valid _nor_ invalid, and to provide a definition of 'invalid' as "given a document type declaration, violating one or more of the constraints expressed by the declarations in the DTD, and failing to fulfill one or more of the validity constraints given in this specification". But to take account of the behaviour you cite of xmllint, likewise of rxp, (which treat the two cases above, and the even simpler <html/> case, all as instances of an idiosyncratic validity error w/o precedent in the XML spec.), we would have to define what it meant to have an _empty_ document type declaration, which would be rather more difficult, and potentially backward incompatible. Consider, for example <!DOCTYPE html []> <html/> which causes both report the 'ordinary' undeclared element error, but xmllint to cmplain of a missing DTD. Note also that <!DOCTYPE html> <hmtl/> _is_ invalid, and we wouldn't want to lose that. . . ht [1] http://www.w3.org/TR/REC-xml/#sec-conformance [2] http://www.w3.org/TR/REC-xml/#elementvalid -- Henry S. Thompson, School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: [hidden email] URL: http://www.ltg.ed.ac.uk/~ht/ [mail from me _always_ has a .sig like this -- mail without it is forged spam] |
In reply to this post by Leif Halvard Silli-4
[Some content of the original comment has been elided
and/or rearranged below.] On 2014-01-19 14:29, Leif Halvard Silli wrote: > Clarify that documents with DOCTYPE but without markup > declaration are not subject to validation > > . . . > XML 1.0 fifth edition says: > > “[Definition: An XML document is valid if it has an associated > document type declaration and if the document complies with > the constraints expressed in it.]” > > Question: But which constraints does a document type declaration > without an internal or external DTD express? > > . . . > > [S]ome XML tools reports validation constraint errors for > documents with the HTML5 doctype. This happens because the very > HTML5 DOCTYPES apparently causes some tools to dip into DTD > validation mode - and subsequently report all elements and > attributes as an error, since none of them are defined in > the (non-existing) DTD. > > When trying to discuss this behavior when XML tools developers, it > would be helpful to have an authoritative statement to point to. > > Therefore, my proposal is to extract rules or guidance for what > to do when the DOCTYPE declaration points to no markup declaration > and place this into the 6th edition of XML. (Or to put it differently: > define what to do when the DOCTYPE lacks an internal or external DTD.) > At [1] we have: Definition: An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it. At [2] we have: validity constraint [Definition: A rule which applies to all valid XML documents. Violations of validity constraints are errors; they MUST, at user option, be reported by validating XML processors.] As indicated above, a document is not valid if it violates a validity constraint. Perhaps that could be made clearer in the definition of "valid" at [1]. But given that fact, and given the "Element Valid" validity constraint at [3], and the "Attribute Value Type" validity constraint at [4], a document containing any element or attribute for which there is no declaration in the associated DTD is not valid. Put another way, one of the constraints a DTD puts on a document (for the document to be considered valid) is that the document must not contain any element or attribute that is not declared in the DTD. So a DTD that declares no elements or attributes constrains the document to have no elements or attributes to be considered valid (and such a document would not have a root element and would therefore not be valid). As far as "documents with DOCTYPE but without markup declaration are not subject to validation", the XML spec has no concept of "subject to validation". That is a tool issue. Per section 5.1 Validating and Non-Validating Processors [5]: Conforming XML processors fall into two classes: validating and non-validating. No where does the spec say that anything in the document (e.g., a doctype declaration) forces use of a validating processor. HTML5 can make its own rules about how a tool should process documents. Admittedly, if a tool is using an XML processor to process an HTML5 document, it should probably not use validation mode, but that is not something for the XML spec to address. The XML Core WG will consider issuing an erratum that augments the definition of valid at [1] to read something like: Definition: An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it and the document violates no validity constraints. We might also add a sentence to the first paragraph of the Conformance section at [5] so that that paragraph would then read something like: Conforming XML processors fall into two classes: validating and non-validating. The determination of which kind of processor to use for a given document is outside the scope of this Recommendation. We realize this still leaves unanswered the issue of how to decide if a document should be "subject to validation". At the present time at least, that issue is not addressed by the XML Recommendation. Paul Grosso for the XML Core WG [1] http://www.w3.org/TR/REC-xml/#dt-valid [2] http://www.w3.org/TR/REC-xml/#dt-vc [3] http://www.w3.org/TR/REC-xml/#elementvalid [4] http://www.w3.org/TR/REC-xml/#ValueType [5] http://www.w3.org/TR/REC-xml/#proc-types |
In reply to this post by Henry S. Thompson
Sorry for my not immediate answer. See below.
Henry S. Thompson, Tue, 21 Jan 2014 18:27:29 +0000: > Leif Halvard Silli writes: > >> A document that lacks DTD is simply ”not valid” >> <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid, >> whether it has validation errors is a question that is out of the >> question. > > I presume you're referring here to these lines near the beginning: > > [Definition: XML documents SHOULD begin with an XML declaration > which specifies the version of XML being used.] For example, the > following is a complete XML document, _well-formed_ but not _valid_: > > <?xml version="1.0"?> > <greeting>Hello, world!</greeting> > > and so is this: > > <greeting>Hello, world!</greeting> > > [emphasis in original] But, it is pretty obvious - to me - that what that section wants to point out is that the *XML* declaration has nothing to do with ”valid” or ”not valid”. Nor has it anything to do with well-formed or not well-formed. > It's not *valid*, but it's not *invalid* either: What is your point here? Is there third category, you say? What should a validating XML processor say if it parses the above document? It has always been pretty obvious - to me - that XML avoids ”invalid” simply because “invalid” has so many negative - and wrong - connotations. ”Valid” simply means ”not conforming to a spec [expressed via DTD grammar]”). And ”not valid” thus simply means that it does not conform to a spec expressed via a DTD grammar. Thus the document above *is* invalid because invalid is just XML’s unspeakable synonym for ”not valid”. > XML provides a mechanism, the document type declaration, to define > constraints on the logical structure and to support the use of > predefined storage units. [Definition: An XML document is *valid* if > it has an associated document type declaration and if the document > complies with the constraints expressed in it.] > > Each of your examples, i.e. > > <!DOCTYPE html> > <html/> > and > <!DOCTYPE html SYSTEM "about:legacy-compat"> > <html/> > > clearly does have an "associated document type declaration", and equally > clearly contain "failures to fulfill the validity constraints given in > this specification" [1], so I conclude they are not only not valid, > but invalid (although that, interestingly, is not a term defined in > the spec. The first validity constraint expressed in XML is that the DOCUMENT has a *DTD*. A grammar. A DOCTYPE without a grammar has no grammar. Is just the empty shell. > What we find at [1] is an obligation on *validating > processors* to _report_ "failures to fulfill the validity constraints > given in this specification".) What we also find is a stressing of the fact that, quote: ”it is possible to construct a well-formed document containing a doctypedecl that neither points to an external subset nor contains an internal subset”. Clearly, such a document would be ”well-formed” but as well ”not valid”. Note how the spec here says ”doctypedecl” - it refers to the formal grammar. I interpret this as if it *avoids* the word ”document type declaration”. Which is logical, when we consider that the spec, a little before that quote says, (my emphasis): ”The XML document type declaration **contains** or **points** to markup declarations that provide a grammar for a class of documents”. Something which each of my examples does not contain. (No, the about:legacy-compat is a URL that points to nowhere, thus there is not any empty DTD file anywhere.) > The validity constraint they both fail to fulfill is VC: Element Valid [2], > which requires a declaration for every element in a document. That requirement is as well not met by ”<greeting>Hello, world!</greeting>”. > It's unfortunate that the definition of *valid* is less explicit than > the definition of conforming validating processor, but my guess is > that the way the Core WG is most likely to fix that is by making the > definition of *valid* stronger, not by making the Conformance section > weaker. I have not suggested to make the conformance section weaker. My understanding is that you seek to insert a third category, while the XML spec always has only had two categories. > It would be possible to expand the definition of *validating > processors* to be clearer about their responsibilities in the absence > of a document type declaration, and that might be a good idea. > > It would also probably be a good idea to clarify that as things stand > > <!DOCTYPE html> > <html/> > > is, using the usual convention, _invalid_, where > > <html/> > > is neither valid _nor_ invalid, and to provide a definition of > 'invalid' as "given a document type declaration, violating one or more > of the constraints expressed by the declarations in the DTD, and > failing to fulfill one or more of the validity constraints given in > this specification". If so, then my message would seem to have resulted in the opposite of my intention. What is the benefit of this proposal of yours? I see none. It only would seem to strengthen the belief that it is correct to use an empty DOCTYPE declaration as trigger to start XML 1.0 validation processor mode. Because, in my case, I have a tool which support both XSD and DTD. XSD mode can bee triggered by the very presence of a XHTML namespace declaration. However, as soon my tool notifies the HTML 5 doctype (the short variant) it disables its XSD feature and starts its validation mode. > But to take account of the behaviour you cite of xmllint, > likewise of rxp, > (which treat the two cases above, and the even simpler > <html/> > case, all as instances of an idiosyncratic validity error w/o > precedent in the XML spec.), we would have to define what it meant to > have an _empty_ document type declaration, which would be rather more > difficult, and potentially backward incompatible. > > Consider, for example > > <!DOCTYPE html []> > <html/> > > which causes both report the 'ordinary' undeclared element error, but > xmllint to cmplain of a missing DTD. Which is an OK complaint provided the user/author knows that xmllint/cmplain runs in XML 1.0 validation mode! > Note also that > > <!DOCTYPE html> > <hmtl/> > > _is_ invalid, and we wouldn't want to lose that. . . It is “not valid”. If it is invalid then it is only in the ”not valid” sense. I believe have not proposed anything that could make us loose that it is not valid. > [1] http://www.w3.org/TR/REC-xml/#sec-conformance > [2] http://www.w3.org/TR/REC-xml/#elementvalid -- leif halvard sillli |
In reply to this post by Paul Grosso
Paul Grosso, Wed, 05 Feb 2014 11:19:49 -0600:
> [Some content of the original comment has been elided > and/or rearranged below.] > > On 2014-01-19 14:29, Leif Halvard Silli wrote: [ I deleted some text, for contraction ] >> Question: But which constraints does a document type declaration >> without an internal or external DTD express? >> Therefore, my proposal is to extract rules or guidance for what >> to do when the DOCTYPE declaration points to no markup declaration >> and place this into the 6th edition of XML. (Or to put it differently: >> define what to do when the DOCTYPE lacks an internal or external DTD.) > At [1] we have: > > Definition: An XML document is valid if it has an associated > document type declaration and if the document complies with > the constraints expressed in it. > > At [2] we have: > > validity constraint > > [Definition: A rule which applies to all valid XML documents. > Violations of validity constraints are errors; they MUST, at > user option, be reported by validating XML processors.] > > As indicated above, a document is not valid if it violates a > validity constraint. Perhaps that could be made clearer in > the definition of "valid" at [1]. But given that fact, and > given the "Element Valid" validity constraint at [3], and the > "Attribute Value Type" validity constraint at [4], a document > containing any element or attribute for which there is no > declaration in the associated DTD is not valid. It sounds like you treat DTD and doctype declaration as one and the same thing. They are related. But a doctypedecl is not the DTD. The DTD is just a part of the doctypedecl production. What if I send a document without a doctypedecl construct to a *validating* processor? MUST the validating processor then, at user option, report that the validity constraints are broken? When I started this reply, I meant to say that it must report validity constraints even then. But my answer now is that validation has two parts: 1) Check whether the particular rules regarding element content etc defined in the DTD are fulfilled; 2) Check whether the validity constraints are fulfilled as well. Hence, if there is no DTD, there is nothing to report except ”not valid”. My claim remains, though, that *also* for documents *with* a construct that matches the doctypedecl production, the processor must locate a DTD before it can check for fulfillment of the validity constraints. > Put another way, one of the constraints a DTD puts on a > document The meaning of ”all valid XML documents” is crucial. In what way is that a reference to a class of documents? Does it mean ”all documents with a match for the doctypedecl production? The answer is no: ”[Definition: An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.]” So whether a document is valid per its DTD is one thing. And the additional validity constraints of XML 1.0 is another thing. HOwever, the latter only applies if the document fulfills the former. So says XML 1.0. So it is not the DTD that places XML 1.0’s validity constraints on the document. It is the *conformance* with the DTD that adds the requirement to *also* fulfill the validity constraints. That XML 1.0 says that a ”validity constraint” applies to ”all valid XML documents” may sound a little bit like a tautology. But I read this as follows: With ”all valid XML documents” XML 1.0 no doubts mean every document that has been *successfully* subjected to a validating XML processor (which describes rules about what contents particular elements and attributes can have etc). For *that* class of documents, there is one set of *additional* things the documents must be fulfill, namely the validity constraints. So there are two parts of *valid*: There are those documents that are just valid. And there are those that are valid *and* fulfill the validity constraints. > (for the document to be considered valid) is that > the document must not contain any element or attribute that > is not declared in the DTD. So a DTD that declares no > elements or attributes constrains the document to have > no elements or attributes to be considered valid (and > such a document would not have a root element and would > therefore not be valid). This to me becomes a upside down. Even documents without a doctypedecl are ”constricted” to not have a DOCTYPE, a DTD or valid elements/attributes. A doctypedecl that does not point to or contain a DTD places no restrictions on the document. Such a document fails to have ”an associated document type declaration” and it can thus not comply ”with the constraints expressed in it” and therefore is *not* subject to ”validity constraint” any more a document without a doctypedecl. > As far as "documents with DOCTYPE but without markup > declaration are not subject to validation", the XML spec has > no concept of "subject to validation". That is a tool issue. > Per section 5.1 Validating and Non-Validating Processors [5]: > > Conforming XML processors fall into two classes: validating > and non-validating. > > No where does the spec say that anything in the document (e.g., > a doctype declaration) forces use of a validating processor. Right. Nevertheless is the presence of a construct that matches the ”doctypedecl” production often used as a validation trigger - something that ”turns on” the validation mode. More below. > HTML5 can make its own rules about how a tool should process > documents. Admittedly, if a tool is using an XML processor > to process an HTML5 document, it should probably not use > validation mode, but that is not something for the XML spec > to address. > > The XML Core WG will consider issuing an erratum that augments > the definition of valid at [1] to read something like: > > Definition: An XML document is valid if it has an associated > document type declaration and if the document complies with > the constraints expressed in it and the document violates no > validity constraints. > > We might also add a sentence to the first paragraph of the > Conformance section at [5] so that that paragraph would > then read something like: > > Conforming XML processors fall into two classes: validating > and non-validating. The determination of which kind of > processor to use for a given document is outside the scope > of this Recommendation. May I suggest that you add expand that to say that the presence of a construct that matches the ’doctypedecl’ production does not count as a ”trigger” that requires XML parsers to enable validation mode? Please consider that there are can be two meanings of ”trigger”/”subject to”. One is that the presence of the DOCTYPE cause the XML processor to jump into validator mode. We are in firm agreement, is seems, that the DOCTYPE is not such a trigger. And I welcome the proposed emphasizing that it isn’t such a trigger. The other meaning of trigger/subject to is where we disagree, presently. You have upheld the view that the very presence of a construct that matches the doctypedecl production allows a validating processor to check for and report validity constraints. If I understood, your justification is that a doctypedecl without a DTD constrains the document from containing valid elements/attributes/etc. But how can a document that is clearly ”not valid” be subject to validity constraints? And what ”class of documents” does such a document make up? Let us keep in mind the *purpose* of issuing document type declarations, mamely to contain or point ”to markup declarations that provide a grammar for a class of documents”! I insist that what should trigger a validating processor to check for the validity constraints is that the doctypedecl points to or contains a non-empty DTD and that the document matches that non-empty DTD. For URLs, we have the concept of empty URL. For DTDs, we do not have the concept of an empty DTD. And I fail to see how an empty grammar is different from no grammar. Perhaps we can best compare it with true/false in programming languages: No grammar should always evaluate to false, and should thus prevent the validating processor from reporting validity constraint errors as it is impossible to comply with a grammar that always evaluates to false. > We realize this still leaves unanswered the issue of how > to decide if a document should be "subject to validation". > At the present time at least, that issue is not addressed > by the XML Recommendation. > > Paul Grosso > for the XML Core WG > > > [1] http://www.w3.org/TR/REC-xml/#dt-valid > [2] http://www.w3.org/TR/REC-xml/#dt-vc > [3] http://www.w3.org/TR/REC-xml/#elementvalid > [4] http://www.w3.org/TR/REC-xml/#ValueType > [5] http://www.w3.org/TR/REC-xml/#proc-types leif halvard silli |
Leif Halvard Silli, Thu, 6 Feb 2014 07:56:11 +0100:
> validation has two parts: 1) Check > whether the particular rules regarding element content etc defined in > the DTD are fulfilled; 2) Check whether the validity constraints are > fulfilled as well. The two parts of validation with reference to the spec: 1) At the top level, validity constraint: [*] ]] validity constraint [Definition: A rule which applies to all [valid](#dt-valid) XML documents. Violations of validity constraints are errors; they MUST, at user option, be reported by validating XML processors.] [[ [*] http://www.w3.org/TR/REC-xml/#dt-vc 2) Word ”valid” above pointed to bottom level, valid document: [*] ]] [Definition: An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.] [[ [*] http://www.w3.org/TR/REC-xml/#dt-valid This two-layered validation makes sense to em. For example, there is the validity constraint that the name part of the document type declaration (e.g. ”html” in <!DOCTYPE html>) matches the name of the root element: ”The Name in the document type declaration MUST match the element type of the root element.” On the surface, this sounds like a rule that is always possible to verify. After all, what is simpler than comparing root element’s name and the DOCTYPE name? However, note the direction of the rule: It is the DOCTYPE name that must be made to match the root element type. Element type name is not required to match the DOCTYPE name. And so, when there is no DTD, it makes no sense to report that the name in the document type declaration does not mach the name of the root element. Because, after all, in such a situation - where there is no document grammar available, it is impossible to say that any of them conform to a DTD. (And , btw, this also illustrates that constructs that match the doctypedecl productoin, but without pointing to or including a DTD, constitutes no DTD!) So, for a document like this one, what is there to report? <!DOCTYPE foo> <bar/> It would please me if we can agree that validating processors may only report that this document is ”not valid”. And that’s it. Because, the condition for performing a check of the validity constraints - a valid document, is not present. Note, btw, for XHTML1 and HTML4, that, while text/html browsers only support <html> as the root element, the root element mechanism allows you to validate parts of a complete document. E.g. try pasting this into the validator at http://validator.w3.org - note the name in the DOCTYPE - and yeah, it also validates in my XML editor: <!DOCTYPE body PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <body xml:lang="en" lang="en"> <div> <p/> </div> </body> -- leif halvard silli |
In reply to this post by Paul Grosso
Paul Grosso, Wed, 05 Feb 2014 11:19:49 -0600:
> HTML5 can make its own rules about how a tool should process > documents. Admittedly, if a tool is using an XML processor > to process an HTML5 document, it should probably not use > validation mode, but that is not something for the XML spec > to address. My motivation is clearly related to HTML5. But I don’t make no special plea for HTML5. For instance, if one was to develop a official - or unofficial - DTD for HTML5 documents, it would make sense for XML tools to default to handle such documents the same way they handle other documents that associate a DTD via a document type declaration. Some other default behavior for such documents would certainly be possible, but counterproductive, to ask for. However, today, when a document is "not valid", it typically triggers DTD-free forms of conformance check, such as XSD-based and other non-DTD-based conformance sevices. For such documents, ”not valid” is often viewed as synonymous with ”without DOCTYPE”. (Btw, ”DOCTYPE”, as a shorthand for ”document type declaration”, is not found in XML 1.0!) For that reason it is quite important to maintain that it is *no hack* to realize that even documents *with* a doctypedecl construct are simply “not valid” and nothing more if the doctypedecl construct of the document contains or points to no DTD. Further more, because HTML5 has become so important and because I would like to use XML tools on HTML5 documents problem free, it is also important to stress that notifying the user about broken validity constraints for documents that are simply ”not valid”, is not in line with how validation is prescribed to happen. It is not the first time a document class has been defined without reference to DTD. But it might be the first time the (empty) mechanism for offering a DTD - the doctypedecl - has been prescribed by such a document class. And this is why it has become somewhat important not to change anything, but to point out the facts that I outlined above. Thank you for your attention. -- leif halvard silli |
Thank you for your continued interest in this matter.
The XML Core WG has discussed this issue in some detail, and the WG members do have varying thoughts on the matter. As I indicated in my earlier message, it did not answer all questions. Speaking just for myself, not the WG, my major opinions are: 1. The XML spec, which defines two processing modes (validating and non-validating) does not and should not say anything about when to use which mode. 2. It is certainly the case that nothing in the document (e.g., the existence or non-existence of any form of doctype declaration) has been defined by the XML spec to indicate which processing mode to use. 3. Therefore, a tool that does not allow itself to be used as a non-validating processor in the presence of a doctype declaration is not a tool to be used to process such things as HTML5. (Short of a tool uniquely designed to be a validator, I would expect any well-designed tool to have a "non-validating mode" and a way to put that tool into that mode regardless of anything in the document.) 4. The XML spec could be augmented to clarify that a document is valid only if it has an associated document type declaration and if the document complies with the constraints expressed in it and the document violates no validity constraints. It should not be amended to make any distinction between document type declaration constraints and validity constraints, and it should not be amended to made a special case out of any particular document type declaration (e.g., an "empty DTD"). The XML Core WG will consider your latest postings further (it may take a while, as we met every other week), and we will probably eventually have some further WG response. paul On 2014-02-06 06:07, Leif Halvard Silli wrote: > Paul Grosso, Wed, 05 Feb 2014 11:19:49 -0600: > >> HTML5 can make its own rules about how a tool should process >> documents. Admittedly, if a tool is using an XML processor >> to process an HTML5 document, it should probably not use >> validation mode, but that is not something for the XML spec >> to address. > My motivation is clearly related to HTML5. But I don’t make no special > plea for HTML5. > > For instance, if one was to develop a official - or unofficial - DTD > for HTML5 documents, it would make sense for XML tools to default to > handle such documents the same way they handle other documents that > associate a DTD via a document type declaration. Some other default > behavior for such documents would certainly be possible, but > counterproductive, to ask for. > > However, today, when a document is "not valid", it typically triggers > DTD-free forms of conformance check, such as XSD-based and other > non-DTD-based conformance sevices. For such documents, ”not valid” is > often viewed as synonymous with ”without DOCTYPE”. (Btw, ”DOCTYPE”, as > a shorthand for ”document type declaration”, is not found in XML 1.0!) > > For that reason it is quite important to maintain that it is *no hack* > to realize that even documents *with* a doctypedecl construct are > simply “not valid” and nothing more if the doctypedecl construct of the > document contains or points to no DTD. > > Further more, because HTML5 has become so important and because I would > like to use XML tools on HTML5 documents problem free, it is also > important to stress that notifying the user about broken validity > constraints for documents that are simply ”not valid”, is not in line > with how validation is prescribed to happen. > > It is not the first time a document class has been defined without > reference to DTD. But it might be the first time the (empty) mechanism > for offering a DTD - the doctypedecl - has been prescribed by such a > document class. And this is why it has become somewhat important not to > change anything, but to point out the facts that I outlined above. > > Thank you for your attention. |
Hi Paul! Some comments to things in your point 3 and 4.
But first: Much of what you say is good. But I also sense the attitude, which I have seen elsewhere, that we can somehow safe ourself out of various XML dilemmas by making (or appearing to make) the validation mode stricter and stricter. I think, instead, we need some analysis of what is going on and of whether we - any more - understand XML the way it was intended. (In reply to Paul Grosso, Thu, 06 Feb 2014 11:09:03 -0600.) Regarding this, from your point 3: > (Short of a tool uniquely designed to be a validator, I would expect > any well-designed tool to have a "non-validating mode" and a way to > put that tool into that mode regardless of anything in the document.) Do you also expect, ”at user option”, to *decide* the mode? Why do you exempt validators from your ’well-designed tool’ expectation? After all, we have validating[1], and non-validating[2] conformance checkers. Why not both kinds in one product? The issue at hand - namely, auto-magic shifts from one parser mode to the other - might then have been clarified earlier! As I make clear below, XML presupposes that the user of a validating processor knows that the tool runs a validating processor. This is not as simple as it might sound, because we seem today to have forgotten that XML requires validating tools to have *two* modes: A validity violation mode reporting and mode were validity violation reporting is disabled. The choice of mode is at user option. But when reporting is disabled, then validating mode and non-validating mode, to the user, becomes more or less identical. So we should be able to expect from tool that they tell us, before parsing, whether they are going to use validation mode or non-validation mode! Another reason to have both in one product is the parsing differences between validating and non-validating processing.[3] These difference prevail whether or not the validating software ”at user option” has been set to run with or without reporting of validity violations.[4] Validator.w3.org has no option to disable validity violation reporting. This is thus a violation of the XML 1.0 requirement that validating violation reporting in validating processors should be ”at user option”. Another tool that fails that test is Xmllint. Try this: $ xmllint --nowarning --validate validity-violating-doc A validating processor should be able to process this document with validity violation reporting disabled: <foo/> (Not having that option is a disservice to validating processors.) In order to be able to discern “no validity violation reporting” from “non-validation mode”, the user needs to know whether or not (s)he is running a validating processor. This might often be simpler to know if the tool at hand has only has a *single* processing mode. I therefore don’t think that XML share the expectations that well-designed software being able to operate in both processing modes. That ”validation” (in the broad sense) today often happens *without* DTD, supports that view. Relating this to my issue: I did clearly have in mind validating processors as such, regardless of whether the user has configured it to report validity violations or not. Because, after all, disabling DTD-based validity violation reporting should of course not cause the tool to switch to XSD - doing that would be to *deprive* the user of the choice turn validity violation on and off. To this, from your point 4: > It should not be amended to > make any distinction between document type declaration > constraints and validity constraints, and it should not > be amended to made a special case out of any particular > document type declaration (e.g., an "empty DTD"). A rush to tighten a rabbit hole? It is XML - not I - who distinguish certain sub features of the validity feature - who discerns between valid per DTD and some validity constraints on the top of that. I have not said, however, that there should be more than a single validity violation reporting mode! But we could ask: What about this document: <foo/> Or what about this document: <!DOCTYPE f><oo/> For both, Xmllint only says ”no DTD found”. A single error message. Why does it not say that the validity constraint that the element type has to be declared, has been broken? If all validity constraints applied (for the validity violation reporter part of the software), then there would be many more messages! And it would then also be non-conformance with XML not to not report them! (Since XML requires reporting of validity constraints whenever the document fulfills the DTD.) So today’s validating processors do seem to think that some documents only need more than a single error message when there is no DTD. And this is clearly inline with XML. Tightening that hole might be to *change* XML. At the same time, tool makers today knows that there might *still* be more to be said than simply ”there is no DTD”. And it is *then* they - typically silently! - make the tool shift from validating mode to non-validating mode. The shift in a tool from validating processor mode to non-validating processor mode is clearly one that happens when the tool at hand comes to the conclusion that validating mode is no longer any useful. What does *that* tell us? It tells us that, actually, the tool (and the users) perceives this as a shift not from validation mode to non-validation mode, but as a shift from *one* validation mode, to *another*, more useful, validation mode! It also tells us that *something* inside the tool has at the very least performed a pre-validation of the document. [1] http://validator.w3.org/ [2] http://validator.w3.org/nu/ [3] http://www.w3.org/TR/REC-xml/#dt-validating [4] http://www.w3.org/TR/REC-xml/#dt-atuseroption -- leif halvard silli |
Paul,
we might agree about your point 2: > 2. It is certainly the case that nothing in the document > (e.g., the existence or non-existence of any form of > doctype declaration) has been defined by the XML spec > to indicate which processing mode to use. I could live well with point 2 going into the spec. But I would suggest to clarify, in the spec, that your statement means that: A) 1) If failing to find DTDs to validate against, validating processors are not permitted to slip into non-validating processing mode and they must, unless reporting is disabled, report such violations for e.g. HTML5 documents. B) Non-validating processors are not permitted to slip into validating processing mode based on presence of a doctype. However, as a kind of compromise, I wonder what you think about, as a part 2) of of A), allowing ”double validation”: A) 2) As long as it is *clear* to the user that the processor is a validating one, validating processors *could* issue validity violations *and* the results of ”conformance checking” based on XSD or some other non-validating schema/option. This way, a validating processor could report e.g. a HTML5 document as violating validity, but as conforming per (e.g.) XSD. Thus two paralell reports. Leif Halvard Silli Leif Halvard Silli, Sat, 8 Feb 2014 14:23:26 +0100: > Hi Paul! Some comments to things in your point 3 and 4. > > But first: Much of what you say is good. But I also sense the attitude, > which I have seen elsewhere, that we can somehow safe ourself out of > various XML dilemmas by making (or appearing to make) the validation > mode stricter and stricter. I think, instead, we need some analysis of > what is going on and of whether we - any more - understand XML the way > it was intended. > > (In reply to Paul Grosso, Thu, 06 Feb 2014 11:09:03 -0600.) > > Regarding this, from your point 3: > >> (Short of a tool uniquely designed to be a validator, I would expect >> any well-designed tool to have a "non-validating mode" and a way to >> put that tool into that mode regardless of anything in the document.) > > Do you also expect, ”at user option”, to *decide* the mode? > > Why do you exempt validators from your ’well-designed tool’ > expectation? After all, we have validating[1], and non-validating[2] > conformance checkers. Why not both kinds in one product? The issue at > hand - namely, auto-magic shifts from one parser mode to the other - > might then have been clarified earlier! > > As I make clear below, XML presupposes that the user of a validating > processor knows that the tool runs a validating processor. This is not > as simple as it might sound, because we seem today to have forgotten > that XML requires validating tools to have *two* modes: A validity > violation mode reporting and mode were validity violation reporting is > disabled. The choice of mode is at user option. But when reporting is > disabled, then validating mode and non-validating mode, to the user, > becomes more or less identical. > > So we should be able to expect from tool that they tell us, before > parsing, whether they are going to use validation mode or > non-validation mode! > > Another reason to have both in one product is the parsing differences > between validating and non-validating processing.[3] These difference > prevail whether or not the validating software ”at user option” has > been set to run with or without reporting of validity violations.[4] > > Validator.w3.org has no option to disable validity violation reporting. > This is thus a violation of the XML 1.0 requirement that validating > violation reporting in validating processors should be ”at user > option”. Another tool that fails that test is Xmllint. Try this: > $ xmllint --nowarning --validate validity-violating-doc > > A validating processor should be able to process this document with > validity violation reporting disabled: > > <foo/> > > (Not having that option is a disservice to validating processors.) > > In order to be able to discern “no validity violation reporting” from > “non-validation mode”, the user needs to know whether or not (s)he is > running a validating processor. This might often be simpler to know if > the tool at hand has only has a *single* processing mode. > > I therefore don’t think that XML share the expectations that > well-designed software being able to operate in both processing modes. > That ”validation” (in the broad sense) today often happens *without* > DTD, supports that view. > > Relating this to my issue: I did clearly have in mind validating > processors as such, regardless of whether the user has configured it to > report validity violations or not. Because, after all, disabling > DTD-based validity violation reporting should of course not cause the > tool to switch to XSD - doing that would be to *deprive* the user of > the choice turn validity violation on and off. > > To this, from your point 4: > >> It should not be amended to >> make any distinction between document type declaration >> constraints and validity constraints, and it should not >> be amended to made a special case out of any particular >> document type declaration (e.g., an "empty DTD"). > > A rush to tighten a rabbit hole? It is XML - not I - who distinguish > certain sub features of the validity feature - who discerns between > valid per DTD and some validity constraints on the top of that. I have > not said, however, that there should be more than a single validity > violation reporting mode! > > But we could ask: What about this document: <foo/> > Or what about this document: <!DOCTYPE f><oo/> > > For both, Xmllint only says ”no DTD found”. A single error message. Why > does it not say that the validity constraint that the element type has > to be declared, has been broken? If all validity constraints applied > (for the validity violation reporter part of the software), then there > would be many more messages! And it would then also be non-conformance > with XML not to not report them! (Since XML requires reporting of > validity constraints whenever the document fulfills the DTD.) > > So today’s validating processors do seem to think that some documents > only need more than a single error message when there is no DTD. And > this is clearly inline with XML. Tightening that hole might be to > *change* XML. > > At the same time, tool makers today knows that there might *still* be > more to be said than simply ”there is no DTD”. And it is *then* they - > typically silently! - make the tool shift from validating mode to > non-validating mode. > > The shift in a tool from validating processor mode to non-validating > processor mode is clearly one that happens when the tool at hand comes > to the conclusion that validating mode is no longer any useful. > > What does *that* tell us? > > It tells us that, actually, the tool (and the users) perceives this as > a shift not from validation mode to non-validation mode, but as a shift > from *one* validation mode, to *another*, more useful, validation mode! > > It also tells us that *something* inside the tool has at the very least > performed a pre-validation of the document. > > [1] http://validator.w3.org/ > [2] http://validator.w3.org/nu/ > [3] http://www.w3.org/TR/REC-xml/#dt-validating > [4] http://www.w3.org/TR/REC-xml/#dt-atuseroption > -- > leif halvard silli |
Free forum by Nabble | Edit this page |