Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
The advent of HTML5/XHTML5 has made documents with a DOCTYPE without a
DTD popular.

However, some XML tools reports validation constraint errors for
documents with the HTML5 doctype. This happens because the very HTML5
DOCTYPES apparently causes some tools to dip into DTD validation mode -
and subsequently report all elements and attributes as an error, since
none of them are defined in the (non-existing) DTD. This may happen
even if the tool supports more useful conformance checking means, such
as XSD schemas etc. Thus, it happens despite that it would have been
more fruitful to go into e.g. XSD conformance checking mode (or simply
just check well-formedness).

When trying to discuss this behavior when XML tools developers, it
would be helpful to have an authoritative statement to point to.

Therefore, my proposal is to extract rules or guidance for what to do
when the DOCTYPE declaration points to no markup declaration and place
this into the 6th edition of XML. (Or to put it differently: define
what to do when the DOCTYPE lacks an internal or external DTD.)

XML 1.0 fifth edition says:

“[Definition: An XML document is valid if it has an associated document
type declaration and if the document complies with the constraints
expressed in it.]”

Question: But which constraints does a document type declaration
without an internal or external DTD express?  

Answer: "no restriction", because document type declarations are
defined to contain markup declarations, something which none of the two
HTML5 doctypes (<!DOCTYPE html SYSTEM "about:legacy-compat"> and
<!DOCTYPE html>) contain. Simply put, since the HTML5 doctypes contains
no ”element type declaration, an attribute-list declaration, an entity
declaration, or a notation declaration”, they should not be seen as
markup declarations, from validating xml processor’s point of view:

“[Definition: The XML document type declaration contains or points to
markup declarations that provide a grammar for a class of documents.
This grammar is known as a document type definition, or DTD. The
document type declaration can point to an external subset (a special
kind of external entity) containing markup declarations, or can contain
the markup declarations directly in an internal subset, or can do both.
The DTD for a document consists of both subsets taken together.]”

“[Definition: A markup declaration is an element type declaration, an
attribute-list declaration, an entity declaration, or a notation
declaration.] These declarations may be contained in whole or in part
within parameter entities, as described in the well-formedness and
validity constraints below. For further information, see 4 Physical
Structures.“

By the way: the spec contains several examples of simple documents to
which validity applies. And it would be good to includes  examples of
documents where teh doctype does not reference a markup declaration.

I may provide verbatim spec text change proposals, if this would be
useful.
--
leif halvard silli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Jirka Kosek
On 19.1.2014 21:29, Leif Halvard Silli wrote:
> Therefore, my proposal is to extract rules or guidance for what to do
> when the DOCTYPE declaration points to no markup declaration and place
> this into the 6th edition of XML. (Or to put it differently: define
> what to do when the DOCTYPE lacks an internal or external DTD.)

I don't think this makes sense. Whether validation is done is decided
not by document itself, but by processor you use -- in terms of XML 1.0
spec you can use validating or non-validating processor.
(http://www.w3.org/TR/REC-xml/#proc-types)

If some tool triggers validating mode on encountering <!DOCTYPE> then I
suggest appraoching developers of such tool and ask for some option that
will allow control of such behaviour.

I don't think that behaviour you describe is generic and is implied by
statements in XML 1.0 spec.

                                Jirka
--
------------------------------------------------------------------
  Jirka Kosek      e-mail: [hidden email]      http://xmlguru.cz
------------------------------------------------------------------
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep.
------------------------------------------------------------------
    Bringing you XML Prague conference    http://xmlprague.cz
------------------------------------------------------------------


signature.asc (265 bytes) Download Attachment
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
Jirka Kosek, Tue, 21 Jan 2014 15:17:08 +0100:

> On 19.1.2014 21:29, Leif Halvard Silli wrote:
>> Therefore, my proposal is to extract rules or guidance for what to do
>> when the DOCTYPE declaration points to no markup declaration and place
>> this into the 6th edition of XML. (Or to put it differently: define
>> what to do when the DOCTYPE lacks an internal or external DTD.)
>
> I don't think this makes sense. Whether validation is done is decided
> not by document itself, but by processor you use -- in terms of XML 1.0
> spec you can use validating or non-validating processor.
> (http://www.w3.org/TR/REC-xml/#proc-types)
>
> If some tool triggers validating mode on encountering <!DOCTYPE> then I
> suggest appraoching developers of such tool and ask for some option that
> will allow control of such behaviour.
>
> I don't think that behaviour you describe is generic and is implied by
> statements in XML 1.0 spec.

But they cannot report validity errors when the lack anything to
validate it against.

The behavior of xmllint is OK: When it fails to find a DTD, it reports
that the *process* known as validation failed: “validity error :
Validation failed: no DTD found !“ (even if I think it could delete the
phrase "validity error").

However, I have another XML tool which, in face of the HTML5 doctype,
reports an error for every single element or attribute the document
contains. And btw, that same tool shows a behavior similar to that of
xmllint if I use the SYSTEM variant of the HTML5 doctype - <!DOCTYPE
html SYSTEM "about:legacy-compat">.

A document that lacks DTD is simply ”not valid”
<http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid,
whether it has validation errors is a question that is out of the
question.

Leif Halvard Silli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
Regarding http://www.w3.org/TR/REC-xml/#proc-types, then it explains
well on what basis to report errors:

]]
[Definition: [ … snip … ] To accomplish this, validating XML processors
MUST read and process the entire DTD and all external parsed entities
referenced in the document.
[[

Leif Halvard SIlli

Leif Halvard Silli, Tue, 21 Jan 2014 18:30:39 +0100:

> Jirka Kosek, Tue, 21 Jan 2014 15:17:08 +0100:
>> On 19.1.2014 21:29, Leif Halvard Silli wrote:
>>> Therefore, my proposal is to extract rules or guidance for what to do
>>> when the DOCTYPE declaration points to no markup declaration and place
>>> this into the 6th edition of XML. (Or to put it differently: define
>>> what to do when the DOCTYPE lacks an internal or external DTD.)
>>
>> I don't think this makes sense. Whether validation is done is decided
>> not by document itself, but by processor you use -- in terms of XML 1.0
>> spec you can use validating or non-validating processor.
>> (http://www.w3.org/TR/REC-xml/#proc-types)
>>
>> If some tool triggers validating mode on encountering <!DOCTYPE> then I
>> suggest appraoching developers of such tool and ask for some option that
>> will allow control of such behaviour.
>>
>> I don't think that behaviour you describe is generic and is implied by
>> statements in XML 1.0 spec.
>
> But they cannot report validity errors when the lack anything to
> validate it against.
>
> The behavior of xmllint is OK: When it fails to find a DTD, it reports
> that the *process* known as validation failed: “validity error :
> Validation failed: no DTD found !“ (even if I think it could delete the
> phrase "validity error").
>
> However, I have another XML tool which, in face of the HTML5 doctype,
> reports an error for every single element or attribute the document
> contains. And btw, that same tool shows a behavior similar to that of
> xmllint if I use the SYSTEM variant of the HTML5 doctype - <!DOCTYPE
> html SYSTEM "about:legacy-compat">.
>
> A document that lacks DTD is simply ”not valid”
> <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid,
> whether it has validation errors is a question that is out of the
> question.
>
> Leif Halvard Silli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Henry S. Thompson
In reply to this post by Leif Halvard Silli-4
Leif Halvard Silli writes:

> A document that lacks DTD is simply ”not valid”
> <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid,
> whether it has validation errors is a question that is out of the
> question.

I presume you're referring here to these lines near the beginning:

  [Definition: XML documents SHOULD begin with an XML declaration
  which specifies the version of XML being used.] For example, the
  following is a complete XML document, _well-formed_ but not _valid_:

  <?xml version="1.0"?>
  <greeting>Hello, world!</greeting>

  and so is this:

  <greeting>Hello, world!</greeting>

  [emphasis in original]

It's not *valid*, but it's not *invalid* either:

  XML provides a mechanism, the document type declaration, to define
  constraints on the logical structure and to support the use of
  predefined storage units. [Definition: An XML document is *valid* if
  it has an associated document type declaration and if the document
  complies with the constraints expressed in it.]

Each of your examples, i.e.

  <!DOCTYPE html>
  <html/>
and
  <!DOCTYPE html SYSTEM "about:legacy-compat">
  <html/>

clearly does have an "associated document type declaration", and equally
clearly contain "failures to fulfill the validity constraints given in
this specification" [1], so I conclude they are not only not valid,
but invalid (although that, interestingly, is not a term defined in
the spec.  What we find at [1] is an obligation on *validating
processors* to _report_ "failures to fulfill the validity constraints
given in this specification".)

The validity constraint they both fail to fulfill is VC: Element Valid [2],
which requires a declaration for every element in a document.

It's unfortunate that the definition of *valid* is less explicit than
the definition of conforming validating processor, but my guess is
that the way the Core WG is most likely to fix that is by making the
definition of *valid* stronger, not by making the Conformance section
weaker.

It would be possible to expand the definition of *validating
processors* to be clearer about their responsibilities in the absence
of a document type declaration, and that might be a good idea.

It would also probably be a good idea to clarify that as things stand

  <!DOCTYPE html>
  <html/>

is, using the usual convention, _invalid_, where

  <html/>

is neither valid _nor_ invalid, and to provide a definition of
'invalid' as "given a document type declaration, violating one or more
of the constraints expressed by the declarations in the DTD, and
failing to fulfill one or more of the validity constraints given in
this specification".

But to take account of the behaviour you cite of xmllint,
likewise of rxp,
(which treat the two cases above, and the even simpler
 <html/>
case, all as instances of an idiosyncratic validity error w/o
precedent in the XML spec.), we would have to define what it meant to
have an _empty_ document type declaration, which would be rather more
difficult, and potentially backward incompatible.

Consider, for example

  <!DOCTYPE html []>
  <html/>

which causes both report the 'ordinary' undeclared element error, but
xmllint to cmplain of a missing DTD.

Note also that

  <!DOCTYPE html>
  <hmtl/>

_is_ invalid, and we wouldn't want to lose that. . .

ht

[1] http://www.w3.org/TR/REC-xml/#sec-conformance
[2] http://www.w3.org/TR/REC-xml/#elementvalid
--
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: [hidden email]
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]



Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Paul Grosso
In reply to this post by Leif Halvard Silli-4
[Some content of the original comment has been elided
and/or rearranged below.]

On 2014-01-19 14:29, Leif Halvard Silli wrote:
 > Clarify that documents with DOCTYPE but without markup
 > declaration are not subject to validation
 >
 > . . .
 > XML 1.0 fifth edition says:
 >
 > “[Definition: An XML document is valid if it has an associated
 > document type declaration and if the document complies with
 > the constraints expressed in it.]”
 >
 > Question: But which constraints does a document type declaration
 > without an internal or external DTD express?
 >
 > . . .
 >
 > [S]ome XML tools reports validation constraint errors for
 > documents with the HTML5 doctype. This happens because the very
 > HTML5 DOCTYPES apparently causes some tools to dip into DTD
 > validation mode - and subsequently report all elements and
 > attributes as an error, since none of them are defined in
 > the (non-existing) DTD.
 >
 > When trying to discuss this behavior when XML tools developers, it
 > would be helpful to have an authoritative statement to point to.
 >
 > Therefore, my proposal is to extract rules or guidance for what
 > to do when the DOCTYPE declaration points to no markup declaration
 > and place this into the 6th edition of XML. (Or to put it differently:
 > define what to do when the DOCTYPE lacks an internal or external DTD.)
 >


At [1] we have:

  Definition: An XML document is valid if it has an associated
  document type declaration and if the document complies with
  the constraints expressed in it.

At [2] we have:

  validity constraint

  [Definition: A rule which applies to all valid XML documents.
  Violations of validity constraints are errors; they MUST, at
  user option, be reported by validating XML processors.]

As indicated above, a document is not valid if it violates a
validity constraint. Perhaps that could be made clearer in
the definition of "valid" at [1]. But given that fact, and
given the "Element Valid" validity constraint at [3], and the
"Attribute Value Type" validity constraint at [4], a document
containing any element or attribute for which there is no
declaration in the associated DTD is not valid.

Put another way, one of the constraints a DTD puts on a
document (for the document to be considered valid) is that
the document must not contain any element or attribute that
is not declared in the DTD. So a DTD that declares no
elements or attributes constrains the document to have
no elements or attributes to be considered valid (and
such a document would not have a root element and would
therefore not be valid).

As far as "documents with DOCTYPE but without markup
declaration are not subject to validation", the XML spec has
no concept of "subject to validation". That is a tool issue.
Per section 5.1 Validating and Non-Validating Processors [5]:

  Conforming XML processors fall into two classes: validating
  and non-validating.

No where does the spec say that anything in the document (e.g.,
a doctype declaration) forces use of a validating processor.

HTML5 can make its own rules about how a tool should process
documents. Admittedly, if a tool is using an XML processor
to process an HTML5 document, it should probably not use
validation mode, but that is not something for the XML spec
to address.

The XML Core WG will consider issuing an erratum that augments
the definition of valid at [1] to read something like:

  Definition: An XML document is valid if it has an associated
  document type declaration and if the document complies with
  the constraints expressed in it and the document violates no
  validity constraints.

We might also add a sentence to the first paragraph of the
Conformance section at [5] so that that paragraph would
then read something like:

  Conforming XML processors fall into two classes: validating
  and non-validating.  The determination of which kind of
  processor to use for a given document is outside the scope
  of this Recommendation.

We realize this still leaves unanswered the issue of how
to decide if a document should be "subject to validation".
At the present time at least, that issue is not addressed
by the XML Recommendation.

Paul Grosso
for the XML Core WG


[1] http://www.w3.org/TR/REC-xml/#dt-valid
[2] http://www.w3.org/TR/REC-xml/#dt-vc
[3] http://www.w3.org/TR/REC-xml/#elementvalid
[4] http://www.w3.org/TR/REC-xml/#ValueType
[5] http://www.w3.org/TR/REC-xml/#proc-types



Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
In reply to this post by Henry S. Thompson
Sorry for my not immediate answer. See below.

Henry S. Thompson, Tue, 21 Jan 2014 18:27:29 +0000:

> Leif Halvard Silli writes:
>
>> A document that lacks DTD is simply ”not valid”
>> <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid,
>> whether it has validation errors is a question that is out of the
>> question.
>
> I presume you're referring here to these lines near the beginning:
>
>   [Definition: XML documents SHOULD begin with an XML declaration
>   which specifies the version of XML being used.] For example, the
>   following is a complete XML document, _well-formed_ but not _valid_:
>
>   <?xml version="1.0"?>
>   <greeting>Hello, world!</greeting>
>
>   and so is this:
>
>   <greeting>Hello, world!</greeting>
>
>   [emphasis in original]

But, it is pretty obvious - to me - that what that section wants to
point out is that the *XML* declaration has nothing to do with ”valid”
or ”not valid”. Nor has it anything to do with well-formed or not
well-formed.

> It's not *valid*, but it's not *invalid* either:

What is your point here? Is there third category, you say? What should
a validating XML processor say if it parses the above document?

It has always been pretty obvious - to me - that XML avoids ”invalid”
simply because “invalid” has so many negative - and wrong -
connotations. ”Valid” simply means ”not conforming to a spec [expressed
via DTD grammar]”). And ”not valid” thus simply means that it does not
conform to a spec expressed via a DTD grammar.

Thus the document above *is* invalid because invalid is just XML’s
unspeakable synonym for ”not valid”.

>   XML provides a mechanism, the document type declaration, to define
>   constraints on the logical structure and to support the use of
>   predefined storage units. [Definition: An XML document is *valid* if
>   it has an associated document type declaration and if the document
>   complies with the constraints expressed in it.]
>
> Each of your examples, i.e.
>
>   <!DOCTYPE html>
>   <html/>
> and
>   <!DOCTYPE html SYSTEM "about:legacy-compat">
>   <html/>
>
> clearly does have an "associated document type declaration", and equally
> clearly contain "failures to fulfill the validity constraints given in
> this specification" [1], so I conclude they are not only not valid,
> but invalid (although that, interestingly, is not a term defined in
> the spec.

The first validity constraint expressed in XML is that the DOCUMENT has
a *DTD*. A grammar. A DOCTYPE without a grammar has no grammar. Is just
the empty shell.

>  What we find at [1] is an obligation on *validating
> processors* to _report_ "failures to fulfill the validity constraints
> given in this specification".)

What we also find is a stressing of the fact that, quote: ”it is
possible to construct a well-formed document containing a doctypedecl
that neither points to an external subset nor contains an internal
subset”. Clearly, such a document would be ”well-formed” but as well
”not valid”.

Note how the spec here says ”doctypedecl” - it refers to the formal
grammar. I interpret this as if it *avoids* the word ”document type
declaration”.

Which is logical, when we consider that the spec, a little before that
quote says, (my emphasis): ”The XML document type declaration
**contains** or **points** to markup declarations that provide a
grammar for a class of documents”. Something which each of my examples
does not contain. (No, the about:legacy-compat is a URL that points to
nowhere, thus there is not any empty DTD file anywhere.)

> The validity constraint they both fail to fulfill is VC: Element Valid [2],
> which requires a declaration for every element in a document.

That requirement is as well not met by ”<greeting>Hello,
world!</greeting>”.

> It's unfortunate that the definition of *valid* is less explicit than
> the definition of conforming validating processor, but my guess is
> that the way the Core WG is most likely to fix that is by making the
> definition of *valid* stronger, not by making the Conformance section
> weaker.

I have not suggested to make the conformance section weaker. My
understanding is that you seek to insert a third category, while the
XML spec always has only had two categories.

> It would be possible to expand the definition of *validating
> processors* to be clearer about their responsibilities in the absence
> of a document type declaration, and that might be a good idea.
>
> It would also probably be a good idea to clarify that as things stand
>
>   <!DOCTYPE html>
>   <html/>
>
> is, using the usual convention, _invalid_, where
>
>   <html/>
>
> is neither valid _nor_ invalid, and to provide a definition of
> 'invalid' as "given a document type declaration, violating one or more
> of the constraints expressed by the declarations in the DTD, and
> failing to fulfill one or more of the validity constraints given in
> this specification".

If so, then my message would seem to have resulted in the opposite of
my intention.

What is the benefit of this proposal of yours? I see none. It only
would seem to strengthen the belief that it is correct to use an empty
DOCTYPE declaration as trigger to start XML 1.0 validation processor
mode.

Because, in my case, I have a tool which support both XSD and DTD. XSD
mode can bee triggered by the very presence of a XHTML namespace
declaration. However, as soon my tool notifies the HTML 5 doctype (the
short variant) it disables its XSD feature and starts its validation
mode.

> But to take account of the behaviour you cite of xmllint,
> likewise of rxp,
> (which treat the two cases above, and the even simpler
>  <html/>
> case, all as instances of an idiosyncratic validity error w/o
> precedent in the XML spec.), we would have to define what it meant to
> have an _empty_ document type declaration, which would be rather more
> difficult, and potentially backward incompatible.
>
> Consider, for example
>
>   <!DOCTYPE html []>
>   <html/>
>
> which causes both report the 'ordinary' undeclared element error, but
> xmllint to cmplain of a missing DTD.

Which is an OK complaint provided the user/author knows that
xmllint/cmplain runs in XML 1.0 validation mode!

> Note also that
>
>   <!DOCTYPE html>
>   <hmtl/>
>
> _is_ invalid, and we wouldn't want to lose that. . .

It is “not valid”. If it is invalid then it is only in the ”not valid”
sense. I believe have not proposed anything that could make us loose
that it is not valid.

> [1] http://www.w3.org/TR/REC-xml/#sec-conformance
> [2] http://www.w3.org/TR/REC-xml/#elementvalid
--
leif halvard sillli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
In reply to this post by Paul Grosso
Paul Grosso, Wed, 05 Feb 2014 11:19:49 -0600:
> [Some content of the original comment has been elided
> and/or rearranged below.]
>
> On 2014-01-19 14:29, Leif Halvard Silli wrote:

  [ I deleted some text, for contraction ]

>> Question: But which constraints does a document type declaration
>> without an internal or external DTD express?

>> Therefore, my proposal is to extract rules or guidance for what
>> to do when the DOCTYPE declaration points to no markup declaration
>> and place this into the 6th edition of XML. (Or to put it differently:
>> define what to do when the DOCTYPE lacks an internal or external DTD.)

> At [1] we have:
>
>  Definition: An XML document is valid if it has an associated
>  document type declaration and if the document complies with
>  the constraints expressed in it.
>
> At [2] we have:
>
>  validity constraint
>
>  [Definition: A rule which applies to all valid XML documents.
>  Violations of validity constraints are errors; they MUST, at
>  user option, be reported by validating XML processors.]
>
> As indicated above, a document is not valid if it violates a
> validity constraint. Perhaps that could be made clearer in
> the definition of "valid" at [1]. But given that fact, and
> given the "Element Valid" validity constraint at [3], and the
> "Attribute Value Type" validity constraint at [4], a document
> containing any element or attribute for which there is no
> declaration in the associated DTD is not valid.

It sounds like you treat DTD and doctype declaration as one and the
same thing. They are related. But a doctypedecl is not the DTD. The DTD
is just a part of the doctypedecl production.

What if I send a document without a doctypedecl construct to a
*validating* processor? MUST the validating processor then, at user
option, report that the validity constraints are broken? When I started
this reply, I meant to say that it must report validity constraints
even then. But my answer now is that validation has two parts: 1) Check
whether the particular rules regarding element content etc defined in
the DTD are fulfilled; 2) Check whether the validity constraints are
fulfilled as well. Hence, if there is no DTD, there is nothing to
report except ”not valid”. My claim remains, though, that *also* for
documents *with* a construct that matches the doctypedecl production,
the processor must locate a DTD before it can check for fulfillment of
the validity constraints.

> Put another way, one of the constraints a DTD puts on a
> document

The meaning of ”all valid XML documents” is crucial. In what way is
that a reference to a class of documents? Does it mean ”all documents
with a match for the doctypedecl production? The answer is no:
”[Definition: An XML document is valid if it has an associated document
type declaration and if the document complies with the constraints
expressed in it.]” So whether a document is valid per its DTD is one
thing. And the additional validity constraints of XML 1.0 is another
thing. HOwever, the latter only applies if the document fulfills the
former. So says XML 1.0.

So it is not the DTD that places XML 1.0’s validity constraints on the
document. It is the *conformance* with the DTD that adds the
requirement to *also* fulfill the validity constraints.

That XML 1.0 says that a ”validity constraint” applies to ”all valid
XML documents” may sound a little bit like a tautology. But I read this
as follows: With ”all valid XML documents” XML 1.0 no doubts mean every
document that has been *successfully* subjected to a validating XML
processor (which describes rules about what contents particular
elements and attributes can have etc). For *that* class of documents,
there is one set of *additional* things the documents must be fulfill,
namely the validity constraints.

So there are two parts of *valid*: There are those documents that are
just valid. And there are those that are valid *and* fulfill the
validity constraints.

> (for the document to be considered valid) is that
> the document must not contain any element or attribute that
> is not declared in the DTD. So a DTD that declares no
> elements or attributes constrains the document to have
> no elements or attributes to be considered valid (and
> such a document would not have a root element and would
> therefore not be valid).

This to me becomes a upside down. Even documents without a doctypedecl
are ”constricted” to not have a DOCTYPE, a DTD or valid
elements/attributes.  A doctypedecl that does not point to or contain a
DTD places no restrictions on the document. Such a document fails to
have ”an associated document type declaration” and it can thus not
comply ”with the constraints expressed in it” and therefore is *not*
subject to ”validity constraint” any more a document without a
doctypedecl.

> As far as "documents with DOCTYPE but without markup
> declaration are not subject to validation", the XML spec has
> no concept of "subject to validation". That is a tool issue.
> Per section 5.1 Validating and Non-Validating Processors [5]:
>
>  Conforming XML processors fall into two classes: validating
>  and non-validating.
>
> No where does the spec say that anything in the document (e.g.,
> a doctype declaration) forces use of a validating processor.

Right. Nevertheless is the presence of a construct that matches the
”doctypedecl” production often used as a validation trigger - something
that ”turns on” the validation mode. More below.

> HTML5 can make its own rules about how a tool should process
> documents. Admittedly, if a tool is using an XML processor
> to process an HTML5 document, it should probably not use
> validation mode, but that is not something for the XML spec
> to address.
>
> The XML Core WG will consider issuing an erratum that augments
> the definition of valid at [1] to read something like:
>
>  Definition: An XML document is valid if it has an associated
>  document type declaration and if the document complies with
>  the constraints expressed in it and the document violates no
>  validity constraints.
>
> We might also add a sentence to the first paragraph of the
> Conformance section at [5] so that that paragraph would
> then read something like:
>
>  Conforming XML processors fall into two classes: validating
>  and non-validating.  The determination of which kind of
>  processor to use for a given document is outside the scope
>  of this Recommendation.

May I suggest that you add expand that to say that the presence of a
construct that matches the ’doctypedecl’ production does not count as a
”trigger” that requires XML parsers to enable validation mode?

Please consider that there are can be two meanings of
”trigger”/”subject to”. One is that the presence of the DOCTYPE cause
the XML processor to jump into validator mode. We are in firm
agreement, is seems, that the DOCTYPE is not such a trigger. And I
welcome the proposed emphasizing that it isn’t such a trigger.    

The other meaning of trigger/subject to is where we disagree,
presently. You have upheld the view that the very presence of a
construct that matches the doctypedecl production allows a validating
processor to check for and report validity constraints. If I
understood, your justification is that a doctypedecl without a DTD
constrains the document from containing valid elements/attributes/etc.
But how can a document that is clearly ”not valid” be subject to
validity constraints? And what ”class of documents” does such a
document make up? Let us keep in mind the *purpose* of issuing document
type declarations, mamely to contain or point ”to markup declarations
that provide a grammar for a class of documents”!

I insist that what should trigger a validating processor to check for
the validity constraints is that the doctypedecl points to or contains
a non-empty DTD and that the document matches that non-empty DTD.

For URLs, we have the concept of empty URL. For DTDs, we do not have
the concept of an empty DTD. And I fail to see how an empty grammar is
different from no grammar. Perhaps we can best compare it with
true/false in programming languages: No grammar should always evaluate
to false, and should thus prevent the validating processor from
reporting validity constraint errors as it is impossible to comply with
a grammar that always evaluates to false.

> We realize this still leaves unanswered the issue of how
> to decide if a document should be "subject to validation".
> At the present time at least, that issue is not addressed
> by the XML Recommendation.
>
> Paul Grosso
> for the XML Core WG
>
>
> [1] http://www.w3.org/TR/REC-xml/#dt-valid
> [2] http://www.w3.org/TR/REC-xml/#dt-vc
> [3] http://www.w3.org/TR/REC-xml/#elementvalid
> [4] http://www.w3.org/TR/REC-xml/#ValueType
> [5] http://www.w3.org/TR/REC-xml/#proc-types
--
leif halvard silli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
Leif Halvard Silli, Thu, 6 Feb 2014 07:56:11 +0100:

> validation has two parts: 1) Check
> whether the particular rules regarding element content etc defined in
> the DTD are fulfilled; 2) Check whether the validity constraints are
> fulfilled as well.

The two parts of validation with reference to the spec:

1) At the top level, validity constraint: [*]

  ]] validity constraint
       [Definition: A rule which applies to all [valid](#dt-valid)
       XML documents. Violations of validity constraints are errors;
       they MUST, at user option, be reported by validating XML
       processors.] [[
       [*] http://www.w3.org/TR/REC-xml/#dt-vc

2) Word ”valid” above pointed to bottom level, valid document: [*]

  ]]   [Definition: An XML document is valid if it has an
       associated document type declaration and if the document
       complies with the constraints expressed in it.] [[
       [*] http://www.w3.org/TR/REC-xml/#dt-valid

This two-layered validation makes sense to em.

For example, there is the validity constraint that the name part of the
document type declaration (e.g. ”html” in <!DOCTYPE html>) matches the
name of the root element: ”The Name in the document type declaration
MUST match the element type of the root element.” On the surface, this
sounds like a rule that is always possible to verify. After all, what
is simpler than comparing root element’s name and the DOCTYPE name?

However, note the direction of the rule: It is the DOCTYPE name that
must be made to match the root element type. Element type name is not
required to match the DOCTYPE name. And so, when there is no DTD, it
makes no sense to report that the name in the document type declaration
does not mach the name of the root element. Because, after all, in such
a situation - where there is no document grammar available, it is
impossible to say that any of them conform to a DTD. (And , btw, this
also illustrates that constructs that match the doctypedecl productoin,
but without pointing to or including a DTD, constitutes no DTD!)

So, for a document like this one, what is there to report?

   <!DOCTYPE foo>
   <bar/>

It would please me if we can agree that validating processors may only
report that this document is ”not valid”. And that’s it. Because, the
condition for performing a check of the validity constraints - a valid
document, is not present.

Note, btw, for XHTML1 and HTML4, that, while text/html browsers only
support <html> as the root element, the root element mechanism allows
you to validate parts of a complete document. E.g. try pasting this
into the validator at http://validator.w3.org - note the name in the
DOCTYPE - and yeah, it also validates in my XML editor:

<!DOCTYPE body PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <body xml:lang="en" lang="en">
   <div>
        <p/>
   </div>
 </body>
--
leif halvard silli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
In reply to this post by Paul Grosso
Paul Grosso, Wed, 05 Feb 2014 11:19:49 -0600:

> HTML5 can make its own rules about how a tool should process
> documents. Admittedly, if a tool is using an XML processor
> to process an HTML5 document, it should probably not use
> validation mode, but that is not something for the XML spec
> to address.

My motivation is clearly related to HTML5. But I don’t make no special
plea for HTML5.

For instance, if one was to develop a official - or unofficial - DTD
for HTML5 documents, it would make sense for XML tools to default to
handle such documents the same way they handle other documents that
associate a DTD via a document type declaration. Some other default
behavior for such documents would certainly be possible, but
counterproductive, to ask for.

However, today, when a document is "not valid", it typically triggers
DTD-free forms of conformance check, such as XSD-based and other
non-DTD-based conformance sevices. For such documents, ”not valid” is
often viewed as synonymous with ”without DOCTYPE”. (Btw, ”DOCTYPE”, as
a shorthand for ”document type declaration”, is not found in XML 1.0!)

For that reason it is quite important to maintain that it is *no hack*
to realize that even documents *with* a doctypedecl construct are
simply “not valid” and nothing more if the doctypedecl construct of the
document contains or points to no DTD.

Further more, because HTML5 has become so important and because I would
like to use XML tools on HTML5 documents problem free, it is also
important to stress that notifying the user about broken validity
constraints for documents that are simply ”not valid”, is not in line
with how validation is prescribed to happen.

It is not the first time a document class has been defined without
reference to DTD. But it might be the first time the (empty) mechanism
for offering a DTD - the doctypedecl - has been prescribed by such a
document class. And this is why it has become somewhat important not to
change anything, but to point out the facts that I outlined above.

Thank you for your attention.
--
leif halvard silli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Paul Grosso
Thank you for your continued interest in this matter.
The XML Core WG has discussed this issue in some detail,
and the WG members do have varying thoughts on the matter.
As I indicated in my earlier message, it did not answer
all questions.

Speaking just for myself, not the WG, my major opinions are:

1.  The XML spec, which defines two processing modes
     (validating and non-validating) does not and should not
     say anything about when to use which mode.

2.  It is certainly the case that nothing in the document
     (e.g., the existence or non-existence of any form of
     doctype declaration) has been defined by the XML spec
     to indicate which processing mode to use.

3.  Therefore, a tool that does not allow itself to be used
     as a non-validating processor in the presence of a
     doctype declaration is not a tool to be used to process
     such things as HTML5.  (Short of a tool uniquely designed
     to be a validator, I would expect any well-designed tool
     to have a "non-validating mode" and a way to put that tool
     into that mode regardless of anything in the document.)

4.  The XML spec could be augmented to clarify that a document
     is valid only if it has an associated document type
     declaration and if the document complies with the
     constraints expressed in it and the document violates
     no validity constraints.  It should not be amended to
     make any distinction between document type declaration
     constraints and validity constraints, and it should not
     be amended to made a special case out of any particular
     document type declaration (e.g., an "empty DTD").

The XML Core WG will consider your latest postings further
(it may take a while, as we met every other week), and we
will probably eventually have some further WG response.

paul

On 2014-02-06 06:07, Leif Halvard Silli wrote:

> Paul Grosso, Wed, 05 Feb 2014 11:19:49 -0600:
>
>> HTML5 can make its own rules about how a tool should process
>> documents. Admittedly, if a tool is using an XML processor
>> to process an HTML5 document, it should probably not use
>> validation mode, but that is not something for the XML spec
>> to address.
> My motivation is clearly related to HTML5. But I don’t make no special
> plea for HTML5.
>
> For instance, if one was to develop a official - or unofficial - DTD
> for HTML5 documents, it would make sense for XML tools to default to
> handle such documents the same way they handle other documents that
> associate a DTD via a document type declaration. Some other default
> behavior for such documents would certainly be possible, but
> counterproductive, to ask for.
>
> However, today, when a document is "not valid", it typically triggers
> DTD-free forms of conformance check, such as XSD-based and other
> non-DTD-based conformance sevices. For such documents, ”not valid” is
> often viewed as synonymous with ”without DOCTYPE”. (Btw, ”DOCTYPE”, as
> a shorthand for ”document type declaration”, is not found in XML 1.0!)
>
> For that reason it is quite important to maintain that it is *no hack*
> to realize that even documents *with* a doctypedecl construct are
> simply “not valid” and nothing more if the doctypedecl construct of the
> document contains or points to no DTD.
>
> Further more, because HTML5 has become so important and because I would
> like to use XML tools on HTML5 documents problem free, it is also
> important to stress that notifying the user about broken validity
> constraints for documents that are simply ”not valid”, is not in line
> with how validation is prescribed to happen.
>
> It is not the first time a document class has been defined without
> reference to DTD. But it might be the first time the (empty) mechanism
> for offering a DTD - the doctypedecl - has been prescribed by such a
> document class. And this is why it has become somewhat important not to
> change anything, but to point out the facts that I outlined above.
>
> Thank you for your attention.


Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
Hi Paul! Some comments to things in your point 3 and 4.

But first: Much of what you say is good. But I also sense the attitude,
which I have seen elsewhere, that we can somehow safe ourself out of
various XML dilemmas by making (or appearing to make) the validation
mode stricter and stricter. I think, instead, we need some analysis of
what is going on and of whether we - any more - understand XML the way
it was intended.

(In reply to Paul Grosso, Thu, 06 Feb 2014 11:09:03 -0600.)

Regarding this, from your point 3:

> (Short of a tool uniquely designed to be a validator, I would expect
> any well-designed tool to have a "non-validating mode" and a way to
> put that tool into that mode regardless of anything in the document.)

Do you also expect, ”at user option”, to *decide* the mode?

Why do you exempt validators from your ’well-designed tool’
expectation? After all, we have validating[1], and non-validating[2]
conformance checkers. Why not both kinds in one product?  The issue at
hand  - namely, auto-magic shifts from one parser mode to the other -
might then have been clarified earlier!

As I make clear below, XML presupposes that the user of a validating
processor knows that the tool runs a validating processor. This is not
as simple as it might sound, because we seem today to have forgotten
that XML requires validating tools to have *two* modes: A validity
violation mode reporting and mode were validity violation reporting is
disabled. The choice of mode is at user option. But when reporting is
disabled, then validating mode and non-validating mode, to the user,
becomes more or less identical.

So we should be able to expect from tool that they tell us, before
parsing, whether they are going to use validation mode or
non-validation mode!

Another reason to have both in one product is the parsing differences
between validating and non-validating processing.[3] These difference
prevail whether or not the validating software ”at user option” has
been set to run with or without reporting of validity violations.[4]

Validator.w3.org has no option to disable validity violation reporting.
This is thus a violation of the XML 1.0 requirement that validating
violation reporting in validating processors should be ”at user
option”. Another tool that fails that test is Xmllint. Try this:
  $ xmllint --nowarning --validate validity-violating-doc

A validating processor should be able to process this document with
validity violation reporting disabled:

   <foo/>

(Not having that option is a disservice to validating processors.)

In order to be able to discern “no validity violation reporting” from
“non-validation mode”, the user needs to know whether or not (s)he is
running a validating processor. This might often be simpler to know if
the tool at hand has only has a *single* processing mode.

I therefore don’t think that XML share the expectations that
well-designed software being able to operate in both processing modes.
That ”validation” (in the broad sense) today often happens *without*
DTD, supports that view.

Relating this to my issue: I did clearly have in mind validating
processors as such, regardless of whether the user has configured it to
report validity violations or not. Because, after all, disabling
DTD-based validity violation reporting should of course not cause the
tool to switch to XSD - doing that would be to *deprive* the user of
the choice turn validity violation on and off.

To this, from your point 4:

>     It should not be amended to
>     make any distinction between document type declaration
>     constraints and validity constraints, and it should not
>     be amended to made a special case out of any particular
>     document type declaration (e.g., an "empty DTD").

A rush to tighten a rabbit hole? It is XML - not I - who distinguish
certain sub features of the validity feature - who discerns between
valid per DTD and some validity constraints on the top of that. I have
not said, however, that there should be more than a single validity
violation reporting mode!

But we could ask: What about this document: <foo/>
Or what about this document: <!DOCTYPE f><oo/>

For both, Xmllint only says ”no DTD found”. A single error message. Why
does it not say that the validity constraint that the element type has
to be declared, has been broken? If all validity constraints applied
(for  the validity violation reporter part of the software), then there
would be many more messages! And it would then also be non-conformance
with XML not to not report them! (Since XML requires reporting of
validity constraints whenever the document fulfills the DTD.)

So today’s validating processors do seem to think that some documents
only need more than a single error message when there is no DTD. And
this is clearly inline with XML. Tightening that hole might be to
*change* XML.

At the same time, tool makers today knows that there might *still* be
more to be said than simply ”there is no DTD”. And it is *then* they -
typically silently! - make the tool shift from validating mode to
non-validating mode.

The shift in a tool from validating processor mode to non-validating
processor mode is clearly one that happens when the tool at hand comes
to the conclusion that validating mode is no longer any useful.

What does *that* tell us?

It tells us that, actually, the tool (and the users) perceives this as
a shift not from validation mode to non-validation mode, but as a shift
from *one* validation mode, to *another*, more useful, validation mode!

It also tells us that *something* inside the tool has at the very least
performed a pre-validation of the document.

[1] http://validator.w3.org/
[2] http://validator.w3.org/nu/
[3] http://www.w3.org/TR/REC-xml/#dt-validating
[4] http://www.w3.org/TR/REC-xml/#dt-atuseroption
--
leif halvard silli
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Leif Halvard Silli-4
Paul,

we might agree about your point 2:

> 2.  It is certainly the case that nothing in the document
>     (e.g., the existence or non-existence of any form of
>     doctype declaration) has been defined by the XML spec
>     to indicate which processing mode to use.

I could live well with point 2 going into the spec. But I would
suggest to clarify, in the spec, that your statement means that:

A)
   1) If failing to find DTDs to validate against, validating
      processors are not permitted to slip into non-validating
      processing mode and they must, unless reporting is disabled,
      report such violations for e.g. HTML5 documents.
B)    Non-validating processors are not permitted to slip into
      validating processing mode based on presence of a doctype.

However, as a kind of compromise, I wonder what you think about, as a
part 2) of of A), allowing ”double validation”:

A)
   2) As long as it is *clear* to the user that the processor is a
      validating one, validating processors *could* issue validity
      violations *and* the results of ”conformance checking” based
      on XSD or some other non-validating schema/option.

This way, a validating processor could report e.g. a HTML5 document as
violating validity, but as conforming per (e.g.) XSD. Thus two paralell
reports.

Leif Halvard Silli

Leif Halvard Silli, Sat, 8 Feb 2014 14:23:26 +0100:

> Hi Paul! Some comments to things in your point 3 and 4.
>
> But first: Much of what you say is good. But I also sense the attitude,
> which I have seen elsewhere, that we can somehow safe ourself out of
> various XML dilemmas by making (or appearing to make) the validation
> mode stricter and stricter. I think, instead, we need some analysis of
> what is going on and of whether we - any more - understand XML the way
> it was intended.
>
> (In reply to Paul Grosso, Thu, 06 Feb 2014 11:09:03 -0600.)
>
> Regarding this, from your point 3:
>
>> (Short of a tool uniquely designed to be a validator, I would expect
>> any well-designed tool to have a "non-validating mode" and a way to
>> put that tool into that mode regardless of anything in the document.)
>
> Do you also expect, ”at user option”, to *decide* the mode?
>
> Why do you exempt validators from your ’well-designed tool’
> expectation? After all, we have validating[1], and non-validating[2]
> conformance checkers. Why not both kinds in one product?  The issue at
> hand  - namely, auto-magic shifts from one parser mode to the other -
> might then have been clarified earlier!
>
> As I make clear below, XML presupposes that the user of a validating
> processor knows that the tool runs a validating processor. This is not
> as simple as it might sound, because we seem today to have forgotten
> that XML requires validating tools to have *two* modes: A validity
> violation mode reporting and mode were validity violation reporting is
> disabled. The choice of mode is at user option. But when reporting is
> disabled, then validating mode and non-validating mode, to the user,
> becomes more or less identical.
>
> So we should be able to expect from tool that they tell us, before
> parsing, whether they are going to use validation mode or
> non-validation mode!
>
> Another reason to have both in one product is the parsing differences
> between validating and non-validating processing.[3] These difference
> prevail whether or not the validating software ”at user option” has
> been set to run with or without reporting of validity violations.[4]
>
> Validator.w3.org has no option to disable validity violation reporting.
> This is thus a violation of the XML 1.0 requirement that validating
> violation reporting in validating processors should be ”at user
> option”. Another tool that fails that test is Xmllint. Try this:
>   $ xmllint --nowarning --validate validity-violating-doc
>
> A validating processor should be able to process this document with
> validity violation reporting disabled:
>
>    <foo/>
>
> (Not having that option is a disservice to validating processors.)
>
> In order to be able to discern “no validity violation reporting” from
> “non-validation mode”, the user needs to know whether or not (s)he is
> running a validating processor. This might often be simpler to know if
> the tool at hand has only has a *single* processing mode.
>
> I therefore don’t think that XML share the expectations that
> well-designed software being able to operate in both processing modes.
> That ”validation” (in the broad sense) today often happens *without*
> DTD, supports that view.
>
> Relating this to my issue: I did clearly have in mind validating
> processors as such, regardless of whether the user has configured it to
> report validity violations or not. Because, after all, disabling
> DTD-based validity violation reporting should of course not cause the
> tool to switch to XSD - doing that would be to *deprive* the user of
> the choice turn validity violation on and off.
>
> To this, from your point 4:
>
>>     It should not be amended to
>>     make any distinction between document type declaration
>>     constraints and validity constraints, and it should not
>>     be amended to made a special case out of any particular
>>     document type declaration (e.g., an "empty DTD").
>
> A rush to tighten a rabbit hole? It is XML - not I - who distinguish
> certain sub features of the validity feature - who discerns between
> valid per DTD and some validity constraints on the top of that. I have
> not said, however, that there should be more than a single validity
> violation reporting mode!
>
> But we could ask: What about this document: <foo/>
> Or what about this document: <!DOCTYPE f><oo/>
>
> For both, Xmllint only says ”no DTD found”. A single error message. Why
> does it not say that the validity constraint that the element type has
> to be declared, has been broken? If all validity constraints applied
> (for  the validity violation reporter part of the software), then there
> would be many more messages! And it would then also be non-conformance
> with XML not to not report them! (Since XML requires reporting of
> validity constraints whenever the document fulfills the DTD.)
>
> So today’s validating processors do seem to think that some documents
> only need more than a single error message when there is no DTD. And
> this is clearly inline with XML. Tightening that hole might be to
> *change* XML.
>
> At the same time, tool makers today knows that there might *still* be
> more to be said than simply ”there is no DTD”. And it is *then* they -
> typically silently! - make the tool shift from validating mode to
> non-validating mode.
>
> The shift in a tool from validating processor mode to non-validating
> processor mode is clearly one that happens when the tool at hand comes
> to the conclusion that validating mode is no longer any useful.
>
> What does *that* tell us?
>
> It tells us that, actually, the tool (and the users) perceives this as
> a shift not from validation mode to non-validation mode, but as a shift
> from *one* validation mode, to *another*, more useful, validation mode!
>
> It also tells us that *something* inside the tool has at the very least
> performed a pre-validation of the document.
>
> [1] http://validator.w3.org/
> [2] http://validator.w3.org/nu/
> [3] http://www.w3.org/TR/REC-xml/#dt-validating
> [4] http://www.w3.org/TR/REC-xml/#dt-atuseroption
> --
> leif halvard silli
Loading...