XPath 1.0 change proposal

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

XPath 1.0 change proposal

James Clark-8
Michael kindly pointed me to his change proposal for XPath 1.0, which he tells me the XSLT WG is planning to consider its next meeting, and invited me to send my comments to this list.

Although I appreciate Michael's work on formalizing the XPath 1.0 data model, I do not think that at this stage a major rewrite of the XPath 1.0 data model is a good idea.  I would suggest that, after nearly 14 years, an extremely conservative policy should be adopted towards changes: changes should be made only when there is a genuine error that is manifested in discrepancies between implementations or inconsistencies between implementations and the spec.

The change proposal claims that it was a goal of XPath 1.0 for the data model be defined without dependencies on XML 1.0.  I find this claim bizarre given that XML 1.0 is referenced normatively and the data data model definition is full of references to XML 1.0. The change proposal seems to be claiming that XPath 1.0 is full of bugs in need of correction because it does not meet a goal that it never had.

The change proposal also claims that it is a goal of XPath 1.0 that the data model be defined formally.  This is clearly not the case.  XPath 1.0 does not make the slightest attempt to be formal.  Rather it aims to be succinct and readily understandable.  The level of formality in the data model definition is similar to that of the rest of the spec and of companion specs (XML 1.0, XML Namespaces, XSLT 1.0).  It is also virtually impossible to be really rigorous about the construction of the data model from the XML document, without specifying this in the XML spec itself: for each syntax production the XML spec would need to explain how to corresponding data model was constructed.

I am also not convinced that in many cases the proposed wording changes are in fact improvements.  If the WG does decide to go ahead with this change, I can make some more detailed comments.  But for the moment, I would just mention a couple of points.

XPath 1.0 does not constrain the root node to have exactly one element child. In the case where the data model is constructed from an XML document, there will of course be exactly one child.  But in other cases (eg querying into a DOM DocumentFragment) it would be unhelpful to impose such a restriction.  (XPath 1.0 is generally fairly loose -- for example, it does not define conformance -- so as to provide maximum flexibility to referencing specs.)

The reason why the spec uses terminology like "There is an element node for every element" instead of referencing particular productions is because of entity expansion.  For example, given

<!DOCTYPE doc [
<!ENTITY e "<x>foo</x>">
]>
<doc>&e;&e;</doc>

I am comfortable with saying (somewhat vaguely) that there are three elements.  I am much less comfortable saying that there are three occurrences of the "element" production (in fact, I would say it is clear that there are only two occurrences of the "element" production).

James



Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

C. M. Sperberg-McQueen-2
Thank you, James.  I agree with some of what you say, and disagree with some.

On Mar 14, 2013, at 7:52 AM, James Clark wrote:

>
>
> Although I appreciate Michael's work on formalizing the XPath 1.0 data model, I do not think that at this stage a major rewrite of the XPath 1.0 data model is a good idea.  

I agree.  

That's why I proposed small fixes to repair errors in the definition of the data model, and
not a major rewrite.  

> I would suggest that, after nearly 14 years, an extremely conservative policy should be adopted towards changes: changes should be made only when there is a genuine error that is manifested in discrepancies between implementations or inconsistencies between implementations and the spec.

The nature of the errors in the definition of the data model is that they amount
to discrepancies within the spec.  The rest of XPath 1.0 assumes that every
instance of the data model, as defined in section 5, will have certain properties.
It is the job of section 5 to ensure that that is so, and in the spec as written
the properties in question are not in fact guaranteed.  

These discrepancies are unlikely to show up in implementations of XPath 1.0
as a whole, since implementors are likely to be guided by the assumptions
manifest elsewhere in the spec more than by the details of the data model
definition.  They will, however, show up in any attempt to implement, or formalize,
the XPath 1.0 data model by itself.  That is how I became aware of the errors
in the first place.

> The change proposal claims that it was a goal of XPath 1.0 for the data model be defined without dependencies on XML 1.0.  I find this claim bizarre given that XML 1.0 is referenced normatively and the data data model definition is full of references to XML 1.0.

If there was no intent to define the data model without dependencies on XML 1.0, then at least half the
text in section 5 is pointless, unnecessary repetition of things that are obvious from the XML spec.
The choice seems to be between reading the spec as having a coherent goal which in some important
details it failed to achieve, and reading the spec as given to garrulous irrelevancies.

> The change proposal seems to be claiming that XPath 1.0 is full of bugs in need of correction because it does not meet a goal that it never had.
>
> The change proposal also claims that it is a goal of XPath 1.0 that the data model be defined formally.  This is clearly not the case.  

By any mathematical standard, the prose of XPath 1.0 would count as informal.  But that is also
true of the prose in the change proposal.

Compared with other specs, I think the data model section of XPath 1.0 is more explicit and
formal than most.

> XPath 1.0 does not make the slightest attempt to be formal.  Rather it aims to be succinct and readily understandable.  The level of formality in the data model definition is similar to that of the rest of the spec and of companion specs (XML 1.0, XML Namespaces, XSLT 1.0).  It is also virtually impossible to be really rigorous about the construction of the data model from the XML document, without specifying this in the XML spec itself: for each syntax production the XML spec would need to explain how to corresponding data model was constructed.

I see no connection between the formality of an exposition and the title of the document
in which it appears.  The XML spec is not formal about (for example) identity criteria for
elements, because nothing in the XML spec appeals to element identity (at least, not
for the cases it leaves indeterminate).  The XPath 1.0 spec does need to be determinate
about element node identity, and it seems bizarre to me to suggest that it could not be
more precise or careful without changes to the text of the XML 1.0 spec.

>
> I am also not convinced that in many cases the proposed wording changes are in fact improvements.  If the WG does decide to go ahead with this change, I can make some more detailed comments.  But for the moment, I would just mention a couple of points.
>
> XPath 1.0 does not constrain the root node to have exactly one element child. In the case where the data model is constructed from an XML document, there will of course be exactly one child.  But in other cases (eg querying into a DOM DocumentFragment) it would be unhelpful to impose such a restriction.  (XPath 1.0 is generally fairly loose -- for example, it does not define conformance -- so as to provide maximum flexibility to referencing specs.)

Thank you for this clarification.

The XML spec seems, then, to be normative for the description of the data model, except for the
parts of it that don't apply.  On this view, the XPath 1.0 spec is readily understandable only for
readers gifted with a certain degree of clairvoyance.

>
> The reason why the spec uses terminology like "There is an element node for every element" instead of referencing particular productions is because of entity expansion.  For example, given
>
> <!DOCTYPE doc [
> <!ENTITY e "<x>foo</x>">
> ]>
> <doc>&e;&e;</doc>
>
> I am comfortable with saying (somewhat vaguely) that there are three elements.  

The problem is that there is nothing in the XML spec or the Infoset spec that could be
used to argue that there are three elements here, instead of two.  Many people
are comfortable saying that there are three elements here, but a count of two elements
is equally compatible with the XML specification.

> I am much less comfortable saying that there are three occurrences of the "element" production (in fact, I would say it is clear that there are only two occurrences of the "element" production).

On the contrary; after entity expansion we have a sequence of character types
matching the document production of the XML spec, but we do not necessarily
have a sequence of character tokens matching the document production.  In
the sequence of character types, there are clearly three occurrences of strings
(sequences of character types) matching the element production, even though there
are only two such string-types.  That is the difference between a string type and
an occurrence of a string type.

--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************





Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

James Clark-8
On Thu, Mar 14, 2013 at 10:34 PM, C. M. Sperberg-McQueen <[hidden email]> wrote:

If there was no intent to define the data model without dependencies on XML 1.0, then at least half the
text in section 5 is pointless, unnecessary repetition of things that are obvious from the XML spec.

Please identify this half.  I just reread section 5, and I'm not seeing it.

All section 5 is doing is saying how to construct an instance of the data model from an XML/XML Namespaces document. Period.

James
Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

C. M. Sperberg-McQueen-2

On Mar 14, 2013, at 9:51 AM, James Clark wrote:

> On Thu, Mar 14, 2013 at 10:34 PM, C. M. Sperberg-McQueen <[hidden email]> wrote:
>
> If there was no intent to define the data model without dependencies on XML 1.0, then at least half the
> text in section 5 is pointless, unnecessary repetition of things that are obvious from the XML spec.
>
> Please identify this half.  I just reread section 5, and I'm not seeing it.

If we assume (as some readers of the XPath 1.0 spec are inclined to
claim)

  (a) that every occurrence of a string type that matches the element
      production of the XML 1.0 grammar (as modified by XML Names) is
      a distinct element;

  (b) that the data model is not intended to be independent of the
      serialized XML form;

  (c) that the statement "There is an ordering, document order,
      defined on all the nodes in the document corresponding to the
      order in which the first character of the XML representation of
      each node occurs in the XML representation of the document after
      expansion of general entities" is a normative statement about
      document order, and not a non-normative observation about
      document order as specified elsewhere;

then the following sentences seem to be either unnecessary repetitions
of simple facts that follow from the 1:1 relation between nodes in the
data model instance and constructs in the XML spec, or contradictions
of things normatively stated either in the XML spec, the namespaces
spec, or elsewhere in XPath 1.0 (especially assumption (c)).

These probably (I haven't actually counted) constitute fewer than 50%
of the clauses in section 5, so I was probably guilty of exaggeration
when I said "half" the text of section 5 is pointless.

1 "Some types of nodes also have an expanded-name." Follows from XML
1.0 + Namespaces 1.0.

2 "an expanded-name ... is a pair consisting of a local part and a
namespace URI." Follows from Namespaces 1.0.

3 "The namespace URI specified in the XML document can be a URI
reference as defined in [RFC2396];" (original text), or "A namespace
name specified in a namespace declaration in an XML document is a URI
reference as defined in [RFC2396];" (erratum). Follows from Namespaces
1.0.

4 "this means it can have a fragment identifier and can be relative."
(original text), or "this implies it can have a fragment identifier
and can be relative." (erratum). Follows from RFC 2396 (but clearly
labeled as such, so it really doesn't count in this enumeration).

5 "Element nodes occur before their children."  Follows from XML 1.0
(together with the immediately preceding normative definition of
document order).

6 "The attribute nodes and namespace nodes of an element occur before
the children of the element."  Follows from XML 1.0 (together with the
normative definition of document grammar).

7 "The namespace nodes are defined to occur before the attribute
nodes."  Contradicts the normative statement of document order.

8 "The relative order of namespace nodes is implementation-dependent."
Contradicts the normative statement of document order.

9 "The relative order of attribute nodes is implementation-dependent."
Contradicts the normative statement of document order.

10 "Nodes never share children: if one node is not the same node as
another node, then none of the children of the one node will be the
same node as any of the children of another node."  Follows from
assumption (a).  

11 "Every node other than the root node has exactly one parent, which
is either an element node or the root node."  Follows from XML 1.0
(assuming the usual usage of the word "parent" in XML contexts).

12 "A root node or an element node is the parent of each of its child
nodes."  (Ditto.)

13 "The element node for the document element is a child of the root
node."  Follows from XML 1.0.

14 "The root node also has as children processing instruction and
comment nodes for processing instructions and comments that occur in
the prolog and after the end of the document element."  Follows from
XML 1.0.

15 "The root node does not have an expanded-name."  Follows from XML
1.0 + Namespaces 1.0.

At this point, this exercise is costing me more tedium than I have
patience for, so I am going to stop.  I will leave the rest of section 5
as an exercise for the reader.

I am sorry that we do not agree as to the quality of work in the XPath
1.0 spec.  I believe it is good work that includes a few simple errors
that are easily fixed, and that the basic structure of the work is
good enough to be worth fixing; you seem to be arguing that it's
shoddy work never intended to be correct or to make good on the
implications of the term "data model", and that any fix would
constitute a major renovation.

Some members of the XSLT WG, I regret to say, also seem to fear that
saying explicitly that the parent axis and sibling axes are acyclic
and that document order is total might introduce contradictions with
the rest of the specification.

Oh, well.


--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************





Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

James Clark-8
On Fri, Mar 15, 2013 at 12:55 AM, C. M. Sperberg-McQueen <[hidden email]> wrote:

the following sentences seem to be either unnecessary repetitions
of simple facts that follow from the 1:1 relation between nodes in the
data model instance and constructs in the XML spec, or contradictions
of things normatively stated either in the XML spec, the namespaces
spec, or elsewhere in XPath 1.0 (especially assumption (c)).
... 

1 "Some types of nodes also have an expanded-name." Follows from XML
1.0 + Namespaces 1.0.

XML Namespaces says nothing about the concept of "node".  The fact that XML Namespaces says that elements have expanded-names doesn't necessarily imply that XPath 1.0 element nodes have expanded-names.  A key aspect of the data model is the selection of available information that it chooses to expose.
 
 
2 "an expanded-name ... is a pair consisting of a local part and a
namespace URI." Follows from Namespaces 1.0.

XML Namespaces uses the term namespace name. XPath chooses to use the term namespace-uri.


3 "The namespace URI specified in the XML document can be a URI
reference as defined in [RFC2396];" (original text), or "A namespace
name specified in a namespace declaration in an XML document is a URI
reference as defined in [RFC2396];" (erratum). Follows from Namespaces
1.0.


Ditto.
 
4 "this means it can have a fragment identifier and can be relative."
(original text), or "this implies it can have a fragment identifier
and can be relative." (erratum). Follows from RFC 2396 (but clearly
labeled as such, so it really doesn't count in this enumeration).

5 "Element nodes occur before their children."  Follows from XML 1.0
(together with the immediately preceding normative definition of
document order)

Let's look at this in context: 
 
There is an ordering, document order, defined on all the nodes in the document corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities. Thus, the root node will be the first node. Element nodes occur before their children. Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities). The attribute nodes and namespace nodes of an element occur before the children of the element. The namespace nodes are defined to occur before the attribute nodes. The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent. 

The sentences following the first "Thus" are fleshing out the definition of document order given in the first sentence.


6 "The attribute nodes and namespace nodes of an element occur before
the children of the element."  Follows from XML 1.0 (together with the
normative definition of document grammar).

Ditto.
 

7 "The namespace nodes are defined to occur before the attribute
nodes."  Contradicts the normative statement of document order.

This is giving you the definition of document order for attribute nodes.
 

8 "The relative order of namespace nodes is implementation-dependent."
Contradicts the normative statement of document order.

As is this. 

9 "The relative order of attribute nodes is implementation-dependent."
Contradicts the normative statement of document order.

As is this.

If you genuinely find this confusing, I suggest adding the words "as further explained in the rest of this paragraph"  at the end of the first sentence.
 

10 "Nodes never share children: if one node is not the same node as
another node, then none of the children of the one node will be the
same node as any of the children of another node."  Follows from
assumption (a).

That is addressing a misinterpretation that could arise because of general entity expansions.

 
11 "Every node other than the root node has exactly one parent, which
is either an element node or the root node."  Follows from XML 1.0
(assuming the usual usage of the word "parent" in XML contexts).


This is giving a precise definition of the term parent, which is a crucial for XPath.

 
12 "A root node or an element node is the parent of each of its child
nodes."  (Ditto.)

Ditto.
 
 
13 "The element node for the document element is a child of the root
node."  Follows from XML 1.0.

Ditto: definition of child.  XML 1.0 defines parent/child only for elements.

14 "The root node also has as children processing instruction and
comment nodes for processing instructions and comments that occur in
the prolog and after the end of the document element."  Follows from
XML 1.0.

Ditto: definition of child.
 

15 "The root node does not have an expanded-name."  Follows from XML
1.0 + Namespaces 1.0.

XML Namespaces says nothing about nodes. 

At this point, this exercise is costing me more tedium than I have
patience for, so I am going to stop.

Good, I don't think it's advancing your case.
 
 I will leave the rest of section 5
as an exercise for the reader.

you seem to be arguing that it's
shoddy work never intended to be correct or to make good on the
implications of the term "data model", and that any fix would
constitute a major renovation.

I hope I have convinced you that the data model section is intended to do nothing more than

- explain how to construct the instance of the data model from an XML document
- define for such instances various key terms (parent, child, document order, expanded-name, string-value etc) which are used in the rest of the spec

I do not accept that the fact that it does no more than this makes it "sloppy".  In any case, it has been approved as a W3C Recommendation in this form.

As I understand it, you are looking for something more that this: a self-contained definition of the data model that completely specifies all the constraints that the data model must satisfy in order to be useable with XPath.  I do not deny that this would be a nice thing to have and would be more satisfying as a data model definition.  However, I think it's way beyond what is appropriate for an errata (especially after this period of time), and would require a major rewrite (going beyond even what you are now proposing) to be completely satisfactory.  For example, in your current draft I think the separation between the definition of the data model itself and the mapping from XML to the data model is not nearly as clean as it could be.

I also do no think the absence of what you are looking for is a practical problem with XPath.  XPath is designed to be a component that is referenced by other standarda.  The referencing standard has to define a whole bunch of stuff to be able to use XPath, including conformance and how the context is set up.  If a standard wants to apply XPath to something other than XML documents, it needs to define how an instance of the XPath data model is constructed from the structures that the standard deals with.  It's up to that standard to do so in a way that ensures XPath does not break.  The exact constraints that such XPath data model instances satisfy is dependent on that referencing standard.  For example, if the DOM standard were to reference XPath and allow XPath to be used to query a DocumentFragment in the obvious way, then in that case the root node of the constructed XPath data model would not necessarily satisfy the constraint of the root node having a single element child.

In summary, although XPath 1.0 could have been written in many different ways, and some of those ways might well be superior in some respects to how it was in fact written, I do not believe that this change proposal has identified a defect in XPath 1.0 that is in need of fixing at this stage.

James
















 
Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

Michael Kay
James Clark wrote:

>
> As I understand it, you are looking for something more that this: a
> self-contained definition of the data model that completely specifies
> all the constraints that the data model must satisfy in order to be
> useable with XPath.  I do not deny that this would be a nice thing to
> have and would be more satisfying as a data model definition.
>  However, I think it's way beyond what is appropriate for an errata
> (especially after this period of time), and would require a major
> rewrite (going beyond even what you are now proposing) to be
> completely satisfactory.  For example, in your current draft I think
> the separation between the definition of the data model itself and the
> mapping from XML to the data model is not nearly as clean as it could be.
I would add: the work has already been done, and can be found in the XDM
specification for XPath 2.0. It's not to everyone's liking, because it
takes about 30 pages to say little more than XPath 1.0 says in 3; but it
does exactly what is described here - it defines the data model
independently of the XML (or infoset) specs, with all the necessary
constraints, and then quite separately it describes one possible way to
construct an XDM instance from an XML infoset.

Now it's been pointed out we have a charter (and therefore some kind of
duty) to maintain XPath 1.0. If you're maintaining an old car then you
have to replace components that are no longer working. But you don't
have to fit a catalytic converter to improve the fuel consumption. Just
because we know how to design catalytic converters doesn't mean we are
obliged to fit them to old cars. All the evidence is that the old car is
motoring along quite happily...

>
>
>
> In summary, although XPath 1.0 could have been written in many
> different ways, and some of those ways might well be superior in some
> respects to how it was in fact written, I do not believe that this
> change proposal has identified a defect in XPath 1.0 that is in need
> of fixing at this stage.
>
>
I couldn't have put it better.

Michael Kay

>
>
>
>
>
>
>
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

C. M. Sperberg-McQueen-2
In reply to this post by James Clark-8

On Mar 14, 2013, at 11:31 PM, James Clark wrote:

> On Fri, Mar 15, 2013 at 12:55 AM, C. M. Sperberg-McQueen <[hidden email]> wrote:
>
> the following sentences seem to be either unnecessary repetitions
> of simple facts that follow from the 1:1 relation between nodes in the
> data model instance and constructs in the XML spec, or contradictions
> of things normatively stated either in the XML spec, the namespaces
> spec, or elsewhere in XPath 1.0 (especially assumption (c)).
> ...
>
> 1 "Some types of nodes also have an expanded-name." Follows from XML
> 1.0 + Namespaces 1.0.
>
> XML Namespaces says nothing about the concept of "node".  The fact that XML Namespaces says that elements have expanded-names doesn't necessarily imply that XPath 1.0 element nodes have expanded-names.  A key aspect of the data model is the selection of available information that it chooses to expose.

You seem to be establishing the principle that nodes share properties with
the corresponding constructs in the XML document (however one might
choose to define them) if and only if the definition of the data model
explicitly mentions those properties.  

On this reading, the normative reference to the XML spec seems to have
no function.

And this principle cuts away the ground underneath every argument thus
far brought forward for the claim that the parent and sibling relations
should be thought of as acyclic, even though the text does not say so.

On this reading, the current text is not a good effort with a few slips,
but fundamentally broken.
>
> 7 "The namespace nodes are defined to occur before the attribute
> nodes."  Contradicts the normative statement of document order.
>
> This is giving you the definition of document order for attribute nodes.

So - no textual demarcation between the sentences that are (on your
reading) merely fleshing out / repeating the normative statement of
document order, and this one, which modifies it by contradicting it?

Well, bad drafting is not a criminal offense.  It can happen to the best.

If you don't see the contradiction, then I do not see how to help you.


> 10 "Nodes never share children: if one node is not the same node as
> another node, then none of the children of the one node will be the
> same node as any of the children of another node."  Follows from
> assumption (a).
>
> That is addressing a misinterpretation that could arise because of general entity expansions.

But it fails to address the problem of distinctiveness adequately -- it
only addresses the case where the entity references occur directly
within different parents.  

>
>  
> 11 "Every node other than the root node has exactly one parent, which
> is either an element node or the root node."  Follows from XML 1.0
> (assuming the usual usage of the word "parent" in XML contexts).
>
>
> This is giving a precise definition of the term parent, which is a crucial for XPath.

No, not precise at all.  It is (on the usual reading of the spec) crucial for
XPath that the parent relation be acyclic.  Nothing here says so, implies
it, or even entails it.

>
> I hope I have convinced you that the data model section is intended to do nothing more than
>
> - explain how to construct the instance of the data model from an XML document
> - define for such instances various key terms (parent, child, document order, expanded-name, string-value etc) which are used in the rest of the spec

There are several problems here.

First, the intent of the WG or the original editors can be reconstructed (when and
to the extent that it can be reconstructed) by appeal to contemporary documents
or other historical evidence.  Nothing in your mail speaks directly to the question of
intent, and any statement about the intent of the text is a non sequitur.

Second, you seem to be falling victim to the intentional fallacy, an elementary
error on textual interpretation which is common enough.  The conscious or
unconscious intent of the authors of a text can be of historical interest, but it
does not determine what the text means, if for no other reason than that humans
do not always succeed in doing what they intend to do.  

It's passably clear that this discussion is not going to persuade either of us to
change our minds and that it's unlikely to provide any illumination to any third
parties.   I believe that the XPath 1.0 spec has a number of simple errors, which are
easily fixed; you deny that there are errors and assert at the same time that
fixing them would involve a much more extensive revision.

Perhaps I am wrong to say neither of us will change the other's mind; you have
made me consider seriously for the first time that if it's impossible to persuade the
responsible WGs to fix the problems, then it may be better to ask W3C to
withdraw the XPath 1.0 spec and deprecate its use in favor of the XDM 2.0 and
3.0 specs, which do a better job and which the responsible WGs are more
willing to maintain.

--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************





Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

James Clark-8
On Fri, Mar 15, 2013 at 10:26 PM, C. M. Sperberg-McQueen <[hidden email]> wrote:
...
You seem to be establishing the principle that nodes share properties with
the corresponding constructs in the XML document (however one might
choose to define them) if and only if the definition of the data model
explicitly mentions those properties.

Nodes (a concept defined in XPath) have precisely the properties that XPath says they do. XPath specifies these properties in many cases by referencing the XML and XML Namespaces Recommendations.
 
On this reading, the normative reference to the XML spec seems to have
no function.

You've lost me. The normative reference is fundamental: the data model section is specifying how to construct a data model instance from a well-formed XML document, which is defined in the XML spec.  It also relies upon it for, amongst other things, the definition of document order.  
 
And this principle cuts away the ground underneath every argument thus
far brought forward for the claim that the parent and sibling relations
should be thought of as acyclic, even though the text does not say so.

XPath tells you, when you construct the node tree from a well-formed XML document, which nodes are parents/siblings of which other nodes.  That is a completely sufficient specification.  It will in fact be the case that, when you do so, that the parent/sibling relation so defined will be acyclic, but there is absolutely no need for XPath to say.   If it would ease your concerns to add a sentence saying a node will never be a descendant of itself, I would have no problems with that.

>
> 7 "The namespace nodes are defined to occur before the attribute
> nodes."  Contradicts the normative statement of document order.
>
> This is giving you the definition of document order for attribute nodes.

So - no textual demarcation between the sentences that are (on your
reading) merely fleshing out / repeating the normative statement of
document order, and this one, which modifies it by contradicting it?

Well, bad drafting is not a criminal offense.  It can happen to the best.

The drafting here could definitely be improved.   If the WG thinks this drafting is so bad that it will cause real confusion, I would suggest the following minimal impact change to deal with it.

There is an ordering, document order, defined on all the nodes in the document. For nodes other than attribute and namespace nodes, this order corresponds to the order in which the first character of the XML representation of the node occurs in the XML representation of the document after expansion of general entities.
 
> 10 "Nodes never share children: if one node is not the same node as
> another node, then none of the children of the one node will be the
> same node as any of the children of another node."  Follows from
> assumption (a).
>
> That is addressing a misinterpretation that could arise because of general entity expansions.

But it fails to address the problem of distinctiveness adequately -- it
only addresses the case where the entity references occur directly
within different parents.

I think I agree with you here (yeah!).  If I have understood you correctly, given the following XML document:

<!DOCTYPE doc [
<!ENTITY e "<p/>">
]>
<doc>&e;&e;</doc>

your point is that it is unclear whether the first and second child nodes of the "doc" element are distinct.  I would address this by adding a sentence (following the quoted sentence above) along the lines of:

"The identity of a child node is determined by its position amongst its siblings: the i-th child node is the same node as the j-th child node if and only if i is equal to j."

> 11 "Every node other than the root node has exactly one parent, which
> is either an element node or the root node."  Follows from XML 1.0
> (assuming the usual usage of the word "parent" in XML contexts).
>
>
> This is giving a precise definition of the term parent, which is a crucial for XPath.

No, not precise at all.  It is (on the usual reading of the spec) crucial for
XPath that the parent relation be acyclic.  Nothing here says so, implies
it, or even entails it.

The data model section tells you, when you construct the node tree from a well-formed XML document, which nodes are parents of which other nodes.  When you so construct the node tree, the parent relation will always be acyclic.  It is an extensional not an intensional definition.  If you disagree, please given an example of a well-formed XML document for which it is unclear in the constructed node tree which nodes are parents of which other nodes, or in which the relationship is cyclic (ie an element is an ancestor of itself).
 
> I hope I have convinced you that the data model section is intended to do nothing more than
>
> - explain how to construct the instance of the data model from an XML document
> - define for such instances various key terms (parent, child, document order, expanded-name, string-value etc) which are used in the rest of the spec

There are several problems here.

First, the intent of the WG or the original editors can be reconstructed (when and
to the extent that it can be reconstructed) by appeal to contemporary documents
or other historical evidence.  Nothing in your mail speaks directly to the question of
intent, and any statement about the intent of the text is a non sequitur.

Second, you seem to be falling victim to the intentional fallacy, an elementary
error on textual interpretation which is common enough.  The conscious or
unconscious intent of the authors of a text can be of historical interest, but it
does not determine what the text means, if for no other reason than that humans
do not always succeed in doing what they intend to do.

OK, so I should have said "does" instead of "intended to do".  I believe I have demonstrated that my reading is completely consistent with all the text in the data model section.  You have demonstrated that your reading is not.  Normal principles of textual interpretation should therefore favour my reading.

It's passably clear that this discussion is not going to persuade either of us to
change our minds and that it's unlikely to provide any illumination to any third
parties.

Regrettably I do not seem to have been able to convince you of anything. However, you have convinced me that there is a defect related to node identity which should be corrected.  You have also identified a couple of places where it would be perfectly reasonable for the WG to decide to add a clarifying phrase or sentence.

  I believe that the XPath 1.0 spec has a number of simple errors, which are
easily fixed;

The word count of the new text you are proposing to add to Section 5 of XPath is approximately 50% of the word count of the current text of Section 5. It is a substantive piece of original work adding to XPath a formalization of the data model that makes it independent of the XML 1.0 and XML Namespaces Recommendations.  This is an intricate, complex piece of work, which you have developed over a number of years.  It is not simple, and it goes way beyond anything that could be described as fixing an error.  Furthermore, your formalization is not the only possible one: there are other completely different ways to do this formalization (for example, by viewing the tree as a map from arrays of node positions to node properties).  In my view it would be a major abuse of the W3C process to get this new, substantial, original work into a Recommendation status document by treating it as an errratum, thereby bypassing the extensive review and consideration that the W3C process would normally apply to such a piece of work.

you deny that there are errors and assert at the same time that
fixing them would involve a much more extensive revision.

I do not deny that there are errors.  I do say that your proposed revision is a much more invasive, risky one than is necessary to fix those errors.

Recommendations are not academic papers.  They serve practical goals: to allow implementors to independently create interoperable implementations, and to allow users to predict how those implementations will behave.  XPath 1.0 is a very mature Recommendation with an extensive implementation track record.  I therefore believe a cautious, conservative approach to changes should be adopted.  Proposed changes should be separated out into small (changing a sentence or a phrase or two) changes, where each is precisely targeted to fix a specific point where there is something genuinely unclear to implementors or users. Each such change should have ideally have a concrete test case associated with it.  I have given several examples above of the kinds of changes that I think might be appropriate at this stage.

James

 
Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

C. M. Sperberg-McQueen-2

On Mar 16, 2013, at 12:23 AM, James Clark wrote:

>
>
> Nodes (a concept defined in XPath) have precisely the properties that XPath says they do. XPath specifies these properties in many cases by referencing the XML and XML Namespaces Recommendations.

On your reading, do the normative clauses of XML and Namespaces
govern all instances of the data model or only those which are
created by parsing a namespace-well-formed XML document?

If the former, how do you reconcile that with your claim that an
instance of the data model can have more than one outermost
element dominated by the same document node?

If the latter, then are there any rules that in your view prevent
an instance of the data model from having cycles in the parent
or next-sibling relations?

> On this reading, the normative reference to the XML spec seems to have
> no function.
>
> You've lost me. The normative reference is fundamental: the data model section is specifying how to construct a data model instance from a well-formed XML document, which is defined in the XML spec.  It also relies upon it for, amongst other things, the definition of document order.

Thank you; you have at least clarified that you read the statement about
document order matching that of the first character of the representation
of a node as a normative statement and not as a restatement of things
said normatively elsewhere.

>  
> XPath tells you, when you construct the node tree from a well-formed XML document, which nodes are parents/siblings of which other nodes.  

Where? How?

XPath tells the reader that the data model instance has one node for every
element in the XML document, one comment node for every comment in the
XML document, one text node for every sequence of adjacent data characters,
etc.  

Where does it tell the reader how to identify the parent of a node?  There
is, to be sure, a definition of parenthood in the XML spec, and perhaps the
use of the term 'parent' is assumed to be sufficient to constitute a reference
to that definition, but that definition applies only to elements; nothing in the
XML spec defines the notion of parent for attributes, comments, or character
data.

Moreover, in telling the reader that the set of element nodes has the same
cardinality as the set of elements in the XML document (and similarly for
other node types), the XPath spec does not in fact tell the reader how many
nodes there are to be in the XPath data model instance.  Instead, it assumes
-- wrongly -- that that information follows from the XML spec.  But the XML
spec has no normative statements that rely on elements being able or unable
to appear more than once in the document, and so no need for a general
account of element, comment, or data character identity or distinctness,
or of element sets, comment sets, sets of data characters, or of the
cardinality of those undefined sets.  Having no need for such an account,
it does not provide one.  XPath on the other hand does have normative
statements that rely on node identity and distinctness, so it needs to
provide some well grounded account of the matter.  It can do so by
defining the data model without logical dependencies on XML; on your reading
it does not do so, but relies on the notion of element identity as defined in the
XML spec to determine element-node identity (and similarly for the other
node types), thus botching its task.


> That is a completely sufficient specification.  It will in fact be the case that, when you do so, that the parent/sibling relation so defined will be acyclic, but there is absolutely no need for XPath to say.  

First, that is true for the parent relation as deifned in the XML spec.
It is not guaranteed by the XML spec for the sibling relation.

Second, you seem here to want to require that XPath explicitly restrict its
operation to XML documents in the serialized form defined in the XML
specification, but I don't think anyone associated with the creation of the
spec, let alone any reader, has ever thought that XPath imposed such
a restriction. Instead, the XSL working group immediately developed and
promulgated the view that XSLT operates on trees as defined in the
XPath spec, and not necessarily on trees created by parsing an XML
document.

For XPath data model instances not created from a parsed serialization
of an XML document, where is the acyclic nature of the parent and
sibling relations specified?

XPath nodes have only those properties assigned by the XPath
spec, you know.  On your reading, it assigns certain properties to
data model instances created from XML documents -- but where
does it describe the properties of data model instances created in
other ways?

If I have understood your recent emails correctly, you hold that when
a data model instance is created from an XML document, then the
root node is guaranteed to have exactly one element-node child, but
when a data model instance is created by other means, there is no
such guarantee.

By analogy, I suppose one can infer that when a data model instance
is created from an XML document, then the parent relation (assuming
that the parent relation of XPath mirrors the parent relation of XML
for elements) will always be acyclic, but when the data model instance
is created by other means, there is no such guarantee.  

So logically speaking, you seem to be taking the position that the
parent relation in XPath data model instances is not guaranteed to
be acyclic.  (The statements in 2.2 that assume acylicity are thus
perhaps to be taken as errors.)

This logical problem goes away, of course, if XPath requires that
data model instances be such that they could in principle have been
created from an XML document, though I haven't seen such a statement
in the spec.  But that can't be your view, given that you don't believe
data model instances are required to have a single outermost element.
(That in turn suggests that the sentence in section 2 reading
"/ selects the document root (which is always the parent of the
document element)" is wrong to use the singular, and should read
"... (which is always the parent of the document element, or
elements)".)


> If it would ease your concerns to add a sentence saying a node will never be a descendant of itself, I would have no problems with that.

That would help, which is why my change proposal includes such a statement.
But I think "will" is the wrong modal verb.  Since it does not follow from any
normative statement elsehwere in the specification, this sentence needs
to be formulated in a clearly normative way, not in a way that suggests it
is a redundant restatement of normative statements elsewhere.

>
>
...

>
> > 11 "Every node other than the root node has exactly one parent, which
> > is either an element node or the root node."  Follows from XML 1.0
> > (assuming the usual usage of the word "parent" in XML contexts).
> >
> >
> > This is giving a precise definition of the term parent, which is a crucial for XPath.
>
> No, not precise at all.  It is (on the usual reading of the spec) crucial for
> XPath that the parent relation be acyclic.  Nothing here says so, implies
> it, or even entails it.
>
> The data model section tells you, when you construct the node tree from a well-formed XML document, which nodes are parents of which other nodes.

Can you point to the sentence you believe tells the reader which nodes
are parents of which other nodes?

> ... I do not seem to have been able to convince you of anything.

Quite correct.


--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************





Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

Daniel Veillard
In reply to this post by C. M. Sperberg-McQueen-2
On Fri, Mar 15, 2013 at 09:26:46AM -0600, C. M. Sperberg-McQueen wrote:
[...]
> Perhaps I am wrong to say neither of us will change the other's mind; you have
> made me consider seriously for the first time that if it's impossible to persuade the
> responsible WGs to fix the problems, then it may be better to ask W3C to
> withdraw the XPath 1.0 spec and deprecate its use in favor of the XDM 2.0 and
> 3.0 specs, which do a better job and which the responsible WGs are more
> willing to maintain.

  That would be a very serious mistake in my book. A lot of
implementations, tools and software is based in XPath 1.0 (including
my own code and a lot of software using it) and marking it as deprecated
or withdrawn will let that code base without a reference. I would
actually call this completely unreasonnable, and for absolutely no
good reason, and would in no way improve the situation.

Daniel

--
Daniel Veillard      | Open Source and Standards, Red Hat
[hidden email]  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/

Reply | Threaded
Open this post in threaded view
|

Re: XPath 1.0 change proposal

Liam R. E. Quin
On Tue, 2013-03-19 at 15:37 +0800, Daniel Veillard wrote:
> [...]

> A lot of
> implementations, tools and software is based in XPath 1.0 (including
> my own code and a lot of software using it) and marking it as deprecated
> or withdrawn will let that code base without a reference.

There are no plans at this time to mark XPath 1 as deprecated, nor to
withdraw it.

It's my intent to open up the discussion of the future of XPath 1 and
XSLT 1 overall.  I dont want to see that happen until the primary
technical work of XSLT 3 is done, but I do want to see it happen before
Xpath 3.1 is made final, so probably next year.  But even then there's
no point in talking about deprecating or withdrawing a widely-used
specification.

XPath 1.0 was, is, and remains, a success.

Liam

--
Liam Quin
XML Activity Lead, W3C,
http://www.w3.org/People/Quin/