Comment on Path / PName clash and Turtle impact

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Comment on Path / PName clash and Turtle impact

David Robillard-4
Hello,

Apologies for sending this past the Last Call, but I have a comment
about the decision to combine PNames and Property Paths in SPARQL and
escaping PNames to resolve the problems this causes.

My perspective is mainly that of a Turtle user/implementer.  I
discovered this issue updating my Turtle implementation[1] for the
latest spec.  I discovered that an odd new rule has been added to the
grammar:

[163s] PN_LOCAL_ESC ::= '\\' ( '_' | '~' | '.' | '-' | '!' | '$' | '&' |
"'" | '(' | ')' | '*' | '+' | ',' | ';' | '=' | ':' | '/' | '?' | '#' |
'@' | '%' )

Unhappy with how ugly this is, and puzzled why such a specific seemingly
arbitrary set of characters has been introduced as escapes in PNames, I
investigated.  It turns out this is from SPARQL, and the escapes are to
avoid clashing with Property Paths (hereafter just "paths").

This seems like a problem to me: the Turtle specification now has a
strange and unpleasant grammar rule from a different specification, to
mesh with a concept that is meaningless in the context of a Turtle
document.  I do agree, though, that copy/paste compatibility between
statements in both languages is highly desirable.

My main point is about the method: I think escaping is a very poor way
of achieving this, and quotation is more appropriate.  Either Paths, or
PNames, should be quoted, or have a special leading character, to remove
this ambiguity.

Some cons of the current escaping scheme:

* Escaping is ugly, and difficult to work with.  Paths that include
pnames with special characters are difficult to read.

* Copying from other data sources that use these characters is
difficult, so much so that expecting a user to manually do this (i.e.
escape every character in the above list) is not realistic, and
error-prone.

* This effectively prevents future revisions of SPARQL from adding
anything to the path syntax.  If both of these specs become
recommendations, then Turtle (and the corresponding rules in SPARQL
itself) will have baked-in escapes specifically to work around path
syntax.  None can be added, because this will break the rules for
PNames, in both SPARQL and Turtle.

* The very existence of escaping implies there is a need to express
these characters in PNames.  However, this has been made tedious and
ugly to accomodate paths.  In my opinion, this is somewhat backwards.
Both languages should have a clean PName syntax.  Paths are a different
thing, and should be clearly designated as such.  Put another way,
property paths are not pnames, and crippling the pname syntax for paths
is a poor design when there are very simple alternative ways of
differentiating the two.

Some pros of quoting, rather than escaping:

* Much easier to read.  Even in a purely SPARQL context, ignoring
Turtle, having a path be very clearly delineated is much simpler to read
than navigating a mess of escapes and trying to mentally parse what is
going on.

* Turtle is not 'infected' by this SPARQL specific grammar
consideration, and both can use a simpler, more expressive, and more
friendly PName grammar.  SPARQL is not 'locked in' forevermore and is
free to update the path syntax in the future.

* Copy/paste compatibility with other data sources is much simpler,
since quoting is easy, unlike escaping.  It is also less error prone,
since only the quote character needs special consideration.

* The grammars become cleaner, since Path rules and PName rules are
clearly distinct (though the former would refer to the latter).  The
PName rules do not need to take into consideration every character used
in the Path syntax, which is crucial since the PName rules must be in
Turtle as well.  The current PName rule is a symptom that different
types of tokens have not been properly distinguished.

* The PName rules would be far more (possibly entirely) compatible with
CURIES, rather than extremely SPARQL specific.

I am not sure exactly what to suggest in terms of syntax.  It seems most
in-line with existing practice to not quote 'top-level' PNames, but
rather quote paths somehow.  This resolves the Turtle problems, but does
not resolve issues with PNames inside paths.  Here, it seems quoting is
best.  One proposal: paths always have a leading '/', and PNames within
paths are quoted with '[' and ']' (as in the CURIE spec).  Thus, the
example:

?x foaf:knows/foaf:name ?name .

Would become:

?x /[foaf:knows]/[foaf:name] ?name .

The quoting means the PNames are free to contain extended characters,
e.g. rather than the unwieldy:

?x eg:foo\/bar\/baz/eg:terms\/a\+b ?b .

You would have:

?x /[eg:foo/bar/baz]/[eg:terms/a+b] ?b .

Importantly, no quoting of PNames in any other context is necessary, and
no escaping of PNames is necessary at all, which is a significant win
for "copy-paste compatibility" (quoting could also be optional in
paths).

The prefix character is analogous to the '?' used for variables.  This
works well, and is very simple, since a token that starts with a '?' is
clearly a variable, and there is no clashing.  Paths (indeed, any new
kind of token) should be similarly simple to distinguish.  A token that
starts with a '?' is a variable.  A token that starts with a '/' is a
property path.  Simple, consistent, extensible.

Note these are just off-the-cuff examples, I have not thought much about
the best syntax.  Leading slash for paths and [] quoting as above may
not be the best choices for whatever reason; I am more interested in
highlighting the problem first.  If quoting in paths is not popular, I
wouldn't mind escaping *only in paths* - at least that doesn't wreck
Turtle.

In my opinion, this is a very serious issue.  I have a strong aversion
to implementing these PName escapes in Turtle, and consider it an
outright error.  Again, apologies for being late, but a more palatable
resolution to this problem would be a significant improvement, and
prevent future problems.

Thanks,

-dr

[1] http://drobilla.net/software/serd/


Reply | Threaded
Open this post in threaded view
|

Re: Comment on Path / PName clash and Turtle impact

Andy Seaborne-3
David,

The SPARQL and RDF working groups have been working to align SPARQL and
Turtle syntax. The area of prefixed names is the main area of alignment.

The SPARQL Working Group has decided not to take the approach you
propose of adopting a different syntax for prefixed names specifically
for property paths. The syntax of prefix names in SPARQL and in Turtle
(at last call [1]) is the same.

The set of characters requiring escapes is the RFC 3986 'gen-delims' and
'sub-delims' except that there has been a change to allow the ":"
character to be included unescaped into a prefix name in line with
Turtle because the Open Graph Protocol uses this style and also it is
the approach taken in URN schemes.

We hope that this reply responds your comment and would be grateful if
you would acknowledge the response by sending a reply to this mailing list.

Andy, on behalf of the SPARQL-WG

[1] Turtle last call: http://www.w3.org/TR/turtle/

Reply | Threaded
Open this post in threaded view
|

Re: Comment on Path / PName clash and Turtle impact

David Robillard-4
On Fri, 2012-07-13 at 15:14 +0100, Andy Seaborne wrote:

> David,
>
> The SPARQL and RDF working groups have been working to align SPARQL and
> Turtle syntax. The area of prefixed names is the main area of alignment.
>
> The SPARQL Working Group has decided not to take the approach you
> propose of adopting a different syntax for prefixed names specifically
> for property paths. The syntax of prefix names in SPARQL and in Turtle
> (at last call [1]) is the same.
>
> The set of characters requiring escapes is the RFC 3986 'gen-delims' and
> 'sub-delims' except that there has been a change to allow the ":"
> character to be included unescaped into a prefix name in line with
> Turtle because the Open Graph Protocol uses this style and also it is
> the approach taken in URN schemes.
>
> We hope that this reply responds your comment and would be grateful if
> you would acknowledge the response by sending a reply to this mailing list.
Apologies for the very late reply, I do not actively follow this list.

Fair enough; as mentioned I see triple syntax alignment as a good thing,
but not an overriding concern.  In my opinion the WG has lost its way
and is ruining Turtle without any regard for *Turtle* implementers, so I
will likely never be implementing the new spec as-is.  I will adopt
appropriate changes, such as more allowable prefix characters,
piece-wise as needed.  Most of them are fine.

I should have mentioned in my formal comment that PREFIX and BASE are
also inappropriate - even more so, actually.  It is ridiculous to have
to two completely different syntaxes for the same directives in Turtle.
The triple compatibility argument is a half decent one, but a Turtle
document is not a SPARQL document anyway, they have different top level
term syntaxes.  Ramming SPARQL term syntax into Turtle just messes up
what, once upon a time, was an elegant little RDF syntax, for no good
reason.  Is it really necessary to turn the Turtle spec into a
design-by-committee eyesore?  If implementations want to implement this,
they can, but it clearly does not belong in the Turtle spec.
Essentially we have fragments of SQL in Turtle now, which is crazy.  The
only thing having these rules in the Turtle grammar does is encourage
implementations to produce documents that are not backwards compatible.
This violates the "permissive reading, strict writing" principle, so I
likely won't be implementing that either.

For what it's worth, every single Turtle implementer I have discussed
this with on IRC and private email feels the same way, understandably.
Turtle is not SPARQL, and there are plenty of implementations of the
former that aren't also implementations of the latter.  I think it is
inappropriate for the WG to dismiss the former so flippantly.

Thanks,

-dr


signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Comment on Path / PName clash and Turtle impact

Sandro Hawke
On 02/03/2013 01:52 PM, David Robillard wrote:

> On Fri, 2012-07-13 at 15:14 +0100, Andy Seaborne wrote:
>> David,
>>
>> The SPARQL and RDF working groups have been working to align SPARQL and
>> Turtle syntax. The area of prefixed names is the main area of alignment.
>>
>> The SPARQL Working Group has decided not to take the approach you
>> propose of adopting a different syntax for prefixed names specifically
>> for property paths. The syntax of prefix names in SPARQL and in Turtle
>> (at last call [1]) is the same.
>>
>> The set of characters requiring escapes is the RFC 3986 'gen-delims' and
>> 'sub-delims' except that there has been a change to allow the ":"
>> character to be included unescaped into a prefix name in line with
>> Turtle because the Open Graph Protocol uses this style and also it is
>> the approach taken in URN schemes.
>>
>> We hope that this reply responds your comment and would be grateful if
>> you would acknowledge the response by sending a reply to this mailing list.
> Apologies for the very late reply, I do not actively follow this list.
>
> Fair enough; as mentioned I see triple syntax alignment as a good thing,
> but not an overriding concern.  In my opinion the WG has lost its way
> and is ruining Turtle without any regard for *Turtle* implementers, so I
> will likely never be implementing the new spec as-is.  I will adopt
> appropriate changes, such as more allowable prefix characters,
> piece-wise as needed.  Most of them are fine.
>
> I should have mentioned in my formal comment that PREFIX and BASE are
> also inappropriate - even more so, actually.  It is ridiculous to have
> to two completely different syntaxes for the same directives in Turtle.
> The triple compatibility argument is a half decent one, but a Turtle
> document is not a SPARQL document anyway, they have different top level
> term syntaxes.  Ramming SPARQL term syntax into Turtle just messes up
> what, once upon a time, was an elegant little RDF syntax, for no good
> reason.  Is it really necessary to turn the Turtle spec into a
> design-by-committee eyesore?  If implementations want to implement this,
> they can, but it clearly does not belong in the Turtle spec.
> Essentially we have fragments of SQL in Turtle now, which is crazy.  The
> only thing having these rules in the Turtle grammar does is encourage
> implementations to produce documents that are not backwards compatible.
> This violates the "permissive reading, strict writing" principle, so I
> likely won't be implementing that either.
>
> For what it's worth, every single Turtle implementer I have discussed
> this with on IRC and private email feels the same way, understandably.
> Turtle is not SPARQL, and there are plenty of implementations of the
> former that aren't also implementations of the latter.  I think it is
> inappropriate for the WG to dismiss the former so flippantly.

I'm coming into this discussion late, but it looks to me like your
comments here are about Turtle, not SPARQL, and are best addressed by
the RDF Working Group, which is standardizing Turtle (and actively
seeking input on the @prefix/@base vs PREFIX/BASE question), not the
SPARQL WG.    Are you okay with SPARQL WG's response to your comments?  
   Thanks.

         -- Sandro


> Thanks,
>
> -dr
>