Possible grammar problem with decimal numbers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible grammar problem with decimal numbers

Thomas Visel

Gentlemen,

While verifying our implementation of Sparql, I have encountered what several suspected problems in the grammar.

 

1st Case:  The following production (in http://www.w3.org/TR/sparql11-query/#rUnaryExpression) permits decimal numbers of the following exemplary forms:  ddd.d    ddd.ddd   .ddd  .  Decimals of the form  ddd.  are not permitted.

 

[147]

DECIMAL

::=

[0-9]* '.' [0-9]+

 

If this was not the intended result, the right-hand side of the production might better read as

 

[0-9]* '.' [0-9]+ | [0-9] '.'

 

The original motivation might be the conflicting use of a floating number in the object position of a triple, where a fraction-free float’s decimal point would conflict with the triple’s closing AND ‘.’ mark.

 

The implications of leaving [147] as-is is that FILTER [?data > 29.] is not legal, a visible annoyance.

 

 

2nd Case:  There is a slight inconsistency in treatment of DECIMAL (per above) and DOUBLEs:

 

[148]

DOUBLE

::=

[0-9]+ '.' [0-9]* EXPONENT | '.' ([0-9])+ EXPONENT | ([0-9])+ EXPONENT

 

For single precision DECIMAL, the leading digit is optional.  For DOUBLE, a leading digit is mandatory.  Is there a motivation for this difference?

 

 

My regards to the WG for taking 1.1 through to closure.


Thomas A. Visel

AlgebraixData, Inc.

(512) 651-5834

 

Reply | Threaded
Open this post in threaded view
|

Re: Possible grammar problem with decimal numbers

Andy Seaborne-3

On 09/04/13 18:21, Thomas Visel wrote:

Gentlemen,

While verifying our implementation of Sparql, I have encountered what several suspected problems in the grammar.

 

1st Case:  The following production (in http://www.w3.org/TR/sparql11-query/#rUnaryExpression) permits decimal numbers of the following exemplary forms:  ddd.d    ddd.ddd   .ddd  .  Decimals of the form  ddd.  are not permitted.

[147]

DECIMAL

::=

[0-9]* '.' [0-9]+

 

If this was not the intended result, the right-hand side of the production might better read as

 

[0-9]* '.' [0-9]+ | [0-9] '.'

 

The original motivation might be the conflicting use of a floating number in the object position of a triple, where a fraction-free float’s decimal point would conflict with the triple’s closing AND ‘.’ mark.

 

The implications of leaving [147] as-is is that FILTER [?data > 29.] is not legal, a visible annoyance.


This is intentional.

Consider:

<subject> <predicate> 123.

It was decide that this be an integer then a dot for the end of triple. This aligns with Turtle.  It also aligns with prefix names where the trailing DOT was never legal.

<subject> <predicate> ns:x.y.

has DOT as end of triple, not part of the local name.  The local part is "x.y"

Given that DOT is end-of-triple as well, we have to have things one way or the other. 

SPARQL-WG decided, along with RDF-WG, that the end-of-triple case was more important  and that the practice is seen in use (Freebase is an example where they is an assumption that DOT is end-of-triple when immediately after an integer). ".0" on decimals is sufficiently common and understood usage.

It flagged in last call.
http://www.w3.org/TR/2012/WD-sparql11-query-20120724/#grammar

[] is not legal for filters.

FILTER ( ?data > 29.0 )

This also aligns with the canonical representation in XSD:
http://www.w3.org/TR/xmlschema11-2/#f-fracDigitsMap

 

 

2nd Case:  There is a slight inconsistency in treatment of DECIMAL (per above) and DOUBLEs:

 

[148]

DOUBLE

::=

[0-9]+ '.' [0-9]* EXPONENT | '.' ([0-9])+ EXPONENT | ([0-9])+ EXPONENT

 

For single precision DECIMAL, the leading digit is optional.  For DOUBLE, a leading digit is mandatory.  Is there a motivation for this difference?


For double the leading zero is not mandatory - see the second clause of [148]

'.' ([0-9])+ EXPONENT


1e5
1.1e5
.1e5

are legal doubles.

(Decimal is xsd:decimal, which is arbitrary length - no limit or implication of precision)


You may find
http://www.sparql.org/query-validator.html

useful.  It is not endorsed by W3C but it is running the parser used to check and produce the HTML grammar in the final document.

    Andy

 

 

My regards to the WG for taking 1.1 through to closure.


Thomas A. Visel

AlgebraixData, Inc.

(512) 651-5834

 


Reply | Threaded
Open this post in threaded view
|

Re: Possible grammar problem with decimal numbers

Eric Prud'hommeaux
In reply to this post by Thomas Visel
* Thomas Visel <[hidden email]> [2013-04-09 17:21+0000]

> Gentlemen,
> While verifying our implementation of Sparql, I have encountered what several suspected problems in the grammar.
>
> 1st Case:  The following production (in http://www.w3.org/TR/sparql11-query/#rUnaryExpression) permits decimal numbers of the following exemplary forms:  ddd.d    ddd.ddd   .ddd  .  Decimals of the form  ddd.  are not permitted.
>
> [147] DECIMAL ::= [0-9]* '.' [0-9]+
>
>
> If this was not the intended result, the right-hand side of the production might better read as
>
> [0-9]* '.' [0-9]+ | [0-9] '.'
>
> The original motivation might be the conflicting use of a floating number in the object position of a triple, where a fraction-free float's decimal point would conflict with the triple's closing AND '.' mark.
>
> The implications of leaving [147] as-is is that FILTER [?data > 29.] is not legal, a visible annoyance.

This was considered to be less of an annoyance than requiring whitespace between a number and the '.' at the end of a triple, e.g. the Turtle/SPARQL triple "<s> <p> 7. .".
You'll note that none of the numerics permit a trailing '.':

[146] INTEGER ::= [0-9]+
[147] DECIMAL ::= [0-9]* '.' [0-9]+
[148] DOUBLE  ::= [0-9]+ '.' [0-9]* EXPONENT | '.' ([0-9])+ EXPONENT | ([0-9])+ EXPONENT


> 2nd Case:  There is a slight inconsistency in treatment of DECIMAL (per above) and DOUBLEs:
>
> [148] DOUBLE ::= [0-9]+ '.' [0-9]* EXPONENT | '.' ([0-9])+ EXPONENT | ([0-9])+ EXPONENT
>
>
> For single precision DECIMAL, the leading digit is optional.  For DOUBLE, a leading digit is mandatory.  Is there a motivation for this difference?

DOUBLE is the numeric variant that is the most permissive. for the mantissa, it permits leading, trailing or no '.'. The latter two variants below can't serve as DECIMALs because one is an INTEGER and the other ends a triple with a ".".

valid DOUBLEs:  ".0E0" "0.0E0" "0E0" "0.E0"
valid DECIMALs: ".0"   "0.0"
valid INTEGERs:                "0"
illegal:                             "0."


> My regards to the WG for taking 1.1 through to closure.
>
> Thomas A. Visel
> AlgebraixData, Inc.
> (512) 651-5834
>

--
-ericP

Reply | Threaded
Open this post in threaded view
|

Re: Possible grammar problem with decimal numbers

Thouraya Bouabana Tebibel-2
In reply to this post by Thomas Visel
Not speaking officially...

On Tue, Apr 9, 2013 at 1:21 PM, Thomas Visel <[hidden email]> wrote:

Gentlemen,

While verifying our implementation of Sparql, I have encountered what several suspected problems in the grammar.

 

1st Case:  The following production (in http://www.w3.org/TR/sparql11-query/#rUnaryExpression) permits decimal numbers of the following exemplary forms:  ddd.d    ddd.ddd   .ddd  .  Decimals of the form  ddd.  are not permitted.

 

[147]

DECIMAL

::=

[0-9]* '.' [0-9]+

 

If this was not the intended result, the right-hand side of the production might better read as

 

[0-9]* '.' [0-9]+ | [0-9] '.'

 

The original motivation might be the conflicting use of a floating number in the object position of a triple, where a fraction-free float’s decimal point would conflict with the triple’s closing AND ‘.’ mark.


It was due to a related reason. The decimal point would have been conflated with the DOT ending the triple if the triple ended in an integer. So:

{?a :p 2.}
{?a :p 2 .}

Would have two different meanings (the first would be a floating point object, and the second would be an integer).

I don't know what the implications of requiring a space before a triple terminator would be (as this could be a disambiguator), but it would certainly create incompatibilities with SPARQL 1.0 as well as Turtle and N3, and that alone is enough to mean the WG had to continue to keep the space optional.

The implications of leaving [147] as-is is that FILTER [?data > 29.] is not legal, a visible annoyance.


Perhaps it's annoying, but that's better than inconsistency between numbers in expressions and those in triples.
 

2nd Case:  There is a slight inconsistency in treatment of DECIMAL (per above) and DOUBLEs:

 

[148]

DOUBLE

::=

[0-9]+ '.' [0-9]* EXPONENT | '.' ([0-9])+ EXPONENT | ([0-9])+ EXPONENT

 

For single precision DECIMAL, the leading digit is optional.  For DOUBLE, a leading digit is mandatory.  Is there a motivation for this difference?


The leading digit is optional in the second forn, but not in the first. Requiring it in the first form prevents the following from being legal:  .E2

Unlike doubles, a trailing '.' won't be mistaken as a triple terminator, since it will always be followed by the exponent, which is why the first form in [148] allows a trailing DOT.

Regards,
Paul Gearon