SPARQL 1.1 Query erratum

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SPARQL 1.1 Query erratum

Steve Harris-11
Spotted by one of the guys in Experian.

    "The IN operator tests whether the RDF term on the left-hand side is found in the values of list of expressions on the right-hand side. The test is done with "=" operator, which tests for the same value, as determined by the operator mapping."

    "The IN operator is equivalent to the SPARQL expression:

    (lhs = expression1) || (lhs = expression2) || …"

    But that's not true given a non-deterministic lhs, e.g.:

    (RAND() < 0.5) IN (true, false)

    Which is always true by my reading of the text, and only true 75% of the time by the equivalence.

Suggest the following text for the errata:

    <div class="entry">
      <p><span style="font-weight: bold;"><a id="errata-query-3">errata-query-3</a></span></p>
      <p>Report: <a href="">this mail...</a></p>
      <p>In the <a href="http://www.w3.org/TR/sparql11-query/#func-in">definition of the IN operator</a>, the <tt>(lhs = expressionN)</tt> equivalence should be downgraded to an illustration, as the equivalence doesn't hold for a non-degterministic lhs.</p>
    </div>

Cheers,
   Steve

-- 
Steve Harris
Experian
+44 20 3042 4132
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL

Reply | Threaded
Open this post in threaded view
|

Re: SPARQL 1.1 Query erratum

Rob Vesse-2
Hey Steve

Maybe I'm being dense because it is too early in the morning here but I don't see this (with this specific example at least)

Translating to the || form we get:

((RAND() < 0.5) = true) || ((RAND() < 0.5) = false)

Surely that is still always true?

The (RAND() < 0.5) portion non-deterministically returns true/false, so this means it always returns true for one/both sides of the || (since technically the two invocations of RAND() could produce values that make both clauses evaluate to true). Thus in this example the two forms are equivalent even with the non-deterministic LHS.

I can believe that there may be cases where the || transformation does produce different results but can't come up with one off the top of my head.  Do you have an actual worked example of a non-deterministic LHS that demonstrates that the || transformation does not hold?

Rob

From: Steve Harris <[hidden email]>
Date: Tuesday, May 7, 2013 6:32 AM
To: "[hidden email]" <[hidden email]>, Ivan Herman <[hidden email]>
Subject: SPARQL 1.1 Query erratum
Resent-From: <[hidden email]>
Resent-Date: Tue, 07 May 2013 13:33:30 +0000

Spotted by one of the guys in Experian.

    "The IN operator tests whether the RDF term on the left-hand side is found in the values of list of expressions on the right-hand side. The test is done with "=" operator, which tests for the same value, as determined by the operator mapping."

    "The IN operator is equivalent to the SPARQL expression:

    (lhs = expression1) || (lhs = expression2) || …"

    But that's not true given a non-deterministic lhs, e.g.:

    (RAND() < 0.5) IN (true, false)

    Which is always true by my reading of the text, and only true 75% of the time by the equivalence.

Suggest the following text for the errata:

    <div class="entry">
      <p><span style="font-weight: bold;"><a id="errata-query-3">errata-query-3</a></span></p>
      <p>Report: <a href="">this mail...</a></p>
      <p>In the <a href="http://www.w3.org/TR/sparql11-query/#func-in">definition of the IN operator</a>, the <tt>(lhs = expressionN)</tt> equivalence should be downgraded to an illustration, as the equivalence doesn't hold for a non-degterministic lhs.</p>
    </div>

Cheers,
   Steve

-- 
Steve Harris
Experian
+44 20 3042 4132
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL

Reply | Threaded
Open this post in threaded view
|

Re: SPARQL 1.1 Query erratum

Andy Seaborne-3


On 07/05/13 17:46, Rob Vesse wrote:

> Hey Steve
>
> Maybe I'm being dense because it is too early in the morning here but I
> don't see this (with this specific example at least)
>
> Translating to the || form we get:
>
> ((RAND() < 0.5) = true) || ((RAND() < 0.5) = false)
>
> Surely that is still always true?

That expression is true about 75% of the time.   There are two calls to
RAND() and they can produce different values. Suppose the first call is
0.7 and the second is 0.3 then the expression is false.

The detail is evaluation and

"""
(lhs = expression1) || (lhs = expression2) || ...
"""

If you read the spec as evaluate the LHS once to get a value, marked
'lhs', you're right.

i.e like:

BIND(RAND() AS ?X)
FILTER( (?X = true) || (?X = false) )

If you read it as a rewrite before evaluation, then one call of RAND()
becomes 2 calls.

FILTER( (RAND() < 0.5) = true) || ((RAND() < 0.5) = false) )

The definition of IN says "rdfTerm IN (expression, ...)" so the first
reading is stronger but it's worth noting in an errata as someone has
raised it.

> The (RAND() < 0.5) portion non-deterministically returns true/false, so
> this means it always returns true for one/both sides of the || (since
> technically the two invocations of RAND() could produce values that make
> both clauses evaluate to true). Thus in this example the two forms are
> equivalent even with the non-deterministic LHS.
>
> I can believe that there may be cases where the || transformation does
> produce different results but can't come up with one off the top of my
> head.

Let's try and capture any you come up with.

The other non-functions are BNODE() and UUID()/STRUUID()

> Do you have an actual worked example of a non-deterministic LHS
> that demonstrates that the || transformation does not hold?
>
> Rob

        Andy

>
> From: Steve Harris <[hidden email]
> <mailto:[hidden email]>>
> Date: Tuesday, May 7, 2013 6:32 AM
> To: "[hidden email]
> <mailto:[hidden email]>"
> <[hidden email]
> <mailto:[hidden email]>>, Ivan Herman <[hidden email]
> <mailto:[hidden email]>>
> Subject: SPARQL 1.1 Query erratum
> Resent-From: <[hidden email]
> <mailto:[hidden email]>>
> Resent-Date: Tue, 07 May 2013 13:33:30 +0000
>
>     Spotted by one of the guys in Experian.
>
>          "The IN operator tests whether the RDF term on the left-hand
>     side is found in the values of list of expressions on the right-hand
>     side. The test is done with "=" operator, which tests for the same
>     value, as determined by the operator mapping."
>
>          "The IN operator is equivalent to the SPARQL expression:
>
>          (lhs = expression1) || (lhs = expression2) || …"
>
>          But that's not true given a non-deterministic lhs, e.g.:
>
>          (RAND() < 0.5) IN (true, false)
>
>          Which is always true by my reading of the text, and only true
>     75% of the time by the equivalence.
>
>     Suggest the following text for the errata:
>
>          <div class="entry">
>            <p><span style="font-weight: bold;"><a
>     id="errata-query-3">errata-query-3</a></span></p>
>            <p>Report: <a href="">this mail...</a></p>
>            <p>In the <a
>     href="http://www.w3.org/TR/sparql11-query/#func-in">definition of
>     the IN operator</a>, the <tt>(lhs = expressionN)</tt> equivalence
>     should be downgraded to an illustration, as the equivalence doesn't
>     hold for a non-degterministic lhs.</p>
>          </div>
>
>     Cheers,
>         Steve
>
>     --
>     Steve Harris
>     Experian
>     +44 20 3042 4132
>     Registered in England and Wales 653331 VAT # 887 1335 93
>     80 Victoria Street, London, SW1E 5JL
>