Hello,
I have a comment about the semantics of the Minus operator from the SPARQL specification. I stumbled upon this while trying to implement SPARQL 1.1 in the RDFox system that we’re developing at Oxford University. In Section 18.5 (http://www.w3.org/TR/sparql11-query/#sparqlAlgebra) of the SPARQL 1.1 recommendation, I see that the Minus operator is defined as follows: Minus(Ω1, Ω2) = { μ | μ in Ω1 . ∀ μ' in Ω2, either μ and μ' are not compatible or dom(μ) and dom(μ') are disjoint } I see an explanation below the operator’s definition for the second part of the condition (i.e., "or dom(μ) and dom(μ') are disjoint”), but this seems to me as a suboptimal choice for at least two reasons. 1. The addition of this condition breaks means that the following two patterns don’t have the same result: (1) { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } } (2) { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } } The equivalence (A OPT B) = (A JOIN B) UNION (A MINUS B) seems natural and it holds already in SQL, so it may be unexpected for SPARQL to break this. Furthermore, query planning often depends on this equivalence, so the modified semantics of MINUS prevents the application of some well-known optimisation. 2. The present definition prevents an implementation approach that uses sideways information passing and iterators. To clarify this, consider the following pattern P, where SP1 and SP2 are some (unspecified) patterns: (3) P = { SP1 MINUS { SP2 } } Now assume that we’ve got a function eval() that takes as input a pattern and that returns a set of mappings that constitute answers to the pattern. One way to implement eval(SP1 MINUS { SP2 }) is as follows: R := empty set For each mapping μ in eval(SP1) Let SP2’ be the result of applying μ to SP2 (i.e., we replace in SP2 each variable x with μ(x)) If eval(SP2’) is empty then Add μ to R return R Evaluating the query in such a way might be better than using the usual bottom-up strategy. For example, if the size of eval(SP1) is small, whereas the size of eval(SP2) is very large but SP2 can be evaluated using indexes, then the above algorithm is much more efficient than the bottom-up one as it iterates over a small set and doesn’t require materialisation of a potentially very large set. Another benefit of this implementation strategy is that, depending on the structure of SP1 and SP2, we may not need to materialise the answers to either SP1 or SP2: if we use an iterator-based implementation (as is common in database systems), we may be able to evaluate SP1 on-the-fly, and we may also be able to check whether eval(SP2’) is empty on the fly as well. Unfortunately, the algorithm I sketched is incorrect with respect to the semantics given in the document, and I see no way to make it correct: after applying μ to SP2, there is no way of checking the constraint "or dom(μ) and dom(μ') are disjoint” from the definition of Minus. Thus, if one is to fully obey the semantics from the specification, it seems to me that one is forced to fully evaluate SP2 so that one can later check the domain disjointness condition. I understand that dropping the disjointness condition means that pattern { SP1 MINUS { } } has no answers regardless of the form of SP1. However, given the expected identities between JOIN, MINUS, and OPTIONAL, one might actually expect this; furthermore, I doubt that there would be many cases in practice where this effect would be observable; and finally I believe that the resulting specification would lend itself to better implementation. I understand that it might be too late to change the specification now, and I am writing this e-mail for the following reasons: - I would like to log a technical comment that might be taken into account in a future edition of the specification. - I am wondering whether WG members have anything to add to my comment. - I am wondering whether it might be possible to add somewhere (e.g., in the errata document) a comment that would allow SPARQL vendors to use the modified semantics of MINUS, but with a lesser degree of compatibility with the specification. Regards, Boris |
Dear Boris,
(note that this is not an official group answer)... just quick: > (1) { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } } > (2) { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } } Indeed, the semantics of OPTIONAL is defined by the LeftJoin algebra operator, cf. http://www.w3.org/TR/sparql11-query/#defn_algLeftJoin rather than by the combination of UNION and MINUS... in this context, see also the section on the relation between NOT EXISTS and MINUS http://www.w3.org/TR/sparql11-query/#neg-notexists-minus ... they are not always equivalent either. HTH, Axel -- Prof. Dr. Axel Polleres Institute for Information Business, WU Vienna url: http://www.polleres.net/ twitter: @AxelPolleres On 18 Jul 2014, at 17:07, Boris Motik <[hidden email]> wrote: > Hello, > > I have a comment about the semantics of the Minus operator from the SPARQL specification. I stumbled upon this while trying to implement SPARQL 1.1 in the RDFox system that we’re developing at Oxford University. In Section 18.5 (http://www.w3.org/TR/sparql11-query/#sparqlAlgebra) of the SPARQL 1.1 recommendation, I see that the Minus operator is defined as follows: > > Minus(Ω1, Ω2) = { μ | μ in Ω1 . ∀ μ' in Ω2, either μ and μ' are not compatible or dom(μ) and dom(μ') are disjoint } > > I see an explanation below the operator’s definition for the second part of the condition (i.e., "or dom(μ) and dom(μ') are disjoint”), but this seems to me as a suboptimal choice for at least two reasons. > > > > 1. The addition of this condition breaks means that the following two patterns don’t have the same result: > > (1) { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } } > (2) { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } } > > The equivalence (A OPT B) = (A JOIN B) UNION (A MINUS B) seems natural and it holds already in SQL, so it may be unexpected for SPARQL to break this. Furthermore, query planning often depends on this equivalence, so the modified semantics of MINUS prevents the application of some well-known optimisation. > > > > 2. The present definition prevents an implementation approach that uses sideways information passing and iterators. To clarify this, consider the following pattern P, where SP1 and SP2 are some (unspecified) patterns: > > (3) P = { SP1 MINUS { SP2 } } > > Now assume that we’ve got a function eval() that takes as input a pattern and that returns a set of mappings that constitute answers to the pattern. One way to implement eval(SP1 MINUS { SP2 }) is as follows: > > R := empty set > For each mapping μ in eval(SP1) > Let SP2’ be the result of applying μ to SP2 (i.e., we replace in SP2 each variable x with μ(x)) > If eval(SP2’) is empty then > Add μ to R > return R > > Evaluating the query in such a way might be better than using the usual bottom-up strategy. For example, if the size of eval(SP1) is small, whereas the size of eval(SP2) is very large but SP2 can be evaluated using indexes, then the above algorithm is much more efficient than the bottom-up one as it iterates over a small set and doesn’t require materialisation of a potentially very large set. Another benefit of this implementation strategy is that, depending on the structure of SP1 and SP2, we may not need to materialise the answers to either SP1 or SP2: if we use an iterator-based implementation (as is common in database systems), we may be able to evaluate SP1 on-the-fly, and we may also be able to check whether eval(SP2’) is empty on the fly as well. > > Unfortunately, the algorithm I sketched is incorrect with respect to the semantics given in the document, and I see no way to make it correct: after applying μ to SP2, there is no way of checking the constraint "or dom(μ) and dom(μ') are disjoint” from the definition of Minus. Thus, if one is to fully obey the semantics from the specification, it seems to me that one is forced to fully evaluate SP2 so that one can later check the domain disjointness condition. > > > > I understand that dropping the disjointness condition means that pattern { SP1 MINUS { } } has no answers regardless of the form of SP1. However, given the expected identities between JOIN, MINUS, and OPTIONAL, one might actually expect this; furthermore, I doubt that there would be many cases in practice where this effect would be observable; and finally I believe that the resulting specification would lend itself to better implementation. I understand that it might be too late to change the specification now, and I am writing this e-mail for the following reasons: > > - I would like to log a technical comment that might be taken into account in a future edition of the specification. > - I am wondering whether WG members have anything to add to my comment. > - I am wondering whether it might be possible to add somewhere (e.g., in the errata document) a comment that would allow SPARQL vendors to use the modified semantics of MINUS, but with a lesser degree of compatibility with the specification. > > Regards, > > Boris > > |
Hello,
Thanks for this answer. I understand the point you make; my point is that the choice of the semantics for Minus is suboptimal and that it may hinder efficient implementation.
Regards,
Boris
-------- Original message -------- From: Axel Polleres <[hidden email]> Date: To: Boris Motik <[hidden email]> Cc: [hidden email] Subject: Re: A comment about the semantics of the Minus operator from the SPARQL specification Dear Boris,
(note that this is not an official group answer)... just quick: > (1) { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } } > (2) { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } } Indeed, the semantics of OPTIONAL is defined by the LeftJoin algebra operator, cf. http://www.w3.org/TR/sparql11-query/#defn_algLeftJoin rather than by the combination of UNION and MINUS... in this context, see also the section on the relation between NOT EXISTS and MINUS http://www.w3.org/TR/sparql11-query/#neg-notexists-minus ... they are not always equivalent either. HTH, Axel -- Prof. Dr. Axel Polleres Institute for Information Business, WU Vienna url: http://www.polleres.net/ twitter: @AxelPolleres On 18 Jul 2014, at 17:07, Boris Motik <[hidden email]> wrote: > Hello, > > I have a comment about the semantics of the Minus operator from the SPARQL specification. I stumbled upon this while trying to implement SPARQL 1.1 in the RDFox system that we’re developing at Oxford University. In Section 18.5 (http://www.w3.org/TR/sparql11-query/#sparqlAlgebra) of the SPARQL 1.1 recommendation, I see that the Minus operator is defined as follows: > > Minus(Ω1, Ω2) = { μ | μ in Ω1 . ∀ μ' in Ω2, either μ and μ' are not compatible or dom(μ) and dom(μ') are disjoint } > > I see an explanation below the operator’s definition for the second part of the condition (i.e., "or dom(μ) and dom(μ') are disjoint”), but this seems to me as a suboptimal choice for at least two reasons. > > > > 1. The addition of this condition breaks means that the following two patterns don’t have the same result: > > (1) { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } } > (2) { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } } > > The equivalence (A OPT B) = (A JOIN B) UNION (A MINUS B) seems natural and it holds already in SQL, so it may be unexpected for SPARQL to break this. Furthermore, query planning often depends on this equivalence, so the modified semantics of MINUS prevents the application of some well-known optimisation. > > > > 2. The present definition prevents an implementation approach that uses sideways information passing and iterators. To clarify this, consider the following pattern P, where SP1 and SP2 are some (unspecified) patterns: > > (3) P = { SP1 MINUS { SP2 } } > > Now assume that we’ve got a function eval() that takes as input a pattern and that returns a set of mappings that constitute answers to the pattern. One way to implement eval(SP1 MINUS { SP2 }) is as follows: > > R := empty set > For each mapping μ in eval(SP1) > Let SP2’ be the result of applying μ to SP2 (i.e., we replace in SP2 each variable x with μ(x)) > If eval(SP2’) is empty then > Add μ to R > return R > > Evaluating the query in such a way might be better than using the usual bottom-up strategy. For example, if the size of eval(SP1) is small, whereas the size of eval(SP2) is very large but SP2 can be evaluated using indexes, then the above algorithm is much more efficient than the bottom-up one as it iterates over a small set and doesn’t require materialisation of a potentially very large set. Another benefit of this implementation strategy is that, depending on the structure of SP1 and SP2, we may not need to materialise the answers to either SP1 or SP2: if we use an iterator-based implementation (as is common in database systems), we may be able to evaluate SP1 on-the-fly, and we may also be able to check whether eval(SP2’) is empty on the fly as well. > > Unfortunately, the algorithm I sketched is incorrect with respect to the semantics given in the document, and I see no way to make it correct: after applying μ to SP2, there is no way of checking the constraint "or dom(μ) and dom(μ') are disjoint” from the definition of Minus. Thus, if one is to fully obey the semantics from the specification, it seems to me that one is forced to fully evaluate SP2 so that one can later check the domain disjointness condition. > > > > I understand that dropping the disjointness condition means that pattern { SP1 MINUS { } } has no answers regardless of the form of SP1. However, given the expected identities between JOIN, MINUS, and OPTIONAL, one might actually expect this; furthermore, I doubt that there would be many cases in practice where this effect would be observable; and finally I believe that the resulting specification would lend itself to better implementation. I understand that it might be too late to change the specification now, and I am writing this e-mail for the following reasons: > > - I would like to log a technical comment that might be taken into account in a future edition of the specification. > - I am wondering whether WG members have anything to add to my comment. > - I am wondering whether it might be possible to add somewhere (e.g., in the errata document) a comment that would allow SPARQL vendors to use the modified semantics of MINUS, but with a lesser degree of compatibility with the specification. > > Regards, > > Boris > > |
((Another unofficial reply))
Hi Boris, Thank you for the comment and the clear details. Did you look at the "diff" operator in the algebra (without the expression part)? (it was in SPARQL 1.0) (A OPT B) = (A JOIN B) UNION (A DIFF B) MINUS was designed by considering use cases (it's a small world - at a face-to-face with one end of the video link in cs.ox.ac.uk!) and is a user written operation. DIFF is more internal and I think has the feature you are looking for. It does not have the domain codition. For MINUS, if you remove the "dom(μ) and dom(μ') are disjoint" condition then it's not just "pattern MINUS' {}" that is empty. { ?X rdf:type :A . MINUS' { ?Y rdf:type :B } } is empty because any solution of {?X rdf:type :A} is compatible with { ?Y rdf:type :B } (no common variables to compare so it's compatible). In fact, it would take only one solution mapping in the RHS to have no variables in common with LHS and the MINUS result is the empty multiset regardless of the rest of the RHS. The analogy to SQL needs to treated with one element of care - in SQL, there are column compatibility rules where MINUS and UNION operate on column compatible results from inner SELECTs. Andy PS see also: http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2014May/0000.html On 19/07/14 11:37, Boris Motik wrote: > Hello, > > Thanks for this answer. I understand the point you make; my point is > that the choice of the semantics for Minus is suboptimal and that it may > hinder efficient implementation. > > Regards, > > Boris > > > > -------- Original message -------- > From: Axel Polleres <[hidden email]> > Date: > To: Boris Motik <[hidden email]> > Cc: [hidden email] > Subject: Re: A comment about the semantics of the Minus operator from > the SPARQL specification > > > Dear Boris, > > (note that this is not an official group answer)... just quick: > >> (1) { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } } >> (2) { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } } > > Indeed, the semantics of OPTIONAL is defined by the LeftJoin algebra > operator, cf. > http://www.w3.org/TR/sparql11-query/#defn_algLeftJoin > rather than by the combination of UNION and MINUS... in this context, > see also the section on the relation between NOT EXISTS and MINUS > http://www.w3.org/TR/sparql11-query/#neg-notexists-minus > ... they are not always equivalent either. > > HTH, > Axel > > > -- > Prof. Dr. Axel Polleres > Institute for Information Business, WU Vienna > url: http://www.polleres.net/ twitter: @AxelPolleres > > On 18 Jul 2014, at 17:07, Boris Motik <[hidden email]> wrote: > >> Hello, >> >> I have a comment about the semantics of the Minus operator from the SPARQL specification. I stumbled upon this while trying to implement SPARQL 1.1 in the RDFox system that we’re developing at Oxford University. In Section 18.5 (http://www.w3.org/TR/sparql11-query/#sparqlAlgebra) of the SPARQL 1.1 > recommendation, I see that the Minus operator is defined as follows: >> >> Minus(Ω1, Ω2) = { μ | μ in Ω1 . ∀ μ' in Ω2, either μ and μ' are not compatible or dom(μ) and dom(μ') are disjoint } >> >> I see an explanation below the operator’s definition for the second part of the condition (i.e., "or dom(μ) and dom(μ') are disjoint”), but this seems to me as a suboptimal choice for at least two reasons. >> >> >> >> 1. The addition of this condition breaks means that the following two patterns don’t have the same result: >> >> (1) { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } } >> (2) { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } } >> >> The equivalence (A OPT B) = (A JOIN B) UNION (A MINUS B) seems natural and it holds already in SQL, so it may be unexpected for SPARQL to break this. Furthermore, query planning often depends on this equivalence, so the modified semantics of MINUS prevents the application of some well-known optimisation. >> >> >> >> 2. The present definition prevents an implementation approach that uses sideways information passing and iterators. To clarify this, consider the following pattern P, where SP1 and SP2 are some (unspecified) patterns: >> >> (3) P = { SP1 MINUS { SP2 } } >> >> Now assume that we’ve got a function eval() that takes as input a pattern and that returns a set of mappings that constitute answers to the pattern. One way to implement eval(SP1 MINUS { SP2 }) is as follows: >> >> R := empty set >> For each mapping μ in eval(SP1) >> Let SP2’ be the result of applying μ to SP2 (i.e., we replace in SP2 each variable x with μ(x)) >> If eval(SP2’) is empty then >> Add μ to R >> return R >> >> Evaluating the query in such a way might be better than using the usual bottom-up strategy. For example, if the size of eval(SP1) is small, whereas the size of eval(SP2) is very large but SP2 can be evaluated using indexes, then the above algorithm is much more efficient than the bottom-up one as it iterates over a small set > and doesn’t require materialisation of a potentially very large set. > Another benefit of this implementation strategy is that, depending on > the structure of SP1 and SP2, we may not need to materialise the answers > to either SP1 or SP2: if we use an iterator-based implementation (as is > common in database systems), we may be able to evaluate SP1 on-the-fly, > and we may also be able to check whether eval(SP2’) is empty on the fly > as well. >> >> Unfortunately, the algorithm I sketched is incorrect with respect to the semantics given in the document, and I see no way to make it correct: after applying μ to SP2, there is no way of checking the constraint "or dom(μ) and dom(μ') are disjoint” from the definition of Minus. Thus, if one is to fully obey the semantics from > the specification, it seems to me that one is forced to fully evaluate > SP2 so that one can later check the domain disjointness condition. >> >> >> >> I understand that dropping the disjointness condition means that pattern { SP1 MINUS { } } has no answers regardless of the form of SP1. However, given the expected identities between JOIN, MINUS, and OPTIONAL, one might actually expect this; furthermore, I doubt that there would be many cases in practice where this effect > would be observable; and finally I believe that the resulting > specification would lend itself to better implementation. I understand > that it might be too late to change the specification now, and I am > writing this e-mail for the following reasons: >> >> - I would like to log a technical comment that might be taken into account in a future edition of the specification. >> - I am wondering whether WG members have anything to add to my comment. >> - I am wondering whether it might be possible to add somewhere (e.g., in the errata document) a comment that would allow SPARQL vendors to use the modified semantics of MINUS, but with a lesser degree of compatibility with the specification. >> >> Regards, >> >> Boris >> >> > |
Free forum by Nabble | Edit this page |