Possible Bug in SPARQL 1.1 Protocol Validator

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible Bug in SPARQL 1.1 Protocol Validator

Rob Vesse
Hi All

First off thanks for the excellent job you've all done in putting together
the SPARQL 1.1 specifications and for the comprehensive and substantial
test suite you've provided.  I look forward to SPARQL 1.1 becoming an
official W3C recommendation in the very near future

However I have encountered what may prove to be a bug in the SPARQL 1.1
Protocol Validator.  When I previously reported my results for this to
Gregory Williams for inclusion in the implementation report I had 4 tests
failing.  As my implementation has a very Windows centric environment
it was difficult for me to debug with the test runner as is so I ported
the problem tests to
Java - https://bitbucket.org/dotnetrdf/sparql11-protocol-validator/overview

Once I did this I found that there were some bugs in my SPARQL engine but
that my protocol implementation appeared to be fine, with the bugs in my
engine fixed the four failing tests now passed.  Yet when I run using the
official validator the tests still fail, specifically:

update_dataset_default_graph
update_dataset_default_graphs
update_dataset_named_graphs
update_dataset_full

After some digging in the Perl code I have identified what might be the
root cause of the problem, in the relevant tests the URIs are created like
so:

POST("${uurl}?using-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2F
data%2Fdata1.rdf&using-named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2F
sparql%2Fdata%2Fdata2.rdf", [
                                                'update' => $sparql,
                                        ]);

This appears to result in double encoding of the using-graph-uri and
using-named-graph parameters and since .Net only decodes the parameters
once for me (and I am clearly not going to decode them multiple times) the
SPARQL Updates end up not creating the expected data because the graph
URIs are incorrect.

In my Java harness it is passing the unencoded form through Apache HTTP
Client and this encodes the URI only once so I get the
correct URI on the server side and the tests pass.

I verified that double encoding does indeed appear to be the root cause of
the problem by replacing the unencoded form with the encoded form in my
Java test harness and then the tests start failing.

So it looks like the Perl code should be as follows:

POST("${uurl}?using-graph-uri=http://kasei.us/2009/09/sparql/
data/data1.rdf&using-named-graph-uri=http://kasei.us/2009/09/sparql/data/da
ta2.rdf", [
                                                'update' => $sparql,
                                        ]);

I.e. the URLs should not be encoded as LWP should take care of this
automatically AFAICT

However I am not 100% certain that double encoding is the issue because
other implementations like Fuseki seem to be totally fine.

I have spent some time trying to get the protocol validator running in an
Apache instance on my OS X laptop but have had little luck.  There is an
apparent undeclared dependency on TryCatch which won't install properly
under OS X for reasons unbeknownst to me and after forcing install the
script just fails to run with a vague and unhelpful compilation error in
the Apache logs.  Knowing next to nothing about Apache and Perl I'd rather
that someone who had a good environment to start with tried out making my
suggested changes and running against my implementation to see if
everything then passes.

Also if someone can look at my Java ports of the tests in question and
check I haven't made an error in porting the tests that would be
appreciated.

For reference the live endpoints for my installation are as follows:

Query - http://www.dotnetrdf.org/demos/server/query
Update - http://www.dotnetrdf.org/demos/server/update

Best Regards,

Rob Vesse





Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug in SPARQL 1.1 Protocol Validator

Gregory Williams
On Dec 17, 2012, at 6:03 PM, Rob Vesse wrote:

> After some digging in the Perl code I have identified what might be the
> root cause of the problem, in the relevant tests the URIs are created like
> so:
>
> POST("${uurl}?using-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2F
> data%2Fdata1.rdf&using-named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2F
> sparql%2Fdata%2Fdata2.rdf", [
> 'update' => $sparql,
> ]);
>
> This appears to result in double encoding of the using-graph-uri and
> using-named-graph parameters and since .Net only decodes the parameters
> once for me (and I am clearly not going to decode them multiple times) the
> SPARQL Updates end up not creating the expected data because the graph
> URIs are incorrect.


Hi Rob,

Where are you seeing the double encoding? I'm able to take that POST line, run it, and see this on the server side:

------------
POST /?using-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2F%20data%2Fdata1.rdf&using-named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2F%20sparql%2Fdata%2Fdata2.rdf HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:8881
User-Agent: libwww-perl/5.834
Content-Length: 27
Content-Type: application/x-www-form-urlencoded

update=sparql+update+string
------------

Do you believe this is wrongly encoded? Given that there are several implementations passing the protocol tests using this validator (I know of ones in perl, java, and c++), I believe the problem may lie elsewhere.

> In my Java harness it is passing the unencoded form through Apache HTTP
> Client and this encodes the URI only once so I get the
> correct URI on the server side and the tests pass.
>
> I verified that double encoding does indeed appear to be the root cause of
> the problem by replacing the unencoded form with the encoded form in my
> Java test harness and then the tests start failing.

This sounds like it might be a difference between the perl and java http library APIs...?

> So it looks like the Perl code should be as follows:
>
> POST("${uurl}?using-graph-uri=http://kasei.us/2009/09/sparql/
> data/data1.rdf&using-named-graph-uri=http://kasei.us/2009/09/sparql/data/da
> ta2.rdf", [
> 'update' => $sparql,
> ]);
>
> I.e. the URLs should not be encoded as LWP should take care of this
> automatically AFAICT

That does not seem to be the case with the version of LWP I am using.


> However I am not 100% certain that double encoding is the issue because
> other implementations like Fuseki seem to be totally fine.
>
> I have spent some time trying to get the protocol validator running in an
> Apache instance on my OS X laptop but have had little luck.  There is an
> apparent undeclared dependency on TryCatch which won't install properly
> under OS X for reasons unbeknownst to me and after forcing install the
> script just fails to run with a vague and unhelpful compilation error in
> the Apache logs.

Yes, TryCatch is required, as is Plack. I can update the documentation, but not sure how else to debug the problem as it works locally for me, and is working on my server where the validator is being hosted.

.greg


Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug in SPARQL 1.1 Protocol Validator

Rob Vesse
Hi Gregory

Comments inline:

On 12/18/12 1:56 PM, "Gregory Williams" <[hidden email]> wrote:

>On Dec 17, 2012, at 6:03 PM, Rob Vesse wrote:
>
>> After some digging in the Perl code I have identified what might be the
>> root cause of the problem, in the relevant tests the URIs are created
>>like
>> so:
>>
>>
>>POST("${uurl}?using-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%
>>2F
>>
>>data%2Fdata1.rdf&using-named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%
>>2F
>> sparql%2Fdata%2Fdata2.rdf", [
>> 'update' => $sparql,
>> ]);
>>
>> This appears to result in double encoding of the using-graph-uri and
>> using-named-graph parameters and since .Net only decodes the parameters
>> once for me (and I am clearly not going to decode them multiple times)
>>the
>> SPARQL Updates end up not creating the expected data because the graph
>> URIs are incorrect.
>
>
>Hi Rob,
>
>Where are you seeing the double encoding? I'm able to take that POST
>line, run it, and see this on the server side:
>
>------------
>POST
>/?using-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2F%20data%2Fd
>ata1.rdf&using-named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2F%20spar
>ql%2Fdata%2Fdata2.rdf HTTP/1.1
>TE: deflate,gzip;q=0.3
>Connection: TE, close
>Host: localhost:8881
>User-Agent: libwww-perl/5.834
>Content-Length: 27
>Content-Type: application/x-www-form-urlencoded
>
>update=sparql+update+string
>------------
>
>Do you believe this is wrongly encoded? Given that there are several
>implementations passing the protocol tests using this validator (I know
>of ones in perl, java, and c++), I believe the problem may lie elsewhere.

It's my best guess at what the problem might be given that I have
eliminated all other obvious explanations to the best of my ability.  To
clarify I have done the following:

1 - Running the command sequences manually through my web UI - All Pass
2 - Running the command sequences in those tests using CURL - All Pass
(See
https://bitbucket.org/dotnetrdf/sparql11-protocol-validator/src/tip/protoco
l.sh?at=default)
3 - Running my Java ports of those tests - All Pass
4 - Running unit test versions of the command sequences I.e. eliminating
any protocol interaction and adjusting the commands to add the USING/USING
NAMED statements that the  protocol should be adding - All Pass

While I could have ported the tests incorrectly once, four times starts to
seem a little unlikely, once I could do something dumb but believe me I've
spent a lot of time staring at these tests already.  So either the test
harness is bad or my implementation is bad (or I really suck at copy and
paste), given that I can get the tests to run successfully in four other
ways I tend to lean towards some oddity in the test harness.

It may be double encoding or perhaps the tests that the harness runs
aren't exactly the same as the tests as documented in the ReadMe (which as
far as I can see is not the case)?

Debugging this with the official harness is a PITA for me because I can't
debug my live implementation using the public instance of the test harness
and since I can't get the test harness to install and run locally yet I am
rather stuck.

I am not ruling out a bug in my implementation but it's hard to know where
to look given all my ported versions of the tests pass and the difficulty
of quickly running the tests in a usable debugging environment for my
implementation.

Rob

>
>> In my Java harness it is passing the unencoded form through Apache HTTP
>> Client and this encodes the URI only once so I get the
>> correct URI on the server side and the tests pass.
>>
>> I verified that double encoding does indeed appear to be the root cause
>>of
>> the problem by replacing the unencoded form with the encoded form in my
>> Java test harness and then the tests start failing.
>
>This sounds like it might be a difference between the perl and java http
>library APIs...?
>
>> So it looks like the Perl code should be as follows:
>>
>> POST("${uurl}?using-graph-uri=http://kasei.us/2009/09/sparql/
>>
>>data/data1.rdf&using-named-graph-uri=http://kasei.us/2009/09/sparql/data/
>>da
>> ta2.rdf", [
>> 'update' => $sparql,
>> ]);
>>
>> I.e. the URLs should not be encoded as LWP should take care of this
>> automatically AFAICT
>
>That does not seem to be the case with the version of LWP I am using.
>
>
>> However I am not 100% certain that double encoding is the issue because
>> other implementations like Fuseki seem to be totally fine.
>>
>> I have spent some time trying to get the protocol validator running in
>>an
>> Apache instance on my OS X laptop but have had little luck.  There is an
>> apparent undeclared dependency on TryCatch which won't install properly
>> under OS X for reasons unbeknownst to me and after forcing install the
>> script just fails to run with a vague and unhelpful compilation error in
>> the Apache logs.
>
>Yes, TryCatch is required, as is Plack. I can update the documentation,
>but not sure how else to debug the problem as it works locally for me,
>and is working on my server where the validator is being hosted.
>
>.greg
>





Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug in SPARQL 1.1 Protocol Validator

Gregory Williams
On Dec 19, 2012, at 2:28 PM, Rob Vesse wrote:

>> Where are you seeing the double encoding? I'm able to take that POST
>> line, run it, and see this on the server side:
>>
>> ------------
>> POST
>> /?using-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2F%20data%2Fd
>> ata1.rdf&using-named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2F%20spar
>> ql%2Fdata%2Fdata2.rdf HTTP/1.1
>> TE: deflate,gzip;q=0.3
>> Connection: TE, close
>> Host: localhost:8881
>> User-Agent: libwww-perl/5.834
>> Content-Length: 27
>> Content-Type: application/x-www-form-urlencoded
>>
>> update=sparql+update+string
>> ------------
>>
>> Do you believe this is wrongly encoded? Given that there are several
>> implementations passing the protocol tests using this validator (I know
>> of ones in perl, java, and c++), I believe the problem may lie elsewhere.
>
> It's my best guess at what the problem might be given that I have
> eliminated all other obvious explanations to the best of my ability.  To
> clarify I have done the following:
>
> 1 - Running the command sequences manually through my web UI - All Pass
> 2 - Running the command sequences in those tests using CURL - All Pass
> (See
> https://bitbucket.org/dotnetrdf/sparql11-protocol-validator/src/tip/protoco
> l.sh?at=default)
> 3 - Running my Java ports of those tests - All Pass
> 4 - Running unit test versions of the command sequences I.e. eliminating
> any protocol interaction and adjusting the commands to add the USING/USING
> NAMED statements that the  protocol should be adding - All Pass
>
> While I could have ported the tests incorrectly once, four times starts to
> seem a little unlikely, once I could do something dumb but believe me I've
> spent a lot of time staring at these tests already.  So either the test
> harness is bad or my implementation is bad (or I really suck at copy and
> paste), given that I can get the tests to run successfully in four other
> ways I tend to lean towards some oddity in the test harness.

I agree that seems strange, but that leaves us with several ways in which your system is passing, and several other implementations that all work just fine with the harness.

> It may be double encoding or perhaps the tests that the harness runs
> aren't exactly the same as the tests as documented in the ReadMe (which as
> far as I can see is not the case)?
>
> Debugging this with the official harness is a PITA for me because I can't
> debug my live implementation using the public instance of the test harness
> and since I can't get the test harness to install and run locally yet I am
> rather stuck.
>
> I am not ruling out a bug in my implementation but it's hard to know where
> to look given all my ported versions of the tests pass and the difficulty
> of quickly running the tests in a usable debugging environment for my
> implementation.

Well, the implementation report is based entirely on self-reported results. It sounds to me like you've done due diligence on ensuring that your implementation is in conformance with the spec and the tests as written, and works with your client code. At this point, I think I'd suggest simply submitting new EARL results indicating all passes and marking the tests at issue with [ earl:mode earl:manual ].

thanks,
.greg


Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug in SPARQL 1.1 Protocol Validator

Rob Vesse
Ok, I will go ahead and do that

I will still try and get the harness running in my environment to see if I
can track down what the issue is and will let you know if and when I find
out whether the cause was the test harness or some bug in my implementation

Rob

On 12/19/12 11:44 AM, "Gregory Williams" <[hidden email]> wrote:

>On Dec 19, 2012, at 2:28 PM, Rob Vesse wrote:
>
>>> Where are you seeing the double encoding? I'm able to take that POST
>>> line, run it, and see this on the server side:
>>>
>>> ------------
>>> POST
>>>
>>>/?using-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2Fsparql%2F%20data%2
>>>Fd
>>>
>>>ata1.rdf&using-named-graph-uri=http%3A%2F%2Fkasei.us%2F2009%2F09%2F%20sp
>>>ar
>>> ql%2Fdata%2Fdata2.rdf HTTP/1.1
>>> TE: deflate,gzip;q=0.3
>>> Connection: TE, close
>>> Host: localhost:8881
>>> User-Agent: libwww-perl/5.834
>>> Content-Length: 27
>>> Content-Type: application/x-www-form-urlencoded
>>>
>>> update=sparql+update+string
>>> ------------
>>>
>>> Do you believe this is wrongly encoded? Given that there are several
>>> implementations passing the protocol tests using this validator (I know
>>> of ones in perl, java, and c++), I believe the problem may lie
>>>elsewhere.
>>
>> It's my best guess at what the problem might be given that I have
>> eliminated all other obvious explanations to the best of my ability.  To
>> clarify I have done the following:
>>
>> 1 - Running the command sequences manually through my web UI - All Pass
>> 2 - Running the command sequences in those tests using CURL - All Pass
>> (See
>>
>>https://bitbucket.org/dotnetrdf/sparql11-protocol-validator/src/tip/proto
>>co
>> l.sh?at=default)
>> 3 - Running my Java ports of those tests - All Pass
>> 4 - Running unit test versions of the command sequences I.e. eliminating
>> any protocol interaction and adjusting the commands to add the
>>USING/USING
>> NAMED statements that the  protocol should be adding - All Pass
>>
>> While I could have ported the tests incorrectly once, four times starts
>>to
>> seem a little unlikely, once I could do something dumb but believe me
>>I've
>> spent a lot of time staring at these tests already.  So either the test
>> harness is bad or my implementation is bad (or I really suck at copy and
>> paste), given that I can get the tests to run successfully in four other
>> ways I tend to lean towards some oddity in the test harness.
>
>I agree that seems strange, but that leaves us with several ways in which
>your system is passing, and several other implementations that all work
>just fine with the harness.
>
>> It may be double encoding or perhaps the tests that the harness runs
>> aren't exactly the same as the tests as documented in the ReadMe (which
>>as
>> far as I can see is not the case)?
>>
>> Debugging this with the official harness is a PITA for me because I
>>can't
>> debug my live implementation using the public instance of the test
>>harness
>> and since I can't get the test harness to install and run locally yet I
>>am
>> rather stuck.
>>
>> I am not ruling out a bug in my implementation but it's hard to know
>>where
>> to look given all my ported versions of the tests pass and the
>>difficulty
>> of quickly running the tests in a usable debugging environment for my
>> implementation.
>
>Well, the implementation report is based entirely on self-reported
>results. It sounds to me like you've done due diligence on ensuring that
>your implementation is in conformance with the spec and the tests as
>written, and works with your client code. At this point, I think I'd
>suggest simply submitting new EARL results indicating all passes and
>marking the tests at issue with [ earl:mode earl:manual ].
>
>thanks,
>.greg
>