Hello Andrew, Carlos, everyone,
I am working towards a test case submission of about 4000 test cases.
This mail is not the submission, but discusses various issues with this.
First of all, some background of the test cases.
I have written these tests from scratch(I am the only copyrighter) during the
development of an open source(GNU LGPL) XQuery/XSL-T implementation for KDE.
Development has been(and is) test driven and I think this has positively
affected the creation of this test suite, especially when contrasted to how
the XQTS has been created. Although the approach sometimes has been similar
to how I suspect the XQTS team has worked, a systematic reading of the
specifications and creation of tests corresponding to the assertions, many of
the tests has been created as responses to bugs, either discovered
sporadically or that I've theoretically concluded their existence judged from
how KDE's implementation is implemented. That is, tests focusing on what an
implementation has trouble with.
I find it difficult to compare KDE's test suite to the XQTS(because I don't
know the exact coverage of the XQTS) but I know that I've found 10-15 bugs
in Saxon even though it has more or less passed the XQTS(although I have not
systematically run the tests with Saxon), and that I've sporadically seen
some areas where I've had tests while XQTS doesn't. For example, testing that
certain operators doesn't exist or that invalid lexical representations of
date/time types are trapped. The majority of the tests are XPath-only, very
few nodetest, nametest, and path expressions, and large focus on casting and
My tests are not in the XQTS format, but a 1.0 XSL-T stylesheet with EXSL-T
extension can convert into it, in a fully automated way. An "export" to the
XQTS format is not publically available(although it's very easy to do) right
now, but all of the related files are here:
or via web interface at:
On a linux system with xsltproc installed and make(or on Windows with those
tools available), running `make xqts` should create a fully functional XQTS
suite. `make xqts-package` zips it up.
The reason I not yet work on nor publish it in the XQTS format is that as soon
one does that, one potentially must revert to manually editing(of 4000
tests!), instead of intervening the conversion process. Therefore, this mail
discusses issues(below) such that we all save work and achieve the best
On to the details. It might be appropriate that some points are discussed in
the Bugzilla database. If so, state that, and I'll open and we can continue
* The testing of KDE's implementation is essential for it. If I loose a
test, it is potentially a regression. How do you decide whether a test is
accepted or not? For example, is it possible that a test is not accepted even
though it does not duplicate a test in XQTS and that it is valid? I (almost
must) have a very clear view on how this works, otherwise contributing to the
XQTS can get very costly for KDE's implementation. It can simply be a
question of close cooperation. If you decide to discard tests that are
incompatible with me, it can just be a question of that I am informed of what
tests that are discarded.
* This is how the conversion works: no test cases uses input files and the
query is merged with the catalog. Example:
description="Test function fn:true().">true()</test-case>
Multiple tests are then put in files, where each file corresponds to a XQTS
test-group. An XQTS Catalog file contains XInclude statements inside
test groups(XQTSCatalogSubmission.in.xml), and an identity transform produces
query files, expected outputs, and the "final" catalog
Not all of the tests are converted. Tests concerning functions(about 1000)
have not been split into groups and are therefore not included from the
catalog. In other words, about 3000 tests are ready to go, modulo these
* The produced test suite is based on XQTSCatalog.xsd version 0.8.6, including
the changes in bug #3090.
* New test-groups were added: OptionDeclarationProlog, CopyNamespacesProlog,
ValCompTypeChecking, AnyURI, StringComp. These have been marked in the
catalog with an XML comment saying "NOTE: This is a new group ..."
* Currently there are tests for how an basic implementation should treat
schema imports and the like(not many, 10 or so). How should those be
organized? Currently they are in the group "Optional Features", but
implementations being schema aware will fail them.
* test-case/@is-XPath20 is currently incorrectly set, and I have no automated
way of doing it. I've gotten the impression that you have scripts to
automatically do this? If so, you can simply fix that when I "officially" do
* I think the submission guidelines are followed
* http://www.w3.org/XML/Query/test-suite/Guidelines for Test Submission.html
reads: "Variable names, function names, etc., should not contain any
copyrighted information or any company name or any other text identifying a
company." However, XQTS' catalog contains test-group/@featureOwner listing
organizations such as NIST, Oracle, Micorsoft, and so forth. I find these two
points contradictory. As I see it, when KDE's test being merged into XQTS,
they will be listed under a certain company as the feature owner. I could use
a clarification on this area. Perhaps test-group/@featureOwner should be
* Some of the tests are generated from the table in "XQuery 1.0 and XPath 2.0
Functions and Operators, 17.1 Casting from primitive types to primitive
types", with fromCastingTable.xsl into casting-generated.xml(see the comments
in the XSL-T file). If some of those tests duplicate XQTS or in some other
way are inappropriate, the generation should be intervened.
* The majority of the tests do not have descriptions. Those who don't, get a
generic one generated: "A test whose essence is: `1 to 1 eq 1`", for example.
The generation knows what test-group a test belong to, so it would be
possible to generate a description which is based on the test-group. In
general, I find the descriptions in XQTS very generic, often the same for all
tests in one group, so I see the generated descriptions as being on the same
level of quality. If one wants to manually add descriptions one should
consider to edit the custom format directly, since it's very simple and one
has the query right next to the 'description' attribute.
* None of the tests have spec-citations, a dummy is added in order to conform
to XQTSCatalog.xsd. Again, one could add spec-citations based on the
test-group, for the sake of it. Is there another approach? Is such broad
spec-citations better than none?
* I have run the tests when in the XQTS format against my implementation, and
all tests pass(error codes tested too). I think the test driver is reporting
correct results, I have regression tested it against an "XQTS driver test
However, statistically, I think the suite nevertheless contains errors. I may
have forgotten cases where different outcomes are valid, and tests which
simply are wrong(and my implementation is wrong as well, since it passes), or
that I have forgotten to update to specification changes(although I
think I've been relatively thorough on that). I would personally not include
the tests in XQTS before having run them with at least one other
* The "XQTS driver test suite" is available in the XQTS format here:
It tests that the driver really mark cases as failed when they should, and so
forth. I'll gladly share it in anyway, if of interest. Clarify the license,
submit it, accept improvements/comments, etc.
* The tests can be said to be aligned to the November drafts(the candidate
releases), and in some cases aligned with the resolution of reports since
then(for example, string/anyURI promotion).
* There is a risk of that some tests duplicate tests in XQTS. It is a very
large job(and error prone in several senses) to check this manually. Perhaps
one could write a tool which opens all queries, removes the initial comment
and then compares the tests for finding duplicates. Creating such a tool
would hopefully be useful with other submissions as well.
* If it is of interest, it's possible to work on this in KDE's SVN repository.
Getting an SVN account is a non issue.
That's it. Comments and suggestions will be interesting to read.
This is a personal reply, not expressing the opinion of any group.
However, my opinion is open-sourced and others are welcome to share
this opinion. :)
Regarding this point:
>There is a risk of that some tests duplicate tests in XQTS. It is a
>very large job(and error prone in several senses) to check this
>manually. Perhaps one could write a tool which opens all queries,
>removes the initial comment and then compares the tests for finding
>duplicates. Creating such a tool would hopefully be useful with other
>submissions as well.
Duplicate tests are not really a risk. By ignoring the possibility of
duplicates, the task force reduces the risk that some edge case will
go untested. There is no requirement to count individual cases as
single points, equal in weight or otherwise, in some sort of scoring
system. If a processor-under-test fails some test cases, you care about
which ones but not how many. If a processor-under-test passes all test
cases, you still can't declare it "100% conformant" because other tests
that didn't get written might have uncovered non-conformant results.
Duplicate test cases might waste someone's time, but they don't cause
a conformance-assessment problem. The time and effort of test case
writers is a scarce resource; right now it's better to have them
thinking about new cases rather than trying to identify duplicates.
On Monday 17 April 2006 15:37, [hidden email] wrote:
> This is a personal reply, not expressing the opinion of any group.
> However, my opinion is open-sourced and others are welcome to share
> this opinion. :)
> Regarding this point:
> >There is a risk of that some tests duplicate tests in XQTS. It is a
> >very large job(and error prone in several senses) to check this
> >manually. Perhaps one could write a tool which opens all queries,
> >removes the initial comment and then compares the tests for finding
> >duplicates. Creating such a tool would hopefully be useful with other
> >submissions as well.
> Duplicate tests are not really a risk. By ignoring the possibility of
> duplicates, the task force reduces the risk that some edge case will
> go untested. There is no requirement to count individual cases as
> single points, equal in weight or otherwise, in some sort of scoring
> system. If a processor-under-test fails some test cases, you care about
> which ones but not how many. If a processor-under-test passes all test
> cases, you still can't declare it "100% conformant" because other tests
> that didn't get written might have uncovered non-conformant results.
> Duplicate test cases might waste someone's time, but they don't cause
> a conformance-assessment problem. The time and effort of test case
> writers is a scarce resource; right now it's better to have them
> thinking about new cases rather than trying to identify duplicates.
Sensible words(but hey, I'm just jumping on the open source wagon here).
I also think not trying to sort out duplicates is the way to go, for several
reasons. My only concern is if it was manually attempted to sort out
duplicates, because it could potentially introduce regressions. However, I am
positive towards an automated comparison on the byte level, since I don't see
how that could go wrong.
And yes, a little modesty cannot hurt. I don't represent or speak for KDE, I
am only one of the developers.
|Free forum by Nabble||Edit this page|