Simplifying metadata

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Simplifying metadata

Geoffrey Sneddon-4
Yo!

Can we revisit all the metadata we have in the tests? The metadata we *need* is what is sufficient to be able to run the tests, probably within CI systems.

I'm going to go by the assumption (which I think has shown itself to be true on numerous occasions) that the more metadata we require in tests the more hoops people have to jump through to release tests, which discourages submitting tests. And this WG has a real problem with getting good, high-quality test suites such that we're able to advance tests beyond CR.

Disclaimer: this is all based on <http://testthewebforward.org/docs/css-metadata.html>; I'm not totally sure this actually reflects the status-quo of the tests.

a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references is utterly useless—I'd rather doing something more descriptive as to what it is. Presto-testo has titles like:

 * Reference rendering - this should be green (green text)
 * Reference rendering - There should be no red below
 * Reference rendering - pass if F in Filler Text is upper-case

This isn't perfect either, but it's more useful than "CSS 2.1 Reference", IMO.

b) We don't need author metadata on any new tests, because that metadata is stored in git/hg. (It's essentially been entirely redundant since we moved away from SVN, as git/hg can store arbitrary authorship data regardless of whether the author has source-tree access.)

c) We haven't actively been adding reviewer metadata for quite a while. I suggest if we *really* want reviewer metadata (which I'm not at all sure we do—a single file may be reviewed by multiple people, especially in the testharness.js case), we do it in the commit description (along the lines of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by the assumption that anything in the repo has been reviewed (at the current time outwith work-in-progress and vendor-imports), and don't bother storing the metadata. It doesn't really matter—when do we need to know who reviewed the test? The current model can be misleading, when the test is changed there's still a "reviewed" link, but that person hasn't necessarily reviewed the edited test.

d) Specification links I'm kinda unconvinced by, but I don't really care enough to argue over. I know Shepard uses it.

e) Requirement flags I feel we should really revisit. We want to have enough flags to be able to run the tests, especially in CI. I'm not so interested in running tests through Shepard's UI, because I simply think it's not valuable—it's almost never done, because it's pointlessly slow. Browsers we should aim at getting them run in CI systems (probably with some way to upload results to Shepard so we can have the CR-exit-criteria views there), and minor UAs also likely can run the tests in a more efficient way (as you want to determine pass/fail by unique screenshot, not looking at a thousand tests all of which say, "This text should be green" identically).

So:

* ahem — we should simply just state "the CSS test suite requires the Ahem font to be available" and get rid of this flag
* animated — this is good because it has a real use (excluding tests from automated testing)
* asis — this makes sense with the current build system
* combo — do we actually care? is anyone doing anything with this? In CI systems you likely want to run all the files, combo and not.
* dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.
* font — we should simply just state "the CSS test suite requires these fonts to be installed" and get rid of this flag
* history — is there any UA that *doesn't* support session history? Yes, in *theory* one could exist, but if we don't know of one, we shouldn't optimise for it. The cost of metadata is too high (yes, even a flag!).
* HTMLonly — why do we have this rather than just using asis and HTML source files?
* http — we should just move to using the same mechanism as web-platform-tests for HTTP headers (rather than .htaccess), and then this can statically be determined by whether test.html.headers exists for a given test.html, leaving less metadata.
* image — like history, is this not just for a hypothetical UA?
* interact — sure.
* invalid — do we need an actual flag for this? I presume we want this for lint tools, in which case we should probably have a better (generic) way to silent bogus lint rules for a given test.
* may — on the whole reasonable
* namespace — is this not just for a hypothetical UA?
* nonHTML — can we not just use asis?
* paged — sure
* scroll — sure
* should — same as may
* speech — um, I guess
* svg — do we still want to treat SVG as something not universal?
* userstyle — sure
* 32bit — is this not just for a hypothetical UA at this point?
* 96dpi — is this not required by CSS 2.1 now, and hence redundant? (96px = 1in per CSS 2.1)

I feel like we shouldn't add metadata for hypothetical UAs that may or may not exist in the future. It adds overhead to contributing to the test suite, we're likely to end up with the flags being missing all over the place. (We end up with flags needed for CI (animated, interact, userstyle, paged, speech) missing all over the place of the ones needed to run tests in existing browsers, when they support all the optional features we have flags for!)

Also, I wonder if we should just merge "animated", "interact", and "userstyle" into one? "userstyle" can probably be justified as CI tools can run those tests in an automated manner by having some metadata within the tool to set the stylesheet (do we want to add a <link rel="user-stylesheet"> so that we can have required user stylesheets in testinfo.data?). "animated" only makes sense to split out from "interact" if anyone is ever going to verify animated content automatically.

For the sake of CI, we essentially have a few categories of tests:

 * reftests
 * visual tests that can be verified by screenshots
 * testharness.js tests
 * any of the three above that need special setup in the CI tool (a non-standard browser window size, a user stylesheet needing set, potentially paged media though I'm not sure if anyone actually runs paged media tests in CI tools?)
 * manual tests that cannot be run in CI

Really that gives us seven types of tests, six of which can run in CI. The first two above (so four of the six) can be distinguished by the presence of link[@rel='match' or @rel='mismatch']. We could distinguish testharness.js tests (two more of the six) by the presence of script[@src='/resources/testharness.js']. This means the only things we actually need to be able to filter out by explicit metadata are "needs special CI setup" and "entirely manual". We probably want enough granularity in the metadata such that people can trivially get a list of tests that need special CI setup based upon what their CI supports (e.g., if it can set a user stylesheet but can't run paged media).

We should discuss what we actually need as a result of this and get rid of the rest.

f) Test assertions… are we doing anything we this, beyond what we use specification links for? If not, we should stop requiring them.

Now, that all done—I think I've forgotten some of what I meant to say, but there's always future emails for that! I hope that sets up some idea of what I want to do!

/gsnedders
Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Peter Linss
The only metadata that Shepherd complains about if missing in tests is: title, at least one spec link, and the author. In reference files it only complains about missing author metadata (and will complain about spec links or assertions if present, because they shouldn’t be there).

(and for the record, Shepherd is the test suite manager tool, the test harness is a separate tool, as is the build system, though they do interact)

More comments inline.

On Oct 27, 2015, at 12:31 AM, Geoffrey Sneddon <[hidden email]> wrote:

Yo!

Can we revisit all the metadata we have in the tests? The metadata we *need* is what is sufficient to be able to run the tests, probably within CI systems.

I'm going to go by the assumption (which I think has shown itself to be true on numerous occasions) that the more metadata we require in tests the more hoops people have to jump through to release tests, which discourages submitting tests. And this WG has a real problem with getting good, high-quality test suites such that we're able to advance tests beyond CR.

Disclaimer: this is all based on <http://testthewebforward.org/docs/css-metadata.html>; I'm not totally sure this actually reflects the status-quo of the tests.

a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references is utterly useless—I'd rather doing something more descriptive as to what it is. Presto-testo has titles like:

 * Reference rendering - this should be green (green text)
 * Reference rendering - There should be no red below
 * Reference rendering - pass if F in Filler Text is upper-case

This isn't perfect either, but it's more useful than "CSS 2.1 Reference", IMO.

b) We don't need author metadata on any new tests, because that metadata is stored in git/hg. (It's essentially been entirely redundant since we moved away from SVN, as git/hg can store arbitrary authorship data regardless of whether the author has source-tree access.)

The author metadata was originally provided because authors didn’t necessarily have access to the repo and sometimes submitted tests via email or other means. So the first committer was often not the author. I’m fine with only requiring author metadata in that case.


c) We haven't actively been adding reviewer metadata for quite a while. I suggest if we *really* want reviewer metadata (which I'm not at all sure we do—a single file may be reviewed by multiple people, especially in the testharness.js case), we do it in the commit description (along the lines of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by the assumption that anything in the repo has been reviewed (at the current time outwith work-in-progress and vendor-imports), and don't bother storing the metadata. It doesn't really matter—when do we need to know who reviewed the test? The current model can be misleading, when the test is changed there's still a "reviewed" link, but that person hasn't necessarily reviewed the edited test.

This was always optional and was only used when reviewing tests outside the tooling (or before the tooling existed). Tests can be reviewed and approved directly in Shepherd (and Shepherd approvals are also triggered by reviewer metadata or simply merging a GitHub pull request by those with approval authority).


d) Specification links I'm kinda unconvinced by, but I don't really care enough to argue over. I know Shepard uses it.

Shepherd uses it for tracking test to spec mapping, but it’s also required by the build system. This is how the build system determines which tests go in which test suites. The test harness also relies on the spec links to generate the spec annotations (visible in the CSSWG drafts). So this one is most definitely necessary (and is fact is the only metadata our tooling really relies on).


e) Requirement flags I feel we should really revisit. We want to have enough flags to be able to run the tests, especially in CI. I'm not so interested in running tests through Shepard's UI, because I simply think it's not valuable—it's almost never done, because it's pointlessly slow. Browsers we should aim at getting them run in CI systems (probably with some way to upload results to Shepard so we can have the CR-exit-criteria views there), and minor UAs also likely can run the tests in a more efficient way (as you want to determine pass/fail by unique screenshot, not looking at a thousand tests all of which say, "This text should be green" identically).

So:

* ahem — we should simply just state "the CSS test suite requires the Ahem font to be available" and get rid of this flag
* animated — this is good because it has a real use (excluding tests from automated testing)
* asis — this makes sense with the current build system
* combo — do we actually care? is anyone doing anything with this? In CI systems you likely want to run all the files, combo and not.

This flag was used in the CSS 2.1 test suite to good effect (for historical process reasons I won’t go in to here). It may be useful in the future to help meet CR exit criteria for other specs.

* dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.
* font — we should simply just state "the CSS test suite requires these fonts to be installed" and get rid of this flag
* history — is there any UA that *doesn't* support session history? Yes, in *theory* one could exist, but if we don't know of one, we shouldn't optimise for it. The cost of metadata is too high (yes, even a flag!).
* HTMLonly — why do we have this rather than just using asis and HTML source files?

This is an indicator to the build system to not generate an XHTML version of the test (as it wouldn’t make sense)

* http — we should just move to using the same mechanism as web-platform-tests for HTTP headers (rather than .htaccess), and then this can statically be determined by whether test.html.headers exists for a given test.html, leaving less metadata.

Sure.

* image — like history, is this not just for a hypothetical UA?
* interact — sure.
* invalid — do we need an actual flag for this? I presume we want this for lint tools, in which case we should probably have a better (generic) way to silent bogus lint rules for a given test.

In addition to the lint tools, the build code might use this (don’t recall if it does at the moment).

* may — on the whole reasonable
* namespace — is this not just for a hypothetical UA?
* nonHTML — can we not just use asis?

Like HTMLonly, it prevents the build system from generating an HTML version of the test

* paged — sure
* scroll — sure
* should — same as may
* speech — um, I guess
* svg — do we still want to treat SVG as something not universal?
* userstyle — sure
* 32bit — is this not just for a hypothetical UA at this point?

Opera cared about this at one point, Presto didn’t use full 32 bit ints for some values.

* 96dpi — is this not required by CSS 2.1 now, and hence redundant? (96px = 1in per CSS 2.1)

I feel like we shouldn't add metadata for hypothetical UAs that may or may not exist in the future. It adds overhead to contributing to the test suite, we're likely to end up with the flags being missing all over the place. (We end up with flags needed for CI (animated, interact, userstyle, paged, speech) missing all over the place of the ones needed to run tests in existing browsers, when they support all the optional features we have flags for!)

Also, I wonder if we should just merge "animated", "interact", and "userstyle" into one? "userstyle" can probably be justified as CI tools can run those tests in an automated manner by having some metadata within the tool to set the stylesheet (do we want to add a <link rel="user-stylesheet"> so that we can have required user stylesheets in testinfo.data?). "animated" only makes sense to split out from "interact" if anyone is ever going to verify animated content automatically.

Animated and interact should remain separate, an animated test could (at least in theory) be tested automatically. One requiring user interaction would potentially require a different system for automatic testing (and some way of scripting the interaction).


For the sake of CI, we essentially have a few categories of tests:

 * reftests
 * visual tests that can be verified by screenshots
 * testharness.js tests
 * any of the three above that need special setup in the CI tool (a non-standard browser window size, a user stylesheet needing set, potentially paged media though I'm not sure if anyone actually runs paged media tests in CI tools?)
 * manual tests that cannot be run in CI

Really that gives us seven types of tests, six of which can run in CI. The first two above (so four of the six) can be distinguished by the presence of link[@rel='match' or @rel='mismatch']. We could distinguish testharness.js tests (two more of the six) by the presence of script[@src='/resources/testharness.js']. This means the only things we actually need to be able to filter out by explicit metadata are "needs special CI setup" and "entirely manual". We probably want enough granularity in the metadata such that people can trivially get a list of tests that need special CI setup based upon what their CI supports (e.g., if it can set a user stylesheet but can't run paged media).

We should discuss what we actually need as a result of this and get rid of the rest.

f) Test assertions… are we doing anything we this, beyond what we use specification links for? If not, we should stop requiring them.

These are for test reviewers to help understand what the test is trying to do. It’s often not obvious and can be helpful.


Now, that all done—I think I've forgotten some of what I meant to say, but there's always future emails for that! I hope that sets up some idea of what I want to do!

/gsnedders


signature.asc (506 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Geoffrey Sneddon-4
[All inline.]

On Tue, Oct 27, 2015 at 5:10 PM, Linss, Peter <[hidden email]> wrote:
The only metadata that Shepherd complains about if missing in tests is: title, at least one spec link, and the author. In reference files it only complains about missing author metadata (and will complain about spec links or assertions if present, because they shouldn’t be there).

(and for the record, Shepherd is the test suite manager tool, the test harness is a separate tool, as is the build system, though they do interact)

More comments inline.

On Oct 27, 2015, at 12:31 AM, Geoffrey Sneddon <[hidden email]> wrote:

Yo!

Can we revisit all the metadata we have in the tests? The metadata we *need* is what is sufficient to be able to run the tests, probably within CI systems.

I'm going to go by the assumption (which I think has shown itself to be true on numerous occasions) that the more metadata we require in tests the more hoops people have to jump through to release tests, which discourages submitting tests. And this WG has a real problem with getting good, high-quality test suites such that we're able to advance tests beyond CR.

Disclaimer: this is all based on <http://testthewebforward.org/docs/css-metadata.html>; I'm not totally sure this actually reflects the status-quo of the tests.

a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references is utterly useless—I'd rather doing something more descriptive as to what it is. Presto-testo has titles like:

 * Reference rendering - this should be green (green text)
 * Reference rendering - There should be no red below
 * Reference rendering - pass if F in Filler Text is upper-case

This isn't perfect either, but it's more useful than "CSS 2.1 Reference", IMO.

b) We don't need author metadata on any new tests, because that metadata is stored in git/hg. (It's essentially been entirely redundant since we moved away from SVN, as git/hg can store arbitrary authorship data regardless of whether the author has source-tree access.)

The author metadata was originally provided because authors didn’t necessarily have access to the repo and sometimes submitted tests via email or other means. So the first committer was often not the author. I’m fine with only requiring author metadata in that case.

Oh, what I said was based on mistaken belief about hg. In git there are separate "Author" and "Committer" fields—and the committer can set the Author field to whatever they want, hence in git there's no reason to ever require author metadata in the file. This further makes me feel we should drop hg! As far as I'm aware the only reason why we have the hg/git mirroring is because much of the Shepard tooling integrates with hg—someone (probably you or me!) should investigate how much work it would be to migrate purely to git. (I heard something about contributors before. Will we actually lose any contributors if we move to git exclusively?)

c) We haven't actively been adding reviewer metadata for quite a while. I suggest if we *really* want reviewer metadata (which I'm not at all sure we do—a single file may be reviewed by multiple people, especially in the testharness.js case), we do it in the commit description (along the lines of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by the assumption that anything in the repo has been reviewed (at the current time outwith work-in-progress and vendor-imports), and don't bother storing the metadata. It doesn't really matter—when do we need to know who reviewed the test? The current model can be misleading, when the test is changed there's still a "reviewed" link, but that person hasn't necessarily reviewed the edited test.

This was always optional and was only used when reviewing tests outside the tooling (or before the tooling existed). Tests can be reviewed and approved directly in Shepherd (and Shepherd approvals are also triggered by reviewer metadata or simply merging a GitHub pull request by those with approval authority).

I was just going by what the documentation says, which says it should be used!
 
d) Specification links I'm kinda unconvinced by, but I don't really care enough to argue over. I know Shepard uses it.

Shepherd uses it for tracking test to spec mapping, but it’s also required by the build system. This is how the build system determines which tests go in which test suites. The test harness also relies on the spec links to generate the spec annotations (visible in the CSSWG drafts). So this one is most definitely necessary (and is fact is the only metadata our tooling really relies on).

As I said, I don't care to argue over this! I'm surprised we determine test suite membership based on it, though—why don't we just base this on source directory?
 
e) Requirement flags I feel we should really revisit. We want to have enough flags to be able to run the tests, especially in CI. I'm not so interested in running tests through Shepard's UI, because I simply think it's not valuable—it's almost never done, because it's pointlessly slow. Browsers we should aim at getting them run in CI systems (probably with some way to upload results to Shepard so we can have the CR-exit-criteria views there), and minor UAs also likely can run the tests in a more efficient way (as you want to determine pass/fail by unique screenshot, not looking at a thousand tests all of which say, "This text should be green" identically).

So:

* ahem — we should simply just state "the CSS test suite requires the Ahem font to be available" and get rid of this flag
* animated — this is good because it has a real use (excluding tests from automated testing)
* asis — this makes sense with the current build system
* combo — do we actually care? is anyone doing anything with this? In CI systems you likely want to run all the files, combo and not.

This flag was used in the CSS 2.1 test suite to good effect (for historical process reasons I won’t go in to here). It may be useful in the future to help meet CR exit criteria for other specs.

I'm not disagreeing with having such patterns—I'm disagreeing with the need for a flag for it!
* dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.
* font — we should simply just state "the CSS test suite requires these fonts to be installed" and get rid of this flag
* history — is there any UA that *doesn't* support session history? Yes, in *theory* one could exist, but if we don't know of one, we shouldn't optimise for it. The cost of metadata is too high (yes, even a flag!).
* HTMLonly — why do we have this rather than just using asis and HTML source files?

This is an indicator to the build system to not generate an XHTML version of the test (as it wouldn’t make sense)

Right, I get that. But isn't this equivalent to having an HTML source file and using the asis flag in it? What's the benefit of it going through the build system?
 
* http — we should just move to using the same mechanism as web-platform-tests for HTTP headers (rather than .htaccess), and then this can statically be determined by whether test.html.headers exists for a given test.html, leaving less metadata.

Sure.

* image — like history, is this not just for a hypothetical UA?
* interact — sure.
* invalid — do we need an actual flag for this? I presume we want this for lint tools, in which case we should probably have a better (generic) way to silent bogus lint rules for a given test.

In addition to the lint tools, the build code might use this (don’t recall if it does at the moment).

The build tools don't.
* may — on the whole reasonable
* namespace — is this not just for a hypothetical UA?
* nonHTML — can we not just use asis?

Like HTMLonly, it prevents the build system from generating an HTML version of the test

See what I said about HTMLonly above.
 
* paged — sure
* scroll — sure
* should — same as may
* speech — um, I guess
* svg — do we still want to treat SVG as something not universal?
* userstyle — sure
* 32bit — is this not just for a hypothetical UA at this point?

Opera cared about this at one point, Presto didn’t use full 32 bit ints for some values.

While I can't actually speak for Opera anymore, I know they never going to run the test suites against Presto.
 
* 96dpi — is this not required by CSS 2.1 now, and hence redundant? (96px = 1in per CSS 2.1)

I feel like we shouldn't add metadata for hypothetical UAs that may or may not exist in the future. It adds overhead to contributing to the test suite, we're likely to end up with the flags being missing all over the place. (We end up with flags needed for CI (animated, interact, userstyle, paged, speech) missing all over the place of the ones needed to run tests in existing browsers, when they support all the optional features we have flags for!)

Also, I wonder if we should just merge "animated", "interact", and "userstyle" into one? "userstyle" can probably be justified as CI tools can run those tests in an automated manner by having some metadata within the tool to set the stylesheet (do we want to add a <link rel="user-stylesheet"> so that we can have required user stylesheets in testinfo.data?). "animated" only makes sense to split out from "interact" if anyone is ever going to verify animated content automatically.

Animated and interact should remain separate, an animated test could (at least in theory) be tested automatically. One requiring user interaction would potentially require a different system for automatic testing (and some way of scripting the interaction).

Animated could *in theory* be tested automatically. As it is, I feel like it's there for some theoretical CI system. I think we should worry about such a CI system when anyone has an interest in creating one. I'm not interested in adding metadata (that will end up out of date if it doesn't matter) for theoretical cases which we have no reason to believe will exist in the near future and will run the tests.
 
For the sake of CI, we essentially have a few categories of tests:

 * reftests
 * visual tests that can be verified by screenshots
 * testharness.js tests
 * any of the three above that need special setup in the CI tool (a non-standard browser window size, a user stylesheet needing set, potentially paged media though I'm not sure if anyone actually runs paged media tests in CI tools?)
 * manual tests that cannot be run in CI

Really that gives us seven types of tests, six of which can run in CI. The first two above (so four of the six) can be distinguished by the presence of link[@rel='match' or @rel='mismatch']. We could distinguish testharness.js tests (two more of the six) by the presence of script[@src='/resources/testharness.js']. This means the only things we actually need to be able to filter out by explicit metadata are "needs special CI setup" and "entirely manual". We probably want enough granularity in the metadata such that people can trivially get a list of tests that need special CI setup based upon what their CI supports (e.g., if it can set a user stylesheet but can't run paged media).

We should discuss what we actually need as a result of this and get rid of the rest.

f) Test assertions… are we doing anything we this, beyond what we use specification links for? If not, we should stop requiring them.

These are for test reviewers to help understand what the test is trying to do. It’s often not obvious and can be helpful.

In my experience I don't find the test assertions help much—a few comments places at the relevant places in the test would be far more helpful to understand what's going on IMO.

/gsnedders
 
Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Peter Linss

On Oct 27, 2015, at 5:50 AM, Geoffrey Sneddon <[hidden email]> wrote:

[All inline.]

On Tue, Oct 27, 2015 at 5:10 PM, Linss, Peter <[hidden email]> wrote:
The only metadata that Shepherd complains about if missing in tests is: title, at least one spec link, and the author. In reference files it only complains about missing author metadata (and will complain about spec links or assertions if present, because they shouldn’t be there).

(and for the record, Shepherd is the test suite manager tool, the test harness is a separate tool, as is the build system, though they do interact)

More comments inline.

On Oct 27, 2015, at 12:31 AM, Geoffrey Sneddon <[hidden email]> wrote:

Yo!

Can we revisit all the metadata we have in the tests? The metadata we *need* is what is sufficient to be able to run the tests, probably within CI systems.

I'm going to go by the assumption (which I think has shown itself to be true on numerous occasions) that the more metadata we require in tests the more hoops people have to jump through to release tests, which discourages submitting tests. And this WG has a real problem with getting good, high-quality test suites such that we're able to advance tests beyond CR.

Disclaimer: this is all based on <http://testthewebforward.org/docs/css-metadata.html>; I'm not totally sure this actually reflects the status-quo of the tests.

a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references is utterly useless—I'd rather doing something more descriptive as to what it is. Presto-testo has titles like:

 * Reference rendering - this should be green (green text)
 * Reference rendering - There should be no red below
 * Reference rendering - pass if F in Filler Text is upper-case

This isn't perfect either, but it's more useful than "CSS 2.1 Reference", IMO.

b) We don't need author metadata on any new tests, because that metadata is stored in git/hg. (It's essentially been entirely redundant since we moved away from SVN, as git/hg can store arbitrary authorship data regardless of whether the author has source-tree access.)

The author metadata was originally provided because authors didn’t necessarily have access to the repo and sometimes submitted tests via email or other means. So the first committer was often not the author. I’m fine with only requiring author metadata in that case.

Oh, what I said was based on mistaken belief about hg. In git there are separate "Author" and "Committer" fields—and the committer can set the Author field to whatever they want, hence in git there's no reason to ever require author metadata in the file. This further makes me feel we should drop hg! As far as I'm aware the only reason why we have the hg/git mirroring is because much of the Shepard tooling integrates with hg—someone (probably you or me!) should investigate how much work it would be to migrate purely to git. (I heard something about contributors before. Will we actually lose any contributors if we move to git exclusively?)

Both because of the Shepherd tooling and because some contributors prefer hg (and several don’t know git). I can’t speak to how many we’d lose by switching to git exclusively, but I’ve been told by more than one that they don’t want to learn yet another source control system. Whether or not that burden would merely be an inconvenience or would drive some of them away I can’t say, but our contributors are so few and so valuable, that I’d rather not lose a single one.


c) We haven't actively been adding reviewer metadata for quite a while. I suggest if we *really* want reviewer metadata (which I'm not at all sure we do—a single file may be reviewed by multiple people, especially in the testharness.js case), we do it in the commit description (along the lines of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by the assumption that anything in the repo has been reviewed (at the current time outwith work-in-progress and vendor-imports), and don't bother storing the metadata. It doesn't really matter—when do we need to know who reviewed the test? The current model can be misleading, when the test is changed there's still a "reviewed" link, but that person hasn't necessarily reviewed the edited test.

This was always optional and was only used when reviewing tests outside the tooling (or before the tooling existed). Tests can be reviewed and approved directly in Shepherd (and Shepherd approvals are also triggered by reviewer metadata or simply merging a GitHub pull request by those with approval authority).

I was just going by what the documentation says, which says it should be used!
 
d) Specification links I'm kinda unconvinced by, but I don't really care enough to argue over. I know Shepard uses it.

Shepherd uses it for tracking test to spec mapping, but it’s also required by the build system. This is how the build system determines which tests go in which test suites. The test harness also relies on the spec links to generate the spec annotations (visible in the CSSWG drafts). So this one is most definitely necessary (and is fact is the only metadata our tooling really relies on).

As I said, I don't care to argue over this! I'm surprised we determine test suite membership based on it, though—why don't we just base this on source directory?

Because many tests are linked to more than one spec so don’t live in a single test suite. (and I wasn’t arguing, just explaining)

 
e) Requirement flags I feel we should really revisit. We want to have enough flags to be able to run the tests, especially in CI. I'm not so interested in running tests through Shepard's UI, because I simply think it's not valuable—it's almost never done, because it's pointlessly slow. Browsers we should aim at getting them run in CI systems (probably with some way to upload results to Shepard so we can have the CR-exit-criteria views there), and minor UAs also likely can run the tests in a more efficient way (as you want to determine pass/fail by unique screenshot, not looking at a thousand tests all of which say, "This text should be green" identically).

So:

* ahem — we should simply just state "the CSS test suite requires the Ahem font to be available" and get rid of this flag
* animated — this is good because it has a real use (excluding tests from automated testing)
* asis — this makes sense with the current build system
* combo — do we actually care? is anyone doing anything with this? In CI systems you likely want to run all the files, combo and not.

This flag was used in the CSS 2.1 test suite to good effect (for historical process reasons I won’t go in to here). It may be useful in the future to help meet CR exit criteria for other specs.

I'm not disagreeing with having such patterns—I'm disagreeing with the need for a flag for it!
* dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.
* font — we should simply just state "the CSS test suite requires these fonts to be installed" and get rid of this flag
* history — is there any UA that *doesn't* support session history? Yes, in *theory* one could exist, but if we don't know of one, we shouldn't optimise for it. The cost of metadata is too high (yes, even a flag!).
* HTMLonly — why do we have this rather than just using asis and HTML source files?

This is an indicator to the build system to not generate an XHTML version of the test (as it wouldn’t make sense)

Right, I get that. But isn't this equivalent to having an HTML source file and using the asis flag in it? What's the benefit of it going through the build system?

All the files go through the build system in one way or another. 

As it’s currently implemented, the HTMLonly/nonHTML flags currently don’t imply asis, they still can get modified by the build system, as in updating paths to reference files, etc, they just avoid the conversion to to the other format. The asis flag prevents any modification of the file (and would be used for files with bad markup or other features that would not survive being parsed and re-serialized).

 
* http — we should just move to using the same mechanism as web-platform-tests for HTTP headers (rather than .htaccess), and then this can statically be determined by whether test.html.headers exists for a given test.html, leaving less metadata.

Sure.

* image — like history, is this not just for a hypothetical UA?
* interact — sure.
* invalid — do we need an actual flag for this? I presume we want this for lint tools, in which case we should probably have a better (generic) way to silent bogus lint rules for a given test.

In addition to the lint tools, the build code might use this (don’t recall if it does at the moment).

The build tools don't.
* may — on the whole reasonable
* namespace — is this not just for a hypothetical UA?
* nonHTML — can we not just use asis?

Like HTMLonly, it prevents the build system from generating an HTML version of the test

See what I said about HTMLonly above.
 
* paged — sure
* scroll — sure
* should — same as may
* speech — um, I guess
* svg — do we still want to treat SVG as something not universal?
* userstyle — sure
* 32bit — is this not just for a hypothetical UA at this point?

Opera cared about this at one point, Presto didn’t use full 32 bit ints for some values.

While I can't actually speak for Opera anymore, I know they never going to run the test suites against Presto.

Sure, just explaining where this came from.

 
* 96dpi — is this not required by CSS 2.1 now, and hence redundant? (96px = 1in per CSS 2.1)

I feel like we shouldn't add metadata for hypothetical UAs that may or may not exist in the future. It adds overhead to contributing to the test suite, we're likely to end up with the flags being missing all over the place. (We end up with flags needed for CI (animated, interact, userstyle, paged, speech) missing all over the place of the ones needed to run tests in existing browsers, when they support all the optional features we have flags for!)

Also, I wonder if we should just merge "animated", "interact", and "userstyle" into one? "userstyle" can probably be justified as CI tools can run those tests in an automated manner by having some metadata within the tool to set the stylesheet (do we want to add a <link rel="user-stylesheet"> so that we can have required user stylesheets in testinfo.data?). "animated" only makes sense to split out from "interact" if anyone is ever going to verify animated content automatically.

Animated and interact should remain separate, an animated test could (at least in theory) be tested automatically. One requiring user interaction would potentially require a different system for automatic testing (and some way of scripting the interaction).

Animated could *in theory* be tested automatically. As it is, I feel like it's there for some theoretical CI system. I think we should worry about such a CI system when anyone has an interest in creating one. I'm not interested in adding metadata (that will end up out of date if it doesn't matter) for theoretical cases which we have no reason to believe will exist in the near future and will run the tests.

But on the other hand, removing existing metadata that may actually be useful in the future is doing work now just to create more potential work later.

I’m all for simplifying the system for new authors, but I don’t see much benefit from dropping flags, there’s not a lot of burden here.

 
For the sake of CI, we essentially have a few categories of tests:

 * reftests
 * visual tests that can be verified by screenshots
 * testharness.js tests
 * any of the three above that need special setup in the CI tool (a non-standard browser window size, a user stylesheet needing set, potentially paged media though I'm not sure if anyone actually runs paged media tests in CI tools?)
 * manual tests that cannot be run in CI

Really that gives us seven types of tests, six of which can run in CI. The first two above (so four of the six) can be distinguished by the presence of link[@rel='match' or @rel='mismatch']. We could distinguish testharness.js tests (two more of the six) by the presence of script[@src='/resources/testharness.js']. This means the only things we actually need to be able to filter out by explicit metadata are "needs special CI setup" and "entirely manual". We probably want enough granularity in the metadata such that people can trivially get a list of tests that need special CI setup based upon what their CI supports (e.g., if it can set a user stylesheet but can't run paged media).

We should discuss what we actually need as a result of this and get rid of the rest.

f) Test assertions… are we doing anything we this, beyond what we use specification links for? If not, we should stop requiring them.

These are for test reviewers to help understand what the test is trying to do. It’s often not obvious and can be helpful.

In my experience I don't find the test assertions help much—a few comments places at the relevant places in the test would be far more helpful to understand what's going on IMO.

Others have had different experiences and have explicitly asked for assertions to help with reviews. I accept that the same job can be done with comments, but having it in an extractable format is useful too.

Again, we’ve never rejected a test for missing this metadata, it’s at the author’s discretion.

signature.asc (506 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Geoffrey Sneddon-4
FWIW, in the meeting this afternoon myself and fantasai chaired, it became clear that what's holding back several of the browser vendors from is the overheads of contributing, and relevant to this thread is that they'd much rather have no metadata in the tests.

On Wed, Oct 28, 2015 at 8:50 AM, Linss, Peter <[hidden email]> wrote:
On Oct 27, 2015, at 5:50 AM, Geoffrey Sneddon <[hidden email]> wrote:

On Tue, Oct 27, 2015 at 5:10 PM, Linss, Peter <[hidden email]> wrote:
On Oct 27, 2015, at 12:31 AM, Geoffrey Sneddon <[hidden email]> wrote:

b) We don't need author metadata on any new tests, because that metadata is stored in git/hg. (It's essentially been entirely redundant since we moved away from SVN, as git/hg can store arbitrary authorship data regardless of whether the author has source-tree access.)

The author metadata was originally provided because authors didn’t necessarily have access to the repo and sometimes submitted tests via email or other means. So the first committer was often not the author. I’m fine with only requiring author metadata in that case.

Oh, what I said was based on mistaken belief about hg. In git there are separate "Author" and "Committer" fields—and the committer can set the Author field to whatever they want, hence in git there's no reason to ever require author metadata in the file. This further makes me feel we should drop hg! As far as I'm aware the only reason why we have the hg/git mirroring is because much of the Shepard tooling integrates with hg—someone (probably you or me!) should investigate how much work it would be to migrate purely to git. (I heard something about contributors before. Will we actually lose any contributors if we move to git exclusively?)

Both because of the Shepherd tooling and because some contributors prefer hg (and several don’t know git). I can’t speak to how many we’d lose by switching to git exclusively, but I’ve been told by more than one that they don’t want to learn yet another source control system. Whether or not that burden would merely be an inconvenience or would drive some of them away I can’t say, but our contributors are so few and so valuable, that I’d rather not lose a single one.

The overwhelming majority of CSS test cases are written by browser vendors: we should do what's needed to make it easier for them to upstream their tests. (This isn't really a response to that, but a comment on contributors in general.)
  
d) Specification links I'm kinda unconvinced by, but I don't really care enough to argue over. I know Shepard uses it.

Shepherd uses it for tracking test to spec mapping, but it’s also required by the build system. This is how the build system determines which tests go in which test suites. The test harness also relies on the spec links to generate the spec annotations (visible in the CSSWG drafts). So this one is most definitely necessary (and is fact is the only metadata our tooling really relies on).

As I said, I don't care to argue over this! I'm surprised we determine test suite membership based on it, though—why don't we just base this on source directory?

Because many tests are linked to more than one spec so don’t live in a single test suite. (and I wasn’t arguing, just explaining)

Having heard what's holding people up from contributing more tests, I want to revisit this. I want to try and get rid of all metadata in the common case. Do you see any way in which we can get rid of this?
 

 
e) Requirement flags I feel we should really revisit. We want to have enough flags to be able to run the tests, especially in CI. I'm not so interested in running tests through Shepard's UI, because I simply think it's not valuable—it's almost never done, because it's pointlessly slow. Browsers we should aim at getting them run in CI systems (probably with some way to upload results to Shepard so we can have the CR-exit-criteria views there), and minor UAs also likely can run the tests in a more efficient way (as you want to determine pass/fail by unique screenshot, not looking at a thousand tests all of which say, "This text should be green" identically).

So:
[…] 
* HTMLonly — why do we have this rather than just using asis and HTML source files?

This is an indicator to the build system to not generate an XHTML version of the test (as it wouldn’t make sense)

Right, I get that. But isn't this equivalent to having an HTML source file and using the asis flag in it? What's the benefit of it going through the build system?

All the files go through the build system in one way or another. 

As it’s currently implemented, the HTMLonly/nonHTML flags currently don’t imply asis, they still can get modified by the build system, as in updating paths to reference files, etc, they just avoid the conversion to to the other format. The asis flag prevents any modification of the file (and would be used for files with bad markup or other features that would not survive being parsed and re-serialized).

Ah, I'd forgotten the build system did so much.
 

f) Test assertions… are we doing anything we this, beyond what we use specification links for? If not, we should stop requiring them.

These are for test reviewers to help understand what the test is trying to do. It’s often not obvious and can be helpful.

In my experience I don't find the test assertions help much—a few comments places at the relevant places in the test would be far more helpful to understand what's going on IMO.

Others have had different experiences and have explicitly asked for assertions to help with reviews. I accept that the same job can be done with comments, but having it in an extractable format is useful too.

So the WebKit guys apparently have a policy of having as little as possible in tests (and to try and have them obvious what they're testing), which means they often don't have assertions/comments. I'm not sure I really like that, but it's something to consider when it comes to getting them to upstream their tests.
 
Again, we’ve never rejected a test for missing this metadata, it’s at the author’s discretion.

That's not the impression I or many of those representing browser vendors had.

/g
Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

fantasai
In reply to this post by Geoffrey Sneddon-4
On 10/27/2015 09:50 PM, Geoffrey Sneddon wrote:
>
>>     f) Test assertions… are we doing anything we this, beyond what we use specification links for? If not, we should stop
>>     requiring them.
>
>     These are for test reviewers to help understand what the test is trying to do. It’s often not obvious and can be helpful.
>
>
> In my experience I don't find the test assertions help much—a few comments places at the relevant places in the test would be
> far more helpful to understand what's going on IMO.

This is supposed to be for an overall comment of what is being tested,
and is really important to me as a reviewer when I'm trying to understand
what the test is trying to do. A lot of testers don't write good assertions,
(but in some cases the same people write perfectly fine HTML comments that
explain what the test is supposed to be testing) so maybe we want to find a
better way to encourage this information.

Basically, any non-trivial function should have documentation describing
what it does (at a higher fidelity than just the function name), and
exactly the same way any non-trivial test should have documentation
describing what it tests (at a higher fidelity than the filename).

~fantasai

Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

fantasai
In reply to this post by Geoffrey Sneddon-4
On 10/27/2015 04:31 PM, Geoffrey Sneddon wrote:
>
> a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references is utterly useless—I'd rather doing something more
> descriptive as to what it is. Presto-testo has titles like:
>
>   * Reference rendering - this should be green (green text)
>   * Reference rendering - There should be no red below
>   * Reference rendering - pass if F in Filler Text is upper-case
>
> This isn't perfect either, but it's more useful than "CSS 2.1 Reference", IMO.

I don't care what <title> is used on references. :)

> b) We don't need author metadata on any new tests, because that metadata is stored in git/hg. (It's essentially been entirely
> redundant since we moved away from SVN, as git/hg can store arbitrary authorship data regardless of whether the author has
> source-tree access.)

We also use it to generate the acknowledgements on the test suite cover page, btw.

I'd like to get rid of this also, but we should have some way of tracking
this information for copyright and acknowledgement purposes.

> c) We haven't actively been adding reviewer metadata for quite a while. I suggest if we *really* want reviewer metadata (which
> I'm not at all sure we do—a single file may be reviewed by multiple people, especially in the testharness.js case), we do it
> in the commit description (along the lines of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by the
> assumption that anything in the repo has been reviewed (at the current time outwith work-in-progress and vendor-imports), and
> don't bother storing the metadata. It doesn't really matter—when do we need to know who reviewed the test? The current model
> can be misleading, when the test is changed there's still a "reviewed" link, but that person hasn't necessarily reviewed the
> edited test.

I'm fine with dropping this as well.

> d) Specification links I'm kinda unconvinced by, but I don't really care enough to argue over. I know Shepard uses it.

These are, as plinss points out, important to keep.

> e) Requirement flags
>
> * ahem — we should simply just state "the CSS test suite requires the Ahem font to be available" and get rid of this flag

Agree

> * animated — this is good because it has a real use (excluding tests from automated testing)

Agree

> * asis — this makes sense with the current build system

Until we drop the build system, yes. :)

> * combo — do we actually care? is anyone doing anything with this? In CI systems you likely want to run all the files, combo
> and not.

I think we should drop this, it's no longer important imho.

> * dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.

We can key off of <script> and such instead if needed.

> * font — we should simply just state "the CSS test suite requires these fonts to be installed" and get rid of this flag

I'm fine with that.

> * history — is there any UA that *doesn't* support session history? Yes, in *theory* one could exist, but if we don't know of
> one, we shouldn't optimise for it. The cost of metadata is too high (yes, even a flag!).

Yes, PDF renderers don't have history. But we can drop the flag,
I don't think this is important to track -- it's only :visited
that cares afaict.

> * http — we should just move to using the same mechanism as web-platform-tests for HTTP headers (rather than .htaccess), and
> then this can statically be determined by whether test.html.headers exists for a given test.html, leaving less metadata.

I defer to plinss on this.

> * image — like history, is this not just for a hypothetical UA?

Yeah, I think we can get rid of this, too.

> * interact — sure.

Definitely keep.

> * invalid — do we need an actual flag for this? I presume we want this for lint tools, in which case we should probably have a
> better (generic) way to silent bogus lint rules for a given test.

It's both for lint tools, but also so that the test suite can be used by validators.
So I think it's worth keeping.

> * may — on the whole reasonable

Yeah, we need this.

> * namespace — is this not just for a hypothetical UA?

Yeah, we can drop this.

> * paged — sure
> * scroll — sure

These are useful, yes.

> * speech — um, I guess

:)

> * svg — do we still want to treat SVG as something not universal?

No, I don't think we need to do that anymore. It's reasonably easy to detect
if you need that information for some reason.

> * userstyle — sure

> Also, I wonder if we should just merge "animated", "interact", and "userstyle" into one?

I don't think so. I don't think it's easier to people writing tests, either.
It's very clear to determine whether something is animated, requires interaction,
or tests a user style sheet. It's harder to say "it's not supported by our
current CI tools".

Also, I think we should loosen the restrictions on filenames to just
"if there's an index number, it must be zero-filled to 3 digits".

~fantasai

Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Boris Zbarsky
In reply to this post by Geoffrey Sneddon-4
On 10/27/15 8:50 AM, Geoffrey Sneddon wrote:
> Oh, what I said was based on mistaken belief about hg. In git there are
> separate "Author" and "Committer" fields—and the committer can set the
> Author field to whatever they want

Similar in hg.  Each changeset has a "user" field that identifies the
author of the changeset.  This value persists as the changeset is pushed
across repositories.  In addition, each push to a repository is
associated with whoever did the push.  So whoever does the commit just
needs to set the "user" appropriately and it should be all fine.

> hence in git there's no reason to
> ever require author metadata in the file.

Nor for hg.

-Boris

Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Gérard Talbot-3
In reply to this post by Geoffrey Sneddon-4
Le 2015-10-27 03:31, Geoffrey Sneddon a écrit :
> Yo!
>
> Can we revisit all the metadata we have in the tests? The metadata we
> *need* is what is sufficient to be able to run the tests, probably
> within
> CI systems.

What about other needs like searching for tests? reviewing tests? test
coverageability?

What is a "CI" system?

Some existing web-aware softwares with good CSS competence do not
support printing or svg or some other features (scrollability, 32bit,
XHTML, transitions, etc..). So those flags can be useful.

> I'm going to go by the assumption (which I think has shown itself to be
> true on numerous occasions) that the more metadata we require in tests
> the
> more hoops people have to jump through to release tests, which
> discourages
> submitting tests.

I'd like to believe this discouragement isn't as bad as you believe...

8 months ago (feb. 15th 2015), I've asked one major browser manufacturer
why they did not submit tests and never got a response.

> And this WG has a real problem with getting good,
> high-quality test suites such that we're able to advance tests beyond
> CR.

I'm all for a discussion on metadata and documentation ... but what
about bad or wrong tests?
I wish
a) incorrect tests,
b) imprecise tests,
c) tests that can not fail (unreliable tests, non-trustworthy tests) and
d) tests that do not check what they claim to be checking
would be removed or rehabilitated or dealt with to start with. I've been
asking for this in the last 4 years (june 28th 2011 and even before
that) and it still has not been dealt with. And I'm not talking about a
few dozen tests here...

> Disclaimer: this is all based on <
> http://testthewebforward.org/docs/css-metadata.html>; I'm not totally
> sure
> this actually reflects the status-quo of the tests.
>
> a) "CSS 2.1 Reference" as a <title> for potentially hundreds of
> references
> is utterly useless—I'd rather doing something more descriptive as to
> what
> it is. Presto-testo has titles like:
>
>  * Reference rendering - this should be green (green text)

Not the shortest IMO but that is another issue.

Isn't that redundant? Test title says: "this should be green (green
text)" and the pass-fail-conditions sentence says the same thing.

Where's the best location to describe the reference rendering? In the
<title> text or in the filename?

>  * Reference rendering - There should be no red below

"below" can safely be dropped IMO.

>  * Reference rendering - pass if F in Filler Text is upper-case

Personally, I think about 30% to 40% of all existing tests could be
re-engineered so that they would be associated with already available,
already created and very frequently reused reference files. When I
create a test, I always try to do this myself. That way,
a) I no longer have to think about creating a reference file,
b) this reduces server load when "doing" a test suite with the test
harness and
c) this reduces the growth of N reference files to be referenced

Examples given:

ref-if-there-is-no-red  : referenced by 290 tests (2 changesets!)
http://test.csswg.org/shepherd/search/reference/name/ref-if-there-is-no-red/
http://test.csswg.org/source/css21/reference/ref-if-there-is-no-red.xht

ref-this-text-should-be-green : referenced by 43 tests
test.csswg.org/shepherd/reference/ref-this-text-should-be-green/
http://test.csswg.org/source/css21/reference/ref-this-text-should-be-green.xht

So, why re-create 2 reference files that already exist?

> This isn't perfect either, but it's more useful than "CSS 2.1
> Reference",
> IMO.
>
> b) We don't need author metadata on any new tests, because that
> metadata is
> stored in git/hg. (It's essentially been entirely redundant since we
> moved
> away from SVN, as git/hg can store arbitrary authorship data regardless
> of
> whether the author has source-tree access.)

There is an owner and an author field in Shepherd...

> c) We haven't actively been adding reviewer metadata for quite a while.
> I
> suggest if we *really* want reviewer metadata (which I'm not at all
> sure we
> do—a single file may be reviewed by multiple people, especially in the
> testharness.js case), we do it in the commit description (along the
> lines
> of Signed-Off-By in the Linux repo). On the whole, I suggest we just go
> by
> the assumption that anything in the repo has been reviewed (at the
> current
> time outwith work-in-progress and vendor-imports),

I strongly disagree. Tests not identified as reviewed should not be
considered reviewed.

You can not presume or postulate that any/all test authors have the same
CSS knowledge, depth of CSS experience. Same thing with test reviewers.

> and don't bother storing
> the metadata. It doesn't really matter—when do we need to know who
> reviewed
> the test?

When a test involving a CSS module has been reviewed by one of its CSS
spec editors, then such test becomes more trustworthy. Reviewing a test
by a competent and independent party is important.

There is such a thing as Quality control, Quality assurance,
de-subjectivizing the evaluation of your own work. It's even a
fundamental principle in justice. I would need more time to explain all
this.

> The current model can be misleading, when the test is changed
> there's still a "reviewed" link, but that person hasn't necessarily
> reviewed the edited test.
>
> d) Specification links I'm kinda unconvinced by, but I don't really
> care
> enough to argue over. I know Shepard uses it.

How well can we know if a test suite coverage is good ... if
specification links are not edited in tests?

> e) Requirement flags I feel we should really revisit. We want to have
> enough flags to be able to run the tests, especially in CI. I'm not so
> interested in running tests through Shepard's UI, because I simply
> think
> it's not valuable—it's almost never done, because it's pointlessly
> slow.
> Browsers we should aim at getting them run in CI systems (probably with
> some way to upload results to Shepard so we can have the
> CR-exit-criteria
> views there), and minor UAs also likely can run the tests in a more
> efficient way (as you want to determine pass/fail by unique screenshot,
> not
> looking at a thousand tests all of which say, "This text should be
> green"
> identically).
>
> So:
>
> * ahem — we should simply just state "the CSS test suite requires the
> Ahem
> font to be available" and get rid of this flag
> * animated — this is good because it has a real use (excluding tests
> from
> automated testing)
> * asis — this makes sense with the current build system
> * combo — do we actually care? is anyone doing anything with this? In
> CI
> systems you likely want to run all the files, combo and not.

Personally, I would drop the combo flag. If I recall correctly, this was
introduced in relation to this CSS2.1 test:
http://test.csswg.org/suites/css2.1/20110323/html4/dynamic-top-change-005.htm

> * dom — sure, I suppose. not very useful for actual browsers, to be
> fair,
> so just extra overhead to release tests.
> * font — we should simply just state "the CSS test suite requires these
> fonts to be installed" and get rid of this flag

I disagree. I am convinced more fonts should be added into the special
fonts package for CSS testing: this is already certainly true for CSS
Writing Modes and probably for tests involving text writing with various
dominant baseline (hanging, central, etc).
Not every web-aware softwares support embedded fonts and I believe we
should not rely on embedded fonts anyway.

> * history — is there any UA that *doesn't* support session history?
> Yes, in
> *theory* one could exist, but if we don't know of one, we shouldn't
> optimise for it. The cost of metadata is too high (yes, even a flag!).
> * HTMLonly — why do we have this rather than just using asis and HTML
> source files?
> * http — we should just move to using the same mechanism as
> web-platform-tests for HTTP headers (rather than .htaccess), and then
> this
> can statically be determined by whether test.html.headers exists for a
> given test.html, leaving less metadata.
> * image — like history, is this not just for a hypothetical UA?
> * interact — sure.
> * invalid — do we need an actual flag for this? I presume we want this
> for
> lint tools, in which case we should probably have a better (generic)
> way to
> silent bogus lint rules for a given test.
> * may — on the whole reasonable
> * namespace — is this not just for a hypothetical UA?
> * nonHTML — can we not just use asis?

Yes. nonHTML can be merged with asis.

> * paged — sure
> * scroll — sure
> * should — same as may
> * speech — um, I guess
> * svg — do we still want to treat SVG as something not universal?
> * userstyle — sure
> * 32bit — is this not just for a hypothetical UA at this point?

Some browsers only support 16bit counter and z-index values (65536);
some others support 32bit counter. So, 32bit flag was introduced for
that reason.

http://test.csswg.org/suites/css2.1/latest/html4/chapter-12.html#s12.4

http://test.csswg.org/suites/css2.1/latest/html4/chapter-9.html#s9.9.1


> * 96dpi — is this not required by CSS 2.1 now, and hence redundant?
> (96px =
> 1in per CSS 2.1)
>
> I feel like we shouldn't add metadata for hypothetical UAs that may or
> may
> not exist in the future. It adds overhead to contributing to the test
> suite, we're likely to end up with the flags being missing all over the
> place. (We end up with flags needed for CI (animated, interact,
> userstyle,
> paged, speech) missing all over the place of the ones needed to run
> tests
> in existing browsers, when they support all the optional features we
> have
> flags for!)
>
> Also, I wonder if we should just merge "animated", "interact", and
> "userstyle" into one? "userstyle" can probably be justified as CI tools
> can
> run those tests in an automated manner by having some metadata within
> the
> tool to set the stylesheet (do we want to add a <link
> rel="user-stylesheet"> so that we can have required user stylesheets in
> testinfo.data?). "animated" only makes sense to split out from
> "interact"
> if anyone is ever going to verify animated content automatically.
>
> For the sake of CI, we essentially have a few categories of tests:
>
>  * reftests
>  * visual tests that can be verified by screenshots
>  * testharness.js tests
>  * any of the three above that need special setup in the CI tool (a
> non-standard browser window size, a user stylesheet needing set,
> potentially paged media though I'm not sure if anyone actually runs
> paged
> media tests in CI tools?)
>  * manual tests that cannot be run in CI
>
> Really that gives us seven types of tests, six of which can run in CI.
> The
> first two above (so four of the six) can be distinguished by the
> presence
> of link[@rel='match' or @rel='mismatch']. We could distinguish
> testharness.js tests (two more of the six) by the presence of
> script[@src='skins/classic/resources/testharness.js']. This means the
> only things we
> actually need to be able to filter out by explicit metadata are "needs
> special CI setup" and "entirely manual". We probably want enough
> granularity in the metadata such that people can trivially get a list
> of
> tests that need special CI setup based upon what their CI supports
> (e.g.,
> if it can set a user stylesheet but can't run paged media).
>
> We should discuss what we actually need as a result of this and get rid
> of
> the rest.
>
> f) Test assertions… are we doing anything we this, beyond what we use
> specification links for? If not, we should stop requiring them.

Test assertions are often difficult for test authors. But it is valuable
for reviewers, test understanding (when you revisit 6 months later one
of your own test which could be a complex test) and test coverage. The
need for test assertions would be reduced if test authors would at least
avoid meaningless, unhelpful, unsemantic id, class, function identifiers
and undifferentiated characters in their tests.

Gérard

>
> Now, that all done—I think I've forgotten some of what I meant to say,
> but
> there's always future emails for that! I hope that sets up some idea of
> what I want to do!
>
> /gsnedders

--
Test Format Guidelines
http://testthewebforward.org/docs/test-format-guidelines.html

Test Style Guidelines
http://testthewebforward.org/docs/test-style-guidelines.html

Test Templates
http://testthewebforward.org/docs/test-templates.html

CSS Naming Guidelines
http://testthewebforward.org/docs/css-naming.html

Test Review Checklist
http://testthewebforward.org/docs/review-checklist.html

CSS Metadata
http://testthewebforward.org/docs/css-metadata.html


Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Geoffrey Sneddon-4


On Thu, Oct 29, 2015 at 6:09 AM, Gérard Talbot <[hidden email]> wrote:
Le 2015-10-27 03:31, Geoffrey Sneddon a écrit :
Yo!

Can we revisit all the metadata we have in the tests? The metadata we
*need* is what is sufficient to be able to run the tests, probably within
CI systems.

What about other needs like searching for tests? reviewing tests? test coverageability?

We had some discussion yesterday as to what the priorities for the test suite should be. Those there roughly all agreed:

1) The priority should be to get browsers *regularly* running the tests (because this disproportionately helps with the goal of interoperably).
2) We should make it easier for browser vendors to submit their tests (they have tens of thousands which haven't been submitted, far more than anyone else has).
3) We should ensure the test suite is able to be used for the purpose of CR exit requirements (note these have changed in the 2015 Process document, and given a larger number of tests it will always require a fair amount of analysis of results).
 
What is a "CI" system?

A continuous integration system—where tests are run on every push of the browser. If we want to ensure browsers implement the spec, and don't regress their implementations, we need to ensure the tests are run regularly, which practically means they're in their CI system.
 
Some existing web-aware softwares with good CSS competence do not support printing or svg or some other features (scrollability, 32bit, XHTML, transitions, etc..). So those flags can be useful.

They *can* be. Metadata almost invariably ends up out of date if nobody is actually making use of it, so there's no point in having metadata nobody is making use of.
 
I'm going to go by the assumption (which I think has shown itself to be
true on numerous occasions) that the more metadata we require in tests the
more hoops people have to jump through to release tests, which discourages
submitting tests.

I'd like to believe this discouragement isn't as bad as you believe...

8 months ago (feb. 15th 2015), I've asked one major browser manufacturer why they did not submit tests and never got a response.

Discussing this at TPAC, *all* the browser vendors said this was *the* reason they didn't submit more tests. It's too much work to add metadata, and it's hard to justify the man-hours to change them to meet the requirements.
 
And this WG has a real problem with getting good,
high-quality test suites such that we're able to advance tests beyond CR.

I'm all for a discussion on metadata and documentation ... but what about bad or wrong tests?
I wish
a) incorrect tests,
b) imprecise tests,
c) tests that can not fail (unreliable tests, non-trustworthy tests) and
d) tests that do not check what they claim to be checking
would be removed or rehabilitated or dealt with to start with. I've been asking for this in the last 4 years (june 28th 2011 and even before that) and it still has not been dealt with. And I'm not talking about a few dozen tests here...

a) and b) are quite easily found if browsers are actually running them and are able to contribute fixes back upstream (which is another problem we've had for years). c) and d) are far harder to deal with, because realistically when mistakes slip through review it's hard to find them. Is it worth designing a system to find them? Probably not. If it adds complexity to the common cases then it's not worth it—having them lying around bogusly passing doesn't hurt much.

Disclaimer: this is all based on <
http://testthewebforward.org/docs/css-metadata.html>; I'm not totally sure
this actually reflects the status-quo of the tests.

a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references
is utterly useless—I'd rather doing something more descriptive as to what
it is. Presto-testo has titles like:

 * Reference rendering - this should be green (green text)

Not the shortest IMO but that is another issue.

Isn't that redundant? Test title says: "this should be green (green text)" and the pass-fail-conditions sentence says the same thing.

Where's the best location to describe the reference rendering? In the <title> text or in the filename?

Really the filename is far more important, IMO. I agree with fantasai that titles of references really aren't very important.
 
 * Reference rendering - There should be no red below

"below" can safely be dropped IMO.

 * Reference rendering - pass if F in Filler Text is upper-case

Personally, I think about 30% to 40% of all existing tests could be re-engineered so that they would be associated with already available, already created and very frequently reused reference files. When I create a test, I always try to do this myself. That way,
a) I no longer have to think about creating a reference file,
b) this reduces server load when "doing" a test suite with the test harness and
c) this reduces the growth of N reference files to be referenced

Examples given:

ref-if-there-is-no-red  : referenced by 290 tests (2 changesets!)
http://test.csswg.org/shepherd/search/reference/name/ref-if-there-is-no-red/
http://test.csswg.org/source/css21/reference/ref-if-there-is-no-red.xht

ref-this-text-should-be-green : referenced by 43 tests
test.csswg.org/shepherd/reference/ref-this-text-should-be-green/
http://test.csswg.org/source/css21/reference/ref-this-text-should-be-green.xht

So, why re-create 2 reference files that already exist?

Because if there's 300 tests with "there should be no red below" and 200 with "there should be no red", it can easily be quite hard to remove the word "below", because removing the word can cause later content to reflow and the test to then fail. In the first instance, we should just try to automate all tests *as is*, rather than modifying them, as this is less work. Longer term we can try and reduce the number of references—but having more references isn't really that much of a problem. (Server load isn't really an issue, because the number of requests on test.csswg.org isn't that great—browsers run local copies, and normally only load each render *once* for the whole testsuite.)
 
This isn't perfect either, but it's more useful than "CSS 2.1 Reference",
IMO.

b) We don't need author metadata on any new tests, because that metadata is
stored in git/hg. (It's essentially been entirely redundant since we moved
away from SVN, as git/hg can store arbitrary authorship data regardless of
whether the author has source-tree access.)

There is an owner and an author field in Shepherd...

This just comes from the <link>, as far as I know. It wouldn't be hard to modify Shepard to also get it from hg.
 
c) We haven't actively been adding reviewer metadata for quite a while. I
suggest if we *really* want reviewer metadata (which I'm not at all sure we
do—a single file may be reviewed by multiple people, especially in the
testharness.js case), we do it in the commit description (along the lines
of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by
the assumption that anything in the repo has been reviewed (at the current
time outwith work-in-progress and vendor-imports),

I strongly disagree. Tests not identified as reviewed should not be considered reviewed.

You can not presume or postulate that any/all test authors have the same CSS knowledge, depth of CSS experience. Same thing with test reviewers.

While true, the policies used in web-platform-tests have, as far as one can tell, worked every bit as well as the CSS policies—by default, anyone can review tests (and it's just done on the trust that people will only review tests they are competent to review), and those with write access are meant to quickly check there's nothing crazy before reviewing. We've not had *any* problems with this policy.

As pointed out, we don't currently require any metadata saying a test has been reviewed. What is the benefit of requiring the metadata? I don't see what the benefit of having something saying it is compared with just relying on everything in the repository being reviewed.
 
and don't bother storing
the metadata. It doesn't really matter—when do we need to know who reviewed
the test?

When a test involving a CSS module has been reviewed by one of its CSS spec editors, then such test becomes more trustworthy. Reviewing a test by a competent and independent party is important.

But they *don't* review tests, and we aren't going to get them to (because we haven't managed to for years). If we want to require spec editors to review tests for their specs, we simply won't have a test suite, and that's useless. It'd be better to have a larger test suite, because at the end of the day we can cope with a few bad tests.
 
There is such a thing as Quality control, Quality assurance, de-subjectivizing the evaluation of your own work. It's even a fundamental principle in justice. I would need more time to explain all this.

The experience in web-platform-tests (and this echos my experiences around several browser vendors too) is that the real way you get good feedback on a test is someone looking into why it fails: reviews (even from good reviewers!) frequently miss things.
 
The current model can be misleading, when the test is changed
there's still a "reviewed" link, but that person hasn't necessarily
reviewed the edited test.

d) Specification links I'm kinda unconvinced by, but I don't really care
enough to argue over. I know Shepard uses it.

How well can we know if a test suite coverage is good ... if specification links are not edited in tests?

The other thread is discussing this; I'll leave that to the other thread.
 
e) Requirement flags I feel we should really revisit. We want to have
enough flags to be able to run the tests, especially in CI. I'm not so
interested in running tests through Shepard's UI, because I simply think
it's not valuable—it's almost never done, because it's pointlessly slow.
Browsers we should aim at getting them run in CI systems (probably with
some way to upload results to Shepard so we can have the CR-exit-criteria
views there), and minor UAs also likely can run the tests in a more
efficient way (as you want to determine pass/fail by unique screenshot, not
looking at a thousand tests all of which say, "This text should be green"
identically).

So:

* ahem — we should simply just state "the CSS test suite requires the Ahem
font to be available" and get rid of this flag
* animated — this is good because it has a real use (excluding tests from
automated testing)
* asis — this makes sense with the current build system
* combo — do we actually care? is anyone doing anything with this? In CI
systems you likely want to run all the files, combo and not.

Personally, I would drop the combo flag. If I recall correctly, this was introduced in relation to this CSS2.1 test:
http://test.csswg.org/suites/css2.1/20110323/html4/dynamic-top-change-005.htm

* dom — sure, I suppose. not very useful for actual browsers, to be fair,
so just extra overhead to release tests.
* font — we should simply just state "the CSS test suite requires these
fonts to be installed" and get rid of this flag

I disagree. I am convinced more fonts should be added into the special fonts package for CSS testing: this is already certainly true for CSS Writing Modes and probably for tests involving text writing with various dominant baseline (hanging, central, etc).
Not every web-aware softwares support embedded fonts and I believe we should not rely on embedded fonts anyway.

You're agreeing with me—that's what I meant. (That we just have a package of fonts and we require it be installed for *all* tests instead of *some* tests.)
 
* history — is there any UA that *doesn't* support session history? Yes, in
*theory* one could exist, but if we don't know of one, we shouldn't
optimise for it. The cost of metadata is too high (yes, even a flag!).
* HTMLonly — why do we have this rather than just using asis and HTML
source files?
* http — we should just move to using the same mechanism as
web-platform-tests for HTTP headers (rather than .htaccess), and then this
can statically be determined by whether test.html.headers exists for a
given test.html, leaving less metadata.
* image — like history, is this not just for a hypothetical UA?
* interact — sure.
* invalid — do we need an actual flag for this? I presume we want this for
lint tools, in which case we should probably have a better (generic) way to
silent bogus lint rules for a given test.
* may — on the whole reasonable
* namespace — is this not just for a hypothetical UA?
* nonHTML — can we not just use asis?

Yes. nonHTML can be merged with asis.

* paged — sure
* scroll — sure
* should — same as may
* speech — um, I guess
* svg — do we still want to treat SVG as something not universal?
* userstyle — sure
* 32bit — is this not just for a hypothetical UA at this point?

Some browsers only support 16bit counter and z-index values (65536); some others support 32bit counter. So, 32bit flag was introduced for that reason.

http://test.csswg.org/suites/css2.1/latest/html4/chapter-12.html#s12.4

http://test.csswg.org/suites/css2.1/latest/html4/chapter-9.html#s9.9.1


I'm not aware of any actively developed browser that uses anything but 32-bit values (and the web, for better or for worse, relies on this behaviour, such that it seems likely that any future web browser will follow this).
 
* 96dpi — is this not required by CSS 2.1 now, and hence redundant? (96px =
1in per CSS 2.1)

I feel like we shouldn't add metadata for hypothetical UAs that may or may
not exist in the future. It adds overhead to contributing to the test
suite, we're likely to end up with the flags being missing all over the
place. (We end up with flags needed for CI (animated, interact, userstyle,
paged, speech) missing all over the place of the ones needed to run tests
in existing browsers, when they support all the optional features we have
flags for!)

Also, I wonder if we should just merge "animated", "interact", and
"userstyle" into one? "userstyle" can probably be justified as CI tools can
run those tests in an automated manner by having some metadata within the
tool to set the stylesheet (do we want to add a <link
rel="user-stylesheet"> so that we can have required user stylesheets in
testinfo.data?). "animated" only makes sense to split out from "interact"
if anyone is ever going to verify animated content automatically.

For the sake of CI, we essentially have a few categories of tests:

 * reftests
 * visual tests that can be verified by screenshots
 * testharness.js tests
 * any of the three above that need special setup in the CI tool (a
non-standard browser window size, a user stylesheet needing set,
potentially paged media though I'm not sure if anyone actually runs paged
media tests in CI tools?)
 * manual tests that cannot be run in CI

Really that gives us seven types of tests, six of which can run in CI. The
first two above (so four of the six) can be distinguished by the presence
of link[@rel='match' or @rel='mismatch']. We could distinguish
testharness.js tests (two more of the six) by the presence of
script[@src='skins/classic/resources/testharness.js']. This means the only things we
actually need to be able to filter out by explicit metadata are "needs
special CI setup" and "entirely manual". We probably want enough
granularity in the metadata such that people can trivially get a list of
tests that need special CI setup based upon what their CI supports (e.g.,
if it can set a user stylesheet but can't run paged media).

We should discuss what we actually need as a result of this and get rid of
the rest.

f) Test assertions… are we doing anything we this, beyond what we use
specification links for? If not, we should stop requiring them.

Test assertions are often difficult for test authors. But it is valuable for reviewers, test understanding (when you revisit 6 months later one of your own test which could be a complex test) and test coverage. The need for test assertions would be reduced if test authors would at least avoid meaningless, unhelpful, unsemantic id, class, function identifiers and undifferentiated characters in their tests.

Requiring actual assertions will practically guarantee we don't get tests from browser vendors. Yes, I think nobody is denying that in an ideal world in an ideal test suite we'd have clearer, better commented tests. To give a somewhat known quote: "Give them the third best to go on with; the second best comes too late, the best never comes". We're better off working towards what we can actually achieve, rather than a hypothetical better test suite that may well never happen.

/Geoffrey
Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Florian Rivoal-4
In reply to this post by fantasai

> On 28 Oct 2015, at 17:27, fantasai <[hidden email]> wrote:
>
>> * dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.
>
> We can key off of <script> and such instead if needed.

I've written tests that use <script> but intentionally don't use the "dom" flag. These were manual tests where the script reduced the amount of manual work required to execute the test, but was optional and the test was still valid if scripts were not supported.

Then again, this is sufficiently corner case that I wouldn't object to losing this bit of expressivity, even if I would miss it.

 - Florian


Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Geoffrey Sneddon-4


On Thu, Oct 29, 2015 at 4:38 PM, Florian Rivoal <[hidden email]> wrote:

> On 28 Oct 2015, at 17:27, fantasai <[hidden email]> wrote:
>
>> * dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.
>
> We can key off of <script> and such instead if needed.

I've written tests that use <script> but intentionally don't use the "dom" flag. These were manual tests where the script reduced the amount of manual work required to execute the test, but was optional and the test was still valid if scripts were not supported.

Then again, this is sufficiently corner case that I wouldn't object to losing this bit of expressivity, even if I would miss it.
 
As soon as a test is manual I don't think we really need much more metadata. It's just not worth it, because the runner can always read it. If we want to distinguish optional scripts, we should just add some flag for tests with scripts that don't *require* scripting—optimise for the common case, make explicit the rare case.

/gsnedders
Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Gérard Talbot-3
In reply to this post by Geoffrey Sneddon-4
Le 2015-10-29 02:05, Geoffrey Sneddon a écrit :
> On Thu, Oct 29, 2015 at 6:09 AM, Gérard Talbot
> <[hidden email]>
> wrote:
>
>> Le 2015-10-27 03:31, Geoffrey Sneddon a écrit :

[snipped]

>> I'm all for a discussion on metadata and documentation ... but what
>> about
>> bad or wrong tests?
>> I wish
>> a) incorrect tests,
>> b) imprecise tests,
>> c) tests that can not fail (unreliable tests, non-trustworthy tests)
>> and
>> d) tests that do not check what they claim to be checking
>> would be removed or rehabilitated or dealt with to start with. I've
>> been
>> asking for this in the last 4 years (june 28th 2011 and even before
>> that)
>> and it still has not been dealt with. And I'm not talking about a few
>> dozen
>> tests here...
>
>
> a) and b) are quite easily found if browsers are actually running them

quite easily found? I have doubts...

> and
> are able to contribute fixes back upstream (which is another problem
> we've
> had for years).

[snipped]

For CSS2.1 tests:

a) 65 CSS2.1 tests with Whiteboard NeedsWork=Incorrect
http://test.csswg.org/shepherd/search/testcase/spec/css21/status/issue/whiteboard/Incorrect/

b) 74 CSS2.1 tests with Whiteboard NeedsWork=Precision
http://test.csswg.org/shepherd/search/testcase/spec/css21/status/issue/whiteboard/Precision/


>> Personally, I think about 30% to 40% of all existing tests could be
>> re-engineered so that they would be associated with already available,
>> already created and very frequently reused reference files. When I
>> create a
>> test, I always try to do this myself. That way,
>> a) I no longer have to think about creating a reference file,
>> b) this reduces server load when "doing" a test suite with the test
>> harness and
>> c) this reduces the growth of N reference files to be referenced
>>
>> Examples given:
>>
>> ref-if-there-is-no-red  : referenced by 290 tests (2 changesets!)
>>
>> http://test.csswg.org/shepherd/search/reference/name/ref-if-there-is-no-red/
>> http://test.csswg.org/source/css21/reference/ref-if-there-is-no-red.xht
>>
>> ref-this-text-should-be-green : referenced by 43 tests
>> test.csswg.org/shepherd/reference/ref-this-text-should-be-green/
>>
>> http://test.csswg.org/source/css21/reference/ref-this-text-should-be-green.xht
>>
>> So, why re-create 2 reference files that already exist?
>
>
> Because if there's 300 tests with "there should be no red below" and
> 200
> with "there should be no red", it can easily be quite hard to remove
> the
> word "below", because removing the word can cause later content to
> reflow
> and the test to then fail.

[snipped]

Geoffrey, I tried to understand what you're saying and just could not.
[Addendum: after more thinking, now I remember 1 test where what you
described could *maybe* happen.]

Eg.

37 tests in
http://test.csswg.org/source/css-conditional-3/
are using, associating with the reference file
http://test.csswg.org/source/css-conditional-3/at-supports-001-ref.html
when it would be *_very easy_* to adapt those 37 tests to use, to link
to
http://test.csswg.org/source/css21/reference/ref-filled-green-100px-square.xht

I probably could do this *_in less than_* 10 min. thanks to advanced
search and replace.

The thing is: the current (and past) documentation are not encouraging
test authors to reuse already created and available reference files.

Gérard
--
Test Format Guidelines
http://testthewebforward.org/docs/test-format-guidelines.html

Test Style Guidelines
http://testthewebforward.org/docs/test-style-guidelines.html

Test Templates
http://testthewebforward.org/docs/test-templates.html

CSS Naming Guidelines
http://testthewebforward.org/docs/css-naming.html

Test Review Checklist
http://testthewebforward.org/docs/review-checklist.html

CSS Metadata
http://testthewebforward.org/docs/css-metadata.html


Reply | Threaded
Open this post in threaded view
|

Re: Simplifying metadata

Ms2ger
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Gérard,

On 10/29/2015 08:02 PM, Gérard Talbot wrote:

> Le 2015-10-29 02:05, Geoffrey Sneddon a écrit :
>> Because if there's 300 tests with "there should be no red below"
>> and 200 with "there should be no red", it can easily be quite
>> hard to remove the word "below", because removing the word can
>> cause later content to reflow and the test to then fail.
>
> [snipped]
>
> Geoffrey, I tried to understand what you're saying and just could
> not. [Addendum: after more thinking, now I remember 1 test where
> what you described could *maybe* happen.]
>
> Eg.
>
> 37 tests in http://test.csswg.org/source/css-conditional-3/ are
> using, associating with the reference file
> http://test.csswg.org/source/css-conditional-3/at-supports-001-ref.html
>
>
when it would be *_very easy_* to adapt those 37 tests to use, to
> link to
> http://test.csswg.org/source/css21/reference/ref-filled-green-100px-square.xht
>
>
>
I probably could do this *_in less than_* 10 min. thanks to
> advanced search and replace.

Spending on the order of ten minutes on a test might not sound like a
lot, but in the context of the css21 test suite, it can translate to
up to €100,000 if we have to pay someone to do it. Unless you want to
put that kind of money on the table, I would prefer we focused on
things that could have a meaningful impact on interoperability, such
as writing references for tests that are currently manual, even if
that means that we have to render a hundred more references.

HTH
Ms2ger
-----BEGIN PGP SIGNATURE-----

iQEcBAEBAgAGBQJWM3a2AAoJEOXgvIL+s8n2RnkH/1E8JibUWKQXLI4fACmI/8Oa
cYvkn+XlPWQEKOEHhviUIoLM4Ly+ZIUtX54XVUw3RSIULfQy/GodEYMrSOSNp4iX
6OOzOmKMjRcy6sWeFznde/DG/yb23eBpOPbFD5ecL5g9rauMjjpmVPv+oMG35gNg
9NHPuYfMPsn55cp2uUbufXhk9ag2tApXr3IhgtQQ+BkKJVteIj88cm2UgtPAD+Qc
/9Wh9DSw2it/LwR0DF2ycK5F73q6J7V+HLShg6+9o15jq2K7EdAgzwU7WDjWQdTC
Jpebt749/8Q1YDg0qaELRPLePgw43LsGCB4ibQ8HQ6tFUr/GS4aetQ9Zu6XkPPc=
=LxBa
-----END PGP SIGNATURE-----