Shepherd

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Shepherd

Anne van Kesteren-4
It's my understanding Shepherd is a tool used for generating
cross-specification cross-references. I also learned it's not public.
Neither the database nor the tool itself. This makes it hard to
propose changes and evaluate where certain things go wrong when
writing specifications.

Would it be possible to open this up?

https://github.com/whatwg/xref is what I used to use, but it's mostly
driven by Anolis which is in decline.


--
https://annevankesteren.nl/

Reply | Threaded
Open this post in threaded view
|

Re: Shepherd

Peter Linss
On Aug 19, 2015, at 3:31 AM, Anne van Kesteren <[hidden email]> wrote:

> It's my understanding Shepherd is a tool used for generating
> cross-specification cross-references. I also learned it's not public.
> Neither the database nor the tool itself. This makes it hard to
> propose changes and evaluate where certain things go wrong when
> writing specifications.
>
> Would it be possible to open this up?
>
> https://github.com/whatwg/xref is what I used to use, but it's mostly
> driven by Anolis which is in decline.
Hi Anne,

first I don't know who told you Shepherd isn't public, as it's been open source since day one, the source is available at:
http://hg.csswg.org/dev/shepherd/

Second, Shepherd isn't the cross-spec cross-reference tool, it's a management system for the CSSWG test suite repository, in addition to an issue tracker, it validates test sources, has a comprehensive test search system (base on the test metadata), and manages the GitHub <-> Mercurial synchronization of the repo.

Shepherd does have a specification parser and spec DB that it uses to associate test links with specs, but that has long been factored out in to a separate module that's used by several systems on the csswg.org server including the test harness, the Bikeshed online service, and the draft server.

Bikeshed uses the Shepherd API to fetch the specification DB when it generates the cross-spec cross-references.

The source for the specification module is online here:
http://hg.csswg.org/dev/specification/

In particular the specification parser has been factored out to not have any DB dependencies and is here:
http://hg.csswg.org/dev/specification/file/tip/python/specification/specificationparser.py

And all the rest of the tools and modules from csswg.org are here:
http://hg.csswg.org/dev/

The specification DB is fully available via a HTTP/JSON API at any of the following URLs:
https://api.csswg.org/shepherd/spec/
https://api.csswg.org/bikeshed/api/spec/
https://drafts.csswg.org/api/spec/
https://test.csswg.org/harness/api/spec/

(calling it without any arguments just lists the specs available, by passing args you can get detailed information on all anchors in the specs.)

Some (auto-generated though sparse) documentation on the API is available by just pointing a browser at:
https://api.csswg.org/shepherd/
It at least explains the available arguments and returned data formats.

(that URL uses content negotiation, so requesting JSON will give you a JSON home page for the API surface, see:
http://tools.ietf.org/html/draft-nottingham-json-home-03 )

Note that the DB is updated every day at midnight (Pacific) and every time there's a push to the CSSWG, FXTF, or Houdini draft repositories. THE DB currently contains information on all the CSSWG specs as well as a handful of specs that the CSSWG specs refer to, such as HTML5, SVG, DOM, WebIDL, etc. Adding more specs is trivial, just let me know if there are any others you need.

There's also a Python API client library that simplifies calling the above APIs via the JSON home page which allows server side API changes without breaking clients at:
http://hg.csswg.org/dev/apiclient/
or
https://github.com/plinss/apiclient

Bikeshed uses this API client when fetching the specification DB.


FWIW, there's also an API there which reads W3C's /TR rdf file and produces JSON output at:
https://api.csswg.org/shepherd/tr/
(and the other API endpoints from above)

signature.asc (506 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Shepherd

Tab Atkins Jr.
On Wed, Aug 19, 2015 at 12:09 PM, Linss, Peter <[hidden email]> wrote:
> first I don't know who told you Shepherd isn't public, as it's been open source since day one, the source is available at:
> http://hg.csswg.org/dev/shepherd/

Sorry about that, I said that because I couldn't find it on your GitHub. ^_^

~TJ

Reply | Threaded
Open this post in threaded view
|

Re: Shepherd

Anne van Kesteren-4
In reply to this post by Peter Linss
Peter, thank you for all this detailed information.

On Wed, Aug 19, 2015 at 9:09 PM, Linss, Peter <[hidden email]> wrote:
> (calling it without any arguments just lists the specs available, by passing args you can get detailed information on all anchors in the specs.)

I see.

Now I it's clear that everything is available publicly I guess what
I'm mostly looking for is some way to report bugs. The nice thing
about https://github.com/tobie/specref is that the database is in
GitHub and you can inspect it and report issues against it. Even
though the database is fully automated, that still makes it easier to
inspect things manually.

E.g., it's unclear to me whether Shepherd indexes all WHATWG Standards
or whether some cannot be indexed currently. It's unclear what
shortname they've been assigned and why e.g., "dom" doesn't get HTTPS
URLs.

Having it all somewhere as a dump on GitHub or some such would make
that a bit more visible.


--
https://annevankesteren.nl/

Reply | Threaded
Open this post in threaded view
|

Re: Shepherd

Peter Linss

On Aug 19, 2015, at 11:01 PM, Anne van Kesteren <[hidden email]> wrote:

Peter, thank you for all this detailed information.

On Wed, Aug 19, 2015 at 9:09 PM, Linss, Peter <[hidden email]> wrote:
(calling it without any arguments just lists the specs available, by passing args you can get detailed information on all anchors in the specs.)

I see.

Now I it's clear that everything is available publicly I guess what
I'm mostly looking for is some way to report bugs.

I did set up a bug tracker years ago, but never really published it, FWIW it’s at:
I haven’t even looked at it myself in quite a while… not sure all the issues there are still current or relevant.

These days it’s best to just ping me (or Tab) via email or IRC. Relatively simple things get fixed pretty quickly.

Tab has admin access to the spec db as well for adding new specs or updating URLs and such.

The nice thing
about https://github.com/tobie/specref is that the database is in
GitHub and you can inspect it and report issues against it. Even
though the database is fully automated, that still makes it easier to
inspect things manually.

Well, the spec DB contains all the anchors and a bunch of metadata about each (like what each <dfn> defines), the anchor table in MySQL is currently at 50MiB and gets updated several times per day. I could set up a dump to GitHub, but it’s going to get large… not sure that’ll help sift through it.

If you just want a list of the specs indexed (and their metadata) that would be considerably smaller, but there are a bunch of tools to help visualize the JSON dump from the API...


E.g., it's unclear to me whether Shepherd indexes all WHATWG Standards
or whether some cannot be indexed currently.

The list of specs that it indexes is manually entered, if we’re not indexing all of the WHATWG specs it’s just because we haven’t added them yet. I’m happy to add any that are missing. There shouldn’t be any reason why they can’t be indexed (it also handles multipage specs fine), at a minimum it’ll find all the anchors, section headings and parse the WebIDL. It may or may not identify the types of all the <dfn> anchors, depending on how they’re marked up (it understands Bikeshed markup just fine and has some heuristics for older specs like CSS2.1, SVG, and HTML5).

The one thing it won’t parse is Respec source, so if you have Respec specs, it needs a URL of the Respec output (the parser is written in Python and can’t run js client-side).

It's unclear what
shortname they've been assigned

For W3C specs we use the /TR shortname, for WHATWG we just pick something, happy to change them or have you tell us what you want there.

and why e.g., "dom" doesn't get HTTPS
URLs.

I believe it does now (we just changed it yesterday).


Having it all somewhere as a dump on GitHub or some such would make
that a bit more visible.

We are planning on making a general index page at some point in the not too distant future. Something that will use the anchor DB to create a master index of all the CSS properties, HTML & SVG elements and attributes, and all the IDL constructs, linking back to the specs where they’re all defined. Having that will definitely put any omissions or errors right out front as well.





signature.asc (506 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Shepherd

Anne van Kesteren-4
On Thu, Aug 20, 2015 at 9:06 AM, Linss, Peter <[hidden email]> wrote:
> These days it’s best to just ping me (or Tab) via email or IRC. Relatively
> simple things get fixed pretty quickly.

Thank you, will do.


> Tab has admin access to the spec db as well for adding new specs or updating
> URLs and such.

For WHATWG, we publish https://resources.whatwg.org/biblio.json these
days. Which I just realized is missing an entry, but if you scrap that
and go from there, you'll find all our specifications eventually.


> The one thing it won’t parse is Respec source, so if you have Respec specs,
> it needs a URL of the Respec output (the parser is written in Python and
> can’t run js client-side).

We avoid ReSpec.


> For W3C specs we use the /TR shortname, for WHATWG we just pick something,
> happy to change them or have you tell us what you want there.

All our specifications are published at [shortname].spec.whatwg.org.
That shortname would be ideal.


> We are planning on making a general index page at some point in the not too
> distant future. Something that will use the anchor DB to create a master
> index of all the CSS properties, HTML & SVG elements and attributes, and all
> the IDL constructs, linking back to the specs where they’re all defined.
> Having that will definitely put any omissions or errors right out front as
> well.

Ooh, that sounds nice!


--
https://annevankesteren.nl/

Reply | Threaded
Open this post in threaded view
|

Re: Shepherd

Tab Atkins Jr.
On Thu, Aug 20, 2015 at 12:22 AM, Anne van Kesteren <[hidden email]> wrote:
> On Thu, Aug 20, 2015 at 9:06 AM, Linss, Peter <[hidden email]> wrote:
>> Tab has admin access to the spec db as well for adding new specs or updating
>> URLs and such.
>
> For WHATWG, we publish https://resources.whatwg.org/biblio.json these
> days. Which I just realized is missing an entry, but if you scrap that
> and go from there, you'll find all our specifications eventually.

Note that Bikeshed's data files contain basically all the information
from Shepherd, just in a different format optimized for Bikeshed's
purposes.  In particular, the specs.json file is a reformatted version
of the JSON that Shepherd sends me, and contains all the specs that
Shepherd is tracking.

>> We are planning on making a general index page at some point in the not too
>> distant future. Something that will use the anchor DB to create a master
>> index of all the CSS properties, HTML & SVG elements and attributes, and all
>> the IDL constructs, linking back to the specs where they’re all defined.
>> Having that will definitely put any omissions or errors right out front as
>> well.
>
> Ooh, that sounds nice!

I actually already wrote the code to generate indexes from arbitrary
sets of specs, and we're using it in the 2015 snapshot. I should just
go ahead and publish a bare-bones version of the "all the specs" data.

~TJ