Thinking about cross references and ReSpec

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Thinking about cross references and ReSpec

Shane McCarron
There was an earlier thread where one side-discussion was about cross references.  Basically, Bikeshed and Shepherd have a system for doing cross-document references to things like terms.  I like the concept of this, but of course Bikeshed and Shepherd are document processors, not client-side magic like ReSpec, so....

I have this *idea*.  Or maybe it is an concept that could be turned into an idea if I expand upon it.  Something like this:
  • Adopt the Bikeshed syntax for definitions.  This is fairly rich, and is well documented at [1].  
  • Define a 'protocol' that ReSpec can use to query / update a definition service with the data from a and dfn elements, respectively.
  • Provide a reference implementation of a service that supports the protocol.
  • Add code to the def / a module of ReSpec that uses the protocol to communicate with a document-defined end point.
So in theory a family of documents could coordinate their term definitions by all pointing to the same service endpoint.  In the PFWG we have a number of documents where this would be a huge help.  I imagine if such a service were available, a lot of other groups that rely upon ReSpec would find it much easier to reference definitions where they live rather than importing them.  

There are some obvious flaws in a design like this (overhead, speed, fragility, exposure to DoS).  But I don't feel the risks are much worse than the current specref use.  We would need some sort of registry in the reference implementation to help control where updates can come in from, etc.

I know that some of us (e.g., me) work off line.  In that mode, all of these sorts of things fall down.  But as long as they fall down gracefully and consistently, I feel like it is okay.

Anyway - thoughts?  




P.S.  Tobie, I know that you were talking about something in this space for Q4... and maybe this is what you are already thinking about.  If so, consider this brainstorming / requirements gathering.
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tobie Langel-3
On Wed, Oct 1, 2014 at 10:10 PM, Shane McCarron <[hidden email]> wrote:
There was an earlier thread where one side-discussion was about cross references.  Basically, Bikeshed and Shepherd have a system for doing cross-document references to things like terms.  I like the concept of this, but of course Bikeshed and Shepherd are document processors, not client-side magic like ReSpec, so....

I have this *idea*.  Or maybe it is an concept that could be turned into an idea if I expand upon it.  Something like this:
  • Adopt the Bikeshed syntax for definitions.  This is fairly rich, and is well documented at [1].
Agreed. 
  • Define a 'protocol' that ReSpec can use to query / update a definition service with the data from a and dfn elements, respectively.
  • Provide a reference implementation of a service that supports the protocol.
So what I'll be working on will have a JSON over HTTP API. 
  • Add code to the def / a module of ReSpec that uses the protocol to communicate with a document-defined end point.
So in theory a family of documents could coordinate their term definitions by all pointing to the same service endpoint.  In the PFWG we have a number of documents where this would be a huge help.  I imagine if such a service were available, a lot of other groups that rely upon ReSpec would find it much easier to reference definitions where they live rather than importing them. 

There are some obvious flaws in a design like this (overhead, speed, fragility, exposure to DoS).  But I don't feel the risks are much worse than the current specref use.  We would need some sort of registry in the reference implementation to help control where updates can come in from, etc.

My plan for this solution is to do daily crawling of relevant specs and extract the dfn and put them in a DB. Further refinements could include a search API, like I added for Specref and exposed within Respec.

P.S.  Tobie, I know that you were talking about something in this space for Q4... and maybe this is what you are already thinking about.  If so, consider this brainstorming / requirements gathering.

My focus will be on the gathering the data and providing a JSON API. Not on actual implementation within ReSpec (which I won't have cycles for at that time, I'm afraid).

--tobie

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Robin Berjon-6
On 02/10/2014 10:10 , Tobie Langel wrote:
> My plan for this solution is to do daily crawling of relevant specs and
> extract the dfn and put them in a DB. Further refinements could include
> a search API, like I added for Specref and exposed within Respec.

Could you somehow reuse or modify what Shepherd does here? If it
includes enough information (or additional extraction can be easily
added) and new specs can be added to its crawling (which I suspect ought
to be relatively easy — I recall Peter's code being able to process
quite a lot of different documents) then we can all align, which I
reckon is a win (even without counting the saved cycles).

Shepherd exposes an API that allows you to just simply dump the data it
has. If you look inside update.py in Bikeshed you can see how it works.
What Bikeshed does is, instead of querying services live, allow the user
to regularly call bikeshed update and get a fresh DB (of a bunch of
stuff). The same could be injected into SpecRef.

> My focus will be on the gathering the data and providing a JSON API. Not
> on actual implementation within ReSpec (which I won't have cycles for at
> that time, I'm afraid).

The hard part is getting the data. Hooking it into ReSpec oughtn't be
difficult, unless I'm missing something.

--
Robin Berjon - http://berjon.com/ - @robinberjon

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tobie Langel-3
On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <[hidden email]> wrote:
On 02/10/2014 10:10 , Tobie Langel wrote:
My plan for this solution is to do daily crawling of relevant specs and
extract the dfn and put them in a DB. Further refinements could include
a search API, like I added for Specref and exposed within Respec.

Could you somehow reuse or modify what Shepherd does here? If it includes enough information (or additional extraction can be easily added) and new specs can be added to its crawling (which I suspect ought to be relatively easy — I recall Peter's code being able to process quite a lot of different documents) then we can all align, which I reckon is a win (even without counting the saved cycles).

I've bumped into way too many painful issues with non browser-based HTML parsers to waste more time with them. I'm also very interested in gathering data from editor's draft which requires a JS runtime for those which use ReSpec.

Shepherd exposes an API that allows you to just simply dump the data it has. If you look inside update.py in Bikeshed you can see how it works. What Bikeshed does is, instead of querying services live, allow the user to regularly call bikeshed update and get a fresh DB (of a bunch of stuff). The same could be injected into SpecRef.

That sounds like a worthwhile idea to explore but seems somewhat orthogonal to this project, no?

My focus will be on the gathering the data and providing a JSON API. Not
on actual implementation within ReSpec (which I won't have cycles for at
that time, I'm afraid).

The hard part is getting the data. Hooking it into ReSpec oughtn't be difficult, unless I'm missing something.

Good. (I haven't thought about this at all, so I'll take your word for it). 

--tobie
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Shane McCarron


On Thu, Oct 2, 2014 at 6:41 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <[hidden email]> wrote:
On 02/10/2014 10:10 , Tobie Langel wrote:
My plan for this solution is to do daily crawling of relevant specs and
extract the dfn and put them in a DB. Further refinements could include
a search API, like I added for Specref and exposed within Respec.

Could you somehow reuse or modify what Shepherd does here? If it includes enough information (or additional extraction can be easily added) and new specs can be added to its crawling (which I suspect ought to be relatively easy — I recall Peter's code being able to process quite a lot of different documents) then we can all align, which I reckon is a win (even without counting the saved cycles).

I've bumped into way too many painful issues with non browser-based HTML parsers to waste more time with them. I'm also very interested in gathering data from editor's draft which requires a JS runtime for those which use ReSpec.

Exactly.  The real value of this is during development - especially of a family of specs such as the (possibly) upcoming HTML5 modules!
 

Shepherd exposes an API that allows you to just simply dump the data it has. If you look inside update.py in Bikeshed you can see how it works. What Bikeshed does is, instead of querying services live, allow the user to regularly call bikeshed update and get a fresh DB (of a bunch of stuff). The same could be injected into SpecRef.

That sounds like a worthwhile idea to explore but seems somewhat orthogonal to this project, no?

It hadn't occurred to me to conflate this feature with SpecRef.  I mean - it's a service, so I guess it can do anything.  Having the exposed references from many specs available to other, unrelated specs is interesting and ultimately useful.  But I agree that it is orthogonal to the goal of making it easier for *related* specs to connect together - particularly during development.


My focus will be on the gathering the data and providing a JSON API. Not
on actual implementation within ReSpec (which I won't have cycles for at
that time, I'm afraid).

The hard part is getting the data. Hooking it into ReSpec oughtn't be difficult, unless I'm missing something.

Good. (I haven't thought about this at all, so I'll take your word for it). 

Yeah, I looked at the code for how we talk to SpecRef and it seems pretty straightforward to do a similar integration into the place where we are creating the list of cross references we need to look up.

As an aside, I note that the SpecRef lookup (in ReSpec biblio.js) uses https GET.  I would change that to POST so that if there is a huge query we don't overflow URL length limits.  I will create an issue about it.
 

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tobie Langel-3
On Thu, Oct 2, 2014 at 1:49 PM, Shane McCarron <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 6:41 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <[hidden email]> wrote:

The hard part is getting the data. Hooking it into ReSpec oughtn't be difficult, unless I'm missing something.

Good. (I haven't thought about this at all, so I'll take your word for it). 

Yeah, I looked at the code for how we talk to SpecRef and it seems pretty straightforward to do a similar integration into the place where we are creating the list of cross references we need to look up.

Yeah, the lookup isn't the part I was worried about. It's the potential syntax changes I'm more concerned with.

As an aside, I note that the SpecRef lookup (in ReSpec biblio.js) uses https GET.  I would change that to POST so that if there is a huge query we don't overflow URL length limits.  I will create an issue about it.

The effective limit is around 2000 chars[1] which should give us over a hundred references. Let's think about fixing it when we cross it, no?

--tobie

---
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Shane McCarron


On Thu, Oct 2, 2014 at 7:15 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 1:49 PM, Shane McCarron <[hidden email]> wrote:


As an aside, I note that the SpecRef lookup (in ReSpec biblio.js) uses https GET.  I would change that to POST so that if there is a huge query we don't overflow URL length limits.  I will create an issue about it.

The effective limit is around 2000 chars[1] which should give us over a hundred references. Let's think about fixing it when we cross it, no?


If you insist.  Is the service not able to handle POST requests?

I only mentioned it because if we are integrating definition cross reference lookups, we could easily overflow the limit.
 

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tobie Langel-3
On Thu, Oct 2, 2014 at 2:18 PM, Shane McCarron <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 7:15 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 1:49 PM, Shane McCarron <[hidden email]> wrote:


As an aside, I note that the SpecRef lookup (in ReSpec biblio.js) uses https GET.  I would change that to POST so that if there is a huge query we don't overflow URL length limits.  I will create an issue about it.

The effective limit is around 2000 chars[1] which should give us over a hundred references. Let's think about fixing it when we cross it, no?

If you insist.  Is the service not able to handle POST requests?

Not sure (think I probably purposefully limited it to GET when building the service, but that's easy to change).

I only mentioned it because if we are integrating definition cross reference lookups, we could easily overflow the limit.

That would probably be two different end-points, no? 

--tobie
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Shane McCarron


On Thu, Oct 2, 2014 at 7:24 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 2:18 PM, Shane McCarron <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 7:15 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 1:49 PM, Shane McCarron <[hidden email]> wrote:


As an aside, I note that the SpecRef lookup (in ReSpec biblio.js) uses https GET.  I would change that to POST so that if there is a huge query we don't overflow URL length limits.  I will create an issue about it.

The effective limit is around 2000 chars[1] which should give us over a hundred references. Let's think about fixing it when we cross it, no?

If you insist.  Is the service not able to handle POST requests?

Not sure (think I probably purposefully limited it to GET when building the service, but that's easy to change).

I only mentioned it because if we are integrating definition cross reference lookups, we could easily overflow the limit.

That would probably be two different end-points, no? 

Probably.  I don't know what you had in mind for design.  I expect it is a different XHR call - no need to conflate them as they are in very different portions of the ReSpec implementation.  

On the other hand, if we wanted to limit the number of XHR round trips, we could delay the resolving of definition references until late in processing - make a single call, get the bibliographic AND definition cross reference links in a single call, then make the updates to the document.  That might be bad, unmodular design, but it might speed up processing for large documents.

In any event, I will leave it alone for now.
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Peter Linss
In reply to this post by Tobie Langel-3

On Oct 2, 2014, at 4:41 AM, Tobie Langel <[hidden email]> wrote:

On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <[hidden email]> wrote:
On 02/10/2014 10:10 , Tobie Langel wrote:
My plan for this solution is to do daily crawling of relevant specs and
extract the dfn and put them in a DB. Further refinements could include
a search API, like I added for Specref and exposed within Respec.

Could you somehow reuse or modify what Shepherd does here? If it includes enough information (or additional extraction can be easily added) and new specs can be added to its crawling (which I suspect ought to be relatively easy — I recall Peter's code being able to process quite a lot of different documents)

Yes, adding specs to it’s crawl is trivial.

then we can all align, which I reckon is a win (even without counting the saved cycles).

I've bumped into way too many painful issues with non browser-based HTML parsers to waste more time with them.

FWIW, Shepherd uses html5lib and AFAICT sees a browser equivalent DOM which it traverses. This hasn’t been an issue to date.

I'm also very interested in gathering data from editor's draft which requires a JS runtime for those which use ReSpec.

At one point I did start to add code to Shepherd’s spec parser (which actually has been completely factored out of Shepherd these days) to handle ReSpec source files. I stopped because ReSpec was under heavy development at the time and I didn’t want to chase a moving target.

Finishing this wouldn’t be that big a deal (and would be made easier if ReSpec uses the Bikeshed dfn markup).


Shepherd exposes an API that allows you to just simply dump the data it has. If you look inside update.py in Bikeshed you can see how it works. What Bikeshed does is, instead of querying services live, allow the user to regularly call bikeshed update and get a fresh DB (of a bunch of stuff). The same could be injected into SpecRef.

Yes, and it’s all JSON over http(s). You can currently query anchor data per spec or simply dump the entire DB. More advanced queries can be added to the API easily.

The API is also self-described via a json-home page (per [1]), Bikeshed uses a Python APIClient I wrote that uses the json-home page to process requests, it’s available stand-alone on GitHub[2]. 


That sounds like a worthwhile idea to explore but seems somewhat orthogonal to this project, no?

My focus will be on the gathering the data and providing a JSON API. Not
on actual implementation within ReSpec (which I won't have cycles for at
that time, I'm afraid).

The hard part is getting the data. Hooking it into ReSpec oughtn't be difficult, unless I'm missing something.

Good. (I haven't thought about this at all, so I'll take your word for it). 

--tobie


Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tobie Langel-3
On Thu, Oct 2, 2014 at 7:08 PM, Peter Linss <[hidden email]> wrote:

On Oct 2, 2014, at 4:41 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <[hidden email]> wrote:
On 02/10/2014 10:10 , Tobie Langel wrote:
My plan for this solution is to do daily crawling of relevant specs and
extract the dfn and put them in a DB. Further refinements could include
a search API, like I added for Specref and exposed within Respec.

Could you somehow reuse or modify what Shepherd does here? If it includes enough information (or additional extraction can be easily added) and new specs can be added to its crawling (which I suspect ought to be relatively easy — I recall Peter's code being able to process quite a lot of different documents)
Yes, adding specs to it’s crawl is trivial.
then we can all align, which I reckon is a win (even without counting the saved cycles).

I've bumped into way too many painful issues with non browser-based HTML parsers to waste more time with them.
FWIW, Shepherd uses html5lib and AFAICT sees a browser equivalent DOM which it traverses. This hasn’t been an issue to date.

So does jsdom[1]. Yet I've bumped into plenty of very annoying issues with it (even though jsdom actually has a JS runtime, which afaik html5lib doesn't).
 
I'm also very interested in gathering data from editor's draft which requires a JS runtime for those which use ReSpec.

At one point I did start to add code to Shepherd’s spec parser (which actually has been completely factored out of Shepherd these days) to handle ReSpec source files. I stopped because ReSpec was under heavy development at the time and I didn’t want to chase a moving target.

Finishing this wouldn’t be that big a deal (and would be made easier if ReSpec uses the Bikeshed dfn markup).

 Unfortunately, I need a solution that works for ReSpec drafts right away.
Shepherd exposes an API that allows you to just simply dump the data it has. If you look inside update.py in Bikeshed you can see how it works. What Bikeshed does is, instead of querying services live, allow the user to regularly call bikeshed update and get a fresh DB (of a bunch of stuff). The same could be injected into SpecRef.
Yes, and it’s all JSON over http(s). You can currently query anchor data per spec or simply dump the entire DB. More advanced queries can be added to the API easily.
 
Neat.

--tobie
---
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tab Atkins Jr.
On Thu, Oct 2, 2014 at 2:57 PM, Tobie Langel <[hidden email]> wrote:

> On Thu, Oct 2, 2014 at 7:08 PM, Peter Linss <[hidden email]> wrote:
>> On Oct 2, 2014, at 4:41 AM, Tobie Langel <[hidden email]> wrote:
>>> I've bumped into way too many painful issues with non browser-based HTML
>>> parsers to waste more time with them.
>>
>> FWIW, Shepherd uses html5lib and AFAICT sees a browser equivalent DOM
>> which it traverses. This hasn’t been an issue to date.
>
> So does jsdom[1]. Yet I've bumped into plenty of very annoying issues with
> it (even though jsdom actually has a JS runtime, which afaik html5lib
> doesn't).

Huh.  I've never run into any problems, myself.

~TJ

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Shane McCarron
In reply to this post by Tobie Langel-3


On Thu, Oct 2, 2014 at 1:57 PM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 7:08 PM, Peter Linss <[hidden email]> wrote:

On Oct 2, 2014, at 4:41 AM, Tobie Langel <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 12:10 PM, Robin Berjon <[hidden email]> wrote:
On 02/10/2014 10:10 , Tobie Langel wrote:
My plan for this solution is to do daily crawling of relevant specs and
extract the dfn and put them in a DB. Further refinements could include
a search API, like I added for Specref and exposed within Respec.

Could you somehow reuse or modify what Shepherd does here? If it includes enough information (or additional extraction can be easily added) and new specs can be added to its crawling (which I suspect ought to be relatively easy — I recall Peter's code being able to process quite a lot of different documents)
Yes, adding specs to it’s crawl is trivial.
then we can all align, which I reckon is a win (even without counting the saved cycles).

I've bumped into way too many painful issues with non browser-based HTML parsers to waste more time with them.
FWIW, Shepherd uses html5lib and AFAICT sees a browser equivalent DOM which it traverses. This hasn’t been an issue to date.

So does jsdom[1]. Yet I've bumped into plenty of very annoying issues with it (even though jsdom actually has a JS runtime, which afaik html5lib doesn't).
 
I'm also very interested in gathering data from editor's draft which requires a JS runtime for those which use ReSpec.

At one point I did start to add code to Shepherd’s spec parser (which actually has been completely factored out of Shepherd these days) to handle ReSpec source files. I stopped because ReSpec was under heavy development at the time and I didn’t want to chase a moving target.

Finishing this wouldn’t be that big a deal (and would be made easier if ReSpec uses the Bikeshed dfn markup).

 Unfortunately, I need a solution that works for ReSpec drafts right away.

I would prefer that too - something where my draft can push its definitions in (with credentials maybe, and on demand, not automatically, through the save menu?) and correspondingly access them from the related drafts automatically is what I am looking for.

 Honestly, this feels like a solved problem.  I would be happy to take a stab at implementing the bikeshed syntax in ReSpec as a way of getting this started.  I find the bikeshed extensions really compelling.

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tobie Langel-3
In reply to this post by Tab Atkins Jr.
On Thu, Oct 2, 2014 at 9:00 PM, Tab Atkins Jr. <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 2:57 PM, Tobie Langel <[hidden email]> wrote:
> So does jsdom[1]. Yet I've bumped into plenty of very annoying issues with
> it (even though jsdom actually has a JS runtime, which afaik html5lib
> doesn't).

Huh.  I've never run into any problems, myself.

Jsdom builds incorrect DOM trees for document that are missing a <head> element (e.g. the Web Workers spec).

--tobie
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tab Atkins Jr.
On Thu, Oct 2, 2014 at 3:11 PM, Tobie Langel <[hidden email]> wrote:

> On Thu, Oct 2, 2014 at 9:00 PM, Tab Atkins Jr. <[hidden email]> wrote:
>> On Thu, Oct 2, 2014 at 2:57 PM, Tobie Langel <[hidden email]>
>> wrote:
>> > So does jsdom[1]. Yet I've bumped into plenty of very annoying issues
>> > with
>> > it (even though jsdom actually has a JS runtime, which afaik html5lib
>> > doesn't).
>>
>> Huh.  I've never run into any problems, myself.
>
> Jsdom builds incorrect DOM trees for document that are missing a <head>
> element (e.g. the Web Workers spec).

Okay, then they are *not* implementing the HTML parsing algorithm.  html5lib is.

~TJ

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Tobie Langel-3
In reply to this post by Shane McCarron
On Thu, Oct 2, 2014 at 9:11 PM, Shane McCarron <[hidden email]> wrote:
On Thu, Oct 2, 2014 at 1:57 PM, Tobie Langel <[hidden email]> wrote:
 Unfortunately, I need a solution that works for ReSpec drafts right away.

I would prefer that too - something where my draft can push its definitions in (with credentials maybe, and on demand, not automatically, through the save menu?) and correspondingly access them from the related drafts automatically is what I am looking for.

So that's absolutely not what I had in mind. :( What I'm working on involves daily crawls of publicly available drafts, not the ability to push data. Not saying the latter couldn't be implemented on top of the former, or additionally, but it's just not what I;m working on.
 
 Honestly, this feels like a solved problem.  I would be happy to take a stab at implementing the bikeshed syntax in ReSpec as a way of getting this started.  I find the bikeshed extensions really compelling.

WFM.

--tobie 

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about cross references and ReSpec

Robin Berjon-6
In reply to this post by Tobie Langel-3
On 02/10/2014 21:11 , Tobie Langel wrote:
> Jsdom builds incorrect DOM trees for document that are missing a <head>
> element (e.g. the Web Workers spec).

I don't think that's the case anymore, jsdom has taken great strides
towards conformance over the past months. If you find an issue, talk to
Domenic. He even made it work in Worker context so that you can now have
a DOM there :)

--
Robin Berjon - http://berjon.com/ - @robinberjon