Comments on /site-meta

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Comments on /site-meta

Danny Ayers




re. http://www.ietf.org/internet-drafts/draft-nottingham-site-meta-00.txt

Regarding the general approach I'm not convinced that "pre-empting an
authority's URI namespace" is a "necessary evil". I'm sure you are
aware of the primary argument against using well-known names in this
way [1].

A possible alternative would be to include a link in the root
namespace document pointing to the site-meta document.

i.e. client GETs:
http://example.com/

depending on conneg, the doc returned would contain something like:

<link rel="site-meta" href="http://example.com/site-meta" />

or

<rdf:Description rdf:about="http://example.com/">
   <x:siteMeta rdf:resource=""http://example.com/site-meta" />
</rdf:Description>

- thus the URI of the metadata document could be decided by the
publisher. While this approach does add a step of indirection, I
believe it would offer greater flexibility in also allowing sub-path
hierarchies of the site to refer to their own, more local, metadata.

Regarding the document format, it seems reasonable enough, though I
can't help thinking it might be advantageous to define it as an
extension to the Sitemap Protocol [2], along the lines of the
Semantic Web Crawling extension [3].

Cheers,
Danny.

[1] http://www.w3.org/2001/tag/issues.html#siteData-36
[2] https://www.google.com/webmasters/tools/docs/en/protocol.html
[3] http://sw.deri.org/2007/07/sitemapextension/

--
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/

Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Mark Nottingham-2

Hi Danny,

On 16/10/2008, at 9:39 PM, Danny Ayers wrote:

> re. http://www.ietf.org/internet-drafts/draft-nottingham-site-meta-00.txt
>
> Regarding the general approach I'm not convinced that "pre-empting an
> authority's URI namespace" is a "necessary evil". I'm sure you are
> aware of the primary argument against using well-known names in this
> way [1].

Yep, that's referenced in the draft :)  I'd note that the W3C has  
included at least one well-known location in a Recommendation, for  
lack of any better mechanism.


> A possible alternative would be to include a link in the root
> namespace document pointing to the site-meta document.
>
> i.e. client GETs:
> http://example.com/
>
> depending on conneg, the doc returned would contain something like:
>
> <link rel="site-meta" href="http://example.com/site-meta" />
>
> or
>
> <rdf:Description rdf:about="http://example.com/">
>   <x:siteMeta rdf:resource=""http://example.com/site-meta" />
> </rdf:Description>
>
> - thus the URI of the metadata document could be decided by the
> publisher. While this approach does add a step of indirection, I
> believe it would offer greater flexibility in also allowing sub-path
> hierarchies of the site to refer to their own, more local, metadata.

Two problems;

1) If the site has a normal representation there (i.e., a home page),  
it could be big, which would be an impediment to clients getting the  
metadata quickly (or at all, in the case of resource-constrained use  
cases). Remember, conneg can't be used to get something fundamentally  
different; it needs to be a representation of the *same* resource.

2) The step of indirection is a deal-killer for some users.

Personally, I'm very tempted by using one or more response *headers*  
on the root resource, so you can HEAD for them, but this still  
requires more requests than the embedded-in-site-meta approach, and  
some people balk at that. Given that the whole idea here is to make  
this a slam-dunk solution for the problem (so as to avoid creating any  
*other* new well-known locations), it has to have as few points of  
friction as possible.


> Regarding the document format, it seems reasonable enough, though I
> can't help thinking it might be advantageous to define it as an
> extension to the Sitemap Protocol [2], along the lines of the
> Semantic Web Crawling extension [3].

I'm not a big fan of sitemaps; it's not very flexible, and can only  
define metadata for one URI at a time. Frankly, if I were to  
implicitly promote an existing format, it'd be Atom (I was tempted to  
do this, but came down on the side of creating something simpler; Dave  
Orchard always said that the most successful XML vocabularies had 4 or  
less elements...) or something like URISpace (but a little less  
tortured).

What' I'm *really* wondering at this point is if XML itself is too  
complex -- i.e., should this be a line-oriented format? One pre-draft  
reviewer already suggested as much.

Cheers,



--
Mark Nottingham     http://www.mnot.net/


Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Julian Reschke

Mark Nottingham wrote:
> ...
> What' I'm *really* wondering at this point is if XML itself is too
> complex -- i.e., should this be a line-oriented format? One pre-draft
> reviewer already suggested as much.
> ...

No, please don't go there.

On the other hand, you really *should* put the elements into a namespace
(IMHO).

BR, Julian

Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Danny Ayers
In reply to this post by Mark Nottingham-2




2008/10/16 Mark Nottingham <[hidden email]>:

Hi Mark,

>> A possible alternative would be to include a link in the root
>> namespace document pointing to the site-meta document.
>>
>> i.e. client GETs:
>> http://example.com/
>>
>> depending on conneg, the doc returned would contain something like:
>>
>> <link rel="site-meta" href="http://example.com/site-meta" />
>>
>> or
>>
>> <rdf:Description rdf:about="http://example.com/">
>>  <x:siteMeta rdf:resource=""http://example.com/site-meta" />
>> </rdf:Description>
>>
>> - thus the URI of the metadata document could be decided by the
>> publisher. While this approach does add a step of indirection, I
>> believe it would offer greater flexibility in also allowing sub-path
>> hierarchies of the site to refer to their own, more local, metadata.
>
> Two problems;
>
> 1) If the site has a normal representation there (i.e., a home page), it
> could be big, which would be an impediment to clients getting the metadata
> quickly

Fair point.

> (or at all, in the case of resource-constrained use cases).

I don't get that - do you have an example of such a use case?

> Remember, conneg can't be used to get something fundamentally different; it
> needs to be a representation of the *same* resource.

Yep, but I don't think that's particularly relevant - usual conneg
rules apply, but representations of the root namespace resource MAY
contain a link to the metadata doc.

> 2) The step of indirection is a deal-killer for some users.

For example..?

> Personally, I'm very tempted by using one or more response *headers* on the
> root resource, so you can HEAD for them, but this still requires more
> requests than the embedded-in-site-meta approach, and some people balk at
> that.

That does seem neater, I guess link header could be in the frame for
that. Not slam-dunk though.

Given that the whole idea here is to make this a slam-dunk solution
> for the problem (so as to avoid creating any *other* new well-known
> locations), it has to have as few points of friction as possible.

Do you happen to know if robot.txt has any extension points (or could
be viably revised)? (Got a presentation to prep last minute or I'd go
look :-)

>> Regarding the document format, it seems reasonable enough, though I
>> can't help thinking it might be advantageous to define it as an
>> extension to the Sitemap Protocol [2], along the lines of the
>> Semantic Web Crawling extension [3].
>
> I'm not a big fan of sitemaps; it's not very flexible, and can only define
> metadata for one URI at a time. Frankly, if I were to implicitly promote an
> existing format, it'd be Atom (I was tempted to do this, but came down on
> the side of creating something simpler; Dave Orchard always said that the
> most successful XML vocabularies had 4 or less elements...) or something
> like URISpace (but a little less tortured).

Ok, fair enough.

> What' I'm *really* wondering at this point is if XML itself is too complex
> -- i.e., should this be a line-oriented format? One pre-draft reviewer
> already suggested as much.

That sounds reasonable, though it would be good if an agent could make
some sense of the doc without prior knowledge - which is a point, the
current proposed format doesn't have an XML namespace, which pre-empts
any chance of follow-your-nose discovery (a la GRDDL).

Cheers,
Danny.

--
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/

Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Mark Nottingham-2


On 16/10/2008, at 10:11 PM, Danny Ayers wrote:

>>
>> 1) If the site has a normal representation there (i.e., a home  
>> page), it
>> could be big, which would be an impediment to clients getting the  
>> metadata
>> quickly
>
> Fair point.
>
>> (or at all, in the case of resource-constrained use cases).
>
> I don't get that - do you have an example of such a use case?

E.g., a mobile device / printer / remote sensor / etc. might have to  
download an entire HTML homepage to get the metadata it needs. True,  
it could drop the connection, but that's nasty and there'd still be  
packets in flight.


>> Remember, conneg can't be used to get something fundamentally  
>> different; it
>> needs to be a representation of the *same* resource.
>
> Yep, but I don't think that's particularly relevant - usual conneg
> rules apply, but representations of the root namespace resource MAY
> contain a link to the metadata doc.

Of course. I just think it's stretching it a bit to say that a 10K  
HTML file and three lines of RDF (for example) are representations of  
the same resource...


>> 2) The step of indirection is a deal-killer for some users.
>
> For example..?

It's the extra round-trip time; subjecting all of your users to that  
is a big deal to performance-minded people, especially when you're  
considering things like users on high-latency, low-bandwidth, high-
loss links, running very popular sites and the bandwidth associated  
with doing that, etc. This topic occupied a *lot* of time in the P3P  
discussions, and it still comes up with a lot of users considering  
this issue today.


>> Given that the whole idea here is to make this a slam-dunk solution
>> for the problem (so as to avoid creating any *other* new well-known
>> locations), it has to have as few points of friction as possible.
>
> Do you happen to know if robot.txt has any extension points (or could
> be viably revised)? (Got a presentation to prep last minute or I'd go
> look :-)

I looked at that, but the situation is really muddy; AIUI some parsers  
will choke on unrecognised content. I actually started out assuming  
robots.txt, but seeing as there isn't even a decent spec for it...


>> What' I'm *really* wondering at this point is if XML itself is too  
>> complex
>> -- i.e., should this be a line-oriented format? One pre-draft  
>> reviewer
>> already suggested as much.
>
> That sounds reasonable, though it would be good if an agent could make
> some sense of the doc without prior knowledge - which is a point, the
> current proposed format doesn't have an XML namespace, which pre-empts
> any chance of follow-your-nose discovery (a la GRDDL).

Yeah, I'm trying to see how far I can get in 2008 without a namespace :)

Question out of the blue -- can GRDDL do dispatch on a media type? If  
not, why not?

Cheers and thanks,


--
Mark Nottingham     http://www.mnot.net/


Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Julian Reschke

Mark Nottingham wrote:
> ...
> Yeah, I'm trying to see how far I can get in 2008 without a namespace :)
>
> Question out of the blue -- can GRDDL do dispatch on a media type? If
> not, why not?
> ...

As far as I can tell, GRDDL *usually* uses XSLT, and the XSLT transform
will have no knowledge of the mime type the XML document was served with.

BR, Julian

Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Danny Ayers
In reply to this post by Mark Nottingham-2

2008/10/16 Mark Nottingham <[hidden email]>:

> Yeah, I'm trying to see how far I can get in 2008 without a namespace :)

Heh.

> Question out of the blue -- can GRDDL do dispatch on a media type? If not,
> why not?

Nope, though it most probably was suggested at some point. One reason
why not would be the dependency on a registry rather than HTTP-based
discovery. But a few areas of the spec were left open-ended (e.g. how
to use transformation mechanisms other than XSLT), anticipating a
possible future revision - in part because of time constraints, in
part because of lack of deployment experience.

For future ref. I did a quick ref:
http://dannyayers.com/misc/grddl-reference

(PDF I'm afraid - if you know of any way of getting lightweight HTML
from an OpenOffice doc, please let me know)

Suggestion out of the blue - for a simpler format, what about Turtle?

http://www.w3.org/TeamSubmission/turtle/

Cheers,
Danny.


--
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/

Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Mark Nottingham-2

Ooh. Hmm.

How would you embed another dialect (e.g., robots.txt or p3p.xml)? I  
ask out of ignorance of turtle; the last time I used n3 was probably  
two years ago.



On 16/10/2008, at 10:47 PM, Danny Ayers wrote:

> Suggestion out of the blue - for a simpler format, what about Turtle?
>
> http://www.w3.org/TeamSubmission/turtle/


--
Mark Nottingham     http://www.mnot.net/


Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Danny Ayers

2008/10/16 Mark Nottingham <[hidden email]>:
> Ooh. Hmm.
>
> How would you embed another dialect (e.g., robots.txt or p3p.xml)? I ask out
> of ignorance of turtle; the last time I used n3 was probably two years ago.

Turtle is the subset of n3 that corresponds directly to RDF (so it's
more or less isomorphic to RDF/XML).

Which would allow (at least..?) two options - embed the other material
as a (string or XML) literal or map terms in the dialect to terms in
an RDF vocabulary.

Cheers,
Danny.

--
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/

Reply | Threaded
Open this post in threaded view
|

Re: Comments on /site-meta

Danny Ayers
In reply to this post by Danny Ayers

2008/10/16 Eran Hammer-Lahav <[hidden email]>:
> In an effort to continue the community's fragmented and repetitive conversation on this topic, I've attempted to collect all the different solutions raised to this problem in a post titled "Discovery and HTTP" [1]. I would be happy to hear new proposals not previously considered, and am happy to explain why I have ruled out some of these suggestions in my analysis.

Great survey!

The issue of resource description vs. resource representation is one
that's cropped up quite a lot around RDF, though offhand I can't think
where to find relevant material, except URIQA [1] (from about 2004)
which is a set of custom HTTP methods.

There's also tangentially related material at WebDescriptionProposals
[2] (WADL, WIDL, WRDL...).

Cheers,
Danny.

[1] http://sw.nokia.com/uriqa/URIQA.html
[2] http://esw.w3.org/topic/WebDescriptionProposals

--
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/