Re: fragment identifiers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: fragment identifiers

Martin J. Dürst
Hello Peter,

I have cross-posted to the URI list, because I think it's important to
get input from more experts. People on the URI list, this is about what
to do (or not to do) about fragment identifiers in URNs, raised in the
context of an update of RFC 2141.

On 2011/03/10 13:30, Peter Saint-Andre wrote:
> <hat type='individual'/>
>
> On 3/9/11 2:11 AM, "Martin J. Dürst" wrote:
>>
>> On 2011/03/09 13:51, Peter Saint-Andre wrote:

>> Anyway, from a higher-up view, RFC2141bis is defining the "urn:" URI
>> scheme, and URI scheme definitions in general are supposed to say
>> nothing (or just a little in some exceptional cases) on fragment
>> identifiers. The reason for this is that fragment identifiers are
>> defined per MIME Media Type, not per URI scheme.
>>
>> So if I have something like "urn:foo:bar:baz#here", then the urn spec
>> only has to say what "urn:foo:bar:baz" is supposed to mean, the meaning
>> of "here" is defined by whatever format I might get back when resolving
>> "urn:foo:bar:baz". If I have a browser that resolves (some) urns (I
>> don't know one, but there should be some), this is what already happens,
>> and it shouldn't and won't change. RFC2141bis doesn't have to say
>> anything for this to work.
>>
>> In case RFC2141bis tries to do anything else than the above, that would
>> be a very bad idea, and should be fixed quickly.
>
> Here is what RFC 3986 says:
>
>     The semantics of a fragment identifier are defined by the set of
>     representations that might result from a retrieval action on the
>     primary resource.  The fragment's format and resolution is therefore
>     dependent on the media type [RFC2046] of a potentially retrieved
>     representation, even though such a retrieval is only performed if the
>     URI is dereferenced.  If no such representation exists, then the
>     semantics of the fragment are considered unknown and are effectively
>     unconstrained.  Fragment identifier semantics are independent of the
>     URI scheme and thus cannot be redefined by scheme specifications.
>
> As far as I can see, the semantics of fragment identifiers in URNs would
> not be defined by media types because URNs are not generally resolved
> for the purpose of retrieving a representation.

"not generally" and "not" are not the same. Even for http: URIs, it's
true that they are not always resolved. So in that sense, if I use
http://never_any_server_here.sw.it.aoyama.ac.jp/one/two/three
with some fragment identifier (I'm in control of sw.it.aoyama.ac.jp and
make sure that there never is a server at
never_any_server_here.sw.it.aoyama.ac.jp), then I'm indeed unconstrained.

On the other hand, for quite a few URNs, it would make a lot of sense to
resolve them. Let's say I have set up some proxy or use some dedicated
browser that helps me resolve some URNs. Then the paragraph from RFC
3986 that you cite above clearly applies.

> Therefore, in the
> context of URNs, the semantics of the fragment would be considered
> unknown and would be effectively unconstrained (at least from the
> perspective of the 'urn:' URI scheme).

Non sequitur.

> 2141bis seems to imply that the semantics of the fragment identifier
> could be constrained by the definition of a particular URN namespace
> (despite the fact that they are not constrained by the 'urn:' URI scheme
> itself).

That would make at least some limited sense, if we could sort namespaces
by whether they (maybe only occasionally) allow resolution, or whether
they are absolutely and terminally never ever going to be used for
resolution. But the last sentence from the paragraph you cite says:

                    Fragment identifier semantics are independent of the
    URI scheme and thus cannot be redefined by scheme specifications.

This not only means that the URN spec (which is just the definition of
the 'urn:' URI scheme) cannot redefine fragment identifier semantics, it
also seems to imply that scheme specifications (including the URN spec)
cannot delegate such semantics to some subspaces of the scheme.

> I'm not sure what the use cases are here, but perhaps folks on
> the list could explain a bit more what they mean by reusing an
> identifier scheme that designates objects of such complexity that it is
> necessary to reference parts of the objects via fragment identifiers.

I'm looking forward to hear from other people on this list, but
essentially even if there are very complex objects, there are always
different ways to identify components than using a '#'.

Regards,   Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Juha Hakala
Hello Martin; all,

A few comments below.

Martin J. Dürst wrote:
> Hello Peter,
>
> I have cross-posted to the URI list, because I think it's important to
> get input from more experts. People on the URI list, this is about what
> to do (or not to do) about fragment identifiers in URNs, raised in the
> context of an update of RFC 2141.

For the URN community this issue is important because there are
initiatives which are eager to use fragment identifiers. I have heard
rumours that some are already using them. A typical use case would be a
very complex data such as structured research data set within which many
kinds of data should be separately described, identified and retrieved.

>
> On 2011/03/10 13:30, Peter Saint-Andre wrote:
>> <hat type='individual'/>
>>
>> On 3/9/11 2:11 AM, "Martin J. Dürst" wrote:
>>>
>>> On 2011/03/09 13:51, Peter Saint-Andre wrote:
>
>>> Anyway, from a higher-up view, RFC2141bis is defining the "urn:" URI
>>> scheme, and URI scheme definitions in general are supposed to say
>>> nothing (or just a little in some exceptional cases) on fragment
>>> identifiers. The reason for this is that fragment identifiers are
>>> defined per MIME Media Type, not per URI scheme.
>>>
>>> So if I have something like "urn:foo:bar:baz#here", then the urn spec
>>> only has to say what "urn:foo:bar:baz" is supposed to mean, the meaning
>>> of "here" is defined by whatever format I might get back when resolving
>>> "urn:foo:bar:baz". If I have a browser that resolves (some) urns (I
>>> don't know one, but there should be some), this is what already happens,
>>> and it shouldn't and won't change. RFC2141bis doesn't have to say
>>> anything for this to work.
>>>
>>> In case RFC2141bis tries to do anything else than the above, that would
>>> be a very bad idea, and should be fixed quickly.
>>
>> Here is what RFC 3986 says:
>>
>>     The semantics of a fragment identifier are defined by the set of
>>     representations that might result from a retrieval action on the
>>     primary resource.  The fragment's format and resolution is therefore
>>     dependent on the media type [RFC2046] of a potentially retrieved
>>     representation, even though such a retrieval is only performed if the
>>     URI is dereferenced.  If no such representation exists, then the
>>     semantics of the fragment are considered unknown and are effectively
>>     unconstrained.  Fragment identifier semantics are independent of the
>>     URI scheme and thus cannot be redefined by scheme specifications.
>>
>> As far as I can see, the semantics of fragment identifiers in URNs would
>> not be defined by media types because URNs are not generally resolved
>> for the purpose of retrieving a representation.
>
> "not generally" and "not" are not the same. Even for http: URIs, it's
> true that they are not always resolved. So in that sense, if I use
> http://never_any_server_here.sw.it.aoyama.ac.jp/one/two/three
> with some fragment identifier (I'm in control of sw.it.aoyama.ac.jp and
> make sure that there never is a server at
> never_any_server_here.sw.it.aoyama.ac.jp), then I'm indeed unconstrained.
>
> On the other hand, for quite a few URNs, it would make a lot of sense to
> resolve them. Let's say I have set up some proxy or use some dedicated
> browser that helps me resolve some URNs. Then the paragraph from RFC
> 3986 that you cite above clearly applies.

Persistent identifiers will be used for multiple purposes, and by the
time we assign e.g. a URN to a resource, we have no idea which
resolution  services will be needed in the (distant) future. Lifetime of
a PID may be centuries; applications and the functionality they offer
will change many times during such a period. And eventually even the
copyright protection of a document will expire ;-).

Retrieving a representation is one the key resolution services supplied
already. But there does not need to be a 1:1 relation between a URN (or
any other persistent identifier) and the URI (URL/URLs) it maps to via a
resolution service.

For example, consider:

DOI: 10.1016/B978-0-240-81330-1.00007-5

This is a real Digital Object Identifier based on ISBN of Tomlinson
Holman's Sound for film and television (3rd ed.), but please note that
this DOI does not identify the entire book, but just a chapter within
it. The final section of the DOI suffix (00007-5) signifies the second
chapter of the book. Each chapter has its own DOI, and they will most
likely be available for purchase as individual files, so the URIs these
DOIs resolve to will not have <fragment>s in them. But if the above
"extended ISBN" were expressed as URN, we might come up with something like:

URN:ISBN:978-0-240-81330-1#00007-5

if this were the way in which identifiers for book chapters were
expressed according to the ISBN standard and in the ISBN namespace. This
URN would then resolve to the same PDF file as the DOI above, either in
the same digital library or in some other digital asset management
system.

>> Therefore, in the
>> context of URNs, the semantics of the fragment would be considered
>> unknown and would be effectively unconstrained (at least from the
>> perspective of the 'urn:' URI scheme).
>
> Non sequitur.
>
>> 2141bis seems to imply that the semantics of the fragment identifier
>> could be constrained by the definition of a particular URN namespace
>> (despite the fact that they are not constrained by the 'urn:' URI scheme
>> itself).

Yes; some namespaces / identifier systems will not allow usage of
<fragment> since the syntax of the identifier does not support such a
thing. For instance, the example shown above

URN:ISBN:978-0-240-81330-1#00007-5, or ISBN string

ISBN 978-0-240-81330-1#00007-5

is imaginary, since ISBN standard does not actually support this. DOI
does, and one might also construct national bibliography numbers (NBNs)
and consequently URNs which consist of ISBN and fragment identifier.
Thus DOI namespace (if one is registered in the future) and NBN
namespace should support <fragment>, if we are to give free hands to
people using these identifiers in the URN context.

> That would make at least some limited sense, if we could sort namespaces
> by whether they (maybe only occasionally) allow resolution, or whether
> they are absolutely and terminally never ever going to be used for
> resolution.

Based on what I have said before, I don't think that resolution is the
crucial factor here. And if I am wrong and it is, then any namespace may
allow resolution at some point in the future when the requirements of
the user community change.

But the last sentence from the paragraph you cite says:
>
>                    Fragment identifier semantics are independent of the
>    URI scheme and thus cannot be redefined by scheme specifications.
>
> This not only means that the URN spec (which is just the definition of
> the 'urn:' URI scheme) cannot redefine fragment identifier semantics, it
> also seems to imply that scheme specifications (including the URN spec)
> cannot delegate such semantics to some subspaces of the scheme.

Yes.
>
>> I'm not sure what the use cases are here, but perhaps folks on
>> the list could explain a bit more what they mean by reusing an
>> identifier scheme that designates objects of such complexity that it is
>> necessary to reference parts of the objects via fragment identifiers.

I can give one practical example from my own library.

Like many other national libraries, we digitise old books. The outcome
of the process is a METS container, within which the full text of the
book is stored in structured XML (METS/ALTO). The structure expresses
chapters, and some information objects such as images.

Each chapter has currently its own URN:NBN, so in addition to being able
to provide a persistent link to the title page of the book, such links
can also be made to the chapters and other component parts of the book.
We believe that some users will find such functionality useful (and they
will also be happy when the URNs will still be functional many years
from now, unlike many URIs that were thought to be cool).

If usage of <fragment> is allowed in RFC2141bis and within the NBN
namespace, we might change the current policy and assign just one
URN:NBN to the book itself, and then fragment identifiers based on the
NBN to the chapters and other component parts of the book. Our URN
resolver would be able to map these URN:NBNs to the correct component
parts within the METS container (or any other container standard we will
rely on in the future.

> I'm looking forward to hear from other people on this list, but
> essentially even if there are very complex objects, there are always
> different ways to identify components than using a '#'.

True - in our case, the national library of Finland can continue the
current policy and assign an NBN to each component part. Nevertheless,
it may be a good idea to allow choice between two different approaches.
In some cases, using <fragment> can be more convenient than assigning
individual identifiers. Research data sets come to mind; perhaps
somebody from that community can describe the requirements?

Best regards,

Juha
>
> Regards,   Martin.
>

--

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email [hidden email], tel +358 50 382 7678


Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Julian Reschke
On 10.03.2011 13:28, Juha Hakala wrote:
> ...
> Persistent identifiers will be used for multiple purposes, and by the
> time we assign e.g. a URN to a resource, we have no idea which
> resolution services will be needed in the (distant) future. Lifetime of
> a PID may be centuries; applications and the functionality they offer
> will change many times during such a period. And eventually even the
> copyright protection of a document will expire ;-).
> ...

I think that statement in itself rules out use of fragment identifiers.
At least if you want to stay in sync with the URI spec (RFC 3986).

> Retrieving a representation is one the key resolution services supplied
> already. But there does not need to be a 1:1 relation between a URN (or
> any other persistent identifier) and the URI (URL/URLs) it maps to via a
> resolution service.
> ...

Even if there *was* a one-to-one mapping, the representation could still
vary based on request header fields (content negotiation), and also over
time.

Best regards, Julian

Reply | Threaded
Open this post in threaded view
|

RE: [urn] fragment identifiers

Evain, Jean-Pierre
In reply to this post by Juha Hakala
Hello there,

One of the most recent activity in this domain is http://www.w3.org/2008/WebVideo/Fragments/ 

Cheers, Jean-Pierre

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Juha Hakala
Sent: jeudi, 10. mars 2011 13:29
To: "Martin J. Dürst"
Cc: Peter Saint-Andre; [hidden email]; [hidden email]
Subject: Re: [urn] fragment identifiers

Hello Martin; all,

A few comments below.

Martin J. Dürst wrote:
> Hello Peter,
>
> I have cross-posted to the URI list, because I think it's important to
> get input from more experts. People on the URI list, this is about what
> to do (or not to do) about fragment identifiers in URNs, raised in the
> context of an update of RFC 2141.

For the URN community this issue is important because there are
initiatives which are eager to use fragment identifiers. I have heard
rumours that some are already using them. A typical use case would be a
very complex data such as structured research data set within which many
kinds of data should be separately described, identified and retrieved.

>
> On 2011/03/10 13:30, Peter Saint-Andre wrote:
>> <hat type='individual'/>
>>
>> On 3/9/11 2:11 AM, "Martin J. Dürst" wrote:
>>>
>>> On 2011/03/09 13:51, Peter Saint-Andre wrote:
>
>>> Anyway, from a higher-up view, RFC2141bis is defining the "urn:" URI
>>> scheme, and URI scheme definitions in general are supposed to say
>>> nothing (or just a little in some exceptional cases) on fragment
>>> identifiers. The reason for this is that fragment identifiers are
>>> defined per MIME Media Type, not per URI scheme.
>>>
>>> So if I have something like "urn:foo:bar:baz#here", then the urn spec
>>> only has to say what "urn:foo:bar:baz" is supposed to mean, the meaning
>>> of "here" is defined by whatever format I might get back when resolving
>>> "urn:foo:bar:baz". If I have a browser that resolves (some) urns (I
>>> don't know one, but there should be some), this is what already happens,
>>> and it shouldn't and won't change. RFC2141bis doesn't have to say
>>> anything for this to work.
>>>
>>> In case RFC2141bis tries to do anything else than the above, that would
>>> be a very bad idea, and should be fixed quickly.
>>
>> Here is what RFC 3986 says:
>>
>>     The semantics of a fragment identifier are defined by the set of
>>     representations that might result from a retrieval action on the
>>     primary resource.  The fragment's format and resolution is therefore
>>     dependent on the media type [RFC2046] of a potentially retrieved
>>     representation, even though such a retrieval is only performed if the
>>     URI is dereferenced.  If no such representation exists, then the
>>     semantics of the fragment are considered unknown and are effectively
>>     unconstrained.  Fragment identifier semantics are independent of the
>>     URI scheme and thus cannot be redefined by scheme specifications.
>>
>> As far as I can see, the semantics of fragment identifiers in URNs would
>> not be defined by media types because URNs are not generally resolved
>> for the purpose of retrieving a representation.
>
> "not generally" and "not" are not the same. Even for http: URIs, it's
> true that they are not always resolved. So in that sense, if I use
> http://never_any_server_here.sw.it.aoyama.ac.jp/one/two/three
> with some fragment identifier (I'm in control of sw.it.aoyama.ac.jp and
> make sure that there never is a server at
> never_any_server_here.sw.it.aoyama.ac.jp), then I'm indeed unconstrained.
>
> On the other hand, for quite a few URNs, it would make a lot of sense to
> resolve them. Let's say I have set up some proxy or use some dedicated
> browser that helps me resolve some URNs. Then the paragraph from RFC
> 3986 that you cite above clearly applies.

Persistent identifiers will be used for multiple purposes, and by the
time we assign e.g. a URN to a resource, we have no idea which
resolution  services will be needed in the (distant) future. Lifetime of
a PID may be centuries; applications and the functionality they offer
will change many times during such a period. And eventually even the
copyright protection of a document will expire ;-).

Retrieving a representation is one the key resolution services supplied
already. But there does not need to be a 1:1 relation between a URN (or
any other persistent identifier) and the URI (URL/URLs) it maps to via a
resolution service.

For example, consider:

DOI: 10.1016/B978-0-240-81330-1.00007-5

This is a real Digital Object Identifier based on ISBN of Tomlinson
Holman's Sound for film and television (3rd ed.), but please note that
this DOI does not identify the entire book, but just a chapter within
it. The final section of the DOI suffix (00007-5) signifies the second
chapter of the book. Each chapter has its own DOI, and they will most
likely be available for purchase as individual files, so the URIs these
DOIs resolve to will not have <fragment>s in them. But if the above
"extended ISBN" were expressed as URN, we might come up with something like:

URN:ISBN:978-0-240-81330-1#00007-5

if this were the way in which identifiers for book chapters were
expressed according to the ISBN standard and in the ISBN namespace. This
URN would then resolve to the same PDF file as the DOI above, either in
the same digital library or in some other digital asset management
system.

>> Therefore, in the
>> context of URNs, the semantics of the fragment would be considered
>> unknown and would be effectively unconstrained (at least from the
>> perspective of the 'urn:' URI scheme).
>
> Non sequitur.
>
>> 2141bis seems to imply that the semantics of the fragment identifier
>> could be constrained by the definition of a particular URN namespace
>> (despite the fact that they are not constrained by the 'urn:' URI scheme
>> itself).

Yes; some namespaces / identifier systems will not allow usage of
<fragment> since the syntax of the identifier does not support such a
thing. For instance, the example shown above

URN:ISBN:978-0-240-81330-1#00007-5, or ISBN string

ISBN 978-0-240-81330-1#00007-5

is imaginary, since ISBN standard does not actually support this. DOI
does, and one might also construct national bibliography numbers (NBNs)
and consequently URNs which consist of ISBN and fragment identifier.
Thus DOI namespace (if one is registered in the future) and NBN
namespace should support <fragment>, if we are to give free hands to
people using these identifiers in the URN context.

> That would make at least some limited sense, if we could sort namespaces
> by whether they (maybe only occasionally) allow resolution, or whether
> they are absolutely and terminally never ever going to be used for
> resolution.

Based on what I have said before, I don't think that resolution is the
crucial factor here. And if I am wrong and it is, then any namespace may
allow resolution at some point in the future when the requirements of
the user community change.

But the last sentence from the paragraph you cite says:
>
>                    Fragment identifier semantics are independent of the
>    URI scheme and thus cannot be redefined by scheme specifications.
>
> This not only means that the URN spec (which is just the definition of
> the 'urn:' URI scheme) cannot redefine fragment identifier semantics, it
> also seems to imply that scheme specifications (including the URN spec)
> cannot delegate such semantics to some subspaces of the scheme.

Yes.
>
>> I'm not sure what the use cases are here, but perhaps folks on
>> the list could explain a bit more what they mean by reusing an
>> identifier scheme that designates objects of such complexity that it is
>> necessary to reference parts of the objects via fragment identifiers.

I can give one practical example from my own library.

Like many other national libraries, we digitise old books. The outcome
of the process is a METS container, within which the full text of the
book is stored in structured XML (METS/ALTO). The structure expresses
chapters, and some information objects such as images.

Each chapter has currently its own URN:NBN, so in addition to being able
to provide a persistent link to the title page of the book, such links
can also be made to the chapters and other component parts of the book.
We believe that some users will find such functionality useful (and they
will also be happy when the URNs will still be functional many years
from now, unlike many URIs that were thought to be cool).

If usage of <fragment> is allowed in RFC2141bis and within the NBN
namespace, we might change the current policy and assign just one
URN:NBN to the book itself, and then fragment identifiers based on the
NBN to the chapters and other component parts of the book. Our URN
resolver would be able to map these URN:NBNs to the correct component
parts within the METS container (or any other container standard we will
rely on in the future.

> I'm looking forward to hear from other people on this list, but
> essentially even if there are very complex objects, there are always
> different ways to identify components than using a '#'.

True - in our case, the national library of Finland can continue the
current policy and assign an NBN to each component part. Nevertheless,
it may be a good idea to allow choice between two different approaches.
In some cases, using <fragment> can be more convenient than assigning
individual identifiers. Research data sets come to mind; perhaps
somebody from that community can describe the requirements?

Best regards,

Juha
>
> Regards,   Martin.
>

--

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email [hidden email], tel +358 50 382 7678


Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Juha Hakala
In reply to this post by Julian Reschke
Hello,

Julian Reschke wrote:

> On 10.03.2011 13:28, Juha Hakala wrote:
>> ...
>> Persistent identifiers will be used for multiple purposes, and by the
>> time we assign e.g. a URN to a resource, we have no idea which
>> resolution services will be needed in the (distant) future. Lifetime of
>> a PID may be centuries; applications and the functionality they offer
>> will change many times during such a period. And eventually even the
>> copyright protection of a document will expire ;-).
>> ...
>
> I think that statement in itself rules out use of fragment identifiers.
> At least if you want to stay in sync with the URI spec (RFC 3986).

Can you explain why this would be the case? Please see below why I find
it difficult to agree.

>> Retrieving a representation is one the key resolution services supplied
>> already. But there does not need to be a 1:1 relation between a URN (or
>> any other persistent identifier) and the URI (URL/URLs) it maps to via a
>> resolution service.
>> ...
>
> Even if there *was* a one-to-one mapping, the representation could still
> vary based on request header fields (content negotiation), and also over
> time.

In the future, the applications preserving and delivering past digital
resources will usually be a long term preservation systems (such as Ex
Libris' Rosetta), hosted by national libraries / national archives or
other organisations which are legally obliged to store certain types
documents (publications, radio and tv programs, government publications)
for future generations.

Eventually, these systems will contain multiple versions of a resource,
produced via migrations of successive versions of resource. Each version
(or manifestation, as we call them) must be kept to make roll-back
possible, and will have its own identifier that will never change. When
a new version is made, it will get a new identifier, even if the new and
old document have the same look and feel.

If a certain version of a resource has an internal structure, and the
component parts have fragment level persistent identifiers, then those
identifiers will remain functional for this particular version of the
resource. Earlier and later versions may not have a similar structure,
but if so, they will not have similar identifier architecture.

 From the national library's point of view I do accept the view that
manifestations of works will change over time, but identifier -
manifestation -links will not, at least in well managed digital archives
and URN namespaces. A URN given to PDF version of Mr. Teppo Sarkamo's
dissertation (http://urn.fi/URN:ISBN:978-952-10-6832-4) will never
change. When a new version of the book is produced, it will get
different URN:ISBN.

One may of course argue that most systems in which URNs are to be used
will not be built in this manner and that therefore most identified
resources will change in more or less subtle manner over time. My take
on this is that different URN namespaces may / will have different
policies, and this may have an impact on many things, including the
usage of fragments. But there are namespaces where identifying fragments
may make sense, also when done using the URI <fragment> functionality.

Juha
>
> Best regards, Julian
>

--

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email [hidden email], tel +358 50 382 7678

Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Julian Reschke
On 10.03.2011 15:04, Juha Hakala wrote:

> Hello,
>
> Julian Reschke wrote:
>> On 10.03.2011 13:28, Juha Hakala wrote:
>>> ...
>>> Persistent identifiers will be used for multiple purposes, and by the
>>> time we assign e.g. a URN to a resource, we have no idea which
>>> resolution services will be needed in the (distant) future. Lifetime of
>>> a PID may be centuries; applications and the functionality they offer
>>> will change many times during such a period. And eventually even the
>>> copyright protection of a document will expire ;-).
>>> ...
>>
>> I think that statement in itself rules out use of fragment
>> identifiers. At least if you want to stay in sync with the URI spec
>> (RFC 3986).
>
> Can you explain why this would be the case? Please see below why I find
> it difficult to agree.
> ...

<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.3.5>:

"The semantics of a fragment identifier are defined by the set of
representations that might result from a retrieval action on the primary
resource. The fragment's format and resolution is therefore dependent on
the media type [RFC2046] of a potentially retrieved representation, even
though such a retrieval is only performed if the URI is dereferenced. If
no such representation exists, then the semantics of the fragment are
considered unknown and are effectively unconstrained. Fragment
identifier semantics are independent of the URI scheme and thus cannot
be redefined by scheme specifications."

I think this is pretty clear -- if you *can* have representations,
you're constrained by the media types that are used as representations.
There's no way avoiding that if you want to stay aligned with the URI spec.

Best regards, Julian

Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Peter Saint-Andre-2
On 3/10/11 7:12 AM, Julian Reschke wrote:

> On 10.03.2011 15:04, Juha Hakala wrote:
>> Hello,
>>
>> Julian Reschke wrote:
>>> On 10.03.2011 13:28, Juha Hakala wrote:
>>>> ...
>>>> Persistent identifiers will be used for multiple purposes, and by the
>>>> time we assign e.g. a URN to a resource, we have no idea which
>>>> resolution services will be needed in the (distant) future. Lifetime of
>>>> a PID may be centuries; applications and the functionality they offer
>>>> will change many times during such a period. And eventually even the
>>>> copyright protection of a document will expire ;-).
>>>> ...
>>>
>>> I think that statement in itself rules out use of fragment
>>> identifiers. At least if you want to stay in sync with the URI spec
>>> (RFC 3986).
>>
>> Can you explain why this would be the case? Please see below why I find
>> it difficult to agree.
>> ...
>
> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.3.5>:
>
> "The semantics of a fragment identifier are defined by the set of
> representations that might result from a retrieval action on the primary
> resource. The fragment's format and resolution is therefore dependent on
> the media type [RFC2046] of a potentially retrieved representation, even
> though such a retrieval is only performed if the URI is dereferenced. If
> no such representation exists, then the semantics of the fragment are
> considered unknown and are effectively unconstrained. Fragment
> identifier semantics are independent of the URI scheme and thus cannot
> be redefined by scheme specifications."
>
> I think this is pretty clear -- if you *can* have representations,
> you're constrained by the media types that are used as representations.
> There's no way avoiding that if you want to stay aligned with the URI spec.
Another way to put it is that you can have representations or free-form
semantics, but not both (because along with representations come the
constraints of media types, according to RFC 3986).

Peter

--
Peter Saint-Andre
https://stpeter.im/




smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Juha Hakala
Hello,

Peter Saint-Andre wrote:
> On 3/10/11 7:12 AM, Julian Reschke wrote:
>> On 10.03.2011 15:04, Juha Hakala wrote:

>>> ...
>> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.3.5>:
>>
>> "The semantics of a fragment identifier are defined by the set of
>> representations that might result from a retrieval action on the primary
>> resource. The fragment's format and resolution is therefore dependent on
>> the media type [RFC2046] of a potentially retrieved representation, even
>> though such a retrieval is only performed if the URI is dereferenced. If
>> no such representation exists, then the semantics of the fragment are
>> considered unknown and are effectively unconstrained. Fragment
>> identifier semantics are independent of the URI scheme and thus cannot
>> be redefined by scheme specifications."
>>
>> I think this is pretty clear -- if you *can* have representations,
>> you're constrained by the media types that are used as representations.
>> There's no way avoiding that if you want to stay aligned with the URI spec.
>
> Another way to put it is that you can have representations or free-form
> semantics, but not both (because along with representations come the
> constraints of media types, according to RFC 3986).

I see now where the problem lies. But it is necessary to consider the
additional complexity that URN resolution brings in.

Depending on the namespace, the relation between an identifier and a
representation can be complex. For instance, a manifestation of an
e-book with a single identifier can consist of multiple files (each
representing a chapter), each file having its own URL. Or, an identifier
may be assigned to a resource which is just a component part within a
larger structured resource (for instance, a metadata record within JPEG
2000 file).

What are the options RFC 3986 allows in such a case? It is OK if the URN
resolves to list of URLs (those of the files from which the resource
consists of). But it seems that adding <fragment> to the e-book NBN to
enable retrieval of individual chapters (files) is against the spirit of
RFC 3986. Resolving a URN with no <fragment> into a component part of a
resource such as embedded metadata within an XML file may or may not be
a philosophical problem (technical implementation is possible).

Anyway, it seems that there is a mismatch between the requirements of
the RFC 3986and the way in which some identifier systems are (will be)
used as URNs, because RFC 3986 does not take into account the
functionality embedded into URN resolution services.

After URN has been resolved to one or more URLs, there is no longer a
conflict with the stipulations of RFC 3986; these URLs must be aligned
with the data types of the files retrieved. Such alignment may also
exist between a URN and the thing it resolves to, but to require that
this should always be the case may put some counterproductive
constraints on the usage of standard identifiers as URNs.

Best regards,

Juha


>
> Peter
>

--

  Juha Hakala
  Senior advisor, standardisation and IT

  The National Library of Finland
  P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University
  Email [hidden email], tel +358 50 382 7678

Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Graham Klyne-4
In reply to this post by Peter Saint-Andre-2
Peter Saint-Andre wrote:

>> "The semantics of a fragment identifier are defined by the set of
>> representations that might result from a retrieval action on the primary
>> resource. The fragment's format and resolution is therefore dependent on
>> the media type [RFC2046] of a potentially retrieved representation, even
>> though such a retrieval is only performed if the URI is dereferenced. If
>> no such representation exists, then the semantics of the fragment are
>> considered unknown and are effectively unconstrained. Fragment
>> identifier semantics are independent of the URI scheme and thus cannot
>> be redefined by scheme specifications."
>>
>> I think this is pretty clear -- if you *can* have representations,
>> you're constrained by the media types that are used as representations.
>> There's no way avoiding that if you want to stay aligned with the URI spec.
>
> Another way to put it is that you can have representations or free-form
> semantics, but not both (because along with representations come the
> constraints of media types, according to RFC 3986).

I suppose it depends on what you mean by "free-form semantics".  For RDF, we
weasel-worded our way out of this by linking the semantics of the
URI-with-fragment to the RDF representation associated with the URI.

-- http://www.w3.org/TR/rdf-concepts/#section-fragID

So this is a form of semantics that is quite generic in its applicability, but
it may not be what you mean by "free form", since it is bound to the semantics
of RDF (http://www.w3.org/TR/rdf-mt/).

There's also some interesting W3C TAG discussion nearby...

http://www.w3.org/TR/webarch/#fragid
http://www.w3.org/2001/tag/doc/abstractComponentRefs
http://lists.w3.org/Archives/Public/www-tag/2010Nov/0000.html
... etc.

#g



Reply | Threaded
Open this post in threaded view
|

Re: [urn] fragment identifiers

Peter Saint-Andre-2
On 3/11/11 3:40 AM, Graham Klyne wrote:

> Peter Saint-Andre wrote:
>>> "The semantics of a fragment identifier are defined by the set of
>>> representations that might result from a retrieval action on the primary
>>> resource. The fragment's format and resolution is therefore dependent on
>>> the media type [RFC2046] of a potentially retrieved representation, even
>>> though such a retrieval is only performed if the URI is dereferenced. If
>>> no such representation exists, then the semantics of the fragment are
>>> considered unknown and are effectively unconstrained. Fragment
>>> identifier semantics are independent of the URI scheme and thus cannot
>>> be redefined by scheme specifications."
>>>
>>> I think this is pretty clear -- if you *can* have representations,
>>> you're constrained by the media types that are used as representations.
>>> There's no way avoiding that if you want to stay aligned with the URI
>>> spec.
>>
>> Another way to put it is that you can have representations or free-form
>> semantics, but not both (because along with representations come the
>> constraints of media types, according to RFC 3986).
>
> I suppose it depends on what you mean by "free-form semantics".  For
> RDF, we weasel-worded our way out of this by linking the semantics of
> the URI-with-fragment to the RDF representation associated with the URI.
>
> -- http://www.w3.org/TR/rdf-concepts/#section-fragID
>
> So this is a form of semantics that is quite generic in its
> applicability, but it may not be what you mean by "free form", since it
> is bound to the semantics of RDF (http://www.w3.org/TR/rdf-mt/).
Yes, that makes sense. And clearly the semantics of fragment identifiers
in other technologies (e.g., HTML) are rather loose.

So perhaps one question to ask of the folks working in the URN space is:
what are the semantics of your manifest files (or whatever else might be
returned upon resolution of your URNs)?

Peter

--
Peter Saint-Andre
https://stpeter.im/




smime.p7s (8K) Download Attachment