obsoleting 3986 -- what would it look like?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

obsoleting 3986 -- what would it look like?

masinter
Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.

Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.

*  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
* how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
  My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?
* Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
* I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
* I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.


<abstract>
  <t>Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
  <t>This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.
  </t>
  <t>This document is part of a set of documents intended to
  replace RFCs 2141, 3986, 3987 and 4345</t>
</abstract>




<section title="Introduction">

<t>
  The concept of a "Uniform Resource Locator" was introduced
  by the World Wide Web global information initiative, whose
  use of the concept dates from 1990, and was described in
  "Universal Resource Identifiers in WWW" <xref target="RFC1630"/>
</t>

<t>
  Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
<t>
  This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.</t>
<t>
  This specification and its companions "Comparison of URLs" <xref
  target="url-comparison"/> "Guidelines for Bidirectional URLs" <xref
  target="url-bidi-guidelines"/>, "Registration of URL schemes" <xref
  target="url-registration"/> obsolete <xref target="RFC3986"/>, <xref
  target="RFC3987"/>, <xref target="RFC4345"/>.
</t>

<section title="Uniform, Resource, Locate">
   
  <t>The original design of URLs and its various forms intended
   to accomplish many aspects. </t>
  <t><list style="hanging">

    <t hangText="Uniform Meaning">
      The intention is that the same URL means (identifies, names,
      locates) the same thing independent of context.</t>

   <t hangText="Resources unlimited">
     The notion of a resource was not limited in scope, with the idea
     that URLs could be used to locate, identify or name not only
     network accessible services, resources and documents, but also
     people, artifacts, abstractions.</t>
   
   <t hangText="Locate, Identify, Name">
     An identifier embodies the information required to distinguish
     what is being identified from all other things within its scope
     of identification.  A locator embodies the information required
     to find and access the thing being located. A name is a component
     of an identifier assigned and resolved by some authority or
     agent. This specification reverts to the most commonly used
     "Locator" designation. </t>
     <t>The role of URLs as locators, identifiers, and names have often
     been in conflict with the design goal of "Uniform Meaning". Some
     systems may use URLs (and, in particular, HTTP URLs) as identifiers
     for abstractions, this usage is not supported by this specification
     directly.</t>
     <t hangText="Internationalized">

     <t>URLs were originally defined to only consist of characters
     from a limited repertoire of characters, selected from the upper
     and lower case letters A-Z plus a limited set of punctuation
     characters, with the provision that other data (and the coding
     for other characters) could be included via an escape sequence.
     This use was extended in later specifications of
     Internationalized Resource Identifiers <xref target="RFC3897"/>
     to include characters from a much larger repertoire.
     </t>
     <t>This specification specifies parsing and
     processing of arbitrary strings of
     Unicode characters as input, with previous syntactic
     restrictions still required by older systems (URI, IRI)
     specified in appendices.</t>
   </list>
  </t>

Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

James M Snell


On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter <[hidden email]> wrote:
Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.

Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.


Seems to be a reasonable effort to undertake... you had me at combining 3986 and 3987. I'll happily help any way I can. 
 
*  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.

+1 ... I can't see any reason why the updated spec should delve into any of that.
 
* how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
  My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?

What http 2.0 will support is still up in the air but drawing a line in the sand with this new spec could help to drive that decision ultimately.
 
* Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.

Not convinced it would be necessary to include but could be wrong.
 
* I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.

Hmm.. not sure this really needs to be touched on at all really. Why not simply focus on the mechanics of the syntax, parsing, and error handling and avoid the semantics completely. 
 
* I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.

Happy to help where I can.

- James
 


<abstract>
  <t>Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
  <t>This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.
  </t>
  <t>This document is part of a set of documents intended to
  replace RFCs 2141, 3986, 3987 and 4345</t>
</abstract>




<section title="Introduction">

<t>
  The concept of a "Uniform Resource Locator" was introduced
  by the World Wide Web global information initiative, whose
  use of the concept dates from 1990, and was described in
  "Universal Resource Identifiers in WWW" <xref target="RFC1630"/>
</t>

<t>
  Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
<t>
  This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.</t>
<t>
  This specification and its companions "Comparison of URLs" <xref
  target="url-comparison"/> "Guidelines for Bidirectional URLs" <xref
  target="url-bidi-guidelines"/>, "Registration of URL schemes" <xref
  target="url-registration"/> obsolete <xref target="RFC3986"/>, <xref
  target="RFC3987"/>, <xref target="RFC4345"/>.
</t>

<section title="Uniform, Resource, Locate">

  <t>The original design of URLs and its various forms intended
   to accomplish many aspects. </t>
  <t><list style="hanging">

    <t hangText="Uniform Meaning">
      The intention is that the same URL means (identifies, names,
      locates) the same thing independent of context.</t>

   <t hangText="Resources unlimited">
     The notion of a resource was not limited in scope, with the idea
     that URLs could be used to locate, identify or name not only
     network accessible services, resources and documents, but also
     people, artifacts, abstractions.</t>

   <t hangText="Locate, Identify, Name">
     An identifier embodies the information required to distinguish
     what is being identified from all other things within its scope
     of identification.  A locator embodies the information required
     to find and access the thing being located. A name is a component
     of an identifier assigned and resolved by some authority or
     agent. This specification reverts to the most commonly used
     "Locator" designation. </t>
     <t>The role of URLs as locators, identifiers, and names have often
     been in conflict with the design goal of "Uniform Meaning". Some
     systems may use URLs (and, in particular, HTTP URLs) as identifiers
     for abstractions, this usage is not supported by this specification
     directly.</t>
     <t hangText="Internationalized">

     <t>URLs were originally defined to only consist of characters
     from a limited repertoire of characters, selected from the upper
     and lower case letters A-Z plus a limited set of punctuation
     characters, with the provision that other data (and the coding
     for other characters) could be included via an escape sequence.
     This use was extended in later specifications of
     Internationalized Resource Identifiers <xref target="RFC3897"/>
     to include characters from a much larger repertoire.
     </t>
     <t>This specification specifies parsing and
     processing of arbitrary strings of
     Unicode characters as input, with previous syntactic
     restrictions still required by older systems (URI, IRI)
     specified in appendices.</t>
   </list>
  </t>


Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

Erik Wilde-3
hello.

On Nov 2, 2012, at 8:27, James M Snell <[hidden email]> wrote:
On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter <[hidden email]> wrote:
Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.
Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.
Seems to be a reasonable effort to undertake... you had me at combining 3986 and 3987. I'll happily help any way I can. 

same here, and i think it makes a lot of sense to consolidate things as much as possible. some refactoring, plus some new parts such as the parsing rules. not sure i'd like to deprecate the URI name, which is what i have been using religiously, but in the end, we should pick the name that works best.

*  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
+1 ... I can't see any reason why the updated spec should delve into any of that.

yes, i agree to this one. any conventions should be left to define and describe for those who want to use them.

* Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
Not convinced it would be necessary to include but could be wrong.

isn't it really yet another scheme? a little different because it's a scheme for schemes, but it really is nothing but a scheme.

* I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
Hmm.. not sure this really needs to be touched on at all really. Why not simply focus on the mechanics of the syntax, parsing, and error handling and avoid the semantics completely. 

i think avoiding semantics would be the way to go, and the httpRange-14 debate might be one that's best deferred to those layers where people introduce and then need to solve those problems.

* I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.
Happy to help where I can.

same here. it'll be quite a bit of work, but it would be worth it.

cheers,

dret.
Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

masinter
In reply to this post by masinter
uri should remain as restricted syntax as compatible as possible with 3986 (same strings legal)  and iri too. along with leiri, for that matter. but they are depricated in that new protocols should expect the broader range of urls if possible.

Sent from mobile Larry
--


Erik Wilde <[hidden email]> wrote:

hello.

On Nov 2, 2012, at 8:27, James M Snell <[hidden email]> wrote:
On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter <[hidden email]> wrote:
Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.
Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.
Seems to be a reasonable effort to undertake... you had me at combining 3986 and 3987. I'll happily help any way I can. 

same here, and i think it makes a lot of sense to consolidate things as much as possible. some refactoring, plus some new parts such as the parsing rules. not sure i'd like to deprecate the URI name, which is what i have been using religiously, but in the end, we should pick the name that works best.

*  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
+1 ... I can't see any reason why the updated spec should delve into any of that.

yes, i agree to this one. any conventions should be left to define and describe for those who want to use them.

* Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
Not convinced it would be necessary to include but could be wrong.

isn't it really yet another scheme? a little different because it's a scheme for schemes, but it really is nothing but a scheme.

* I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
Hmm.. not sure this really needs to be touched on at all really. Why not simply focus on the mechanics of the syntax, parsing, and error handling and avoid the semantics completely. 

i think avoiding semantics would be the way to go, and the httpRange-14 debate might be one that's best deferred to those layers where people introduce and then need to solve those problems.

* I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.
Happy to help where I can.

same here. it'll be quite a bit of work, but it would be worth it.

cheers,

dret.
Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

Mark Davis ☕
In reply to this post by masinter
I really like this idea. It would be far easier to understand and deal with, both for developers and for users, than the current nomenclature. I've seen so many problems due to confusion over what is, at heart, the same concept just with different expressions.


— Il meglio è l’inimico del bene —



On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter <[hidden email]> wrote:
Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.

Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.

*  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
* how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
  My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?
* Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
* I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
* I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.


<abstract>
  <t>Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
  <t>This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.
  </t>
  <t>This document is part of a set of documents intended to
  replace RFCs 2141, 3986, 3987 and 4345</t>
</abstract>




<section title="Introduction">

<t>
  The concept of a "Uniform Resource Locator" was introduced
  by the World Wide Web global information initiative, whose
  use of the concept dates from 1990, and was described in
  "Universal Resource Identifiers in WWW" <xref target="RFC1630"/>
</t>

<t>
  Uniform Resource Locators (URL) are compact strings which form a
  namespace used as identifiers.  The URL namespace is federated:
  there are URL schemes, each with its own semantics and syntactic
  restrictions, and a registry of scheme names.  A relative URL is an
  abbreviated form which can be combined with a base URL to form a new
  URL (relative resolution).  Previously, the terms "Unform Resource
  Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
  used to designate syntactic restrictions of the URL space.
  </t>
<t>
  This specification brings together these defintions into a single
  specification and updates them to match current widespread usage,
  most notably within the World Wide Web global information and
  application system.</t>
<t>
  This specification and its companions "Comparison of URLs" <xref
  target="url-comparison"/> "Guidelines for Bidirectional URLs" <xref
  target="url-bidi-guidelines"/>, "Registration of URL schemes" <xref
  target="url-registration"/> obsolete <xref target="RFC3986"/>, <xref
  target="RFC3987"/>, <xref target="RFC4345"/>.
</t>

<section title="Uniform, Resource, Locate">

  <t>The original design of URLs and its various forms intended
   to accomplish many aspects. </t>
  <t><list style="hanging">

    <t hangText="Uniform Meaning">
      The intention is that the same URL means (identifies, names,
      locates) the same thing independent of context.</t>

   <t hangText="Resources unlimited">
     The notion of a resource was not limited in scope, with the idea
     that URLs could be used to locate, identify or name not only
     network accessible services, resources and documents, but also
     people, artifacts, abstractions.</t>

   <t hangText="Locate, Identify, Name">
     An identifier embodies the information required to distinguish
     what is being identified from all other things within its scope
     of identification.  A locator embodies the information required
     to find and access the thing being located. A name is a component
     of an identifier assigned and resolved by some authority or
     agent. This specification reverts to the most commonly used
     "Locator" designation. </t>
     <t>The role of URLs as locators, identifiers, and names have often
     been in conflict with the design goal of "Uniform Meaning". Some
     systems may use URLs (and, in particular, HTTP URLs) as identifiers
     for abstractions, this usage is not supported by this specification
     directly.</t>
     <t hangText="Internationalized">

     <t>URLs were originally defined to only consist of characters
     from a limited repertoire of characters, selected from the upper
     and lower case letters A-Z plus a limited set of punctuation
     characters, with the provision that other data (and the coding
     for other characters) could be included via an escape sequence.
     This use was extended in later specifications of
     Internationalized Resource Identifiers <xref target="RFC3897"/>
     to include characters from a much larger repertoire.
     </t>
     <t>This specification specifies parsing and
     processing of arbitrary strings of
     Unicode characters as input, with previous syntactic
     restrictions still required by older systems (URI, IRI)
     specified in appendices.</t>
   </list>
  </t>


Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

David Sheets-2
In reply to this post by masinter
Hi Larry,

On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter <[hidden email]> wrote:

> Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.
>
> Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.
>
> *  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
> * how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
>   My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?
> * Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
> * I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
> * I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.

I am very interested in the aggregation of URI/URN/URL/IRI grammars
and formalization of codepoint translation tables. Does IETF have an
XML vocabulary for expressing ABNF (RFC 5234?) grammars? I am
presently developing machinery for grammar analysis that will be used
to generate reference parsers, serializers, and test suites directly
from the specification(s).

Is there a central repository of RFC XML (RFC 2629) documents? Are you
drafting the neo-URI RFC in a revision control system somewhere?

> <abstract>
>   <t>Uniform Resource Locators (URL) are compact strings which form a
>   namespace used as identifiers.  The URL namespace is federated:
>   there are URL schemes, each with its own semantics and syntactic
>   restrictions, and a registry of scheme names.  A relative URL is an
>   abbreviated form which can be combined with a base URL to form a new
>   URL (relative resolution).  Previously, the terms "Unform Resource
>   Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>   used to designate syntactic restrictions of the URL space.
>   </t>
>   <t>This specification brings together these defintions into a single
>   specification and updates them to match current widespread usage,
>   most notably within the World Wide Web global information and
>   application system.
>   </t>
>   <t>This document is part of a set of documents intended to
>   replace RFCs 2141, 3986, 3987 and 4345</t>
> </abstract>

RFC 2141 is about well-known email endpoints for domains. How is this
related to the structure of identifiers?

RFC 4345 is about RC4 modes for SSH? How is this related? Or which
other RFC was meant?

>
> <section title="Introduction">
>
> <t>
>   The concept of a "Uniform Resource Locator" was introduced
>   by the World Wide Web global information initiative, whose
>   use of the concept dates from 1990, and was described in
>   "Universal Resource Identifiers in WWW" <xref target="RFC1630"/>
> </t>
>
> <t>
>   Uniform Resource Locators (URL) are compact strings which form a
>   namespace used as identifiers.  The URL namespace is federated:
>   there are URL schemes, each with its own semantics and syntactic
>   restrictions, and a registry of scheme names.  A relative URL is an
>   abbreviated form which can be combined with a base URL to form a new
>   URL (relative resolution).  Previously, the terms "Unform Resource
>   Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>   used to designate syntactic restrictions of the URL space.
>   </t>
> <t>
>   This specification brings together these defintions into a single
>   specification and updates them to match current widespread usage,
>   most notably within the World Wide Web global information and
>   application system.</t>
> <t>
>   This specification and its companions "Comparison of URLs" <xref
>   target="url-comparison"/> "Guidelines for Bidirectional URLs" <xref
>   target="url-bidi-guidelines"/>, "Registration of URL schemes" <xref
>   target="url-registration"/> obsolete <xref target="RFC3986"/>, <xref
>   target="RFC3987"/>, <xref target="RFC4345"/>.
> </t>
>
> <section title="Uniform, Resource, Locate">
>
>   <t>The original design of URLs and its various forms intended
>    to accomplish many aspects. </t>
>   <t><list style="hanging">
>
>     <t hangText="Uniform Meaning">
>       The intention is that the same URL means (identifies, names,
>       locates) the same thing independent of context.</t>
>
>    <t hangText="Resources unlimited">
>      The notion of a resource was not limited in scope, with the idea
>      that URLs could be used to locate, identify or name not only
>      network accessible services, resources and documents, but also
>      people, artifacts, abstractions.</t>
>
>    <t hangText="Locate, Identify, Name">
>      An identifier embodies the information required to distinguish
>      what is being identified from all other things within its scope
>      of identification.  A locator embodies the information required
>      to find and access the thing being located. A name is a component
>      of an identifier assigned and resolved by some authority or
>      agent. This specification reverts to the most commonly used
>      "Locator" designation. </t>
>      <t>The role of URLs as locators, identifiers, and names have often
>      been in conflict with the design goal of "Uniform Meaning". Some
>      systems may use URLs (and, in particular, HTTP URLs) as identifiers
>      for abstractions, this usage is not supported by this specification
>      directly.</t>
>      <t hangText="Internationalized">
>
>      <t>URLs were originally defined to only consist of characters
>      from a limited repertoire of characters, selected from the upper
>      and lower case letters A-Z plus a limited set of punctuation
>      characters, with the provision that other data (and the coding
>      for other characters) could be included via an escape sequence.
>      This use was extended in later specifications of
>      Internationalized Resource Identifiers <xref target="RFC3897"/>
>      to include characters from a much larger repertoire.
>      </t>
>      <t>This specification specifies parsing and
>      processing of arbitrary strings of
>      Unicode characters as input, with previous syntactic
>      restrictions still required by older systems (URI, IRI)
>      specified in appendices.</t>
>    </list>
>   </t>

Great! These strings are so critically important for the future health
of the internet; I would love to see their structures completely and
unambiguously defined.

I'll send more information about my ABNF work when I have it (or
you're welcome to snoop; it's open source). Let me know if there is
anything else I can do to help.

Best regards,

David Sheets


Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

Martin J. Dürst
In reply to this post by masinter
Hello Larry,

[cross-posting to [hidden email]]

On 2012/11/02 16:24, Larry Masinter wrote:
> Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.
>
> Doing so is pretty ambitious, of course,

Yes indeed. I'm wondering why you think that this will be successful if
a less ambitious project (updating the IRI spec and the URI/IRI scheme
registration spec) is having problems getting enough attention.


and likely to lead to all sorts of controversies, but I thought I'd give
it a try.
>
> *  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
> * how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
>    My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?

This is all well and good, but one advantage of URIs (and IRIs) is that
they don't allow characters that are used as delimiters. If we move to
URLs as defined by "what browsers grok in HTML", then in each protocol
that uses fixed delimiters, we have to say "URLs, but not containing xyz
delimiters".


Also, the differences between a valid IRI reference and a valid HTML URL
are very small, and can probably be removed all together. Here is what
the URL spec said before it was moved from the W3C
(http://dvcs.w3.org/hg/url/) to the WHATWG (http://url.spec.whatwg.org/):

 >>>>>>>>
A URL is a valid URL if at least one of the following conditions holds:

* The URL is a valid URI reference. [RFC3986]
* The URL is a valid IRI reference and it has no query component. [RFC3987]
* The URL is a valid IRI reference and its query component contains no
unescaped non-ASCII characters. [RFC3987]
* The URL is a valid IRI reference and the character encoding of the
URL's Document is UTF-8 or a UTF-16 encoding.
 >>>>>>>>

The conditions in the second, third, and fourth bullet are all related
to the encoding of the query part. If we can get a handle on these in
the IRI spec, then it may be possible to just collapse them. Then the
first bullet can be removed by the fact that every URI reference is an
IRI reference. (It still makes sense to call out the fact that every URI
reference is a valid URL, but it isn't necessary spec-wise.)


> * Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.

URNs are just a single URI/IRI/URL scheme, so they shouldn't need any
special treatment, but it is occasionally helpful to call out schemes as
examples.


> * I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.

That's indeed tempting. But there's the problem that some software uses
it that way. (Because I'm not on the TAG, httpRange-14 is less of a
problem for me that it is for you).


> * I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.

I'm willing to help, in particular with stuff relating to IRIs and
internationalization in general. But first we need a wide consensus that
this is the right way to go.


Regards,   Martin.


> <abstract>
>    <t>Uniform Resource Locators (URL) are compact strings which form a
>    namespace used as identifiers.  The URL namespace is federated:
>    there are URL schemes, each with its own semantics and syntactic
>    restrictions, and a registry of scheme names.  A relative URL is an
>    abbreviated form which can be combined with a base URL to form a new
>    URL (relative resolution).  Previously, the terms "Unform Resource
>    Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>    used to designate syntactic restrictions of the URL space.
>    </t>
>    <t>This specification brings together these defintions into a single
>    specification and updates them to match current widespread usage,
>    most notably within the World Wide Web global information and
>    application system.
>    </t>
>    <t>This document is part of a set of documents intended to
>    replace RFCs 2141, 3986, 3987 and 4345</t>
> </abstract>
>
>
>
>
> <section title="Introduction">
>
> <t>
>    The concept of a "Uniform Resource Locator" was introduced
>    by the World Wide Web global information initiative, whose
>    use of the concept dates from 1990, and was described in
>    "Universal Resource Identifiers in WWW"<xref target="RFC1630"/>
> </t>
>
> <t>
>    Uniform Resource Locators (URL) are compact strings which form a
>    namespace used as identifiers.  The URL namespace is federated:
>    there are URL schemes, each with its own semantics and syntactic
>    restrictions, and a registry of scheme names.  A relative URL is an
>    abbreviated form which can be combined with a base URL to form a new
>    URL (relative resolution).  Previously, the terms "Unform Resource
>    Identifier" (URI), "Internationalized Resource Identifier" (IRI) and
>    used to designate syntactic restrictions of the URL space.
>    </t>
> <t>
>    This specification brings together these defintions into a single
>    specification and updates them to match current widespread usage,
>    most notably within the World Wide Web global information and
>    application system.</t>
> <t>
>    This specification and its companions "Comparison of URLs"<xref
>    target="url-comparison"/>  "Guidelines for Bidirectional URLs"<xref
>    target="url-bidi-guidelines"/>, "Registration of URL schemes"<xref
>    target="url-registration"/>  obsolete<xref target="RFC3986"/>,<xref
>    target="RFC3987"/>,<xref target="RFC4345"/>.
> </t>
>
> <section title="Uniform, Resource, Locate">
>
>    <t>The original design of URLs and its various forms intended
>     to accomplish many aspects.</t>
>    <t><list style="hanging">
>
>      <t hangText="Uniform Meaning">
>        The intention is that the same URL means (identifies, names,
>        locates) the same thing independent of context.</t>
>
>     <t hangText="Resources unlimited">
>       The notion of a resource was not limited in scope, with the idea
>       that URLs could be used to locate, identify or name not only
>       network accessible services, resources and documents, but also
>       people, artifacts, abstractions.</t>
>
>     <t hangText="Locate, Identify, Name">
>       An identifier embodies the information required to distinguish
>       what is being identified from all other things within its scope
>       of identification.  A locator embodies the information required
>       to find and access the thing being located. A name is a component
>       of an identifier assigned and resolved by some authority or
>       agent. This specification reverts to the most commonly used
>       "Locator" designation.</t>
>       <t>The role of URLs as locators, identifiers, and names have often
>       been in conflict with the design goal of "Uniform Meaning". Some
>       systems may use URLs (and, in particular, HTTP URLs) as identifiers
>       for abstractions, this usage is not supported by this specification
>       directly.</t>
>       <t hangText="Internationalized">
>
>       <t>URLs were originally defined to only consist of characters
>       from a limited repertoire of characters, selected from the upper
>       and lower case letters A-Z plus a limited set of punctuation
>       characters, with the provision that other data (and the coding
>       for other characters) could be included via an escape sequence.
>       This use was extended in later specifications of
>       Internationalized Resource Identifiers<xref target="RFC3897"/>
>       to include characters from a much larger repertoire.
>       </t>
>       <t>This specification specifies parsing and
>       processing of arbitrary strings of
>       Unicode characters as input, with previous syntactic
>       restrictions still required by older systems (URI, IRI)
>       specified in appendices.</t>
>     </list>
>    </t>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

Julian Reschke
In reply to this post by David Sheets-2
On 2012-11-05 00:29, David Sheets wrote:

> Hi Larry,
>
> On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter <[hidden email]> wrote:
>> Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.
>>
>> Doing so is pretty ambitious, of course, and likely to lead to all sorts of controversies, but I thought I'd give it a try.
>>
>> *  how much of the introductory and explanatory material from 3896 and 3897 to retain. While it's philosophically and historically interesting, it's also a fertile ground for philosophical debates over whether http://larry.masinter.net#the_person could  identify, locate, or name me rather than a paragraph of my home page. So I'm tempted to leave all that behind.
>> * how much of the historical reasons for distinguishing between URIs and IRIs to leave. Again, it's interesting and useful material, but less so for practitioners who just want to know what a URL is and how to use it.
>>    My temptation at this point is to leave out most of the explanatory material, and just put appendixes for URI, IRI and LEIRI which explain them as prior syntactic restrictions which are still supported by older protocols (including HTTP 1.x). Will HTTP 2.0 support UTF-8 URLs?
>> * Include URNs? I'm tempted to include at least a pointer to URNbis, but I'm not sure which one.
>> * I'm having trouble resisting the temptation to put a stake into the httpRange-14 by removing any basis for support of using http URLs to "mean" abstractions or people. Right now I'm considering putting that in a "URLs and Semantic Web" appendix.
>> * I'll accept sincere offers of co-authorship as long as you're willing to accept the requirements that to obsolete 3986 we need to address current use cases that make reference to 3986, 3987, etc.
>
> I am very interested in the aggregation of URI/URN/URL/IRI grammars
> and formalization of codepoint translation tables. Does IETF have an
> XML vocabulary for expressing ABNF (RFC 5234?) grammars? I am
> presently developing machinery for grammar analysis that will be used
> to generate reference parsers, serializers, and test suites directly
> from the specification(s).

I hacked an XML export option into Bill Fenner's "BAP" (Bill's ABNF
parser). Sources at
<http://trac.tools.ietf.org/wg/httpbis/trac/browser/abnfparser/bap>.

> Is there a central repository of RFC XML (RFC 2629) documents? Are you

xml.resource.org has some. Are you looking for a specific one?

> ...

Best regards, Julian


Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

Martin J. Dürst
In reply to this post by David Sheets-2


On 2012/11/05 8:29, David Sheets wrote:
> Hi Larry,
>
> On Fri, Nov 2, 2012 at 12:24 AM, Larry Masinter<[hidden email]>  wrote:
>> Initially as a thought experiment, I've started to sketch out what it would look like to obsolete 3986 (URI) with a document that combined it with 3987 (IRI), reverts to the "URL" name, and gave updated parsing advice.

> I am very interested in the aggregation of URI/URN/URL/IRI grammars
> and formalization of codepoint translation tables. Does IETF have an
> XML vocabulary for expressing ABNF (RFC 5234?) grammars? I am
> presently developing machinery for grammar analysis that will be used
> to generate reference parsers, serializers, and test suites directly
> from the specification(s).
>
> Is there a central repository of RFC XML (RFC 2629) documents? Are you
> drafting the neo-URI RFC in a revision control system somewhere?

The IRI WG has a subversion repository viewable at
http://trac.tools.ietf.org/wg/iri/trac/browser/draft-ietf-iri-3987bis.

>>    <t>This document is part of a set of documents intended to
>>    replace RFCs 2141, 3986, 3987 and 4345</t>
>> </abstract>
>
> RFC 2141 is about well-known email endpoints for domains. How is this
> related to the structure of identifiers?

http://www.ietf.org/rfc/rfc2141.txt is "URN Syntax". (No well-known
email endpoints as far as I can see.)


> RFC 4345 is about RC4 modes for SSH? How is this related? Or which
> other RFC was meant?

My guess is that Larry meant RFC 4395
(http://www.ietf.org/rfc/rfc4395.txt), Guidelines and Registration
Procedures for New URI Schemes.

Regards,   Martin.

Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

David Sheets-2
In reply to this post by Julian Reschke
On Mon, Nov 5, 2012 at 9:57 AM, Julian Reschke <[hidden email]> wrote:
> I hacked an XML export option into Bill Fenner's "BAP" (Bill's ABNF parser).
> Sources at
> <http://trac.tools.ietf.org/wg/httpbis/trac/browser/abnfparser/bap>.

Great! I'll extract a schema and export the same format. I will report
here with deviations as I find them.

>> Is there a central repository of RFC XML (RFC 2629) documents?
>
> xml.resource.org has some. Are you looking for a specific one?

I didn't see any RFC XML docs (just tools) at that URL but perhaps I
missed something obvious. I am looking for XML representations of
2141, 3986, 3987, and 5234. Does IETF keep XML documents as canonical
representations? They appear to publish the official standards in
ASCII.

It would be super-cool if <http://tools.ietf.org/html/rfcXXXX> had a
link with "[txt|pdf]" for XML or a <link> in the HTML source for an
XML alternate. This seems dependent on IETF practices w.r.t. draft
conformance requirements. Is use of XML and xml2rfc mandatory? If so,
what needs to be done to expose XML representations for all RFCs?

Thanks,

David


Reply | Threaded
Open this post in threaded view
|

Re: obsoleting 3986 -- what would it look like?

Julian Reschke
On 2012-11-13 01:54, David Sheets wrote:

> On Mon, Nov 5, 2012 at 9:57 AM, Julian Reschke <[hidden email]> wrote:
>> I hacked an XML export option into Bill Fenner's "BAP" (Bill's ABNF parser).
>> Sources at
>> <http://trac.tools.ietf.org/wg/httpbis/trac/browser/abnfparser/bap>.
>
> Great! I'll extract a schema and export the same format. I will report
> here with deviations as I find them.
>
>>> Is there a central repository of RFC XML (RFC 2629) documents?
>>
>> xml.resource.org has some. Are you looking for a specific one?
>
> I didn't see any RFC XML docs (just tools) at that URL but perhaps I
> missed something obvious. I am looking for XML representations of

Well hidden, I guess. See <http://xml.resource.org/public/rfc/xml/>.

> 2141, 3986, 3987, and 5234. Does IETF keep XML documents as canonical
> representations? They appear to publish the official standards in
> ASCII.

Yep.

Sources for 3986 and 5234 are at <http://greenbytes.de/tech/webdav/>,
they use extensions so you'll have to look at
<http://greenbytes.de/tech/webdav/rfc2629xslt/rfc2629xslt.html#clean-for-dtd>.

> It would be super-cool if <http://tools.ietf.org/html/rfcXXXX> had a
> link with "[txt|pdf]" for XML or a <link> in the HTML source for an
> XML alternate. This seems dependent on IETF practices w.r.t. draft
> conformance requirements. Is use of XML and xml2rfc mandatory? If so,

No, it's not.

> what needs to be done to expose XML representations for all RFCs?

You can always ask the RFC Editor whether they have a copy of the XML;
they should for most RFCs of the last ~7 years.

Best regards, Julian