FWD: I-D Action: draft-klensin-iri-sri-00.txt

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

FWD: I-D Action: draft-klensin-iri-sri-00.txt

John C Klensin
Hi.

As many of you know or have deduced from earlier notes, I've
wondered what we could do to make a localization-friendly i18n
structure for a URI overlap if we dropped the strict
compatibility requirement (e.g., that every valid URI must also
be a valid IRI).  If IRIs are protocol elements, that strict
compatibility should no longer be a obvious requirement.

SM and I have hacked together a draft in the hope of starting an
exploration of that question.   It is not complete, probably
contains errors, and we made some design decisions fairly
arbitrarily.  The intent should, however, be relatively clear
and sufficient to start a discussion about tradeoffs and
alternatives.

best,
    john


---------- Forwarded Message ----------
Date: Monday, July 09, 2012 01:27 -0700
From: [hidden email]
To: [hidden email]
Subject: I-D Action: draft-klensin-iri-sri-00.txt


A New Internet-Draft is available from the on-line
Internet-Drafts directories.


        Title           : An XML-based Simple Resource Identifier
        Author(s)       : John C Klensin
                          Subramanian Moonesamy
        Filename        : draft-klensin-iri-sri-00.txt
        Pages           : 10
        Date            : 2012-07-09

Abstract:
   While the URI specification has been widely deployed, it has
long    been recognized that many valid URIs, especially those
that contain    extensive information in the "tail" are
unsuitable for user    presentation, especially for
internationalized environments.  IRIs    have been proposed as a
solution for that problem but inherit (and    are constrained
by) the complex and sometimes method-dependent syntax    model
of URIs as well as positional and ordering assumptions that
make them more difficult to localize than is desirable.

   This specification illustrates a way to define an "above URI"
model    for a localization-friendly simple reference identifier
(SRI) that    explicitly identifies fields and is more
appropriate than IRIs to    support localization.  The current
version is intended simply to    initiate a discussion.  In
particular, while it is written to use an    XML element syntax
model, variations using JSON or some other system    with
explicitly-labeled data fields might be as, or more,
appropriate.


The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-klensin-iri-sri

There's also a htmlized version available at:
http://tools.ietf.org/html/draft-klensin-iri-sri-00


Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

_______________________________________________
I-D-Announce mailing list
[hidden email]
https://www.ietf.org/mailman/listinfo/i-d-announce
Internet-Draft directories: http://www.ietf.org/shadow.html
or ftp://ftp.ietf.org/ietf/1shadow-sites.txt

---------- End Forwarded Message ----------





Reply | Threaded
Open this post in threaded view
|

RE: I-D Action: draft-klensin-iri-sri-00.txt

Dave Thaler-2
Thanks for writing this, it should generate a useful discussion.
I have some editorial nits I'll leave to separate email, but here's some technical
comments.

1) Whereas the generic URI syntax defines an authority component, my
understanding is that it's part of the scheme-specific hier-part so that a new
scheme can define its "authority" component as something other than
      authority   = [ userinfo "@" ] host [ ":" port ]
If so, there should be some way of expressing an authority component
that is something other than a "host" in the classic sense.   RFC 4395 section 2.2
requires only that new schemes MUST adhere to Section 4.3 of RFC 3986 which is
      absolute-URI  = scheme ":" hier-part [ "?" query ]
So you might want a hier-part element with authority and path being
subsidiary elements.

2) Regarding the "port" element.   One issue for both URIs and this (and IRIs) is that
RFC 6335 recommends use of service names rather than static port numbers
for new protocols.   So I'd expect new schemes to want to be capable of using
them and not just port numbers.   RFC 3986 allows a new scheme to define its own
hier-part, but we should really have a common way to express service names.  
Perhaps by:
      authority   = [ userinfo "@" ] host [ ":" servicename ]
RFC 6335 section 5.1 already requires service names to have a non-digit so there
cannot be ambiguity.

3) For the query element, if you're defining an XML syntax, it might be a good
idea to define some typical way of expressing an unordered list of name/value pairs,
since that's very common.

4) Would be good to add an example showing what happens with "%", since I
couldn't follow the text in section 4.

5) Would be good to add an example showing how <> are encoded.  For example,
if a user types in "</query>" into some app, which then wants to put the input
string into the query component, what would the resulting XML look like?

-Dave


> -----Original Message-----
> From: John C Klensin [mailto:[hidden email]]
> Sent: Monday, July 9, 2012 1:43 AM
> To: [hidden email]
> Subject: FWD: I-D Action: draft-klensin-iri-sri-00.txt
>
> Hi.
>
> As many of you know or have deduced from earlier notes, I've wondered
> what we could do to make a localization-friendly i18n structure for a URI
> overlap if we dropped the strict compatibility requirement (e.g., that every
> valid URI must also be a valid IRI).  If IRIs are protocol elements, that strict
> compatibility should no longer be a obvious requirement.
>
> SM and I have hacked together a draft in the hope of starting an
> exploration of that question.   It is not complete, probably
> contains errors, and we made some design decisions fairly arbitrarily.  The
> intent should, however, be relatively clear and sufficient to start a discussion
> about tradeoffs and alternatives.
>
> best,
>     john
>
>
> ---------- Forwarded Message ----------
> Date: Monday, July 09, 2012 01:27 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: I-D Action: draft-klensin-iri-sri-00.txt
>
>
> A New Internet-Draft is available from the on-line Internet-Drafts directories.
>
>
> Title           : An XML-based Simple Resource Identifier
> Author(s)       : John C Klensin
>                           Subramanian Moonesamy
> Filename        : draft-klensin-iri-sri-00.txt
> Pages           : 10
> Date            : 2012-07-09
>
> Abstract:
>    While the URI specification has been widely deployed, it has
> long    been recognized that many valid URIs, especially those
> that contain    extensive information in the "tail" are
> unsuitable for user    presentation, especially for
> internationalized environments.  IRIs    have been proposed as a
> solution for that problem but inherit (and    are constrained
> by) the complex and sometimes method-dependent syntax    model
> of URIs as well as positional and ordering assumptions that make them more
> difficult to localize than is desirable.
>
>    This specification illustrates a way to define an "above URI"
> model    for a localization-friendly simple reference identifier
> (SRI) that    explicitly identifies fields and is more
> appropriate than IRIs to    support localization.  The current
> version is intended simply to    initiate a discussion.  In
> particular, while it is written to use an    XML element syntax
> model, variations using JSON or some other system    with
> explicitly-labeled data fields might be as, or more, appropriate.
>
>
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-klensin-iri-sri
>
> There's also a htmlized version available at:
> http://tools.ietf.org/html/draft-klensin-iri-sri-00
>
>
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
>
> _______________________________________________
> I-D-Announce mailing list
> [hidden email]
> https://www.ietf.org/mailman/listinfo/i-d-announce
> Internet-Draft directories: http://www.ietf.org/shadow.html or
> ftp://ftp.ietf.org/ietf/1shadow-sites.txt
>
> ---------- End Forwarded Message ----------
>
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

RE: I-D Action: draft-klensin-iri-sri-00.txt

Dave Thaler-2
In reply to this post by John C Klensin
Also on the real substance of the SRI vs IRI discussion...

Browsers today have an address bar.   That address bar today can often contain IRIs,
which are more readable than URIs.   The SRI isn't amenable to that UI paradigm which
was based on the URI-is-an-IRI principle.   In this proposal to get rid of that principle,
it would be worth discussing UI paradigm principles, since the XML representation
can't be easily rendered in such a box.   That means without IRIs, it seems one would either
need a radically different paradigm, or one would need to always display the less-readable
URI form.   I don't think always rendering the URI form would be any better for usability,
so to me this proposal is tightly tied to the discussion of alternate rendering paradigms.

-Dave

> -----Original Message-----
> From: Dave Thaler
> Sent: Monday, July 9, 2012 12:56 PM
> To: 'John C Klensin'; [hidden email]
> Subject: RE: I-D Action: draft-klensin-iri-sri-00.txt
>
> Thanks for writing this, it should generate a useful discussion.
> I have some editorial nits I'll leave to separate email, but here's some
> technical comments.
>
> 1) Whereas the generic URI syntax defines an authority component, my
> understanding is that it's part of the scheme-specific hier-part so that a new
> scheme can define its "authority" component as something other than
>       authority   = [ userinfo "@" ] host [ ":" port ]
> If so, there should be some way of expressing an authority component
> that is something other than a "host" in the classic sense.   RFC 4395 section
> 2.2
> requires only that new schemes MUST adhere to Section 4.3 of RFC 3986
> which is
>       absolute-URI  = scheme ":" hier-part [ "?" query ] So you might want a
> hier-part element with authority and path being subsidiary elements.
>
> 2) Regarding the "port" element.   One issue for both URIs and this (and IRIs)
> is that
> RFC 6335 recommends use of service names rather than static port numbers
> for new protocols.   So I'd expect new schemes to want to be capable of
> using
> them and not just port numbers.   RFC 3986 allows a new scheme to define
> its own
> hier-part, but we should really have a common way to express service
> names.
> Perhaps by:
>       authority   = [ userinfo "@" ] host [ ":" servicename ]
> RFC 6335 section 5.1 already requires service names to have a non-digit so
> there cannot be ambiguity.
>
> 3) For the query element, if you're defining an XML syntax, it might be a good
> idea to define some typical way of expressing an unordered list of
> name/value pairs, since that's very common.
>
> 4) Would be good to add an example showing what happens with "%", since I
> couldn't follow the text in section 4.
>
> 5) Would be good to add an example showing how <> are encoded.  For
> example, if a user types in "</query>" into some app, which then wants to
> put the input string into the query component, what would the resulting XML
> look like?
>
> -Dave
>
>
> > -----Original Message-----
> > From: John C Klensin [mailto:[hidden email]]
> > Sent: Monday, July 9, 2012 1:43 AM
> > To: [hidden email]
> > Subject: FWD: I-D Action: draft-klensin-iri-sri-00.txt
> >
> > Hi.
> >
> > As many of you know or have deduced from earlier notes, I've wondered
> > what we could do to make a localization-friendly i18n structure for a
> > URI overlap if we dropped the strict compatibility requirement (e.g.,
> > that every valid URI must also be a valid IRI).  If IRIs are protocol
> > elements, that strict compatibility should no longer be a obvious
> requirement.
> >
> > SM and I have hacked together a draft in the hope of starting an
> > exploration of that question.   It is not complete, probably
> > contains errors, and we made some design decisions fairly arbitrarily.
> > The intent should, however, be relatively clear and sufficient to
> > start a discussion about tradeoffs and alternatives.
> >
> > best,
> >     john
> >
> >
> > ---------- Forwarded Message ----------
> > Date: Monday, July 09, 2012 01:27 -0700
> > From: [hidden email]
> > To: [hidden email]
> > Subject: I-D Action: draft-klensin-iri-sri-00.txt
> >
> >
> > A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
> >
> >
> > Title           : An XML-based Simple Resource Identifier
> > Author(s)       : John C Klensin
> >                           Subramanian Moonesamy
> > Filename        : draft-klensin-iri-sri-00.txt
> > Pages           : 10
> > Date            : 2012-07-09
> >
> > Abstract:
> >    While the URI specification has been widely deployed, it has
> > long    been recognized that many valid URIs, especially those
> > that contain    extensive information in the "tail" are
> > unsuitable for user    presentation, especially for
> > internationalized environments.  IRIs    have been proposed as a
> > solution for that problem but inherit (and    are constrained
> > by) the complex and sometimes method-dependent syntax    model
> > of URIs as well as positional and ordering assumptions that make them
> > more difficult to localize than is desirable.
> >
> >    This specification illustrates a way to define an "above URI"
> > model    for a localization-friendly simple reference identifier
> > (SRI) that    explicitly identifies fields and is more
> > appropriate than IRIs to    support localization.  The current
> > version is intended simply to    initiate a discussion.  In
> > particular, while it is written to use an    XML element syntax
> > model, variations using JSON or some other system    with
> > explicitly-labeled data fields might be as, or more, appropriate.
> >
> >
> > The IETF datatracker status page for this draft is:
> > https://datatracker.ietf.org/doc/draft-klensin-iri-sri
> >
> > There's also a htmlized version available at:
> > http://tools.ietf.org/html/draft-klensin-iri-sri-00
> >
> >
> > Internet-Drafts are also available by anonymous FTP at:
> > ftp://ftp.ietf.org/internet-drafts/
> >
> > _______________________________________________
> > I-D-Announce mailing list
> > [hidden email]
> > https://www.ietf.org/mailman/listinfo/i-d-announce
> > Internet-Draft directories: http://www.ietf.org/shadow.html or
> > ftp://ftp.ietf.org/ietf/1shadow-sites.txt
> >
> > ---------- End Forwarded Message ----------
> >
> >
> >
> >
> >



Reply | Threaded
Open this post in threaded view
|

Browser bars, etc. (was: RE: I-D Action: draft-klensin-iri-sri-00.txt)

John C Klensin


--On Monday, July 09, 2012 20:03 +0000 Dave Thaler
<[hidden email]> wrote:

> Also on the real substance of the SRI vs IRI discussion...

Indeed, this is part of the key question.   I think the
underlying question is even more complicated than the way you
present it, so I'm going to pull your message apart a bit.  

> Browsers today have an address bar.

And one of the things the UI community has known for many years
is that users don't make useful distinctions about what they see
or type there.  In particular, only a very small fraction of the
user community understands the differences among domain names
(aka "web addresses"), URIs of various sorts, and search terms.
Some modern browsers have essentially given up on those
distinctions entirely, mixing up partial URLs with
autocompletion and searching outside the browser (whether that
search is the history often used in autocompletion, some sort of
web search, or a "friend" search of some flavor.

>   That address bar today
> can often contain IRIs, which are more readable than URIs.

I'll agree with "more readable" but note that there seems to be
an industry trend to be toward more and more complex URIs which
are usefully accessible only from favorites/bookmarks, imported
references, and search mechanisms of some flavor.  Discussions
in the IETF may lead to reinforcing that trend (e.g., proposals
to incorporate hashes instead of user names and of signed URIs).
If one assumes a URL that goes on for several lines, with most
of the relevant tail components present, possibly embedding
another URL, multiple query components, and so on, then
readability is an unreasonable expectation whether the abstract
text is ASCII or not.  Even if it is not, the marginal
percentage readability improvement from IRIs is likely to be
miniscule.

Even the structure of the address bar doesn't help much.  If the
number of characters in an IRI or URI considerably exceeds the
width of the address bar, readability and usability suffer: I
have yet to see a browser that handles over-long URI strings in
the address bar (such as by scrolling) really well.   IRIs might
help by virtue of being shorter, but the window in which that is
of large benefit to readability is not huge.

So, part of my response (albeit a small one) to concerns about
the address bar is that, unless something like ICANN's new TLD
program changes the trends toward long and complex URLs/URIs,
the whole concept of the address bar as a display of the current
"location" is going to need rethinking in favor of things that
make better sense to users given limited space and attention.
Relative to that requirement, I think URIs, IRIs, and SRIs are
going to fail more or less together.

> The SRI isn't amenable to that UI paradigm which was based on
> the URI-is-an-IRI principle.

If one preserves the [every] URI-is-a-[valid-]IRI principle, a
number of convenient things happen.  Some inconvenient things
happen too, a few of which SM and I tried to identify in the
document.  But one of the reasons this I-D or something very
like it wasn't written a half-dozen years ago was that there
seemed to be general agreement that URI-is-an-IRI made sense.
In that context, IRIs were no worse for localization than IDNs:
perhaps, if one had a good "above DNS" layer that would do the
kinds of language-based matching that end users had been
encouraged to expect, IDNs would be almost useless because that
layer could translate directly to strings with which the user
would not need to deal, but, especially with the dual
relationship between A-labels and U-labels introduced in
IDNA2008, they would not be harmful.  

But, in that regard, IRIs ran into something of the same problem
that IDNA2003 ran into: if one could not reverse the ISI-> URI
transformation and reliably get back the same IRI, then there
are uncomfortable side effects (including on the user's ability
to interpret the address bar regardless of how readable it is).
There were other issues as well, and the WG (or at least an
active fraction of it) concluded that IRIs should be independent
protocol elements, similar to URIs but independent of them and
perhaps most suitable for new protocols.  

Depending on how far that goes --and I really don't have a clear
picture of how far it is going-- URI-is-an-IRI works for RFC
3897 IRIs but may not apply for draft-ietf-iri-3897bis IRIs.  


>  In this proposal to get rid of
> that principle, it would be worth discussing UI paradigm
> principles, since the XML representation  can't be easily
> rendered in such a box.

Yes, but note that draft-ietf-iri-3897bis proposes to get rid of
that principle too.  Not as dramatically of course -- under
draft-ietf-iri-3897bis, many URIs will still be IRIs -- but,
unless the WG reverses itself, the discussion is not about
whether to get rid of the principle or not but about what to
replace the discarded principle with.

>   That means without IRIs, it seems
> one would either need a radically different paradigm, or one
> would need to always display the less-readable URI form.   I
> don't think always rendering the URI form would be any better
> for usability, so to me this proposal is tightly tied to the
> discussion of alternate rendering paradigms.

Agreed.  But, again, that is "without IRIs that preserve the
URIs-are-always-IRIs principle" and "with many longer and more
URIs even without i18n", so I think that we are likely to need a
new rendering paradigm soon anyway.  The bad news is that it is
not clear how the IETF talks about rendering paradigms for
"address bars" given that I don't think we have anything formal
that even mentions address bars or their ilk.

   john




Reply | Threaded
Open this post in threaded view
|

SRI technical issues (was: RE: I-D Action: draft-klensin-iri-sri-00.txt)

John C Klensin
In reply to this post by Dave Thaler-2
--On Monday, July 09, 2012 19:55 +0000 Dave Thaler
<[hidden email]> wrote:

> Thanks for writing this, it should generate a useful
> discussion. I have some editorial nits I'll leave to separate
> email, but here's some technical comments.

Thanks.

General observation: While I find these suggestions very helpful
(and have responded below), they are important only if the WG
wants to pursue the idea or, in some cases, if details are
needed as part of the proof of concept.    Advice from
participants in the WG as to whether they would like to see a
new draft before the cutoff and/or early in IETF 84 (since IRI
isn't scheduled to meet, if at all, until late Thursday
afternoon) would be appreciated.

> 1) Whereas the generic URI syntax defines an authority
> component, my  understanding is that it's part of the
> scheme-specific hier-part so that a new  scheme can define its
> "authority" component as something other than       authority
> = [ userinfo "@" ] host [ ":" port ] If so, there should be
> some way of expressing an authority component that is
> something other than a "host" in the classic sense.   RFC 4395
> section 2.2 requires only that new schemes MUST adhere to
> Section 4.3 of RFC 3986 which is       absolute-URI  = scheme
> ":" hier-part [ "?" query ] So you might want a hier-part
> element with authority and path being subsidiary elements.

Yes, almost certainly.  In our relative haste to get the draft
out, we didn't have time to do a really careful review of RFC
3986 for cases like this.   If the WG wishes to pursue the
approach, that will clearly have to be done.  In the interim,
I've noted this in the working copy.

> 2) Regarding the "port" element.   One issue for both URIs and
> this (and IRIs) is that  RFC 6335 recommends use of service
> names rather than static port numbers  for new protocols.   So
> I'd expect new schemes to want to be capable of using  them
> and not just port numbers.   RFC 3986 allows a new scheme to
> define its own  hier-part, but we should really have a common
> way to express service names.   Perhaps by:
>       authority   = [ userinfo "@" ] host [ ":" servicename ]
> RFC 6335 section 5.1 already requires service names to have a
> non-digit so there cannot be ambiguity.

Ack.  Noted in working draft.  Another question is whether
things like servicenames/ports should be expressed as separate
elements or as parameters to existing ones, e.g., whether, for
the domain case, the above would be better expressed as

   <host><domain>xxxx</domain></host>
<servicename>service-or-port</servicename>

or as, e.g.,
  <host servicename="service-or-port">
<domain>xxx</domain></host>

That should be left to WG preference and especially people who
have a better-developed sense of XML aesthetics than I do but,
if we are going to go forward with this general idea, it should
be addressed.

> 3) For the query element, if you're defining an XML syntax, it
> might be a good  idea to define some typical way of expressing
> an unordered list of name/value pairs, since that's very
> common.

Yes.  Or providing a canonical ordering, however artificial.
Making comparison each is A Good Thing.   Advice from XML
experts would be appreciated on this.  In the interim, I've put
a placeholder note in the working draft.

> 4) Would be good to add an example showing what happens with
> "%", since I couldn't follow the text in section 4.

I've have another look at the text, but the short answer is that
nothing happens to "%".  It is just a normal character.  My hope
is that the only escape arrangements would be those required by
XML and that we can get rid of most of those.   If we need a
character escape (and I hope we don't), it should follow the
\u(NNNN) convention of RFC 5137 rather than encoding octets of
UTF-8.
 
> 5) Would be good to add an example showing how <> are encoded.
> For example, if a user types in "</query>" into some app,
> which then wants to put the input string into the query
> component, what would the resulting XML look like?

Yes.  And that is the case where I'm not sure how to avoid
"&lt;", but requiring the use of CDATA might be the better plan.
Advice from XML experts would be appreciated here.

best,
   john





Reply | Threaded
Open this post in threaded view
|

Re: Browser bars, etc.

Peter Saint-Andre-2
In reply to this post by John C Klensin
<hat type='individual'/>

On 7/10/12 1:16 AM, John C Klensin wrote:
>
> --On Monday, July 09, 2012 20:03 +0000 Dave Thaler
> <[hidden email]> wrote:

<snip/>

>>   That address bar today
>> can often contain IRIs, which are more readable than URIs.
>
> I'll agree with "more readable" but note that there seems to be
> an industry trend to be toward more and more complex URIs which
> are usefully accessible only from favorites/bookmarks, imported
> references, and search mechanisms of some flavor.  Discussions
> in the IETF may lead to reinforcing that trend (e.g., proposals
> to incorporate hashes instead of user names and of signed URIs).
> If one assumes a URL that goes on for several lines, with most
> of the relevant tail components present, possibly embedding
> another URL, multiple query components, and so on, then
> readability is an unreasonable expectation whether the abstract
> text is ASCII or not.  Even if it is not, the marginal
> percentage readability improvement from IRIs is likely to be
> miniscule.

It seems to me that a browser does not need to present the "raw" URI in
the address bar, and ought to display hex-encoded characters there in a
user-friendly manner. So I don't think that IRI vs. URI is all that
relevant for the address bar, whereas it's more relevant for activities
like authoring HTML documents.

Peter

--
Peter Saint-Andre
https://stpeter.im/





Reply | Threaded
Open this post in threaded view
|

Re: Browser bars, etc.

Julian Reschke
On 2012-07-10 20:02, Peter Saint-Andre wrote:
> ...
> It seems to me that a browser does not need to present the "raw" URI in
> the address bar, and ought to display hex-encoded characters there in a

s/characters/octets/

This is part of the problem :-)

> user-friendly manner. So I don't think that IRI vs. URI is all that
> relevant for the address bar, whereas it's more relevant for activities
> like authoring HTML documents.

People paste from the address bar into href attributes. So whatever
works in the browser *will* end up in HTML documents.

Best regards, Julian

Reply | Threaded
Open this post in threaded view
|

Re: Browser bars, etc.

Peter Saint-Andre-2
On 7/10/12 12:09 PM, Julian Reschke wrote:
> On 2012-07-10 20:02, Peter Saint-Andre wrote:
>> ...
>> It seems to me that a browser does not need to present the "raw" URI in
>> the address bar, and ought to display hex-encoded characters there in a
>
> s/characters/octets/
>
> This is part of the problem :-)

As is my doing three things at once!

>> user-friendly manner. So I don't think that IRI vs. URI is all that
>> relevant for the address bar, whereas it's more relevant for activities
>> like authoring HTML documents.
>
> People paste from the address bar into href attributes. So whatever
> works in the browser *will* end up in HTML documents.

You have a point...

/psa



Reply | Threaded
Open this post in threaded view
|

Re: Browser bars, etc.

John C Klensin
In reply to this post by Julian Reschke


--On Tuesday, July 10, 2012 20:09 +0200 Julian Reschke
<[hidden email]> wrote:

> On 2012-07-10 20:02, Peter Saint-Andre wrote:
>> ...
>> It seems to me that a browser does not need to present the
>> "raw" URI in the address bar, and ought to display
>> hex-encoded characters there in a
>
> s/characters/octets/
>
> This is part of the problem :-)

Yes, especially with UTF-8.  And that is why I've been arguing
for some years that the move from ISO 8859-1-friendly %hex
encoding of octets to the basically user-hostile hex encoding of
UTF-8 octets was a mistake that should not be propagated into
something that is explicitly concerned with i18n.  Note that,
with %hex octet encoding of UTF-8, the typical user can't even
tell where one character ends and another begins.  \u(NNNN) and
its variations isn't really user-friendly either, but at least
it is about characters and can be mapped to a character (Unicode
code point) by a single-stage lookup that doesn't require
special programming.

>> user-friendly manner. So I don't think that IRI vs. URI is
>> all that relevant for the address bar, whereas it's more
>> relevant for activities like authoring HTML documents.
>
> People paste from the address bar into href attributes. So
> whatever works in the browser *will* end up in HTML documents.

Yes.  But, to the extent that is true and _all_ HTML
implementations don't support IRIs in href arguments, Dave's
assertion about IRI display in the address bar that started this
thread leads to a failure condition.  

Again, I think the conclusion is that we need to do some careful
rethinking here in a process that doesn't necessarily favor
either of the present IRI or SIR ideas.

best,
   john




Reply | Threaded
Open this post in threaded view
|

Re: I-D Action: draft-klensin-iri-sri-00.txt (three good reasons why not)

Martin J. Dürst
In reply to this post by Dave Thaler-2
On 2012/07/10 5:03, Dave Thaler wrote:

> Also on the real substance of the SRI vs IRI discussion...
>
> Browsers today have an address bar.   That address bar today can often contain IRIs,
> which are more readable than URIs.   The SRI isn't amenable to that UI paradigm which
> was based on the URI-is-an-IRI principle.   In this proposal to get rid of that principle,
> it would be worth discussing UI paradigm principles, since the XML representation
> can't be easily rendered in such a box.   That means without IRIs, it seems one would either
> need a radically different paradigm, or one would need to always display the less-readable
> URI form.   I don't think always rendering the URI form would be any better for usability,
> so to me this proposal is tightly tied to the discussion of alternate rendering paradigms.

Yes, this is a very good point: SRIs are useless for presentation (in
the browser bar, or as part of a textual reference, or in many other
places).


SRIs are also unsuited for protocols/formats, because they would blow up
and complicate syntax and processing needlessly. How would a format such
as HTML or SVG use SRIs? Would anybody creating a new protocol or format
choose lengthy and convoluted SRIs over URIs, even if they got told that
SRIs were needed because of internationalization? Also, if SRIs are in
XML, they fit well with XML, but how would I put it into a JSON document
or an ASN1 document or any other format? Would we need a JSON version of
SRIs, and an ASN1 version, and so on?

As an example, think about how to use an SRI in SVG. It's all XML, so
it's relatively straightforward, but it's hopelessly lengthy. To just
give an example, something that's now as easy as
    <use xlink:href="#étoile" />
(étoile is star in French; this works in SVG as of today!) would change
to something like:
    <use>
      <href>
        <sri>
          <fragment>étoile</fragment>
        </sri>
      </href>
    <use>
(Minor details: I have taken the liberty to remove all namespace
prefixes, and to remove the <uri> element (I don't understand why <sri>
needs to be inside <uri>), and to not use any of <scheme>,
<authority>,..., as the DTD would require because this is only a
relative URI/IRI).


The third technical problem is that that SRIs don't solve the comparison
problem. In another mail, John Klensin wrote:

 > I note that draft-ietf-iri-comparison seems intimately tied to
 > (1).  The intent behind (2) includes standardizing information
 > sufficiently that a simple XML structured comparison (i.e.,
 > ignoring irrelevant white space) should suffice without
 > identifier- or scheme-specific comparison rules.

It is essentially impossible to avoid scheme-specific comparison rules.
The reason is that URIs/IRIs can include identifier components from
arbitrary third-party identifier systems.

For an example, let's look at the tel: scheme
(http://tools.ietf.org/html/rfc3966) for telephone numbers.
tel:+1-201-555-0123 is equivalent to tel:+12015550123 (and
tel:+12-0155-501-23 and a few other variants, just to show how this
works). There are some other equivalence rules for tel: URIs, please see
http://tools.ietf.org/html/rfc3966#section-4. Every time such an example
is discovered, this may be addressed by creating the necessary
element(s) in SRIs, but this is a task that will never end.


So my current conclusion is that SRIs are not suited for presentation,
are ill-suited for inclusion into protocols/formats, and don't
fundamentally help with processing problems such as comparison.

Regards,    Martin.

Reply | Threaded
Open this post in threaded view
|

Re: Browser bars, etc.

Martin J. Dürst
In reply to this post by John C Klensin
On 2012/07/10 16:16, John C Klensin wrote:

> --On Monday, July 09, 2012 20:03 +0000 Dave Thaler
> <[hidden email]>  wrote:

>> Browsers today have an address bar.

>>    That address bar today
>> can often contain IRIs, which are more readable than URIs.
>
> I'll agree with "more readable" but note that there seems to be
> an industry trend to be toward more and more complex URIs which
> are usefully accessible only from favorites/bookmarks, imported
> references, and search mechanisms of some flavor.  Discussions
> in the IETF may lead to reinforcing that trend (e.g., proposals
> to incorporate hashes instead of user names and of signed URIs).
> If one assumes a URL that goes on for several lines, with most
> of the relevant tail components present, possibly embedding
> another URL, multiple query components, and so on, then
> readability is an unreasonable expectation whether the abstract
> text is ASCII or not.  Even if it is not, the marginal
> percentage readability improvement from IRIs is likely to be
> miniscule.

There is a trend to longer URIs/IRIs. But that doesn't mean that short
URIs are no longer used. And while the readability improvements may be
marginal in some cases, they are absolutely crucial in others. Say a
student writes a report and uses some Web pages as references. The
readability difference between
    http://ru.wikipedia.org/wiki/Интернет
and
 
http://ru.wikipedia.org/wiki/%D0%98%D0%BD%D1%82%D0%B5%D1%80%D0%BD%D0%B5%D1%82
for anybody who can read Russian or is otherwise familiar with the
Cyrillic script, is just huge.


> Even the structure of the address bar doesn't help much.  If the
> number of characters in an IRI or URI considerably exceeds the
> width of the address bar, readability and usability suffer: I
> have yet to see a browser that handles over-long URI strings in
> the address bar (such as by scrolling) really well.   IRIs might
> help by virtue of being shorter, but the window in which that is
> of large benefit to readability is not huge.

So IRIs help because they show the characters rather than %-encoding,
AND because they make things shorter. Good point.


>> The SRI isn't amenable to that UI paradigm which was based on
>> the URI-is-an-IRI principle.
>
> If one preserves the [every] URI-is-a-[valid-]IRI principle, a
> number of convenient things happen.  Some inconvenient things
> happen too, a few of which SM and I tried to identify in the
> document.  But one of the reasons this I-D or something very
> like it wasn't written a half-dozen years ago was that there
> seemed to be general agreement that URI-is-an-IRI made sense.
> In that context, IRIs were no worse for localization than IDNs:
> perhaps, if one had a good "above DNS" layer that would do the
> kinds of language-based matching that end users had been
> encouraged to expect, IDNs would be almost useless because that
> layer could translate directly to strings with which the user
> would not need to deal, but, especially with the dual
> relationship between A-labels and U-labels introduced in
> IDNA2008, they would not be harmful.
>
> But, in that regard, IRIs ran into something of the same problem
> that IDNA2003 ran into: if one could not reverse the ISI->  URI
> transformation and reliably get back the same IRI, then there
> are uncomfortable side effects (including on the user's ability
> to interpret the address bar regardless of how readable it is).
> There were other issues as well, and the WG (or at least an
> active fraction of it) concluded that IRIs should be independent
> protocol elements, similar to URIs but independent of them and
> perhaps most suitable for new protocols.

You seem to be confused. There is absolutely no change from "every URI
is an IRI". At no point the WG concluded that IRIs should be totally
independent of URIs. Of course IRIs, like IDNs, are easier to introduce
into new protocols. For existing protocols, whether introducing IRIs
where up to now only URIs have been in use makes sense depends a lot on
existing implementations,.... Again, that's the same as for IDNs.


> Depending on how far that goes --and I really don't have a clear
> picture of how far it is going-- URI-is-an-IRI works for RFC
> 3897 IRIs but may not apply for draft-ietf-iri-3897bis IRIs.

It applies. If there's a place in the draft where it doesn't, please
tell us and we'll make sure we'll fix it.


>>   In this proposal to get rid of
>> that principle, it would be worth discussing UI paradigm
>> principles, since the XML representation  can't be easily
>> rendered in such a box.
>
> Yes, but note that draft-ietf-iri-3897bis proposes to get rid of
> that principle too.  Not as dramatically of course -- under
> draft-ietf-iri-3897bis, many URIs will still be IRIs

All URIs are still IRIs. If you know about an URI that isn't an IRI
anymore, please just put it in your next mail, and we'll have a look at
it and fix it.


-- but,
> unless the WG reverses itself, the discussion is not about
> whether to get rid of the principle or not but about what to
> replace the discarded principle with.

The WG doesn't have to reverse itself.

>>    That means without IRIs, it seems
>> one would either need a radically different paradigm, or one
>> would need to always display the less-readable URI form.   I
>> don't think always rendering the URI form would be any better
>> for usability, so to me this proposal is tightly tied to the
>> discussion of alternate rendering paradigms.
>
> Agreed.  But, again, that is "without IRIs that preserve the
> URIs-are-always-IRIs principle"

Which we don't have.

> and "with many longer and more
> URIs even without i18n",

Please note that the URI/IRI ecosystem is quite well organized in that
URIs/IRIs that need to be short get made short (the extreme example is
URI/IRI shorteners so that they fit into Twitter messages), and
URIs/IRIs that benefit from using words rather than just arbitrary
character combinations use them. On the other hand, where it's less
important, URIs (and IRIs) may become overly long and cryptic. In many
ways, the principles of "natural" selection apply. If you don't make an
URI/IRI short where that helps, then the URI/IRI (actually, the resource
behind it) just isn't used.


Regards,    Martin.