Future of Internationalized Identifiers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Future of Internationalized Identifiers

Martin J. Dürst
Like others, I'd like to express my personal view of the future of
internationalized identifiers.

At the start of internationalization, it was very clear that content had
to come first. Fortunately, today, it's easy to write a Web page or an
email in almost any languages one wishes.

Identifiers were a next area of concern. In some contexts, e.g. file
names, users now take them for granted. On any OS, it would be a big
hassle if a user had to cook up a romanized or translated name for a
document just because the OS was ASCII-only. I can say that from my own
experience; working in Japan, I get lots of job-related documents with
non-ASCII names, and create some myself. This lets me feel the need for
internationalized identifiers every day.

In the IETF, there has also been a lot of work. Internationalized Domain
Names have come a long way since I put out the first proposal in 1996.
Email Address Internationalization (EAI) went through an experimental
phase and is now very close to completing Proposed Standards. Other
technologies such as XMPP use non-ASCII identifiers, too. With
stringprep and precis, the IETF has also done a lot of work in the area
of equivalence of internationalized identifiers.

For URIs and IDNs, internationalization is available via ACE
(ASCII-compatible encoding). This addresses low-level
backwards-compatibility issues (e.g. HTTP 1.1). But the user obviously
wants to see räksmörgås, and is annoyed by xn--rksmrgs-5wao1o or
r%C3%A4ksm%C3%B6rg%C3%A5s. EAI and XMPP take this a step further, they
just use UTF-8. For EAI, this is amazing because for the longest time,
there was only the mantra "email will always stay 7-bit".

So things move, even if not at a very fast pace. And my prediction is
that they will continue to move. Users prefer to see what they can read.
Implementers prefer UTF-8 rather than a charset zoo. If a new protocol
or format (in the IETF or elsewhere) is UTF-8 only, there is not much of
a point to transfer URIs or IDNs in ACE form. But neither are these just
presentation elements, or just something that needs pre-processing.


Based on this background:

* I support closing the IETF IRI WG. Most of the work on IRIs (from ca.
1995 to 2009) was done without a WG. A WG is not a precondition for work
to get done, and not a way to make work magically faster.

* Some time after RFC 3987 was published, I started to update it
(http://tools.ietf.org/html/draft-duerst-iri-bis-00) to take into
account errata and feedback from implementers. I plan to continue this work.

* I will continue to support Anne in his work on the WHATWG URL spec. In
particular, documenting browser bugwards-compatibility and getting the
browsers to align their implementations in this area is very important,
and very hard.

* There are other implementations than browsers and other technologies
than HTML and its surroundings. Browsers have very peculiar market
pressures on bugwards compatibility that fortunately don't apply in the
same way to other implementations. Also, other implementations are
processing URIs/IRIs/URLs in other ways than browsers. I plan to work to
make sure these needs are covered, too, in whatever form that may take.

* I hope that we can find a good way to proceed with RFC 4395bis
(registration), and am willing to contribute. There is a lot of good
stuff in there registration-wise and internationalization-wise. Of the
four WG specs, it is the one with the most open issues, but probably the
one which can be moved forward most quickly.


Regards,   Martin.

Reply | Threaded
Open this post in threaded view
|

Re: Future of Internationalized Identifiers

Peter Saint-Andre-2
Hi Martin,

On 11/12/12 4:14 AM, "Martin J. Dürst" wrote:

> Like others, I'd like to express my personal view of the future of
> internationalized identifiers.
>
> At the start of internationalization, it was very clear that content had
> to come first. Fortunately, today, it's easy to write a Web page or an
> email in almost any languages one wishes.
>
> Identifiers were a next area of concern. In some contexts, e.g. file
> names, users now take them for granted. On any OS, it would be a big
> hassle if a user had to cook up a romanized or translated name for a
> document just because the OS was ASCII-only. I can say that from my own
> experience; working in Japan, I get lots of job-related documents with
> non-ASCII names, and create some myself. This lets me feel the need for
> internationalized identifiers every day.
>
> In the IETF, there has also been a lot of work. Internationalized Domain
> Names have come a long way since I put out the first proposal in 1996.
> Email Address Internationalization (EAI) went through an experimental
> phase and is now very close to completing Proposed Standards. Other
> technologies such as XMPP use non-ASCII identifiers, too. With
> stringprep and precis, the IETF has also done a lot of work in the area
> of equivalence of internationalized identifiers.
>
> For URIs and IDNs, internationalization is available via ACE
> (ASCII-compatible encoding). This addresses low-level
> backwards-compatibility issues (e.g. HTTP 1.1). But the user obviously
> wants to see räksmörgås, and is annoyed by xn--rksmrgs-5wao1o or
> r%C3%A4ksm%C3%B6rg%C3%A5s. EAI and XMPP take this a step further, they
> just use UTF-8. For EAI, this is amazing because for the longest time,
> there was only the mantra "email will always stay 7-bit".
>
> So things move, even if not at a very fast pace. And my prediction is
> that they will continue to move. Users prefer to see what they can read.
> Implementers prefer UTF-8 rather than a charset zoo. If a new protocol
> or format (in the IETF or elsewhere) is UTF-8 only, there is not much of
> a point to transfer URIs or IDNs in ACE form. But neither are these just
> presentation elements, or just something that needs pre-processing.

That all seems reasonable.

> Based on this background:
>
> * I support closing the IETF IRI WG.

Sadly, I concur with you and Larry here.

> Most of the work on IRIs (from ca.
> 1995 to 2009) was done without a WG. A WG is not a precondition for work
> to get done, and not a way to make work magically faster.

True. As you know, I've put a lot of work into reaching for a successful
conclusion to the IRI WG (not as much as you and Larry, for sure), and
I'm disappointed that we were not able to to involve more participants
and contributors. As SM notes in his reply to Larry, that seems to be
the way of the world with regard to internationalization (and even
something as fundamental as URIs).

> * Some time after RFC 3987 was published, I started to update it
> (http://tools.ietf.org/html/draft-duerst-iri-bis-00) to take into
> account errata and feedback from implementers. I plan to continue this
> work.

I'm happy to hear it.

> * I will continue to support Anne in his work on the WHATWG URL spec. In
> particular, documenting browser bugwards-compatibility and getting the
> browsers to align their implementations in this area is very important,
> and very hard.

Indeed.

> * There are other implementations than browsers and other technologies
> than HTML and its surroundings. Browsers have very peculiar market
> pressures on bugwards compatibility that fortunately don't apply in the
> same way to other implementations. Also, other implementations are
> processing URIs/IRIs/URLs in other ways than browsers. I plan to work to
> make sure these needs are covered, too, in whatever form that may take.

Great.

> * I hope that we can find a good way to proceed with RFC 4395bis
> (registration), and am willing to contribute. There is a lot of good
> stuff in there registration-wise and internationalization-wise. Of the
> four WG specs, it is the one with the most open issues, but probably the
> one which can be moved forward most quickly.

Agreed. One possibility is spinning up a small WG in the IETF
Applications Area dedicated to simplifying and modernizing registration
requirements for a variety of technologies (URIs/IRIs, link relations,
etc.). I think it would be good to work on these updates in a
semi-coordinated fashion.

Peter

--
Peter Saint-Andre
https://stpeter.im/