Restrictions on domain names for Top Level Domains (TLDs) for bidi document

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Restrictions on domain names for Top Level Domains (TLDs) for bidi document

masinter
By the way,

I've read over your notes about RTL-TLD, and I'm uncertain how this might be reflected
in the IRI document or even the BIDI document except perhaps as an informational reference
to the IDN specification.

While it's interesting, it doesn't seem to add or remove any restrictions on what a
"legal" IRI is, or how to process an IRI -- or am I missing something.

Larry

===========

>
> *Restrictions on domain names for Top Level Domains (TLDs)*
>
> *Definition:* Right-To-Left Top Level Domains (RTL-TLD). These are
> top-level domains that are in languages using right-to-left
> characters. Namely the Unicode bidi class of the characters that make
> up the TLD is either R or AL (see UAX 9).
>
> As an IRI must always be rendered left-to-right (see section 2) there
> exists a number of cases where an RTL-TLD will render in a way that is
> visually unclear what the TLD is in a particular URL. For example:
>
> Logical representation: http://abc.def.GHI/JKL Visual representation:
> http://abc.def.LKJ/IHG
>
> In the above case the path appears after the registered domain and is
> in the visual location of the TLD. This can confuse the reader as to
> which is the actual TLD. In order to restrict such confusing cases the
> following rules will apply:
>
> 1. An RTL-TLD is a TLD which is in a language where the characters
> draw right to left. An LTR-TLD is a TLD which is in a language where
> the characters draw left to right.
> 2. The characters in an RTL-TLD MUST always be of the same Unicode
> bidi class.
> 3. The characters of a registered domain MUST match the Unicode bidi
> class of the TLD if the TLD is an RTL-TLD.
> 4. if the characters of a registered domain contain more than one bidi
> class, the domain MUST be registered to an LTR-TLD.
>
> The restriction of MUST guarantees that the registered domain and its
> corresponding TLD will always appear together and in the same order in
> all possible IRIs. There may be cases where numbers and bidi neutral
> characters may be reordered by the Unicode bidi algorithm in a way
> that changes their visual position relative to the TLD. The above
> rules prevent such cases. If the domain registrar needs to register a
> name that contains characters that are mixed direction (e.g. contains
> numbers, punctuation or LTR characters) then the domain can still be
> registered with a TLD that has left to right characters.
>
> Examples:
>
> A. This is a good case - the TLD is visually followed by the domain:
>
> Logical representation: http://ABC.DEF.GHI/jkl Visual representation:
> http://IHG.FED.CBA/jkl
>
> B. With an LTR second level domain there is a sub-optimal case where
> the path appears next to the sub-domain. But in this case it is still
> clear where the TLD and registered domain are in the IRI:
>
> Logical representation: http://abc.DEF.GHI/JKL Visual representation:
> http://abc.LKJ/IHG.FED

Reply | Threaded
Open this post in threaded view
|

Re: Restrictions on domain names for Top Level Domains (TLDs) for bidi document

Martin J. Dürst
Hello Larry,

On 2012/02/21 15:20, Larry Masinter wrote:
> By the way,
>
> I've read over your notes about RTL-TLD, and I'm uncertain how this might be reflected
> in the IRI document or even the BIDI document except perhaps as an informational reference
> to the IDN specification.

I think the problem that Adil describes happens when IDNs get integrated
into IRIs. An adjacent path component can make a bidi IDN that was
reasonably understandable on its own look quite different and difficult
to parse.

I have thought about this case little by little, and my current thinking
is that it might lead to very similar restrictions like those we already
have on an individual component (e.g. DNS label) level, but one level
higher, e.g. for all of the domain name, all of the path, and so on.

Another aspect is that Adil looked at the domain name first and foremost
because it's the component most vulnerable with respect to spoofing.

> While it's interesting, it doesn't seem to add or remove any restrictions on what a
> "legal" IRI is, or how to process an IRI -- or am I missing something.

To some extent, that may be a wording issue. Even if we don't want to
make such cases invalid, a strong warning may be in order.

Regards,    Martin.

> Larry
>
> ===========
>
>>
>> *Restrictions on domain names for Top Level Domains (TLDs)*
>>
>> *Definition:* Right-To-Left Top Level Domains (RTL-TLD). These are
>> top-level domains that are in languages using right-to-left
>> characters. Namely the Unicode bidi class of the characters that make
>> up the TLD is either R or AL (see UAX 9).
>>
>> As an IRI must always be rendered left-to-right (see section 2) there
>> exists a number of cases where an RTL-TLD will render in a way that is
>> visually unclear what the TLD is in a particular URL. For example:
>>
>> Logical representation: http://abc.def.GHI/JKL Visual representation:
>> http://abc.def.LKJ/IHG
>>
>> In the above case the path appears after the registered domain and is
>> in the visual location of the TLD. This can confuse the reader as to
>> which is the actual TLD. In order to restrict such confusing cases the
>> following rules will apply:
>>
>> 1. An RTL-TLD is a TLD which is in a language where the characters
>> draw right to left. An LTR-TLD is a TLD which is in a language where
>> the characters draw left to right.
>> 2. The characters in an RTL-TLD MUST always be of the same Unicode
>> bidi class.
>> 3. The characters of a registered domain MUST match the Unicode bidi
>> class of the TLD if the TLD is an RTL-TLD.
>> 4. if the characters of a registered domain contain more than one bidi
>> class, the domain MUST be registered to an LTR-TLD.
>>
>> The restriction of MUST guarantees that the registered domain and its
>> corresponding TLD will always appear together and in the same order in
>> all possible IRIs. There may be cases where numbers and bidi neutral
>> characters may be reordered by the Unicode bidi algorithm in a way
>> that changes their visual position relative to the TLD. The above
>> rules prevent such cases. If the domain registrar needs to register a
>> name that contains characters that are mixed direction (e.g. contains
>> numbers, punctuation or LTR characters) then the domain can still be
>> registered with a TLD that has left to right characters.
>>
>> Examples:
>>
>> A. This is a good case - the TLD is visually followed by the domain:
>>
>> Logical representation: http://ABC.DEF.GHI/jkl Visual representation:
>> http://IHG.FED.CBA/jkl
>>
>> B. With an LTR second level domain there is a sub-optimal case where
>> the path appears next to the sub-domain. But in this case it is still
>> clear where the TLD and registered domain are in the IRI:
>>
>> Logical representation: http://abc.DEF.GHI/JKL Visual representation:
>> http://abc.LKJ/IHG.FED
>