Bidi Doc

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Bidi Doc

Shawn Steele
It's been a while since I've taken a look at this document.

IMO the embedding display forcing LTR behavior still doesn't match the feedback I've received from many Arabic speakers, though it does seem to fit the expectations of other BIDI speakers.  It would appear that some users would be best served by RLE type behavior instead.

Unfortunately this appears to be a user preference and appears to be influenced by their life experience, not necessarily tied directly to the content language.  Users with strong math or CS backgrounds seem more likely to find the LTR behavior acceptable.  If someone's going to read an IRI to someone over the phone, it needs to be in the order they'd read/type it.

FWIW:  Outside of skilled users, the structure of the actual IRI is opaque.  Eg: to a Phd educated user in a different field, www.foo.com means "foo company's spot on the web", somehow reversing the order of the domain.

-Shawn

Reply | Threaded
Open this post in threaded view
|

RE: Bidi Doc

masinter
Shawn, I'm not sure what part of the Bidi IRI spec would be affected by your comments.... could you be more specific about which section it refers to?

Thanks,

Larry


-----Original Message-----
From: Shawn Steele [mailto:[hidden email]]
Sent: Monday, February 27, 2012 8:39 AM
To: Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: Bidi Doc

It's been a while since I've taken a look at this document.

IMO the embedding display forcing LTR behavior still doesn't match the feedback I've received from many Arabic speakers, though it does seem to fit the expectations of other BIDI speakers.  It would appear that some users would be best served by RLE type behavior instead.

Unfortunately this appears to be a user preference and appears to be influenced by their life experience, not necessarily tied directly to the content language.  Users with strong math or CS backgrounds seem more likely to find the LTR behavior acceptable.  If someone's going to read an IRI to someone over the phone, it needs to be in the order they'd read/type it.

FWIW:  Outside of skilled users, the structure of the actual IRI is opaque.  Eg: to a Phd educated user in a different field, www.foo.com means "foo company's spot on the web", somehow reversing the order of the domain.

-Shawn

Reply | Threaded
Open this post in threaded view
|

RE: Bidi Doc

Shawn Steele
The whole thing?  Starting with 2:

   "Bidirectional IRIs MUST be rendered by using the Unicode
   Bidirectional Algorithm [UNIV6], [UNI9].  Bidirectional IRIs MUST be
   rendered in the same way as they would be if they were in a left-to-
   right embedding; i.e., as if they were preceded by U+202A, LEFT-TO-
   RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL
   FORMATTING (PDF).  Setting the embedding direction can also be done
   in a higher-level protocol (e.g., the dir='ltr' attribute in HTML)."

That forbids display such as FED.CBA//:http, or more to the point FED.CBA, which many bidi speakers find more intuitive.

Like I said, our investigation has indicated that this is a user preference.  People with a strong math, CS or other backgrounds are happy with the more computer-like display.  Laymen (for lack of a better word), and some professionals, seem much happier with the RTL label ordering.  There appears to also be a cultural bias, not just math/cs, but it's not perfect.

I think the best case is the "tell me what site you went to" scenario.  For the logical name ABC.DEF, someone on the phone is going to say "I went to A B C <dot> D E F".  And that's what the user is going to type.  If an RTL speaker is transcribing that logical order "ABC.DEF" off the side of the bus, they're going to be reading and writing it naturally from RTL, "FED.CBA" (visual order).  If they were to read the visual "CBA.FED" from the side of a bus, they'd naturally say "D E F <dot> A B C", which is wrong for logical order, and won't work when they guy on the other end of the phone tries to type it in.  The might be able to be trained to read it in a funny way, but I don't think that's at all natural for many speakers.

To this point, how IRI's say http://buy.stuff.com/get/your/stuff/here.html?user=23456&account=abcd is far less important to most people than the display ads which say "pepsi.com".

Us computer scientists will be able to read the IRI no matter how it's displayed, we'll know what the spec says, or close enough anyway.  It’s the user trying to go to "pepsi.com" that is the most important case.  How we handle the rest of the IRI should be based on that.  We shouldn't force some behavior on the domain name because we have trouble figuring out what to do with query strings.

-Shawn

-----Original Message-----
From: Larry Masinter [mailto:[hidden email]]
Sent: Friday, March 02, 2012 12:32 AM
To: Shawn Steele; Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: RE: Bidi Doc

Shawn, I'm not sure what part of the Bidi IRI spec would be affected by your comments.... could you be more specific about which section it refers to?

Thanks,

Larry


-----Original Message-----
From: Shawn Steele [mailto:[hidden email]]
Sent: Monday, February 27, 2012 8:39 AM
To: Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: Bidi Doc

It's been a while since I've taken a look at this document.

IMO the embedding display forcing LTR behavior still doesn't match the feedback I've received from many Arabic speakers, though it does seem to fit the expectations of other BIDI speakers.  It would appear that some users would be best served by RLE type behavior instead.

Unfortunately this appears to be a user preference and appears to be influenced by their life experience, not necessarily tied directly to the content language.  Users with strong math or CS backgrounds seem more likely to find the LTR behavior acceptable.  If someone's going to read an IRI to someone over the phone, it needs to be in the order they'd read/type it.

FWIW:  Outside of skilled users, the structure of the actual IRI is opaque.  Eg: to a Phd educated user in a different field, www.foo.com means "foo company's spot on the web", somehow reversing the order of the domain.

-Shawn

Reply | Threaded
Open this post in threaded view
|

RE: Bidi Doc

masinter
I'm not sure it is a "user" preference as much as it is a situational one. And I think we need to really back away from the idea that IRIs can meet a requirement of "intuitive presentation".

The problem is over-constrained. Between URI and IRI there is a difference in how they meet the conflicting requirements of "global transcription" vs. "local ease of use", where IRIs reflect a design choice to allow more reasonable local names.

However, we still need to maintain at least some level of transcription interoperability.... that it should at least be possible to construct IRIs that, when displayed by ordinary Unicode display methods, the display presented to a user, and then entered by keyboard or speech or some other method by that user, that the result will be the "same" IRI.

You are questioning a  "MUST", though, in "Bidirectional IRIs MUST be rendered by using the ...".  

> That forbids display such as FED.CBA//:http, or more to the point FED.CBA, which many bidi speakers find more intuitive.

I'm not sure "FED.CBA" is a "rendered" though.

Would it help if we said that the -bidi- document should be thought of as "best practices" rather than standards track?  Usually when you write a "MUST" it's clear what implementations might be affected. I don't know who would be non-compliant or how to test "Bidirectional IRIs MUST be rendered ...".

Larry


-----Original Message-----
From: Shawn Steele [mailto:[hidden email]]
Sent: Friday, March 02, 2012 9:25 AM
To: Larry Masinter; Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: RE: Bidi Doc

The whole thing?  Starting with 2:

   "Bidirectional IRIs MUST be rendered by using the Unicode
   Bidirectional Algorithm [UNIV6], [UNI9].  Bidirectional IRIs MUST be
   rendered in the same way as they would be if they were in a left-to-
   right embedding; i.e., as if they were preceded by U+202A, LEFT-TO-
   RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL
   FORMATTING (PDF).  Setting the embedding direction can also be done
   in a higher-level protocol (e.g., the dir='ltr' attribute in HTML)."

That forbids display such as FED.CBA//:http, or more to the point FED.CBA, which many bidi speakers find more intuitive.

Like I said, our investigation has indicated that this is a user preference.  People with a strong math, CS or other backgrounds are happy with the more computer-like display.  Laymen (for lack of a better word), and some professionals, seem much happier with the RTL label ordering.  There appears to also be a cultural bias, not just math/cs, but it's not perfect.

I think the best case is the "tell me what site you went to" scenario.  For the logical name ABC.DEF, someone on the phone is going to say "I went to A B C <dot> D E F".  And that's what the user is going to type.  If an RTL speaker is transcribing that logical order "ABC.DEF" off the side of the bus, they're going to be reading and writing it naturally from RTL, "FED.CBA" (visual order).  If they were to read the visual "CBA.FED" from the side of a bus, they'd naturally say "D E F <dot> A B C", which is wrong for logical order, and won't work when they guy on the other end of the phone tries to type it in.  The might be able to be trained to read it in a funny way, but I don't think that's at all natural for many speakers.

To this point, how IRI's say http://buy.stuff.com/get/your/stuff/here.html?user=23456&account=abcd is far less important to most people than the display ads which say "pepsi.com".

Us computer scientists will be able to read the IRI no matter how it's displayed, we'll know what the spec says, or close enough anyway.  It’s the user trying to go to "pepsi.com" that is the most important case.  How we handle the rest of the IRI should be based on that.  We shouldn't force some behavior on the domain name because we have trouble figuring out what to do with query strings.

-Shawn

-----Original Message-----
From: Larry Masinter [mailto:[hidden email]]
Sent: Friday, March 02, 2012 12:32 AM
To: Shawn Steele; Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: RE: Bidi Doc

Shawn, I'm not sure what part of the Bidi IRI spec would be affected by your comments.... could you be more specific about which section it refers to?

Thanks,

Larry


-----Original Message-----
From: Shawn Steele [mailto:[hidden email]]
Sent: Monday, February 27, 2012 8:39 AM
To: Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: Bidi Doc

It's been a while since I've taken a look at this document.

IMO the embedding display forcing LTR behavior still doesn't match the feedback I've received from many Arabic speakers, though it does seem to fit the expectations of other BIDI speakers.  It would appear that some users would be best served by RLE type behavior instead.

Unfortunately this appears to be a user preference and appears to be influenced by their life experience, not necessarily tied directly to the content language.  Users with strong math or CS backgrounds seem more likely to find the LTR behavior acceptable.  If someone's going to read an IRI to someone over the phone, it needs to be in the order they'd read/type it.

FWIW:  Outside of skilled users, the structure of the actual IRI is opaque.  Eg: to a Phd educated user in a different field, www.foo.com means "foo company's spot on the web", somehow reversing the order of the domain.

-Shawn

Reply | Threaded
Open this post in threaded view
|

RE: Bidi Doc

Shawn Steele
Quick comment: global transcription cannot happen.  I don't know how to write (or type) Arabic or Chinese or...

However someone who can read it does understand rtl behavior.  I shouldn't force my ltr bias on someone who uses that language.

There's also no real support issue here... So long as all the pieces are ordered consistently, its easy to recognize which label is first or last. 


Sent from my Windows Phone 7

From: Larry Masinter
Sent: 3/2/2012 11:31 AM
To: Shawn Steele; Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: RE: Bidi Doc

I'm not sure it is a "user" preference as much as it is a situational one. And I think we need to really back away from the idea that IRIs can meet a requirement of "intuitive presentation".

The problem is over-constrained. Between URI and IRI there is a difference in how they meet the conflicting requirements of "global transcription" vs. "local ease of use", where IRIs reflect a design choice to allow more reasonable local names.

However, we still need to maintain at least some level of transcription interoperability.... that it should at least be possible to construct IRIs that, when displayed by ordinary Unicode display methods, the display presented to a user, and then entered by keyboard or speech or some other method by that user, that the result will be the "same" IRI.

You are questioning a  "MUST", though, in "Bidirectional IRIs MUST be rendered by using the ...". 

> That forbids display such as FED.CBA//:http, or more to the point FED.CBA, which many bidi speakers find more intuitive.

I'm not sure "FED.CBA" is a "rendered" though.

Would it help if we said that the -bidi- document should be thought of as "best practices" rather than standards track?  Usually when you write a "MUST" it's clear what implementations might be affected. I don't know who would be non-compliant or how to test "Bidirectional IRIs MUST be rendered ...".

Larry


-----Original Message-----
From: Shawn Steele [[hidden email]]
Sent: Friday, March 02, 2012 9:25 AM
To: Larry Masinter; Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: RE: Bidi Doc

The whole thing?  Starting with 2:

   "Bidirectional IRIs MUST be rendered by using the Unicode
   Bidirectional Algorithm [UNIV6], [UNI9].  Bidirectional IRIs MUST be
   rendered in the same way as they would be if they were in a left-to-
   right embedding; i.e., as if they were preceded by U+202A, LEFT-TO-
   RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL
   FORMATTING (PDF).  Setting the embedding direction can also be done
   in a higher-level protocol (e.g., the dir='ltr' attribute in HTML)."

That forbids display such as FED.CBA//:http, or more to the point FED.CBA, which many bidi speakers find more intuitive.

Like I said, our investigation has indicated that this is a user preference.  People with a strong math, CS or other backgrounds are happy with the more computer-like display.  Laymen (for lack of a better word), and some professionals, seem much happier with the RTL label ordering.  There appears to also be a cultural bias, not just math/cs, but it's not perfect.

I think the best case is the "tell me what site you went to" scenario.  For the logical name ABC.DEF, someone on the phone is going to say "I went to A B C <dot> D E F".  And that's what the user is going to type.  If an RTL speaker is transcribing that logical order "ABC.DEF" off the side of the bus, they're going to be reading and writing it naturally from RTL, "FED.CBA" (visual order).  If they were to read the visual "CBA.FED" from the side of a bus, they'd naturally say "D E F <dot> A B C", which is wrong for logical order, and won't work when they guy on the other end of the phone tries to type it in.  The might be able to be trained to read it in a funny way, but I don't think that's at all natural for many speakers.

To this point, how IRI's say http://buy.stuff.com/get/your/stuff/here.html?user=23456&account=abcd is far less important to most people than the display ads which say "pepsi.com".

Us computer scientists will be able to read the IRI no matter how it's displayed, we'll know what the spec says, or close enough anyway.  It’s the user trying to go to "pepsi.com" that is the most important case.  How we handle the rest of the IRI should be based on that.  We shouldn't force some behavior on the domain name because we have trouble figuring out what to do with query strings.

-Shawn

-----Original Message-----
From: Larry Masinter [[hidden email]]
Sent: Friday, March 02, 2012 12:32 AM
To: Shawn Steele; Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: RE: Bidi Doc

Shawn, I'm not sure what part of the Bidi IRI spec would be affected by your comments.... could you be more specific about which section it refers to?

Thanks,

Larry


-----Original Message-----
From: Shawn Steele [[hidden email]]
Sent: Monday, February 27, 2012 8:39 AM
To: Adil Allawi; Najib Tounsi
Cc: [hidden email]
Subject: Bidi Doc

It's been a while since I've taken a look at this document.

IMO the embedding display forcing LTR behavior still doesn't match the feedback I've received from many Arabic speakers, though it does seem to fit the expectations of other BIDI speakers.  It would appear that some users would be best served by RLE type behavior instead.

Unfortunately this appears to be a user preference and appears to be influenced by their life experience, not necessarily tied directly to the content language.  Users with strong math or CS backgrounds seem more likely to find the LTR behavior acceptable.  If someone's going to read an IRI to someone over the phone, it needs to be in the order they'd read/type it.

FWIW:  Outside of skilled users, the structure of the actual IRI is opaque.  Eg: to a Phd educated user in a different field, www.foo.com means "foo company's spot on the web", somehow reversing the order of the domain.

-Shawn