RE: Words and spaces

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: Words and spaces

Glenn Adams

Chris,

Thanks for your comment. The TT WG has reviewed this comment has agreed
upon the following response:

Regarding your question, it depends upon whether the language or writing
system is unknown or unspecified. If either of these cases hold, then,
according to rule 2 above, each of your examples except the last would
be interpreted as one word. The last would be interpreted as two words,
presuming that the ' ' between "Masayasu" and "Ishikawa" is represented
as #x20. In contrast, if the language or writing system is known, e.g.,
if xml:lang="en" is specified on the root element (and no override
appears), then a word unit is specified in accordance of the rules of
that language or writing system. DFXP does not specify these latter
rules in an interoperable manner (as Unicode also does not specify).

Regards,
Glenn


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Chris Lilley
Sent: Saturday, June 03, 2006 2:04 AM
To: [hidden email]
Subject: Words and spaces


Hello public-tt,

In section 8.3.7 <flowFunction>

 The dynamic flow unit word must be interpreted as being dependent upon
 the language or writing system of the affected content. If the language
 or writing system is unknown or unspecified, then word is interpreted
 as follows:

   1. If the affected content consists solely or mostly of Unified CJK
   Ideographic characters or of characters of another Unicode character
   block that are afforded similar treatment to that of Unified CJK
   Ideographic characters, then word is to be interpreted as if
   character were specified.
   
   2. Otherwise, word is to be interpreted as denoting a sequence of one
   or more characters that are not interpreted as an XML whitespace
   character.

Noting the "must" which is a testable conformance requirement, do the
following paragraphs contain one word or two?

<p>Hello&#x3000;World</p>
<p xml:lang="en">Hello&#x3000;World</p>
<p xml:lang="en">Hello&#x2004;World</p>
<p xml:lang="ja">Hello&#x3000;World</p>
<p xml:lang="ja">Hello&#x2004;World</p>
<p xml:lang="ja">Masayasu Ishikawa</p>

For a list of Unicode space characters, see for example
http://www.cs.tut.fi/~jkorpela/chars/spaces.html


--
 Chris Lilley                    mailto:[hidden email]
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG



Reply | Threaded
Open this post in threaded view
|

Re: Words and spaces

Chris Lilley

On Thursday, July 27, 2006, 5:07:43 PM, Glenn wrote:

GAA> Chris,

GAA> Thanks for your comment. The TT WG has reviewed this comment has agreed
GAA> upon the following response:

GAA> Regarding your question, it depends upon whether the language or writing
GAA> system is unknown or unspecified. If either of these cases hold, then,
GAA> according to rule 2 above, each of your examples except the last would
GAA> be interpreted as one word. The last would be interpreted as two words,
GAA> presuming that the ' ' between "Masayasu" and "Ishikawa" is represented
GAA> as #x20. In contrast, if the language or writing system is known, e.g.,
GAA> if xml:lang="en" is specified on the root element (and no override
GAA> appears), then a word unit is specified in accordance of the rules of
GAA> that language or writing system. DFXP does not specify these latter
GAA> rules in an interoperable manner (as Unicode also does not specify).

Thank you for the clarification. This response is satisfactory to me.

GAA> Regards,
GAA> Glenn


GAA> -----Original Message-----
GAA> From: [hidden email] [mailto:[hidden email]] On
GAA> Behalf Of Chris Lilley
GAA> Sent: Saturday, June 03, 2006 2:04 AM
GAA> To: [hidden email]
GAA> Subject: Words and spaces


GAA> Hello public-tt,

GAA> In section 8.3.7 <flowFunction>

GAA>  The dynamic flow unit word must be interpreted as being dependent upon
GAA>  the language or writing system of the affected content. If the language
GAA>  or writing system is unknown or unspecified, then word is interpreted
GAA>  as follows:

GAA>    1. If the affected content consists solely or mostly of Unified CJK
GAA>    Ideographic characters or of characters of another Unicode character
GAA>    block that are afforded similar treatment to that of Unified CJK
GAA>    Ideographic characters, then word is to be interpreted as if
GAA>    character were specified.
GAA>    
GAA>    2. Otherwise, word is to be interpreted as denoting a sequence of one
GAA>    or more characters that are not interpreted as an XML whitespace
GAA>    character.

GAA> Noting the "must" which is a testable conformance requirement, do the
GAA> following paragraphs contain one word or two?

GAA> <p>Hello&#x3000;World</p>
GAA> <p xml:lang="en">Hello&#x3000;World</p>
GAA> <p xml:lang="en">Hello&#x2004;World</p>
GAA> <p xml:lang="ja">Hello&#x3000;World</p>
GAA> <p xml:lang="ja">Hello&#x2004;World</p>
GAA> <p xml:lang="ja">Masayasu Ishikawa</p>

GAA> For a list of Unicode space characters, see for example
GAA> http://www.cs.tut.fi/~jkorpela/chars/spaces.html






--
 Chris Lilley                    mailto:[hidden email]
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG