[iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

 What term should we use for the kind of text that the Unicode Bidi
 Algorithm was designed for. RFC 3987 and 3987bis use "running text". bidi-
 guidelines (-01) changed to "plain text".

 We have a definition for running text at
 http://tools.ietf.org/html/draft-ietf-iri-3987bis-10#section-1.3:

     running text:  Human text (paragraphs, sentences, phrases) with
        syntax according to orthographic conventions of a natural
        language, as opposed to syntax defined for ease of processing by
        machines (e.g., markup, programming languages).

 In RFC 3987, there are two uses:

 The Unicode Bidirectional Algorithm is designed mainly for running text.

 [UNIXML] is written in the context of running text rather than in that of
 identifiers.

 The first use moved to bidi-guidelines, but the second use is still in
 3987bis. In both cases, the term "plain text" isn't appropriate, because
 the main use of "plain text" is to distinguish from "fancy text", i.e.
 text with styling,... But in both usages above, the distinction between
 "plain text" and "fancy text" is irrelevant. See also
 http://en.wikipedia.org/wiki/Plain_text.

--
----------------------+--------------------------------------
 Reporter:  duerst@…  |      Owner:  draft-ietf-iri-3987bis@…
     Type:  defect    |     Status:  new
 Priority:  major     |  Milestone:
Component:  3987bis   |    Version:
 Severity:  -         |   Keywords:
----------------------+--------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

Matitiahu Allouche
Since the question is related to Unicode (the kind of text that the Unicode Bidi Algorithm was designed for), maybe we should check the Unicode definition for "plain text". In the Unicode glossary (http://unicode.org/glossary/#P), we find:
Plain Text. Computer-encoded text that consists only of a sequence of code points from a given standard, with no other formatting or structural information. Plain text interchange is commonly used between computer systems that do not share higher-level protocols. (See also rich text.)


Personally, I find this definition appropriate for "the kind of text that the Unicode Bidi Algorithm was designed for", and I prefer "plain text" over "running text". It is also my experience that "plain text" is much more in use in Unicode circles than "running text".

Shalom (Regards),  Mati
      Bidi Architect
      Globalization Center Of Competency - Bidirectional Scripts
      IBM Israel
      Mobile: +972 52 2554160




From:        "iri issue tracker" <[hidden email]>
To:        [hidden email], [hidden email]
Cc:        [hidden email]
Date:        11/03/2012 14:03
Subject:        [iri] #118: What term to use for the kind of text that the Unicode  Bidi Algorithm was designed for




#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

What term should we use for the kind of text that the Unicode Bidi
Algorithm was designed for. RFC 3987 and 3987bis use "running text". bidi-
guidelines (-01) changed to "plain text".

We have a definition for running text at
http://tools.ietf.org/html/draft-ietf-iri-3987bis-10#section-1.3:

    running text:  Human text (paragraphs, sentences, phrases) with
       syntax according to orthographic conventions of a natural
       language, as opposed to syntax defined for ease of processing by
       machines (e.g., markup, programming languages).

In RFC 3987, there are two uses:

The Unicode Bidirectional Algorithm is designed mainly for running text.

[UNIXML] is written in the context of running text rather than in that of
identifiers.

The first use moved to bidi-guidelines, but the second use is still in
3987bis. In both cases, the term "plain text" isn't appropriate, because
the main use of "plain text" is to distinguish from "fancy text", i.e.
text with styling,... But in both usages above, the distinction between
"plain text" and "fancy text" is irrelevant. See also
http://en.wikipedia.org/wiki/Plain_text.

--
----------------------+--------------------------------------
Reporter:  duerst@…  |      Owner:  draft-ietf-iri-3987bis@…
    Type:  defect    |     Status:  new
Priority:  major     |  Milestone:
Component:  3987bis   |    Version:
Severity:  -         |   Keywords:
----------------------+--------------------------------------

Ticket URL: <
http://trac.tools.ietf.org/wg/iri/trac/ticket/118>
iri <
http://tools.ietf.org/wg/iri/>



Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

Martin J. Dürst
Hello Mati,

Many thanks for your comments.

On 2012/03/11 22:07, Matitiahu Allouche wrote:

> Since the question is related to Unicode (the kind of text that the
> Unicode Bidi Algorithm was designed for), maybe we should check the
> Unicode definition for "plain text". In the Unicode glossary (
> http://unicode.org/glossary/#P), we find:
> Plain Text. Computer-encoded text that consists only of a sequence of code
> points from a given standard, with no other formatting or structural
> information. Plain text interchange is commonly used between computer
> systems that do not share higher-level protocols. (See also rich text.)
>
>
> Personally, I find this definition appropriate for "the kind of text that
> the Unicode Bidi Algorithm was designed for", and I prefer "plain text"
> over "running text". It is also my experience that "plain text" is much
> more in use in Unicode circles than "running text".

I agree that if we look at the distinction between plain text and rich
text, then it is appropriate to say that the Bidi Algorithm has been
designed for plain text rather than for rich text. But in the two places
in the spec where we have been using "running text" for the past seven
or more years, it's NOT this distinction between plain text and rich
text that we are after.

To be more specific, it's irrelevant whether an IRI shows up in a plain
text file (.txt) or a rich text file (e.g. MS Word, HTML with
stylesheets,...). We have exactly the same problems with bidi IRIs in
plain text as we have in rich text. This is because although the Bidi
Algorithm was designed for plain text, essentially the same algorithm is
used for rich text. For MS Word, there are usually a few tweaks where it
does not behave exactly the same as the Unicode Bidi Algorithm (the last
one of them is the special behavior regarding parentheses that was
presented and discussed at last year's IUC), but the basics are the
same. Rendered HTML also uses the Unicode Bidi Algorithm for its basic
features.

What the spec is referring to is the fact that the Bidi Algorithm was
designed for sequences of characters, words, and punctuation such as
they turn up in letters, newspaper articles, explanatory text in books,
and so on, as opposed to sequences of characters as they turn up in
artificial stuff such as IRIs, markup source, programming languages, and
so on.

I'm not sure whether "running text" is the best term for this, but I am
very sure "plain text" is wrong for where we want to use it, because
IRIs, markup source, programs, and so on are in many if not most cases
plain text. Running text at least seems to come close, see e.g. the
definition at http://en.wiktionary.org/wiki/running_text.

Regards,   Martin.



> Shalom (Regards),  Mati
>         Bidi Architect
>         Globalization Center Of Competency - Bidirectional Scripts
>         IBM Israel
>         Mobile: +972 52 2554160
>
>
>
>
> From:   "iri issue tracker"<[hidden email]>
> To:     [hidden email], [hidden email]
> Cc:     [hidden email]
> Date:   11/03/2012 14:03
> Subject:        [iri] #118: What term to use for the kind of text that the
> Unicode  Bidi Algorithm was designed for
>
>
>
> #118: What term to use for the kind of text that the Unicode Bidi
> Algorithm was
> designed for
>
>   What term should we use for the kind of text that the Unicode Bidi
>   Algorithm was designed for. RFC 3987 and 3987bis use "running text".
> bidi-
>   guidelines (-01) changed to "plain text".
>
>   We have a definition for running text at
>   http://tools.ietf.org/html/draft-ietf-iri-3987bis-10#section-1.3:
>
>       running text:  Human text (paragraphs, sentences, phrases) with
>          syntax according to orthographic conventions of a natural
>          language, as opposed to syntax defined for ease of processing by
>          machines (e.g., markup, programming languages).
>
>   In RFC 3987, there are two uses:
>
>   The Unicode Bidirectional Algorithm is designed mainly for running text.
>
>   [UNIXML] is written in the context of running text rather than in that of
>   identifiers.
>
>   The first use moved to bidi-guidelines, but the second use is still in
>   3987bis. In both cases, the term "plain text" isn't appropriate, because
>   the main use of "plain text" is to distinguish from "fancy text", i.e.
>   text with styling,... But in both usages above, the distinction between
>   "plain text" and "fancy text" is irrelevant. See also
>   http://en.wikipedia.org/wiki/Plain_text.
>

Reply | Threaded
Open this post in threaded view
|

RE: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

Phillips, Addison-2
>
> I'm not sure whether "running text" is the best term for this, but I am very sure
> "plain text" is wrong for where we want to use it, because IRIs, markup source,
> programs, and so on are in many if not most cases plain text. Running text at
> least seems to come close, see e.g. the definition at
> http://en.wiktionary.org/wiki/running_text.
>

I'm pretty sure that 'running text' is too limiting as well. It there a need for a specialized term here at all? How about 'text' as the term? Even such "off-line" formats as napkins and bus sides qualify then. As in: "Where an IRI appears in text...."

I notice that the term "running text" in section 1.3 appears exactly once in the document and there only provides a sort of informative explanation of UNIXML.

Addison
Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for


Comment (by adil@…):

 Mati writes:

 > Since the question is related to Unicode (the kind of text that the
 Unicode Bidi Algorithm was designed for), maybe we should check the
 Unicode definition for "plain text". In the Unicode glossary
 (http://unicode.org/glossary/#P), we find:
 >
 >Plain Text. Computer-encoded text that consists only of a sequence of
 code points from a given standard, with no other formatting or structural
 information. Plain text interchange is commonly used between computer
 systems that do not share higher-level protocols. (See also
 [http://unicode.org/glossary/#rich_text rich text].)
 >
 >Personally, I find this definition appropriate for "the kind of text that
 the Unicode Bidi Algorithm was designed for", and I prefer "plain text"
 over "running text". It is also my experience that "plain text" is much
 more in use in Unicode circles than "running text".

--
----------------------+---------------------------------------
 Reporter:  duerst@…  |       Owner:  draft-ietf-iri-3987bis@…
     Type:  defect    |      Status:  new
 Priority:  major     |   Milestone:
Component:  3987bis   |     Version:
 Severity:  -         |  Resolution:
 Keywords:            |
----------------------+---------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:1>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for


Comment (by adil@…):

 Martin Writes:
 >I agree that if we look at the distinction between plain text and rich
 text, then it is appropriate to say that the Bidi Algorithm has been
 designed for plain text rather than for rich text. But in the two places
 in the spec where we have been using "running text" for the past seven or
 more years, it's NOT this distinction between plain text and rich text
 that we are after.
 >
 >To be more specific, it's irrelevant whether an IRI shows up in a plain
 text file (.txt) or a rich text file (e.g. MS Word, HTML with
 stylesheets,...). We have exactly the same problems with bidi IRIs in
 plain text as we have in rich text. This is because although the Bidi
 Algorithm was designed for plain text, essentially the same algorithm is
 used for rich text. For MS Word, there are usually a few tweaks where it
 does not behave exactly the same as the Unicode Bidi Algorithm (the last
 one of them is the special behavior regarding parentheses that was
 presented and discussed at last year's IUC), but the basics are the same.
 Rendered HTML also uses the Unicode Bidi Algorithm for its basic features.
 >
 >What the spec is referring to is the fact that the Bidi Algorithm was
 designed for sequences of characters, words, and punctuation such as they
 turn up in letters, newspaper articles, explanatory text in books, and so
 on, as opposed to sequences of characters as they turn up in artificial
 stuff such as IRIs, markup source, programming languages, and so on.
 >
 >I'm not sure whether "running text" is the best term for this, but I am
 very sure "plain text" is wrong for where we want to use it, because IRIs,
 markup source, programs, and so on are in many if not most cases plain
 text. Running text at least seems to come close, see e.g. the definition
 at http://en.wiktionary.org/wiki/running_text.

--
----------------------+---------------------------------------
 Reporter:  duerst@…  |       Owner:  draft-ietf-iri-3987bis@…
     Type:  defect    |      Status:  new
 Priority:  major     |   Milestone:
Component:  3987bis   |     Version:
 Severity:  -         |  Resolution:
 Keywords:            |
----------------------+---------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:2>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for


Comment (by adil@…):

 Addison writes:
 >I'm pretty sure that 'running text' is too limiting as well. It there a
 need for a specialized term here at all? How about 'text' as the term?
 Even such "off-line" formats as napkins and bus sides qualify then. As in:
 "Where an IRI appears in text...."
 >
 >I notice that the term "running text" in section 1.3 appears exactly once
 in the document and there only provides a sort of informative explanation
 of UNIXML.

--
----------------------+---------------------------------------
 Reporter:  duerst@…  |       Owner:  draft-ietf-iri-3987bis@…
     Type:  defect    |      Status:  new
 Priority:  major     |   Milestone:
Component:  3987bis   |     Version:
 Severity:  -         |  Resolution:
 Keywords:            |
----------------------+---------------------------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:3>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by adil@…):

 * owner:  draft-ietf-iri-3987bis@… => adil@…
 * status:  new => assigned


--
----------------------+-----------------------
 Reporter:  duerst@…  |       Owner:  adil@…
     Type:  defect    |      Status:  assigned
 Priority:  major     |   Milestone:
Component:  3987bis   |     Version:
 Severity:  -         |  Resolution:
 Keywords:            |
----------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:4>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by adil@…):

 * keywords:   => bidi


--
----------------------+-----------------------
 Reporter:  duerst@…  |       Owner:  adil@…
     Type:  defect    |      Status:  assigned
 Priority:  major     |   Milestone:
Component:  3987bis   |     Version:
 Severity:  -         |  Resolution:
 Keywords:  bidi      |
----------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:5>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by adil@…):

 * component:  3987bis => bidi-guidelines


--
-----------------------------+-----------------------
 Reporter:  duerst@…         |       Owner:  adil@…
     Type:  defect           |      Status:  assigned
 Priority:  major            |   Milestone:
Component:  bidi-guidelines  |     Version:
 Severity:  -                |  Resolution:
 Keywords:  bidi             |
-----------------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:6>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for


Comment (by adil@…):

 I think the best description is:
 ''The Unicode Bidirectional Algorithm is designed for general purpose
 text''

--
-----------------------------+-----------------------
 Reporter:  duerst@…         |       Owner:  adil@…
     Type:  defect           |      Status:  assigned
 Priority:  major            |   Milestone:
Component:  bidi-guidelines  |     Version:
 Severity:  -                |  Resolution:
 Keywords:  bidi             |
-----------------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:7>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for


Comment (by duerst@…):

 The proposal by Adil ("The Unicode Bidirectional Algorithm is designed for
 general purpose text") looks very good to me.

 I had entered the component as "3987bis" originally, because there is a
 definition and one use of "running text" in 3987bis, too.

 In line with Adil's proposal, I propose to change "[UNIXML] is written in
 the context of running text rather than in that of identifiers." to
 "[UNIXML] is written in the context of general proprose text rather than
 in that of identifiers."

 There are two things we can do with the definition we currenly have for
 running text: Change it to a definition of general purpose text, or remove
 it. The changed definition would read:

 general purpose text: Human text (paragraphs, sentences,
    phrases) with syntax according to orthographic conventions of a
    natural language, as opposed to syntax defined for ease of
    processing by machines (e.g., markup, programming languages).

 Becasue we use the term only once in each of two documents, and because we
 use it only in contrast, I propose to remove the definition.

--
-----------------------------+-----------------------
 Reporter:  duerst@…         |       Owner:  adil@…
     Type:  defect           |      Status:  assigned
 Priority:  major            |   Milestone:
Component:  bidi-guidelines  |     Version:
 Severity:  -                |  Resolution:
 Keywords:  bidi             |
-----------------------------+-----------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:8>
iri <http://tools.ietf.org/wg/iri/>


Reply | Threaded
Open this post in threaded view
|

Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

iri issue tracker
In reply to this post by iri issue tracker
#118: What term to use for the kind of text that the Unicode Bidi Algorithm was
designed for

Changes (by duerst@…):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 Using "general purpose text" as proposed by Adil. Was is already
 implemented in bidi. Also changed in 3987bis, and removed the definition,
 as proposed before.

--
-----------------------------+---------------------
 Reporter:  duerst@…         |       Owner:  adil@…
     Type:  defect           |      Status:  closed
 Priority:  major            |   Milestone:
Component:  bidi-guidelines  |     Version:
 Severity:  -                |  Resolution:  fixed
 Keywords:  bidi             |
-----------------------------+---------------------

Ticket URL: <http://trac.tools.ietf.org/wg/iri/trac/ticket/118#comment:9>
iri <http://tools.ietf.org/wg/iri/>