Updated article: Unicode Bidirectional Algorithm basics

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Updated article: Unicode Bidirectional Algorithm basics

r12a
Since we've had to explain it several times recently during reviews of
specs from other WGs, i extended the final section of this article to
show examples of situations where it is important to be able to apply
rtl base direction to a string in order to achieve correct display.

There's also one example of the need for isolation.

The idea is that we should point people to that when needed during reviews.

See the updated version of the article at:
http://w3c.github.io/i18n-drafts/articles/inline-bidi-markup/uba-basics.en

If there are no objections after a few days, i'll post to the main site.

ri




PS: Please don't suggest additions that will complicate this more – it's
meant to be a simple, accessible article for newbies, to give them the
gist of things.  Other articles, such as those listed at the bottom,
carry greater detail.

Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

John Cowan-3
[hidden email] scripsit:

> If there are no objections after a few days, i'll post to the main site.

I would add at least a mention of Eastern Arabic-Indic digits, and
something about mirrored characters like ?.

--
John Cowan          http://www.ccil.org/~cowan        [hidden email]
"Hacking is the true football."  --F.W. Campbell (1863) in response to a
successful attempt to ban shin-kicking from soccer.  Today, it's biting.

Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

Asmus Freytag (c)
In reply to this post by r12a
On 7/25/2016 9:33 AM, [hidden email] wrote:
PS: Please don't suggest additions that will complicate this more – it's meant to be a simple, accessible article for newbies, to give them the gist of things.  Other articles, such as those listed at the bottom, carry greater detail.

Richard,

I was going to comment on the fact that I find it too hard for newbies in some ways.
I think the issue is fixable by rewriting the first five paragraphs of the document to do a better job of setting the scene.



[old]

It is important to understand from the outset that, in all major web browsers, the order of characters in memory (logical) is not the same as the order in which they are displayed (visual).

The set of rules applied by the browser to produce the correct order at the time of display are described by the Unicode Bidirectional Algorithm. We'll generally refer to this as 'the bidi algorithm'.

[new]

==> Words in some languages may be written from right to left, while numbers and other languages may be written from left to right. However, browsers and apps usually store all characters in the some order which, generally the order as they were typed (logical order), which is not the order in which they are displayed (visual order).

To display them, they apply a  set of rules that will produce the correct order at the time of display, These rules are described by the Unicode Bidirectional Algorithm, or 'bidi algorithm' for short.

[keep]

The rest of this section introduces basic concepts of the bidi algorithm that will help you understand how to manage bidirectional inline text.

Base direction (direction of the context)

[old]

The order in which text is displayed depends on the base direction assigned to the phrase, paragraph or block that contains it. The base direction is a fundamentally important concept. It establishes a directional context that the bidi algorithm refers to at various points to decide how to handle the text.

[new]

==> In some situations, especially where languages of a different directionality are mixed, the result can depend on the intended overall direction of the text. For example: is it an Arabic text that happens to contain words or phrases in French, or is it a French text that includes some Arabic.

[examples]

In resolving this and other ambiguous situations, the bidi algorithm uses a fundamental concept, the base direction. The base direction is assigned to the phrase, paragraph or block that contains the text to be displayed. It establishes a directional context that the bidi algorithm refers to at various points to decide how to handle the text

[keep]

In HTML the base direction is either set explicitly by the nearest parent element that uses the dir attribute, or, in the absence of any such attribute, is inherited from the default direction of the document, which is left-to-right.



Now, I think that even after editing these five paragraphs, there's a flow issue in your document, made visible by the need of having a whole sentence in bold face.

I would fix that issue by moving the reworded Base Direction paragraph to the place where you have the highlighted sentence. (As you already have an example, ignore the suggested example, and reword the line about French/Arabic to instead refer to the example that you already have. The sentence about HTML would go before the "If you change..." paragraph)


[new, if moved to past "Note that..happen]

==> In situations like this where languages of a different directionality are mixed, the result can depend on the intended overall direction of the text.

[old]

In the example above, which has an overall context (ie. base direction) of ltr, you would read 'bahrain', then 'مصر', then 'kuwait'.

==>

[new]In the example above, assuming the base direction (overall context) was ltr, you would read 'bahrain', then 'مصر', then 'kuwait'.


I believe making these changes provides a better flow and avoids introducing a concept before it is needed, which is always difficult for a beginner.

A./


Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

Asmus Freytag (c)
In reply to this post by John Cowan-3
On 7/25/2016 10:09 AM, John Cowan wrote:
[hidden email] scripsit:

If there are no objections after a few days, i'll post to the main site.
I would add at least a mention of Eastern Arabic-Indic digits, and
something about mirrored characters like ?.

Not discussing mirroring probably ranks as an omission. Especially with the still relatively recent updates to the UBA that changed the handling of paired punctuation.

The Eastern digits could be a footnote.

A./

Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

r12a
In reply to this post by John Cowan-3
thanks for the suggestions.

On 25/07/2016 18:09, John Cowan wrote:
> I would add at least a mention of Eastern Arabic-Indic digits

This is really intended just to give absolute beginners an initial taste
about how the algorithm works, rather than to cover all the details, so
i'd prefer to include that elsewhere. It is a good thing to cover
somewhere, however, though i'm currently thinking it's worth a separate
article of its own.

, and
> something about mirrored characters like ?.

? isn't a mirrored character as far as i'm aware.  The Arabic question
mark is a separate character, ؟.  There is, however, a section about
mirrored characters at
https://www.w3.org/International/articles/inline-bidi-markup/index#mirrored

ri


Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

Asmus Freytag (c)
In reply to this post by John Cowan-3
Richard,

I think that an unstated goal of your text is to tell people "how to
manage" bidi texts in a very high-level way.

To that end, I would move the trailing exclamation mark example up into
the discussion of base direction.

I would then change the "Beyond the algorithm..." section to "Isolating
text passages" and make it explicitly about how isolation works by
mentioning how to achieve it in HTML in the same general way you
explained how to set base direction.

After "numbers", I would have a (short) section on paired punctuation
with an example or two. Handling pairs, and nested pairs, is now a
function of the UBA.

Finally, the very last paragraph points to other resources, it needs a
header of its own, unless you want to make it part of "Further Reading".

A./

Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

r12a
thanks for the suggestions, Asmus. Getting a bit late here, so some
quick answers below.

On 25/07/2016 20:41, Asmus Freytag (c) wrote:
> I think that an unstated goal of your text is to tell people "how to
> manage" bidi texts in a very high-level way.

Actually, teaching how to 'manage' bidi was deliberately not on my
agenda for this article.  It was initially intended simply as an initial
can-opener, to give newcomers a sufficient peek at the miniumum amount
of most basic concepts involved in the UBA so that they could then
understand the other articles we have that do get into the actual
managing aspects (eg. the ones mentioned in the very last para). The
latter are the ones we really need people to read if they are going to
be working with bidi content.

By the addition of the final section i have today extended the purpose
of the article so that i can also point to it for people developing
linked data, csv, annotations, and other such markup-resistent formats
and who have no clue (understandably) about why they should make it
possible to identify the base direction for the strings they are passing
around, and why the UBA isn't sufficient unto itself. (The key message
is stated in the first para: "...it is necessary to use additional
markup, metadata or special approaches to establish the correct base
direction for a range of text").

> To that end, I would move the trailing exclamation mark example up into
> the discussion of base direction.

In fact, that example is indeed used higher up in the section at
http://w3c.github.io/i18n-drafts/articles/inline-bidi-markup/uba-basics.en#embeddedbd

> I would then change the "Beyond the algorithm..." section to "Isolating
> text passages" and make it explicitly about how isolation works by
> mentioning how to achieve it in HTML in the same general way you
> explained how to set base direction.

See above. I actually only added isolation as an afterthought. The main
message of that section is that the UBA is not sufficient on its own.
Isolation is just one aspect of that issue.

> After "numbers", I would have a (short) section on paired punctuation
> with an example or two. Handling pairs, and nested pairs, is now a
> function of the UBA.

That's at
https://www.w3.org/International/articles/inline-bidi-markup/index#mirrored

> Finally, the very last paragraph points to other resources, it needs a
> header of its own, unless you want to make it part of "Further Reading".

Yes, that's probably a good idea. I'll look at it tomorrow.

ri


Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

Asmus Freytag (c)
On 7/25/2016 1:09 PM, [hidden email] wrote:
thanks for the suggestions, Asmus. Getting a bit late here,
Probably best if you give your text another pass, before we continue much, but here are a few more:
so some quick answers below.

Don't miss the suggestions in a separate message on how to improve the flow of the opening paragraphs.

On 25/07/2016 20:41, Asmus Freytag (c) wrote:
I think that an unstated goal of your text is to tell people "how to
manage" bidi texts in a very high-level way.

Actually, teaching how to 'manage' bidi was deliberately not on my agenda for this article.

"how to" is perhaps too strong. But you point out that there are things that need to be managed. And give some names for HMTL attributes and values (e.g. dir, ltr, etc.). Tha's not a detailed prescription, but a teaser. All I'm suggesting is that you parallel that in the discussion on the need for isoaltion.

Overall I think it is more useful to people to come away with some understanding of what *they* must do, even if the article is otherwise a high-level summary of the algorithm. The take-away is what makes that high-level info relevant.

It was initially intended simply as an initial can-opener, to give newcomers a sufficient peek at the miniumum amount of most basic concepts involved in the UBA so that they could then understand the other articles we have that do get into the actual managing aspects (eg. the ones mentioned in the very last para). The latter are the ones we really need people to read if they are going to be working with bidi content.

Agreed. If they come away understanding the need to for them to learn how to do the detailed management, that's the best outcome.

By the addition of the final section i have today extended the purpose of the article so that i can also point to it for people developing linked data, csv, annotations, and other such markup-resistent formats and who have no clue (understandably) about why they should make it possible to identify the base direction for the strings they are passing around, and why the UBA isn't sufficient unto itself. (The key message is stated in the first para: "...it is necessary to use additional markup, metadata or special approaches to establish the correct base direction for a range of text").

Sure, but move it to the place in the article and merge it with existing text of the two paragraphs "Again...." and "The markup lang...."
Might break out a section on "base direction and web specifications" or something.

To that end, I would move the trailing exclamation mark example up into
the discussion of base direction.

In fact, that example is indeed used higher up in the section at
http://w3c.github.io/i18n-drafts/articles/inline-bidi-markup/uba-basics.en#embeddedbd

That example shows the issue with embedding. The Hebrew example shows the issue WITHOUT embeddings - I think it's a key point and the discussion of these examples wants to be merged and reorganized a bit.

In a way, you want to show that base direction is a fundamental concept, rather than merely telling people that it is.

By the way, in the section you linked, the part of the sentence "on the bottom line, without a definition of the base direction" is really perplexing in its vagueness. What you mean is that you need to specify the base direction for the inside of the quote and that the way to do that is via a <span> or other inline element. Being a bit more specific would help, I think.


I would then change the "Beyond the algorithm..." section to "Isolating
text passages" and make it explicitly about how isolation works by
mentioning how to achieve it in HTML in the same general way you
explained how to set base direction.

See above. I actually only added isolation as an afterthought. The main message of that section is that the UBA is not sufficient on its own. Isolation is just one aspect of that issue.

But the article is better for it.

A basic understanding of the UBA means also having a high level understanding of what types of shortcomings must be managed. It's not enough to know that there's "at least one", but it should cover all the "biggies". Base direction is one, but isolation is definitely another.

After "numbers", I would have a (short) section on paired punctuation
with an example or two. Handling pairs, and nested pairs, is now a
function of the UBA.

That's at https://www.w3.org/International/articles/inline-bidi-markup/index#mirrored

The example in that article should no longer occur, if the paired bidi brackets part of the UBA is implemented.
(I tried to make it happen in an online CSS tutorial and could not get the parens to come out mismtached.

But mirroring is a really basic aspect of the UBA - so your "can-opener" needs to introduce the concept in some high-level form. I think of it as a not creating a surprise for the reader.

Finally, the very last paragraph points to other resources, it needs a
header of its own, unless you want to make it part of "Further Reading".

Yes, that's probably a good idea. I'll look at it tomorrow.

ri




Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

Lina Kemmel
In reply to this post by r12a
Hello Richard,

> Directional runs
......
> Here's the important bit: the order in which directional runs themselves
> are displayed across the page depends on the *prevailing base direction*

.....

> Embedding changes to the base direction
.....
> To correct this, we need to define the *base direction of the Arabic
text*
> plus the exclamation mark to be right-to-left.

A comment on terminology.. I'd suggest to use just "base direction" when
referring to the paragraph direction, and "embedding direction" otherwise.

Regards,
Lina




From:   [hidden email]
To:     www International <[hidden email]>
Date:   25/07/2016 19:36
Subject:        Updated article: Unicode Bidirectional Algorithm basics



Since we've had to explain it several times recently during reviews of
specs from other WGs, i extended the final section of this article to
show examples of situations where it is important to be able to apply
rtl base direction to a string in order to achieve correct display.

There's also one example of the need for isolation.

The idea is that we should point people to that when needed during
reviews.

See the updated version of the article at:
http://w3c.github.io/i18n-drafts/articles/inline-bidi-markup/uba-basics.en

If there are no objections after a few days, i'll post to the main site.

ri




PS: Please don't suggest additions that will complicate this more – it's
meant to be a simple, accessible article for newbies, to give them the
gist of things.  Other articles, such as those listed at the bottom,
carry greater detail.




Reply | Threaded
Open this post in threaded view
|

Re: Updated article: Unicode Bidirectional Algorithm basics

r12a
hi Lina,

i'm thinking we should really be raising these comments as github
issues, but i'll reply here for now.


On 26/07/2016 12:12, Lina Kemmel wrote:

>> Directional runs
> ......
>> Here's the important bit: the order in which directional runs themselves
>> are displayed across the page depends on the *prevailing base direction*
>
> .....
>
>> Embedding changes to the base direction
> .....
>> To correct this, we need to define the *base direction of the Arabic
> text*
>> plus the exclamation mark to be right-to-left.
>
> A comment on terminology.. I'd suggest to use just "base direction" when
> referring to the paragraph direction, and "embedding direction" otherwise.

I'm not sure what the difference is, actually, other than the instance
of use, and i'd rather not use different terminology, since i want to
keep this as simple as possible for the reader.

ri