Closed Bug 1144205 Opened 9 years ago Closed 6 years ago

Wrap fallback translations in Unicode control marks to enforce proper text direction

Categories

(Firefox OS Graveyard :: Gaia::L10n, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: stas, Assigned: tedders1)

References

Details

When falling back to a language which has a different directionality than the main language it would make sense to wrap the returned value in LRE+PDF or RLE+PDF (depending on the language) to stop weak characters from changing direction and breaking the UI.
Actually, it looks like we can use FSi and PDI for the same purpose (taken from bug 1144682, thanks Ted):

  + var FSI = '\u2068'; // Unicode FIRST STRONG ISOLATE character.
  + var PDI = '\u2068'; // Unicode POP DIRECTIONAL ISOLATE character.

This should be easier because we don't have to know the direction of the inner text then.
As per http://rishida.net/blog/?p=564, in some rare cases FSI can give wrong results if the isolated text itself is bi-directional and the first strongly-typed character is of different directionality than the base text should be.

From http://www.unicode.org/reports/tr9/tr9-31.html#Directional_Formatting_Characters :

"Although the term embedding is used for some explicit formatting characters, the text within the scope of the embedding formatting characters is not independent of the surrounding text. Characters within an embedding can affect the ordering of characters outside, and vice versa. This is not the case with the isolate formatting characters, however. Characters within an isolate cannot affect the ordering of characters outside it, or vice versa. The effect that an isolate as a whole has on the ordering of the surrounding characters is the same as that of a neutral character, whereas an embedding or override roughly has the effect of a strong character.

Directional isolate characters were introduced in Unicode 6.3 after it became apparent that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. The new characters were introduced instead of changing the behavior of the existing ones because doing so might have had an undesirable effect on those existing documents that do rely on the old behavior. Nevertheless, the use of the directional isolates instead of embeddings is encouraged in new documents – once target platforms are known to support them."

I suggest that we use LRI and RLI (and PDI).
Summary: Wrap fallback translations in LRE/RLE+PDF marks to enforce proper text direction → Wrap fallback translations in Unicode control marks to enforce proper text direction
Blocks: 1152074
Assignee: nobody → tclancy
No longer blocks: 1152074
Blocks: 1152074
Hi Staś,

So, it turns out that gecko doesn't yet support LRI/RLI/FSI/PDI characters yet. (There's an outstanding bug for it.) But gecko does support <bdi> tags, so let's use them instead.

Each Unicode bidi isolation character has an equivalent <bdi> tag:

LRI is the same as <bdi dir="ltr">.
RLI is the same as <bdi dir="rtl">
FSI is the same as <bdi dir="auto"> or simply <bdi> ("auto" is the default)
PDI is the same as </bdi>

It also occurs to me that we're going to need to explain this stuff to translators (and other non-experts, as we try to encourage good bidi practices), and the <bdi> tags have the benefit of not being invisible, and being easier to understand. So, I think I tend to prefer the <bdi> tags in general instead of the Unicode control characters.

In practice, I don't think the dir attribute will be needed much (the default of "auto" works most of the time), so translators will just have to learn to use <bdi> and </bdi> whenever they have text of one directionality embedded within text of a different directionality.

(In reply to Staś Małolepszy :stas from comment #2)
> As per http://rishida.net/blog/?p=564, in some rare cases FSI can give wrong
> results if the isolated text itself is bi-directional and the first
> strongly-typed character is of different directionality than the base text
> should be.

This won't be a problem as long as the isolated text itself correctly uses FSI and PDI (or <bdi> and </bdi>). For example, if the isolated text is a Hebrew sentence that happens to start with an English word, everything will work fine as long as the English word is itself wrapped in another layer of <bdi> and </bdi> tags.

Alternatively, the translator can manually put <bdi dir="rtl"> tags around the entire bidi string, which will then be wrapped by another pair of <bdi> and </bdi> tags generated by our code. I think this is an inferior option because (1) It's more cumbersome, and (2) You'll still have problems if the English word happens to be followed by a punctuation character. But the option is available, and many translators will be used to "solutions" like this.
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.