1315975 - Remove the automatic insertion of bidi isolation marks

Assignee

Description

•

8 years ago

In bug 1306262 we moved the logic responsible for wrapping interpolated values in FSI and PDI from the MessageContext and into the bindings. Some background: if the interpolated text is LTR and contains weak characters at the beginning or end, e.g. `WPA2-PSK (TKIP)` and the surrounding translation is RTL, the weak characters would inherit the directionality of the surrounding text and render misplaced. The solution we came up long time ago in Firefox OS was to wrap interpolations in FSI and PDI, the Unicode isolation marks. Over the weekend I attended the l10n hackathon in Paris where I talked to our Hebrew localizers. It turns out the Unicode bidi algorithm v6.3 supports the notion of paired brackets (along with other improvements). v6.3 has been implemented as part of bug 922963 and more specifically, in bug 1157727 bracket pairs were added. Let's remove the automatic insertion of FSI/PDI completely. In the rare case where a more precise control of directionality will be required, the developers can wrap external arguments in FSI/PDI and localizers can use the <bdi> tags. Other browsers support Unicode v6.3 well, including Safari, IE and Edge. Unfortunately current versions of Chrome don't support bracket pairs yet: https://bugs.chromium.org/p/chromium/issues/detail?id=302469. I don't think it should stop us from removing this feature from L20n. It's a rare use case and our solution has been causing many unexpected problems to developers so far.

Staś Małolepszy :stas

Assignee

Comment 1

•

8 years ago

Attached file Pull request — Details

Assignee: nobody → stas

Status: NEW → ASSIGNED

Attachment #8808621 - Flags: review?(gandalf)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 2

•

8 years ago

I don't think I understand the logic here. https://www.w3.org/International/articles/inline-bidi-markup/ describes W3C recommendations which include, among others phrases "The Unicode Consortium recommends that isolation be used as the default for all future inline bidirectional text embeddings." At the Unicode Conference last week there was a whole presentation encouraging people to start using isolation for any embeddings if they're mixing directionality of text. What exactly is the claim that Hebrew localizers told you that goes in the opposite direction?

Staś Małolepszy :stas

Assignee

Comment 3

•

8 years ago

We were all pleasantly surprised that the updated algorithm ships in most of the major browsers. This directly addresses the problem of parentheses. I'm not arguing with the recommendation. I commend it. I just think that inserting FSI/PDI into all interpolations is troublesome and we've seen evidence of it being surprising. Instead of doing this in our library, I would like to encourage developers or localizers to use explicit isolation when they know the interpolated text might be bidi.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 4

•

8 years ago

> Instead of doing this in our library, I would like to encourage developers or localizers to use explicit isolation when they know the interpolated text might be bidi. If I understand correctly, this is in almost all cases plus cases when they don't know if the interpolated text has different directionality. Which is why we used FSI and not RLI. Asking localizers to mark embeddings that may require directionality isolation is going to be a pretty significant burden to the point where I'd question if it'll happen outside of in result of explicit bug reports. > I just think that inserting FSI/PDI into all interpolations is troublesome and we've seen evidence of it being surprising. I'm not sure if you provided any evidence of it. It feels to me that the solution works as expected and is in line with W3C recommendation.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 5

•

8 years ago

Attached image example of a difference — Details

I did a quick test based on the recent bug 1313751. It's slightly difference case (placeable) and I understand that passing URI as a variable is not going to be the core scenario, but I think it's a good example of the claim that Unicode by itself doesn't solve bidirectionality and that the recommendation of W3C that we follow currently is improving the outcome.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 6

•

8 years ago

Sorry, hit send before finished: I modified our l20n.js example to take an argument "url" which is an url and embed it as a placeable into a l20n string.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 7

•

8 years ago

I also went through the use cases from the W3C article and tried several of them against our code with and without FSI/PDI. In cases I tested removing FSI/PDI brings the error scenario.

Staś Małolepszy :stas

Assignee

Comment 8

•

8 years ago

I'm torn here. As a general rule I like the idea of isolating interpolations. OTOH I see a problem in how the current solution is surprising to the users. If you pass { "foo": "Foo" } as args, you can interpolate it in the translation no problem: key = A { $foo } #1 but you can't use it as a selector: key = A { $foo -> #2 [Foo] Foo [Bar] Bar } or as an argument, with ( "style": "percent" } for instance: key = { NUMBER($num, style: $style) } #3 $foo and $style are strings, but not really. Because of the wrapping done by the bindings they suddenly are special and don't work like other strings. They don't match anything anymore. I'm not sure how serious this is. Perhaps this isn't a common use-case? Admittedly in case of arguments (#3) you should rather encode the style property in $num, via an extended external argument feature. But #2 will be common when passing information about users' genders, so I think this is serious. I also see it as an inconsistency and a gotcha. A "smell" of a suboptimal design. (Maybe we should completely forbid #3?) Which is why my idea with this bug was to remove the automatic wrapping in favor of an explicit opt-in solution. Something like: key = A <bdi>{ $foo }</bdi> key = A { BIDI($foo) } or: formatValues('key', wrap_bidi({foo: "Foo"}) Do you also see this as a problem? Do you have any other ideas of how to fix it?

Flags: needinfo?(gandalf)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 9

•

8 years ago

(In reply to Staś Małolepszy :stas from comment #8) > I'm torn here. As a general rule I like the idea of isolating > interpolations. OTOH I see a problem in how the current solution is > surprising to the users. Agree, we should aim for the solution to not surprise users. > If you pass { "foo": "Foo" } as args, you can > interpolate it in the translation no problem: > > key = A { $foo } #1 > > but you can't use it as a selector: > > key = A { $foo -> #2 > [Foo] Foo > [Bar] Bar > } Let's make it work. > or as an argument, with ( "style": "percent" } for instance: > > key = { NUMBER($num, style: $style) } #3 Let's make it work. > $foo and $style are strings, but not really. Because of the wrapping done > by the bindings they suddenly are special and don't work like other strings. > They don't match anything anymore. That's a great point. > I'm not sure how serious this is. Perhaps this isn't a common use-case? I definitely think that we should make #1 our priority number one. This is the most common case and it should just work (tm) imho. Everything else is further on the scale of less common and ok to be more complex. > Admittedly in case of arguments (#3) you should rather encode the style > property in $num, via an extended external argument feature. But #2 will be > common when passing information about users' genders, so I think this is > serious. I believe both should work, but that's not the same bug as this one imho. > Which is why my idea with this bug was to remove the automatic wrapping in > favor of an explicit opt-in solution. > > Something like: > > key = A <bdi>{ $foo }</bdi> > key = A { BIDI($foo) } > > or: > > formatValues('key', wrap_bidi({foo: "Foo"}) > > Do you also see this as a problem? Yeah, I do. I believe that neither developer not localizer are in the perfect position to know if the string will need a BIDI. If we think of the relationship as Developer <--> L20n <--> Localizer - then I'd argue that only L20n and only at runtime can know if the string from the localizer and the variable from the developer are of different directions. And I believe that by default we should do what W3C recommends. Wrap. Some gotchas that you reported are reasonable - like the selectors and params and we should fix it in l20n. Some are in my opinion just artifacts of people not thinking that a message may contain those isolation marks because other l10n libraries don't do this. We're changing paradigms and I'm ok changing this one as well. A message from localization library may contain isolation marks. W3C says we should, and they are designed for our use case. Now, we can further optimize it over time to reduce the gotchas, like trying to not use them if directionality doesn't change. I can imagine us trying to evaluate it using some ICU method and decide if we need to wrap BIDI strings around it, but that's gonna have perf impact so I'd prefer to avoid having to do that now. FSI/PDI is perfect for that. It indicates that directionality may change and layout engine should check that. I even start questioning if it should be DOM only. > Do you have any other ideas of how to fix it? Possibly. There are various ways we can approach it - we probably may want to move back the wrapping to library from bindings to facilitate for those more complex uses - and then we may want to add a param that bindings will send to the library. There may be smarter ways, but I'm ok starting with more explicit API. What I strongly believe in is that we should not ask localizers or developers to explicitly state that a given parameter should be wrapped in BIDI. The reason is that there is no scenario I can imagine where a string parameter from a developer that is embedded in the localized message should not be wrapped in it. And if something is supposed to happen in 100/100 cases (or even 98/100 if we find exceptions that don't require it) then we should do everything possible to avoid reducing readability of the localization strings by requiring users to wrap every variable in a builtin function.

Flags: needinfo?(gandalf)

Staś Małolepszy :stas

Assignee

Comment 10

•

8 years ago

Zibi makes a compelling argument for keeping the wrapping and even moving it back into MessageContext. We talked about it in detail in a meeting on Monday and the consensus was to use wrapping by default in MessageContext for all interpolations, with two provisions: 1) it would be good to limit the wrapping to string only in the future, 2) wrapping can be turned off by passing an option to the MessageContext constructor. The second one in particular address most of my concerns about wrapping making it hard to work with the MessageContext in some scenarios. I filed bug 1317989 to implement this.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

Staś Małolepszy :stas

Assignee

Updated

•

8 years ago

No longer blocks: 1291693

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

8 years ago

Attachment #8808621 - Flags: review?(gandalf)

Pull request 8 years ago Staś Małolepszy :stas 40 bytes, text/x-github-pull-request		Details \| Review
example of a difference 8 years ago Zibi Braniecki [:zbraniecki][:gandalf] 100.96 KB, image/png		Details

Bugzilla

Remove the automatic insertion of bidi isolation marks

Categories

(L20n :: JS Library, defect)

Tracking

(Not tracked)

People

(Reporter: stas, Assigned: stas)

References

Details

(Whiteboard: [gecko-l20n])

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Attachment

General

Description

File Name

Content Type