Closed Bug 1144682 Opened 10 years ago Closed 10 years ago

[RTL] L10n placeholders need to distinguish between strings intended for machines and strings intended for humans

Categories

(Firefox OS Graveyard :: Gaia::First Time Experience, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: tedders1, Assigned: tedders1)

References

Details

(Whiteboard: [3.0-Daily-Testing][systemsfe])

Attachments

(1 file, 1 obsolete file)

When using navigator.mozL10n, you can specify placeholders with the syntax "{{ placeholder }}". However, there's no way to specify whether the resulting string is intended for humans or machines. If a string is intended for humans: 1) When 'placeholder' is replaced by a number, the number should be formatted with commas and dots, so that 1000000 becomes 1,000,000 (or 1.000.000 if you're in continental Europe or Quebec). 2) When 'placeholder' is replaced by a string, the string should be enclosed in Unicode bidi isolation characters, as described here: https://bugzilla.mozilla.org/show_bug.cgi?id=1136859#c33 But these same substitutions should not happen when the resulting string is intended for a machine, such as when you're constructing a URL or a JSON packet. +++ This bug was initially created as a clone of Bug #1136859 +++
Assignee: nobody → tclancy
Blocks: 1136859
No longer depends on: 1136859
This patch adds support for a new kind of placeholder with the notation "[[ placeholder ]]", which can be used in addition to "{{ placeholder }}". Whereas "{{ placeholder }}" means that the substitution is intended for a human, "[[ placeholder ]]" means that the substitution is intended for a machine. Substitutions intended for humans are by far the more common type in our code base. In comparison, there are only about a dozen places where substitutions are intended for machines. That's why I let the former keep the existing syntax. For a good illustration of the difference between "{{ placeholder }}" and "[[ placeholder ]]", see the changes to ftu.properties in the attached patch. I'm adding :smontagu to the review just so he can agree (or not) that this is philosophically the right thing to do. This patch contains modifications to shared/js/l10n.js, which is :stas is a peer for. :sfoster is added for changes to ftu.properties. I'm aware that this patch breaks a number of UI tests which rely on string comparison. I'll update those tests.
Attachment #8579393 - Flags: review?(stas)
Attachment #8579393 - Flags: review?(smontagu)
Attachment #8579393 - Flags: review?(sfoster)
Attachment #8579381 - Attachment is obsolete: true
Comment on attachment 8579393 [details] [review] Fix string substitution for bidi contexts Note in the PR about incrementing string IDs. From the point of view of the FTU app, I'm happy to rubber-stamp this. But that's contingent on Stas' approval of the [[ ]] markers and that whole display vs. machine-readable string. I'm not quite sure we're solving this in the right place, but that's definitely the l10n team's call as they are downstream of these changes and will be most affected by the new syntax
Attachment #8579393 - Flags: review?(sfoster) → review+
Comment on attachment 8579393 [details] [review] Fix string substitution for bidi contexts Ted, thanks so much for putting effort into this. It's an interesting idea that I'd like to discuss further; I'm canceling the review request for now so that I can ask a few questions and provide some background information on l10n.js. l10n.js is a stepping stone to a new localization file format and library: L20n. In the pre-1.0 days of Firefox OS we needed a localization solution and L20n wasn't ready yet so we went with the .properties files. They stuck for longer than we thought they would (as it often happens) but we're close to moving to the L20n format soon (bug 1027684). You can read more about L20n at http://l20n.org and consult the syntax spec at http://l20n.github.io/spec/grammar.html. There might be better ways to achieve the same thing using the idioms that we already have in the L20n syntax. For instance, the L20n syntax supports expressions in placeables which can call special functions ('macros'), either defined by the localizers or by the localization library. See http://l20n.org/learn/building-macros-and-expressions/. We could provide a special macro for wrapping with or stripping the bidi markers, like this: {{ @cldr.wrapbidi(name) }} {{ @cldr.stripbidi(name) }} The wrap bidi one could be made to be the default for all placeables. Macros are similar to filters from some templating libraries ({{ name | wrapbidi }}), but a little bit more powerful. See also bug 897877 about adding @cldr.plural. Currently macro calls are not supported in properties files, but we've been planning to add them at some point in the near future; or we'd get them as part of the switch to the L20n syntax. Furthermore, the L20n syntax differentiates between localizer-provided and developer-provided variables in placeables. In *.properties files there is no such distinction: {{ name }} can mean a reference to another string in the same file/localization context (useful for things like brandShortName), or to a dynamic variable provided by the developer. In L20n, OTOH, we have {{ name }} for the former case and {{ $name }} for the latter. Knowing which kind of reference we're dealing with we could wrap developer-provided variables by default, but not other strings. Another alternative would be to allow localizers to use <bdi>, dir="auto" or FSI/PDI themselves when needed. I like this approach because it gives a lot of control to the localizers which is the pinnacle of L20n. Thanks to L20n's custom macros, the localizer can create their own version of wrapbidi macro and use it in their translations. <bdi> also has the advantage of being semantically meaningful. There's also an education theme here: by moving this to localizer-land we'd raise awareness of bi-directional text and HTML's and Unicode's solutions to it. I'm close to landing bug 994357 which will make it easy to safely use HTML tags and entities in translations without the awful requirement of defining it as the innerHTML attribute. Can you file a separate bug for using number.toLocaleString(), please? Let's keep this one contained to the bidi issue. (BTW I think we'll need to pass a language code to toLocaleString or it'll return English formatting by default.) Lastly, I think there's another case where it would make sense to wrap the whole string in Unicode isolation marks: when a translation is missing we fall back to the default language which may have a different directionality. It the fallback translation contains weak characters at its extremes the text will look broken because they'll assume the surrounding context's direction instead of the fallback language's one. I filed bug 1144205 for this. Let me know what you think. Thanks again for working on this!
Attachment #8579393 - Flags: review?(stas)
Hi Staś, Okay, if we're going to move to L20n, I see the value in using a syntax which is forward-compatible with L20n. However, I think this means we're going to need to add new syntax to L20n. It doesn't have to be "[[ placeholder ]]". I chose that because we currently use {[ ]} for plural macros in our properties files, so I was looking for something that fit with {{ }} and {[ ]}. Just let me know what syntax you want and I can implement it. > Another alternative would be to allow localizers to use <bdi>, dir="auto" or FSI/PDI themselves when needed. I don't think we can leave this up to localizers. For one thing, the insertion of Unicode directional isolation characters needs to happen even for English UIs which haven't been translated by a localizer. If you have an English string that says "Hello {{$user.name}}, you have new mail" and the user's name is in Hebrew, the comma will end up displayed in the wrong place unless we insert directional isolation characters around the user's name. Secondly, this needs to be done almost everywhere. If the bidi wrapping has to be done in the .l20n files every time there's a substitution, then the syntax will become cumbersome, ugly, and error-prone, and people won't enjoy using it. They'll look for a different tool. And if they do stick with L20n, they will frequently forget to put in the bidi wrapping, making this a constant source of easily-missed bugs. Thirdly, I expect localizers to understand the grammar of the languages their using, but I don't expect them to understand Unicode bidi characters. We'd be asking them to do something that is outside their area of expertise. I expect their responses to range from "Huh? What? I don't understand this" to "Why can't the system insert these characters automatically?". > In L20n, OTOH, we have {{ name }} for the former case and {{ $name }} for the latter. Knowing which kind of reference we're dealing with we could wrap developer-provided variables by default, but not other strings. That's an interesting idea. > There might be better ways to achieve the same thing using the idioms that we already have in the L20n syntax. [...] We could provide a special macro for wrapping with or stripping the bidi markers, like this: > > {{ @cldr.wrapbidi(name) }} > {{ @cldr.stripbidi(name) }} > > The wrap bidi one could be made to be the default for all placeables. I agree that wrapping needs to be the default. My question is, at what point is the wrapping done? I thought the wrapping would be done when the {{ }} is evaluated. But that obviously won't work with what you're proposing. (There's no point in calling @cldr.stripbidi() if the outer {{ }} brackets put the isolation characters back again.) One possibility would be to wrap whenever $name is evaluated. In that case, we'd also have to wrap something like $name.attribute (such as $brandName.short). But consider the following use case: > <greeting[$user.gender] { > masculine: "Hello guy", > feminine: "Hello gal", > *default: Hello person" > }> We obviously don't want to wrap $user.gender in this case. So $name.attribute by itself can't be wrapped. But {{ $name.attribute }} would need to be wrapped. But {{ @cldr.stripbidi($name.attribute) }} should not be wrapped. This raises questions. Exactly when does wrapping happen? Does {{ ($name).attribute }} get wrapped? What about {{ my_macro($name).attribute }} ? What about {{ @cldr.otherfunction($name) }} ? So, I don't think a @cldr.stripbidi function is the answer. (It might still be useful, but not for this.) I think the more obvious thing to do is to apply wrapping whenever a string is substituted into another string. That is, whenever the {{ }} is evaluated. And that means we'll need a new syntax for specifying when wrapping shouldn't happen. > Can you file a separate bug for using number.toLocaleString(), please? Let's keep this one contained to the bidi issue. Okay, I'll do that. But I think it wouldn't hurt to consider number formatting while considering the bidi issue. I'm of the belief that numbers should be formatted for their locale by default. (If it's not the default, it becomes cumbersome to do, and often won't be done at all.) Number formatting is part of localization, and the correct behaviour should be the default behaviour. So again, we'd need a syntax that means "disable the default". > Lastly, I think there's another case where it would make sense to wrap the whole string in Unicode isolation marks: when a translation is missing we fall back to the default language which may have a different directionality. That's a good idea too. Or, as an alternative, maybe we could always wrap the whole string in Unicode directional isolation characters, regardless of the language. It wouldn't hurt.
By the way, I realize now that [[ ]] was a bad syntax choice. A good syntax choice should make clear which is the usual case and which is the abnormal case. But if you look at {{ }} and [[ ]], it's not clear which is the usual case and which is the abnormal case. Maybe the syntax for a non-wrapping substitution should be something like {{#!# placeholder }}. You know, something that is clearly weirder looking than {{ placeholder }}.
So what do you think, Staś? It would be good if we could decide on something soon, because this issue is blocking 1136859.
Flags: needinfo?(stas)
(In reply to Ted Clancy [:tedders1] from comment #7) > So what do you think, Staś? It would be good if we could decide on something > soon, because this issue is blocking 1136859. Let me respond to this first and I'll comment again later today to address your earlier points. I think that for bug 1136859 we should just do what we've been doing so far: innerHTML with <bdi><a></a></bdi> or <a dir="auto">. This should fix the issue right now and we can replace this witha proper fix once we agree on what to do in this bug. It's easy to grep for innerHTML in *.properties and we have bug 1027117 for removing it when bug 994357 lands. I don't want to rush a decision on a new addition to L20n syntax because of a single slash which is improperly positioned by the bidi algorithm when a work-around exists :)
(In reply to Ted Clancy [:tedders1] from comment #5) > That's a good idea too. Or, as an alternative, maybe we could always wrap > the whole string in Unicode directional isolation characters, regardless of > the language. It wouldn't hurt. I would be actually surprised if it didn't hurt. We'd need input from layout team, but I'd expect that wrapping *every* string in our UI in DIC characters should cost more than having the same UI without those characters. Or, let me say this this way - if there's some benefit of wrapping all strings in HTML file in DIC characters, why don't we do this on Gecko level? Why is it not just part of HTML standard?
(In reply to Ted Clancy [:tedders1] from comment #5) > I don't think we can leave this up to localizers. > > For one thing, the insertion of Unicode directional isolation characters > needs to happen even for English UIs which haven't been translated by a > localizer. If you have an English string that says "Hello {{$user.name}}, > you have new mail" and the user's name is in Hebrew, the comma will end up > displayed in the wrong place unless we insert directional isolation > characters around the user's name. The English string is also a translation of sorts; the developer being the localizer into English. Developers need to know what they're doing; they need to know to use <bdi> in this case. I don't think it's unreasonable to expect them to know about this good practice. > Secondly, this needs to be done almost everywhere. If the bidi wrapping has > to be done in the .l20n files every time there's a substitution, then the > syntax will become cumbersome, ugly, and error-prone, and people won't enjoy > using it. They'll look for a different tool. And if they do stick with L20n, > they will frequently forget to put in the bidi wrapping, making this a > constant source of easily-missed bugs. These are good points and I'm not opposed to the idea of extending the L20n syntax. At the same time, I'm trying to evaluate if there are simpler ways of achieving the same goal. Requiring to explicitly use <bdi> or FSI/PDI might be more error-prone, but it also gives more control to the developer and the localizer alike. We can also rely on authoring tools to warn localizers about the missing <bdi> elements in interpolated strings. > Thirdly, I expect localizers to understand the grammar of the languages > their using, but I don't expect them to understand Unicode bidi characters. > We'd be asking them to do something that is outside their area of expertise. > I expect their responses to range from "Huh? What? I don't understand this" > to "Why can't the system insert these characters automatically?". I want developers to learn about <bdi> because it's their job to know about bi-directional text. So I'm okay if L20n expects developers to create strings like: <hello "Hello, <bdi>{{ $user.name }}</bdi>"> Then, when the localizer sees this string and translates it into their language, I fully expect them to follow the structure and keep <bdi> in the translation as well. > This raises questions. Exactly when does wrapping happen? Does {{ > ($name).attribute }} get wrapped? What about {{ my_macro($name).attribute }} > ? What about {{ @cldr.otherfunction($name) }} ? Ah, you're right, my bad. The macros don't really help because the wrapping needs to be the very last step in the string interpolation process. The only thing that could work is the @cldr.wrapbidi macro but only if we don't wrap {{}} by default. It would also need to be used everywhere, so I don't see too much of a benefit over using <bdi>. > I think the more obvious thing to do is to apply wrapping whenever a string > is substituted into another string. That is, whenever the {{ }} is > evaluated. > > And that means we'll need a new syntax for specifying when wrapping > shouldn't happen. Right, I think we agree here. If we want the wrapping to happen by default, we'll need new syntax the no-wrap case. As you can see from my other comments, it's the "by default" part that I'm still not convinced about :) > That's a good idea too. Or, as an alternative, maybe we could always wrap > the whole string in Unicode directional isolation characters, regardless of > the language. It wouldn't hurt. That's also a possibility. I'm not opposed but I'd like to first understand if this could have any performance impact.
Flags: needinfo?(stas)
To re-cap, I think there's a number of way we can move forward here: 1. {{ }} is human-readable by default (wraps bidi, uses number.toLocaleString etc.) and we introduce a new syntax for the machine-readable case, 2. {{ }} is machine-readable by default and: 2.1. we provide a @humanReadable macro (name TBD), 2.2. we privide @cldr.wrapbidi and @cldr.formatNumber macros, 2.3. we provide @FSI and @PDI static macros which insert the respective Unicode control marks, 2.4. we recommend the use of <bdi> around placeables which may contain text of unknown directionality. Zibi, Axel -- what are your thoughts on this?
Flags: needinfo?(l10n)
Flags: needinfo?(gandalf)
I also wonder how common will the need for machine-readable placeables actually be. For URLs in hrefs, bug 994357 will introduce the overlaying mechanism which takes the href from the source elements and copies it to the appropriate element in the translation. The localizers won't be able to define they're own hrefs for security reasons. For numbers, I can imagine something like <elementWidthInEm "10.5">, but that's rather hacky and I'm not a fan of this practice in general. Maybe it would make sense to make {{ }} human readable by default and that's it? No machine-readable case at all.
I personally think that "machine-" vs "human-readable" is not the right way to slice it. Expecting to have a machine-readable string after a localizer touched it sounds like a recipe for bustages, just generally. I also think that the "getLocalizedLink" pattern creates more problems than it solves, I'd be much easier if href and className were set after localizing the innerHTML. The questions I see are: - what happens if you mix texts of different directionality? Does the bidi algorithm do the right thing? I see how URLs are funky, but that can be fixed with explicitly wrapping it. Not sure if we have URLs with arabic script parts on the web? I wonder how people expect those to be displayed. What other use cases are there? - what's the right default numberformat?
Flags: needinfo?(l10n)
Hi Staś, > I think that for bug 1136859 we should just do what we've been doing so far [...] I don't want to rush a > decision on a new addition to L20n syntax because of a single slash which is improperly positioned by the > bidi algorithm when a work-around exists :) Okay, I'll fix Bug 1136859 in a different way for now. > So I'm okay if L20n expects developers to create > strings like: > > <hello "Hello, <bdi>{{ $user.name }}</bdi>"> > > Then, when the localizer sees this string and translates it into their > language, I fully expect them to follow the structure and keep <bdi> in the > translation as well. You have more faith in both developers and localizers than I do. > I also wonder how common will the need for machine-readable placeables actually be. For URLs in hrefs, > bug 994357 will introduce the overlaying mechanism which takes the href from the source elements and > copies it to the appropriate element in the translation. That's a good point. Maybe it's a mistake to have any HTML attributes in the localization files. As Axel says in Comment 13, HTML tags are something the translators might mess up. And that leaves us without a use-case for generating machine-readable strings, so maybe it's not necessary. (By the way, I like the work you've done on DOM overlays and I think it's a good idea. But as long as the re-ordering problem remains, I don't think it's quite ready for prime time.) > Maybe it would make sense to make {{ }} human readable by default and that's it? No machine-readable > case at all. Yes, maybe that's the solution. In which case, feel free to close this Bugzilla ticket and create a new one. I have one more idea: Just in case there is an odd corner case where the developer needs to prevent wrapping, maybe it could be turned off via a pragma? Something that applies to the whole file, instead of to a particular substitution? Or maybe the developer can call L20n.disableWrapping() before loading the .l20n file?
As discussed above, I'm closing this ticket. All {{ }} substitutions should be considered human readable (the few exceptions should be rewritten). I created Bug 1154438 for inserting the bidi isolation characters around string substitutions. I think Bug 835748 will take care of handling numbers.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
Flags: needinfo?(gandalf)
Issue has been resolved as Invalid. Removing verifyme keyword.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmercado)
Keywords: verifyme
Flags: needinfo?(jmercado)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: