Closed Bug 1450781 Opened 6 years ago Closed 6 years ago

Enable pseudolocalization in Fluent

Categories

(Core :: Internationalization, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

Fluent should provide basic pseudolocalization strategy to help developers working with the UI test it against RTL or longer strings.
IIRC you said today that you have some ideas on how this could be done correctly. I'll write the fluent part in github, but wouldn't mind collecting your feedback here before so that I can test it in Firefox :)
Flags: needinfo?(stas)
Comment on attachment 8964397 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

https://reviewboard.mozilla.org/r/233122/#review243274

I like how simple this is, thanks Zibi! I have a number of inline comments and I'd like to see another revision of this patch, but the general direction looks great.

It would be great to support the bidi pseudolocale right away. The code is written so I don't think it adds much complexity. If you want to add it later, however, please design the pref name such that developers are able to choose from more than one pseudo strategy.

Note that you'll also need to make a few more changes in MessageContext in places where it checks for strings: the Type function and the Pattern function.

::: intl/l10n/L10nRegistry.jsm:249
(Diff revision 1)
>    }
>  };
>  
> +function pseudo(msg) {
> +  let chars = {
> +    'caps': [550,385,391,7698,7702,401,403,294,298,308,310,319,7742,544,510,420,586,344,350,358,364,7804,7814,7818,7822,7824],

Why use charcodes instead of the actual characters?

::: intl/l10n/L10nRegistry.jsm:256
(Diff revision 1)
> +  };
> +  return msg.replace(/[a-z]/ig, (ch) => {
> +    let cc = ch.charCodeAt(0);
> +    if (cc >= 97 && cc <= 122) {
> +      if (ch === 'a' || ch === 'e' || ch === 'o' || ch === 'u') {
> +        return String.fromCharCode(chars['small'][cc - 97]) + String.fromCharCode(chars['small'][cc - 97]);

Please use an intermediate variable here so that the doubling of vowels is more evident.

::: intl/l10n/L10nRegistry.jsm:293
(Diff revision 1)
>      return L10nRegistry.sources.get(sourcesOrder[i]).fetchFile(locale, resourceId);
>    });
>  
>    const ctxPromise = Promise.all(fetchPromises).then(
>      dataSets => {
> +      const usePseudo = Services.prefs.getBoolPref("intl.l10n.use-pseudo", false);

How about choosing a name which allows us to add more pseudolocaliation strategies in the future? `intl.l10n.pseudo.accented` and `intl.l10n.pseudo.bidi` would be my picks.

I understand that you'd then have to observe two (and even more in the future) prefs and resolve conflicts between them. It doesn't scale. Perhaps the easiest solution is to make it a string-typed pref? `intl.l10n.pseudo` with possible known values: "" (empty), "accented" and "bidi".

::: intl/l10n/L10nRegistry.jsm:295
(Diff revision 1)
>  
>    const ctxPromise = Promise.all(fetchPromises).then(
>      dataSets => {
> +      const usePseudo = Services.prefs.getBoolPref("intl.l10n.use-pseudo", false);
> +      MSG_CONTEXT_OPTIONS.transform = usePseudo ? pseudo : undefined;
>        const ctx = new MessageContext(locale, MSG_CONTEXT_OPTIONS);

I suggest using `Object.assign` here (or the object spread syntax) so that `MSG_CONTEXT_OPTIONS` doesn't change on runtime:

    const ctx = new MessageContext(
        locale,
        {
            ...MSG_CONTEXT_OPTIONS,
            transform: PSEUDO_STRATEGIES[pseudoNameFromPref]
        }
    );

::: intl/l10n/MessageContext.jsm:1712
(Diff revision 1)
>     *
>     * @param   {string|Array<string>} locales - Locale or locales of the context
>     * @param   {Object} [options]
>     * @returns {MessageContext}
>     */
> -  constructor(locales, { functions = {}, useIsolating = true } = {}) {
> +  constructor(locales, { functions = {}, useIsolating = true, transform = undefined } = {}) {

I ran a quick test and there's no significant difference between checking of `transform` is defined and calling a noop function. I recommend the default value of the transform parameter to be `x => x`.

::: intl/l10n/MessageContext.jsm:1826
(Diff revision 1)
> -      return message;
> +      return this._transform ? this._transform(message) : message;
>      }
>  
>      // optimize simple-string entities with attributes
>      if (typeof message.val === "string") {
>        return message.val;

This also requires the call to `this._transform`.
Flags: needinfo?(stas)
Priority: -- → P3
Attachment #8964397 - Attachment is obsolete: true
Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Comment on attachment 8964397 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

https://reviewboard.mozilla.org/r/233122/#review243274

> Why use charcodes instead of the actual characters?

Unfortunately JSM doesn't handle Unicode at all :(
> I ran a quick test and there's no significant difference between checking of `transform` is defined and > calling a noop function. I recommend the default value of the transform parameter to be `x => x`.

How much can I challenge you on that?

My take is that any complexity that is debug-mode should not affect production. If we were to use a build time preprocessing, I'd be fine here, but putting thousands of string operations per runtime of a major product through a noop just for convinience of the developers is the wrong tradeoff imho.

I asked on #jsapi and the response I got was that they're not expensive, but they're also not optimized away fully resulting in "dozens of machine operations" per call.

Could we maybe instead have two formats?

```
class MessageContext {
  constructor(transform) {
    this.format = if (transform) this.formatWithTransform : this.formatWithoutTransform;
  }
}
```

What do you think?
Comment on attachment 8974534 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

Another round of feedback ready :)

The interesting tibid is that we don't reverse the direction of layout yet. We don't set `dir` on `<html>` and its equivalent on XUL documents yet. There's a separate pref `ui.direction` which enforces layout direction.

I assume that in dev preferences we may end up flipping both for bidi, but for now it works.

I also think that eventually we'll probably want to make a bit more sophisticated transform functions that also transform major punctuation signs like `!?` and `[]` etc. but for now I think it's a great start :)

If we'll be ok with the direction, I'll write the patch for fluent.js repo.
Attachment #8974534 - Flags: feedback?(stas)
Also, wondering if you think we should add `[` and `]` to wrap up text boundries here or start with the well-tested algorithm and extend it later?
Comment on attachment 8974534 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

Axel: Would love to get your feedback on the approach going forward here.

You can download a build here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3313bb475149650e3b36334c439a063139e9056e

and in browser console flip `Services.prefs.setStringPref("intl.l10n.pseudo", "accented");`

You can also flip it to `bidi` but then you'll also need to set `intl.uidirection` to `1` to get the full RTL experience I believe.

Questions:
 - does it look useful?
 - is it a good first step?
 - any thoughts on what else should it do?
Attachment #8974534 - Flags: feedback?(l10n)
Comment on attachment 8974534 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

General feedback is good.

One thing I'd change is the extent to which we pseudo. I think half as pseudo would be pseudo enough, I find the UI really hard to use.

Maybe only accent consonants, and double latin vowels for accented, and no doubling at all for bidi? Vowels make up for a significant part of word discovery, so keeping those would be great. This is in particular important as we're having a global switch, so people need to use the full browser (eventually) in this layout. I don't think we'd loose a lot in functionality.

Can we just force the intl.direction pref if we have pseudo == 'bidi'?

There are a couple of strings in the general pref page that are not pseudo'ed, but coming from fluent, like the default browser check, and applications-description. Finding hard-coded strings and re-translation problems is part of the target here, right? We need to make sure we're not breaking this on the fluent side. Or is this stuff that actually has re-translation problems right now?
Attachment #8974534 - Flags: feedback?(l10n) → feedback+
Comment on attachment 8974534 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

https://reviewboard.mozilla.org/r/242868/#review252380

I like the direction, Zibi! I'm very sorry I kept you waiting here.

You'll need to make two more changes to the resolver, in Type and in Pattern, where the code checks for strings.

I think accesskeys are broken now in both accented (funny characters) and in bidi (funny characters surrounded by RLO and PDI). I wonder if it would make sense to opt out of the transform for simple strings (i.e. when format() doesn't go into resolve()) when the string is 1-char-long.

::: intl/l10n/L10nRegistry.jsm:247
(Diff revision 2)
>        }
>      }
>    }
>  };
>  
> +const ACCENTED_MAP = {

Maybe add comments from https://github.com/l20n/l20n.js/blob/v2.x/src/lib/pseudo.js?

::: intl/l10n/L10nRegistry.jsm:275
(Diff revision 2)
> +    return prefix + part.replace(/[a-z]/ig, (ch) => {
> +      let cc = ch.charCodeAt(0);
> +      if (cc >= 97 && cc <= 122) {
> +        if (ch === "a" || ch === "e" || ch === "o" || ch === "u") {
> +          const newChar = String.fromCodePoint(map.small[cc - 97]);
> +          return newChar + newChar;

I find the text hard to read when it's bidi and when the vowels are duplicated at the same time. I'd turn duplication on only in "accented". FWIW, I don't mind the duplicate vowels being accented ones.

::: intl/l10n/MessageContext.jsm:1838
(Diff revision 2)
> -      return message;
> +      return this._transform ? this._transform(message) : message;
>      }
>  
>      // optimize simple-string entities with attributes
>      if (typeof message.val === "string") {
> -      return message.val;
> +      return this._transform ? this._transform(message.val) : message.val;

I'm not fond of this check because it will leak into the resolver code as well, where you'll need to run strings through the transform function as well. (See my top-level comment about Type and Pattern.) 

I ran a quick test with `make -C fluent build && make perf-jsshell` and on my machine there's no observable difference between running a noop transform and checking `this._transform` in each `format()` call. A special-purpose `formatWithTransform` doesn't solve the problem of needing to call the transform in the resolver.

If you feel strongly about not using a noop transform (even despite the numbers; or if you're getting different numbers), I won't block on this if check here.
Attachment #8974534 - Flags: feedback?(stas) → feedback+
Attachment #8974534 - Attachment is obsolete: true
Comment on attachment 8982646 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

https://reviewboard.mozilla.org/r/248606/#review255906
Attachment #8982646 - Flags: review?(dtownsend) → review+
Comment on attachment 8982646 [details]
Bug 1450781 - Enable pseudolocalization in Fluent.

https://reviewboard.mozilla.org/r/248606/#review255908

Please add a test that setting the pref to an invalid pseudolocale doesn't break things.
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ec66192b6be9
Enable pseudolocalization in Fluent. r=mossop
https://hg.mozilla.org/mozilla-central/rev/ec66192b6be9
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: