Closed Bug 1581960 Opened 5 years ago Closed 5 years ago

Use fluent-locale-rs for LocaleService::NegotiateLanguages

Categories

(Core :: Internationalization, task, P3)

task

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

Details

Attachments

(2 files)

On top of bug 1571915, we should switch to use fluent-locale-rs for language negotiation.

I ran the following test:

{let start = performance.now();Services.locale.negotiateLanguages(

["de", "pl", "fr", "uk", "it", "ar", "de-AT", "uk", "fa", "cak", "fy-NL", "ga-IE", "ff", "eo", "es-AR", "es-MX", "wo", "xh", "lv", "pa-IN", "ne-NP", "ms", "ltd", "kn", "lij", "oc"],
["es", "en-GB", "sr-Latn-RS", "de", "es-CA", "en-CA", "en-GB", "en-AU", "sk", "mk", "zh-CN", "zh-Hans", "zh-Hant", "hi-IN", "gu-IN", "nn-NO", "nb-NO", "ne-NP", "trs", "tl", "nl", "oc", "kk", "km", "sq", "bs", "az", "ast", "an", "ga-IE", "ach", "fa", "eu", "sv-SE", "fy-NL", "ia", "son"]

);let end = performance.now();console.log(end - start);}

only monitoring the CPP time and compared it to Rust:

CPP: 310 us
Rust: 50 us

See Also: → 1571915

Plugged into LocaleService, from JS:

m-c: 0.33 ms
patch: 0.089 ms

I'm almost done with the last blocker which is to have likely subtags implemented in unic-langid - https://github.com/zbraniecki/unic-locale/pull/15

With those two ready, I can vendor in unic-langid and fluent-locale and replace all the guts of MozLocale and LocaleService::NegotiateLanguages with them.

This will be needed for bug 1560038, but also brings performance wins on its own.
I'll do closer measurements of the perf impact, but initial values from comment 1 are likely to either stay or even improve as we continue to iterate over the crates, and since we perform ~10 negotiations during Firefox startup, I'm planning to measure if there's a noticable startup perf win from it!

There's one thing to tackle tho. At the moment the likely subtags from Rust (which are on their own much faster than ICU4C - (135.453 us -> 19 us in my microbench) add ~70kb of data.

It would be nice to offset that addition with carving out likely subtags from ICU, but I'm not sure if we'll be able to do it.

In my initial testing it seems that outside of MozLocale, uloc_addLikelySubtags is used in js/src/builtin/intl/Locale and several internal methods of ICU [0].

I'm not sure if we can vendor in unic-langid` into SpiderMonkey, and I'm not sure if we want to. It should be feature compatible, is faster, uses less memory and is backed up by CLDR, but it may be a bigger project since I haven't seen Rust floating around in SM yet.
NI on :waldo on that part.

The other question is about internal uses. If we would to pull out likelySubtags data from ICU would the methods in tzfmt.cpp, tzgnames.cpp and tznames_impl.cpp break? Do we use it?
NI on :m_kato and :jfkthame on that one.

I'm ok leaving the likely subtags in ICU and duplicating it in Gecko from unic-langid. I didn't measure the binary impact yet but I will soon. Maybe 70k is ok?

But we'll start adding more of the rust intl APIs like https://github.com/unclenachoduh/pluralrules next and I'd like to start a conversation about if there are ways to minimize our ICU data surface as we move it to Rust Intl crates.

[0] https://searchfox.org/mozilla-central/search?q=uloc_addLikelySubtags&path=

Flags: needinfo?(m_kato)
Flags: needinfo?(jwalden)
Flags: needinfo?(jfkthame)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #2)

The other question is about internal uses. If we would to pull out likelySubtags data from ICU would the methods in tzfmt.cpp, tzgnames.cpp and tznames_impl.cpp break? Do we use it?
NI on :m_kato and :jfkthame on that one.

I don't think that this causes break. Of course, we need test ja-JP-mac locale for application UI locale.

I'm ok leaving the likely subtags in ICU and duplicating it in Gecko from unic-langid. I didn't measure the binary impact yet but I will soon. Maybe 70k is ok?

Now although GV is aar package, we don't take care of size now for GV. 70K is no problem since we will remove Fennec only code (reducing a lot of Java and UX code).

Flags: needinfo?(m_kato)

Zibi, I was wanting to experiment a bit with the implementation in unic-langid, so I cloned your git repo and did a cargo build and cargo test ... but AFAICS it doesn't seem to be running any likelysubtags tests for me. Is there something I need to do in order to exercise this locally? (It's probably some really trivial thing, but I'm not familiar with the tools/environment...)

Flags: needinfo?(jfkthame) → needinfo?(gandalf)

Yes! Sorry I didn't document it yet, but I made the likelysubtags an optional feature to save on size. cargo test --features "likelysubtags". For fluent-locale the feature name is "cldr" - you can find the available features in Cargo.toml

Flags: needinfo?(gandalf)

Ah - thanks. And I need to run that from within the /unic-langid-impl/ directory, apparently; a top-level cargo test doesn't go there, right?

yeah, we have benchmarks and tests in unic-langid-impl, but the user-facing crate is unic-langid - that's due to a temporary limitation of Rust which requires us to add separate crates for procedural macros, so that langid!("en-US") works.

so, to test things:

cd unic-langid-impl
cargo test --features "likelysubtags"
cargo bench --features "likelysubtags"

and if you want macros:

cd unic-langid
cargo run --example simple-langid --features "macros"
Attached patch langneg.diffSplinter Review

As I finalize bug 157915 its time to take this one on top.

I started by collecting all negotiations that happen at startup and their cost using today's m-c without unic-langid switch. Attaching the patch I used.

The doc is here: https://docs.google.com/spreadsheets/d/13x2S_xhGCD8ArUz9MiFPMSxCwrIVsbe1vb3Va35vGc8/edit?usp=sharing

What strikes me most is how many negotiations we do with empty list of availables. That feels like a mistake.

Assignee: nobody → gandalf
Flags: needinfo?(jwalden)

The other thing are the three negotiations with en,b-d,b-e,b-1-d,b-1-e, as available. Looking at searchfox I see it comes from search extensions - https://searchfox.org/mozilla-central/search?q=b-1-e&path=

Mark - are you the right person to look into this? It seems like we're trying to pass strings like b-e and b-1-d as language identifiers, while they are not.
Our logic will filter them out, but I'm wondering if there's something worth checking in the code that accumulates available locales for the negotiation.

Flags: needinfo?(standard8)
Status: NEW → ASSIGNED

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #10)

The other thing are the three negotiations with en,b-d,b-e,b-1-d,b-1-e, as available. Looking at searchfox I see it comes from search extensions - https://searchfox.org/mozilla-central/search?q=b-1-e&path=

Mark - are you the right person to look into this? It seems like we're trying to pass strings like b-e and b-1-d as language identifiers, while they are not.

The search service uses explicit overrides in the WebExtensions we ship to achieve various things. We do this currently to allow us to group extensions into one which a) avoids duplicating code, b) makes the whole config simpler, c) allows us to do things like specify different urls to use for a particular region. If we didn't do this, we'd have 4 Google WebExtensions, for example.

I think Google is the only one where we don't have "legal" locales. However, some of the other extensions, we are explicitly forcing the locale to sometimes be different from the user's, e.g. due to wanting a particular engine setup due to being in a certain region.

A couple of code pointers:

Our logic will filter them out, but I'm wondering if there's something worth checking in the code that accumulates available locales for the negotiation.

Sorry, I don't understand the implications of filtering them out. Where is the filtering being done, what does that affect?

Is that also a silent filter? If so, that's a little concerning.

If this stops the WebExtensions from including those locales or letting us select them, then we need to find another solution. With the modern configuration, we are working towards being able to rework the Google extension to only have the one configuration (i.e. not code-based names for locales), we're aiming towards 72 for that, but we're not going to be able to remove/break the old legacy configuration for a couple releases due to making sure modern ships.

If you can add a bit more detail, that'd be useful to know what the effects are here.

Flags: needinfo?(standard8) → needinfo?(gandalf)

Sorry, I don't understand the implications of filtering them out. Where is the filtering being done, what does that affect?

If you pass sth like this:

let requested = ["pl"];
let available = ["en", "b-1-d", "b-e", "pl", "ru", "fr"];
let supported = negotiateLanguages(requested, available);

then we have to figure out what to do with b-1-d and b-e. They are not language identifiers and we won't allow user to ever select such string as a requested locale, so it'll never match.

What we do right now is filter them out, so that the real list of available passed to negotiation is ["en", "pl", "ru", "fr"] in such a case.

I'd like to strengthen it, initially just issuing a warning if you try to pass invalid locales as available/default, to make it easier to catch when you pass something that is not a Unicode Language Identifier there, rather than just quietly filter them out.

My question is, why is b-1-d passed to available locales? It's not a locale. What is it meant to match against? In which scenario? Because no user has b-1-d locale...

Flags: needinfo?(gandalf) → needinfo?(standard8)

Initial results based on https://docs.google.com/spreadsheets/d/13x2S_xhGCD8ArUz9MiFPMSxCwrIVsbe1vb3Va35vGc8/edit#gid=0

== New profile:

Count: 33 -> 33 (0%)
Sum: 916946 -> 269912 (-70%)
Average: 27786 -> 8179 (-70%)
Min: 4264 -> 3331 (-22%)
Max: 75692 -> 26340 (-65%)
Stdev: 15649 -> 5700 (-64%)

== Old profile (avg. of 3 runs)

Count: 18 -> 18 (0%)
Sum: 444869 -> 107093 (-76%)
Average: 24715 -> 5950 (-76%)
Min: 7936 -> 1046 (-87%)
Max: 88947 -> 23522 (-74%)
Stdev: 21533 -> 6505 (-70%)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #12)

What we do right now is filter them out, so that the real list of available passed to negotiation is ["en", "pl", "ru", "fr"] in such a case.

Ok, that will break the search service at this time.

I'd like to strengthen it, initially just issuing a warning if you try to pass invalid locales as available/default, to make it easier to catch when you pass something that is not a Unicode Language Identifier there, rather than just quietly filter them out.

Just a warning might be ok, though I think the search service will be making that happen every startup.

My question is, why is b-1-d passed to available locales? It's not a locale. What is it meant to match against? In which scenario? Because no user has b-1-d locale...

I think I explained it in comment 11. I'm not sure what I can add to that. To be explicit, it is a hack that made some things simpler.

Flags: needinfo?(standard8)

Ok, that will break the search service at this time.

I don't understand this sentence. What does "will" mean here? The filtering works this way (rejecting invalid language identifiers) since its inception. Which means it always filtered out invalid language identifiers.

But even if it didn't filter them out, they would still not match to anything ever, because language negotiation matches a requested locale to an available locale. If we didn't filter an invalid available, we would still never accept an invalid requested (user cannot set their locale in Firefox to "b-1-d") so there is never a match.

What do you mean by "that will break"? I'm reporting a behavior that is likely not working and never has been. Is that actionable?

I think I explained it in comment 11. I'm not sure what I can add to that. To be explicit, it is a hack that made some things simpler.

I don't understand the explanation from comment 11 then. I don't understand what is the intention of a matching between any requested locale and "b-1-d" and what is the expected outcome (assuming the b-1-d code has been passed intentionally).

In other words, if you say that the current behavior of negotiateLanguages doesn't work for the search service in the b-1-d scenario, what would be the behavior, or output, you'd expect to say that it does work?

Flags: needinfo?(standard8)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #15)

Ok, that will break the search service at this time.

I don't understand this sentence. What does "will" mean here? The filtering works this way (rejecting invalid language identifiers) since its inception. Which means it always filtered out invalid language identifiers.

Oh, were you just pointing this out as a potential issue, not something that would be changed by this patch? I assumed the functionality would be changing with this patch, and hence might then break things.

The search service code works today, and we have tests to verify it in toolkit/components/search. I suspect it works because in the bad cases, we're telling the add-on manager exactly what to load. In a few releases, we'll hopefully stop using these bad identifiers.

Flags: needinfo?(standard8)

Yes, I don't expect any change in result of this patchset, but I would like to start warning on passing incorrect language codes to the algorithm (not in this issue, in the near future) which would make search code trigger warnings in the console.

I'll file a bug to find a way to avoid passing incorrect values to the negotiation algorithm.

Priority: -- → P3
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5ff314660cab
Use fluent-locale-rs for LocaleService::NegotiateLanguages. r=emilio
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: