Switch to `unic-langid` to back MozLocale
Categories
(Core :: Internationalization, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: zbraniecki, Assigned: zbraniecki)
References
Details
Attachments
(3 files, 1 obsolete file)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
928 bytes,
patch
|
Details | Diff | Splinter Review | |
807 bytes,
patch
|
Details | Diff | Splinter Review |
As a warm-up to bug 1560038, I'd like to move MozLocale to use unic-langid
.
This will reduce the number of codepaths we have to parse Language Identifiers and pave the way to use unic-locale
there in the future.
This will also unify our locale management between LocaleService and fluent-rs.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
Assignee | ||
Comment 2•4 years ago
|
||
The patch is still a WIP, but is almost fully functional (two tests to fix), and ready for initial Rust/cbindgen/FFI review from :emilio. Setting NI!
Comment 3•4 years ago
|
||
It looks reasonable, I took a pass on phab, but I think there's missing code? I don't see the implementation of the FFI functions.
Updated•4 years ago
|
Assignee | ||
Comment 4•4 years ago
|
||
Assignee | ||
Comment 5•4 years ago
|
||
I profiled the startup locale parsing between central and this patch. There are small differences due to changes in callers, but overall the results are similar.
We parse ~400 language identifiers during startup.
On average, parsing a locale takes:
- central: 665 ns
- unic-langid: 417 ns (37% faster)
data: https://docs.google.com/spreadsheets/d/1eWjslErJmzwGnWSX5eNXctjwa_KlK9QkyyM3NqPHgm8/edit#gid=0
Assignee | ||
Comment 6•4 years ago
|
||
The patch is now passing all tests and is pending a couple review decisions from emilio.
I updated the spreadsheet with new data, which I separated for the initial cold startup with a new profile, and 3 runs with the same profile averaged.
== New profile:
Count: 956 -> 786 (-18%)
Sum: 671661 -> 403094 (-40%)
Average: 702 -> 512 (-28%)
Min: 255 -> 148 (-42%)
Max: 4595 -> 11901 (150%)
Stdev: 497 -> 986 (98%)
The count drops thanks to the change in how we calculate likely subtags (without having to parse the output into a new Locale).
The average and min drops, which comes from better performance (already measured before).
The combination of the two impacts the total time that the parsing takes, with 40% win, which is really nice!
Max goes up - the first parsing takes significantly longer with Rust than C++. I'm wondering if it is because that becomes the first point where we load gecko-rust lib? In result stdev goes up.
I'm not super worried about it, because majority of parsings are now much faster, and the total numbers are very good.
It still may be worth investigating why the max goes up.
== Old profile (avg. of 3 runs)
Count: 340 -> 274 (-20%)
Sum: 238151 -> 144606 (-39%)
Average: 523 -> 394 (-25%)
Min: 212 -> 112 (-47%)
Max: 10340 -> 9618 (-7%)
Stdev: 606 -> 812 (33%)
Here we can see that the total count stays 20% lower, total time remains 40% lower, average remains 25% lower and min remains 45% lower!
Max is now lower by 7%, which makes me think that the new profile max may have been a fluke. Stdev is still a little bit higher.
All in all, I think it looks really good!
Updated•4 years ago
|
Assignee | ||
Comment 7•4 years ago
|
||
Hi Jonathan - can you please take a look at the r? in the PR?
Assignee | ||
Comment 8•4 years ago
|
||
Assignee | ||
Comment 9•4 years ago
|
||
I updated spreadsheet with new numbers based on today's nightly: https://docs.google.com/spreadsheets/d/1eWjslErJmzwGnWSX5eNXctjwa_KlK9QkyyM3NqPHgm8/edit#gid=0
== New profile:
Count: 956 -> 784 (-18%)
Sum: 671661 -> 142324 (-79%)
Average: 702 -> 181 (-74%)
Min: 255 -> 46 (-82%)
Max: 4595 -> 5424 (+18%)
Stdev: 497 -> 391 (-21%)
== Old profile (avg. of 3 runs)
Count: 454 -> 339 (-25%)
Sum: 317534 -> 64443 (-79%)
Average: 697 -> 190 (-72%)
Min: 282 -> 45 (-84%)
Max: 13786 -> 7819 (-43%)
Stdev: 808 -> 595 (-26%)
This looks really good. I also measured unic-langid
against raw ICU4C 64 - https://github.com/zbraniecki/intl-measurements#current-results-as-of-oct-30-2019
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 10•4 years ago
|
||
Adding a patch I used to dump all MozLocale constructors without the unic-langid patch and their performance cost.
Assignee | ||
Comment 11•4 years ago
|
||
Adding a patch I'm using with the unic-langid patch to collect all MozLocale constructors and their performance cost.
Comment 12•3 years ago
|
||
Pushed by zbraniecki@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5a8fc2574be5 Switch MozLocale to use unic-langid. r=emilio,jfkthame
Comment 13•3 years ago
|
||
bugherder |
Description
•