[meta] Migrate Fluent in Gecko off of JavaScript
Categories
(Core :: Internationalization, task, P1)
Tracking
()
People
(Reporter: zbraniecki, Assigned: zbraniecki)
References
(Blocks 2 open bugs)
Details
(Keywords: meta)
Attachments
(1 file)
2.37 KB,
patch
|
Details | Diff | Splinter Review |
While majority of Fluent in Gecko is already in either Rust or C++, there are still two pieces in JavaScript.
There are three main reasons to move away from JS here:
- Performance (see bug 1613705 comment 6 for some rough estimates)
- Memory (same estimate gives us ~800kb savings)
- Architecture - current architecture makes JS code block first paint and layout of the initial window.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
We now have first functional pieces operational, and are starting to tie things up. The order of steps we plan to do is as follows:
- (djg) Land bug 1660393 to get C++
L10nRegistry::Load(Sync)
working - (djg) Factor our
chunk-vec
as a separate PR againstfluent-rs
(https://github.com/zbraniecki/fluent-rs/pull/3) - (djg/zibi) Merge the
l10nregistry-rs
PR from :djg (https://github.com/zbraniecki/l10nregistry-rs/pull/1) - (zibi) polish and release
l10nregistry-rs
- (zibi/djg) release
chunk-vec
- (zibi) Plug
l10nregistry-rs
intoL10nRegistry
in Gecko and expose viaXPIDL
- (djg/zibi) Merge the
fluent-fallback
PR from :djg (https://github.com/zbraniecki/fluent-rs/pull/3) - (zibi) Write a PR that moves
Localization.cpp
to usefluent-fallback
- (djg) Get the
Future->Promise
forfluent-fallback
toLocalization
WebIDL use - (zibi) Clean up
Localization
/DOMLocalization
/DocumentL10n
to remove the no longer needed JSContext - (zibi) Remove
Localization.jsm
andL10nRegistry.jsm
Assignee | ||
Comment 2•4 years ago
|
||
A set of markers used in performance profiles for identifying:
l10n_start_URL
- when the document encounters the initial FTL linkl10n_trigger_URL
- when the document triggers initial translation phasel10n_end_URL
- when the document reports initial translation to be completed
Assignee | ||
Comment 3•4 years ago
|
||
Final pre-review numbers!
With the advancements in Gecko bindings I was able to profile startup with the markers as described above.
Here are my profiles based on mozilla-central from the past weekend:
mozilla-central (using JS L10nRegistry and Localization):
1ms intervals:
https://share.firefox.dev/2Mmn6Te
https://share.firefox.dev/3a3R1bd
https://share.firefox.dev/3c8uIUn
0.1ms intervals:
https://share.firefox.dev/3qGfs5c
https://share.firefox.dev/3og8jqH
mozilla-central + l10nregistry-rs + localization-rs:
1ms intervals:
https://share.firefox.dev/3sNsqjv
https://share.firefox.dev/2LRgPzj
https://share.firefox.dev/2KO9C2q
0.1ms intervals:
https://share.firefox.dev/2Y7fAyi
https://share.firefox.dev/2YaePVh
In the main process you can find browser.xhtml and about:preferences, and in the content process about:home and about:newtab.
I'd appreciate any eyeballs that may want to evaluate anything standing out.
From my evaluation it looks like we're generally in a good shape, and what's remaining are:
- Consider whether we want to prefetch L10n in either sync or async and then apply translation as we parse instead of collecting elements and applying translations after.
- Consider whether we want to maintain the XUL cache and what's really a value of it when we're out of the JS realm on the blocking path
- Bunch of microoptimizations in the Fluent parser around slice iteration and bytes retrieval.
- Further Gecko/XPCOM/DOM bindings optimizations to minimize the cost there (hope to catch those in the review process!)
- In the about:newtab there seem to be a large cost of JSON parsing, likely
l10n-args
. Is there a chance we can parse JSON faster?
I consider those optimizations optional and non-blocking landing of this work now, because the performance numbers look good!
I'll share more details in the next comment.
Assignee | ||
Comment 4•4 years ago
|
||
I evaluated performance of four documents:
- browser.xhtml
- about:preferences
- about:home
- about:newtab
using two methods:
- 1ms profiler time, and
l10n_end - l10n_trigger
memory - talos tests
Profiler
From the profiler, I used the opt build, and measured l10n_end - l10n_start
and l10n_end - l10n_trigger
- the former being similarly noisy to talos
, and the latter being much cleaner. The latter is the real different, the phase where localization is applied. If you look at the profiles, almost nothing happens before than, as we don't currently prefetch
, so we can focus on the end - trigger
phase.
We need to recognize, that the profiler adds some overhead and in theory may give us different results, so it is important to cross-check with talos, but in this case, I think the results are quite consistent and Talos matches end - start
in the Profiler results, while end - trigger
is the isolated difference that represents the actual perf difference from the change.
There's also a little bit of first-run difference, so I used an average between 2nd and 3rd for the table below (stdev between them is low):
Document | JS (ms) | Rust (ms) | Diff | % |
---|---|---|---|---|
browser.xhtml | 7.5 | 4.7 | -2.8ms | -37.3% |
Preferences | 19 | 9.6 | -9.5ms | -65.8% |
about:home | 79 | 14 | -65ms | -82.27% |
about:newtab | 122 | 80 | -42ms | -34.42% |
Document | JS (mem) | Rust (mem) | Diff | % |
---|---|---|---|---|
browser.xhtml | 1.3mb | 0.9mb | -0.4mb | -30.76% |
Preferences | 1.94mb | 1.26mb | -0.68mb | -35.05% |
about:home | 5.25mb | 0.97mb | -4.28mb | -81.50% |
about:newtab | 6.8mb | 2.8mb | -4.00mb | -58.82% |
Both numbers, time and memory, go significantly down!
Talos
Unfortunately, talos tests are quite noisy, so it's really hard to pin-point the wins, but one of the wins with the patches is that the stdev
goes noticeably down, so I hope to also make the talos tests a better tool for further optimizations evaluation.
I tried to run it with ~40 reps, but stdev is continuously high enough that cutting 3ms from browser.xhtml
or even 10ms from about:preferences
is indistinguishable from noise when stdev is 15-20ms!
In result, my read from talos is that most numbers go down, in several cases quite significantly. stdev also goes down, which is great for the value of talos further :)
Document | Platform | JS (ms) | Rust (ms) | Diff | % |
---|---|---|---|---|---|
ts_paint | Linux | 253.1 | 254.98 | 1.88ms | +0.7% |
ts_paint | MacOS | 928.1 | 934.32 | 6.22ms | +0.67% |
ts_paint | Windows | 365.88 | 359.85 | -6.03ms | -1.64% |
twinopen | Linux | 342.67 | 343.88 | 1.21ms | +0.35% |
twinopen | MacOS | 124.54 | 122.0 | -2.54ms | -2.03% |
twinopen | Windows | 104.5 | 101.66 | -2.84ms | -2.74% |
about_newtab | Linux | 30.85 | 30.21 | -0.64ms | -2.07% |
about_newtab | MacOS | 32.36 | 32.08 | -0.28ms | -0.86% |
about_newtab | Windows | 31.81 | 29.74 | -2.07ms | -6.50% |
about_preferences_basic | Linux | 124.39 | 102.0 | -22.39ms | -17.99% |
about_preferences_basic | MacOS | 107.73 | 104.84 | -2.89ms | -2.68% |
about_preferences_basic | Windows | 116.19 | 105.94 | -10.25ms | -8.82% |
Here's the full compare view: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=8771dfdc8694a91053b5e86c0a8ad9de34b68393&newProject=try&newRevision=c7f8b45423c3f228ad170c0a9b668e424f9abc96
With porfiler wins in both time and memory, and talos showing general trend down, some strong wins and much lower stdev in all tests, I'm comfortable recommending this change with the numbers as we have them right now.
Once we're closer to landing, I'll redo the talos tests to see if maybe we get more significant wins.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
Latest benchmarks: bug 1613705 comment 37
Latest talos numbers: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=4e7fdee308deafa3bebc6f177caf5d1720ee369f&newProject=try&newRevision=fd42ad55cf7527849a589454153c6e3bf1a38b11&framework=1
The status of the patchset:
- FileSource - mostly reviewed, likely close to final state, some opportunity to profile I/O
- L10nRegistry - in review, seems to be stabilizing, likely in last rounds of review
- Localization - first round of reviews, functionality complete
And on the crate side:
- fluent-syntax - stable, documented, good test coverage
- fluent-bundle - stable, documented, good test coverage
- fluent-fallback - to be documented and cleaned up, but stable
- l10nregistry - to be documented and cleaned up, but stable
Assignee | ||
Comment 6•4 years ago
|
||
Assignee | ||
Comment 7•3 years ago
|
||
this is now fixed and in beta.
Description
•