Closed Bug 1660391 Opened 1 year ago Closed 2 months ago

[meta] Migrate Fluent in Gecko off of JavaScript

Categories

(Core :: Internationalization, task, P1)

task

Tracking

()

RESOLVED FIXED

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

(Blocks 2 open bugs)

Details

(Keywords: meta)

Attachments

(1 file)

While majority of Fluent in Gecko is already in either Rust or C++, there are still two pieces in JavaScript.

There are three main reasons to move away from JS here:

  • Performance (see bug 1613705 comment 6 for some rough estimates)
  • Memory (same estimate gives us ~800kb savings)
  • Architecture - current architecture makes JS code block first paint and layout of the initial window.
Priority: -- → P1
Depends on: 1613705
Depends on: 1660392

We now have first functional pieces operational, and are starting to tie things up. The order of steps we plan to do is as follows:

  1. (djg) Land bug 1660393 to get C++ L10nRegistry::Load(Sync) working
  2. (djg) Factor our chunk-vec as a separate PR against fluent-rs (https://github.com/zbraniecki/fluent-rs/pull/3)
  3. (djg/zibi) Merge the l10nregistry-rs PR from :djg (https://github.com/zbraniecki/l10nregistry-rs/pull/1)
  4. (zibi) polish and release l10nregistry-rs
  5. (zibi/djg) release chunk-vec
  6. (zibi) Plug l10nregistry-rs into L10nRegistry in Gecko and expose via XPIDL
  7. (djg/zibi) Merge the fluent-fallback PR from :djg (https://github.com/zbraniecki/fluent-rs/pull/3)
  8. (zibi) Write a PR that movesLocalization.cpp to use fluent-fallback
  9. (djg) Get the Future->Promise for fluent-fallback to Localization WebIDL use
  10. (zibi) Clean up Localization/DOMLocalization/DocumentL10n to remove the no longer needed JSContext
  11. (zibi) Remove Localization.jsm and L10nRegistry.jsm
Blocks: 1683759
Blocks: 1685365
Attached patch markers.diffSplinter Review

A set of markers used in performance profiles for identifying:

  • l10n_start_URL - when the document encounters the initial FTL link
  • l10n_trigger_URL - when the document triggers initial translation phase
  • l10n_end_URL - when the document reports initial translation to be completed
Assignee: nobody → zbraniecki

Final pre-review numbers!

With the advancements in Gecko bindings I was able to profile startup with the markers as described above.

Here are my profiles based on mozilla-central from the past weekend:

mozilla-central (using JS L10nRegistry and Localization):

1ms intervals:

https://share.firefox.dev/2Mmn6Te
https://share.firefox.dev/3a3R1bd
https://share.firefox.dev/3c8uIUn

0.1ms intervals:

https://share.firefox.dev/3qGfs5c
https://share.firefox.dev/3og8jqH

mozilla-central + l10nregistry-rs + localization-rs:

1ms intervals:

https://share.firefox.dev/3sNsqjv
https://share.firefox.dev/2LRgPzj
https://share.firefox.dev/2KO9C2q

0.1ms intervals:

https://share.firefox.dev/2Y7fAyi
https://share.firefox.dev/2YaePVh

In the main process you can find browser.xhtml and about:preferences, and in the content process about:home and about:newtab.

I'd appreciate any eyeballs that may want to evaluate anything standing out.

From my evaluation it looks like we're generally in a good shape, and what's remaining are:

  • Consider whether we want to prefetch L10n in either sync or async and then apply translation as we parse instead of collecting elements and applying translations after.
  • Consider whether we want to maintain the XUL cache and what's really a value of it when we're out of the JS realm on the blocking path
  • Bunch of microoptimizations in the Fluent parser around slice iteration and bytes retrieval.
  • Further Gecko/XPCOM/DOM bindings optimizations to minimize the cost there (hope to catch those in the review process!)
  • In the about:newtab there seem to be a large cost of JSON parsing, likely l10n-args. Is there a chance we can parse JSON faster?

I consider those optimizations optional and non-blocking landing of this work now, because the performance numbers look good!
I'll share more details in the next comment.

I evaluated performance of four documents:

  • browser.xhtml
  • about:preferences
  • about:home
  • about:newtab

using two methods:

  • 1ms profiler time, and l10n_end - l10n_trigger memory
  • talos tests

Profiler

From the profiler, I used the opt build, and measured l10n_end - l10n_start and l10n_end - l10n_trigger - the former being similarly noisy to talos, and the latter being much cleaner. The latter is the real different, the phase where localization is applied. If you look at the profiles, almost nothing happens before than, as we don't currently prefetch, so we can focus on the end - trigger phase.

We need to recognize, that the profiler adds some overhead and in theory may give us different results, so it is important to cross-check with talos, but in this case, I think the results are quite consistent and Talos matches end - start in the Profiler results, while end - trigger is the isolated difference that represents the actual perf difference from the change.

There's also a little bit of first-run difference, so I used an average between 2nd and 3rd for the table below (stdev between them is low):

Document JS (ms) Rust (ms) Diff %
browser.xhtml 7.5 4.7 -2.8ms -37.3%
Preferences 19 9.6 -9.5ms -65.8%
about:home 79 14 -65ms -82.27%
about:newtab 122 80 -42ms -34.42%
Document JS (mem) Rust (mem) Diff %
browser.xhtml 1.3mb 0.9mb -0.4mb -30.76%
Preferences 1.94mb 1.26mb -0.68mb -35.05%
about:home 5.25mb 0.97mb -4.28mb -81.50%
about:newtab 6.8mb 2.8mb -4.00mb -58.82%

Both numbers, time and memory, go significantly down!

Talos

Unfortunately, talos tests are quite noisy, so it's really hard to pin-point the wins, but one of the wins with the patches is that the stdev goes noticeably down, so I hope to also make the talos tests a better tool for further optimizations evaluation.

I tried to run it with ~40 reps, but stdev is continuously high enough that cutting 3ms from browser.xhtml or even 10ms from about:preferences is indistinguishable from noise when stdev is 15-20ms!

In result, my read from talos is that most numbers go down, in several cases quite significantly. stdev also goes down, which is great for the value of talos further :)

Document Platform JS (ms) Rust (ms) Diff %
ts_paint Linux 253.1 254.98 1.88ms +0.7%
ts_paint MacOS 928.1 934.32 6.22ms +0.67%
ts_paint Windows 365.88 359.85 -6.03ms -1.64%
twinopen Linux 342.67 343.88 1.21ms +0.35%
twinopen MacOS 124.54 122.0 -2.54ms -2.03%
twinopen Windows 104.5 101.66 -2.84ms -2.74%
about_newtab Linux 30.85 30.21 -0.64ms -2.07%
about_newtab MacOS 32.36 32.08 -0.28ms -0.86%
about_newtab Windows 31.81 29.74 -2.07ms -6.50%
about_preferences_basic Linux 124.39 102.0 -22.39ms -17.99%
about_preferences_basic MacOS 107.73 104.84 -2.89ms -2.68%
about_preferences_basic Windows 116.19 105.94 -10.25ms -8.82%

Here's the full compare view: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=8771dfdc8694a91053b5e86c0a8ad9de34b68393&newProject=try&newRevision=c7f8b45423c3f228ad170c0a9b668e424f9abc96

With porfiler wins in both time and memory, and talos showing general trend down, some strong wins and much lower stdev in all tests, I'm comfortable recommending this change with the numbers as we have them right now.

Once we're closer to landing, I'll redo the talos tests to see if maybe we get more significant wins.

Summary: Migrate Fluent in Gecko off of JavaScript → [meta] Migrate Fluent in Gecko off of JavaScript
Depends on: 1672317

Latest benchmarks: bug 1613705 comment 37
Latest talos numbers: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=4e7fdee308deafa3bebc6f177caf5d1720ee369f&newProject=try&newRevision=fd42ad55cf7527849a589454153c6e3bf1a38b11&framework=1

The status of the patchset:

  • FileSource - mostly reviewed, likely close to final state, some opportunity to profile I/O
  • L10nRegistry - in review, seems to be stabilizing, likely in last rounds of review
  • Localization - first round of reviews, functionality complete

And on the crate side:

  • fluent-syntax - stable, documented, good test coverage
  • fluent-bundle - stable, documented, good test coverage
  • fluent-fallback - to be documented and cleaned up, but stable
  • l10nregistry - to be documented and cleaned up, but stable
Depends on: 1723886

this is now fixed and in beta.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.