Migrate Localization API to Rust
Categories
(Core :: Internationalization, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox92 | --- | fixed |
People
(Reporter: zbraniecki, Assigned: zbraniecki)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
Attachments
(20 files, 14 obsolete files)
1.38 KB,
application/x-javascript
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1613705 - [localization] part2: Switch Localization class to use localization-ffi. r?nika,emilio
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
17.54 KB,
application/x-javascript
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1613705 - [localization] part7: Re-enable custom locales argument to Localization. r?emilio,nika
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Once we have bug 1560038 fixed, the next step will be to remove mozILocalization, and Localization.jsm.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
•
|
||
Here's the upper bound of the wins - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dba043e0c81a6b8422d69873b28de38fc8ad336e&newProject=try&newRevision=df8beab44d49776375e3ad24c6d4ea2711363c9b&framework=1
I removed L10nRegistry.jsm and Localization.jsm and basically perform no localization. I was able to launch the browser and use it in limited capability without l10n :)
Assignee | ||
Comment 2•5 years ago
•
|
||
The results are very preliminary. There's still a lot of potential improvements on all levels, but here are talos numbers from a fairly complete build that replaces L10nRegistry.jsm and Localization.jsm with Rust/C++: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9bdda7f8c5ace35705e0961b55fba99f99fbe77c&newProject=try&newRevision=9b4e419e79518947b597079390e3cd3adfba760a&framework=1
sessionrestore, startup_about_home, ts_paint, twinopen wins!
This is just based on a very raw POC, with a number of limitations:
- arguments are not yet passed
- I/O is synchronous even for preferences (in m-c sync is only used for startup)
- No fallback
- L10nRegistry with its hot-plugging is yet to be added - https://github.com/zbraniecki/l10nregistry-rs
Despite that, I believe the results should remain similar when missing features get added for all startup path tests, while non-startup path is a matter of sync/async as the only likely significant factor. The rest of the missing bits are either affecting small number of strings (arguments) or fallback scenarios.
Further optimization avenues are:
- fluent-syntax (Parser)
- Use of unsafe slicing shows up to 20% win on parsing https://github.com/projectfluent/fluent-rs/issues/82
- Lexer branch shows up to 66% win on parsing https://github.com/projectfluent/fluent-rs/pull/161
- Pushing parsing off the main thread may also help
- fluent-bundle (resolver)
- Writer pattern can give up to 10% perf win on resolution https://github.com/projectfluent/fluent-rs/pull/127
- Gecko bindings
- A number of suboptimal allocations and UTF8<-->UTF16 conversions that we can still optimize
- Some of the API calls are unnecessary slow because we needed it for the JSMs
- IntlMemoizer is currently created per-bundle, but could be shared between all bundles in a single process/locale
It's hard to evaluate how those improvements would translate to actual talos numbers, but the numbers we get already are often fairly close to the upper bounds listed in the previous comment, so my hope is that they would mostly help us get wins in areas which the upper bound shown as potential wins but this talos compare doesn't show any significant wins for yet (quite often showing wins but below significance level).
What's nice is that basically all of the things I listed here are already reasonable to assume as achievable, not just a potential research, but as a clear forward path.
Comment 3•5 years ago
|
||
Just to make sure, is https://searchfox.org/mozilla-central/rev/cc0f9906505d0ffa0d3a29a80142379e21cb5fb2/browser/components/newtab/lib/RemoteL10n.jsm#65 with explicit creation of DOMLocalization
on your list of things to track?
Assignee | ||
Comment 4•5 years ago
|
||
Assignee | ||
Comment 5•5 years ago
•
|
||
Just to make sure, is https://searchfox.org/mozilla-central/rev/cc0f9906505d0ffa0d3a29a80142379e21cb5fb2/browser/components/newtab/lib/RemoteL10n.jsm#65 with explicit creation of DOMLocalization on your list of things to track?
Yep! All IDLs will retain functionality! (I'm adding two options that are in mozILocalization to Localization.webidl).
Assignee | ||
Comment 6•5 years ago
|
||
Arraching WIP patch!
The WIP patch is extremely simple, reduces the whole l10nregistry and localization to the bare minimum, but it does work and Firefox with it is completely usable!
Similarly to bug 1560038 comment 57 I did some initial performance measurements.
This time, I used a much simpler test on the preferences.xhtml
page:
{
async function testResolving() {
let l10nElements = Array.from(document.querySelectorAll("[data-l10n-id]")).map(elem => document.l10n.getAttributes(elem));
performance.mark("fluent-resolve-start");
let resolveStart = performance.now();
let values = await document.l10n.formatMessages(l10nElements);
let resolveEnd = performance.now();
performance.mark("fluent-resolve-end");
console.log(`perf(resolve): ${resolveEnd-resolveStart}`);
}
testResolving();
}
and generated profiles:
Variant | Profile Fast | Profile Detailed |
---|---|---|
Localization.jsm+Fluent.jsm | https://perfht.ml/2w973QU | https://perfht.ml/2PpKi1L |
localization-rs+fluent-rs | https://perfht.ml/3caOyMo | https://perfht.ml/2T0GGW6 |
All profiles have markers for fluent-resolving start/end.
- Fast
This one collects minimal amount of data, uses 1ms sampling, and is meant to keep the overhead low to give us a good approximation of the actual performance difference.
Results
Test | JS | Rust | Decrease | Percent |
---|---|---|---|---|
Resolve Time | 9.2ms | 1.2ms | -8.0 ms | -86.95% |
Resolve Memory | 1080 KB | 226 KB | -854 KB | -79.07% |
Hooray! Seems like my worry from bug 1560038 comment 57 is alleviated with the move of Localization+L10nRegistry classes to Rust!
As mentioned there, I still expect a Writer
trait to yield some performance/memory gains, but even without that, we're seeing a pretty substantial drop in both!
- Detailed Resolver
The overhead makes reading the profiles a bit tricky, but core characteristics are preserved. JS FormatPattern
takes ~8ms, while Rust format_pattern
takes 0.7ms.
The whole profile is still a bit noisy, and I won't dive too deep into it yet.
Next, I'd like to clean up the bindings between Localization.cpp and Localization-rs and in particular figure out how to handle async I/O.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 7•4 years ago
|
||
This could be useful for an IDL-backed I/O - https://phabricator.services.mozilla.com/D76743
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 8•4 years ago
|
||
Assignee | ||
Comment 9•4 years ago
|
||
With the latest version of the WIP I get to compare Localization
(JS) and Localization2
(Rust) classes.
I tested loading a main preferences.ftl
file as if it was loaded for Preferences UI and formatting them all. This scenario should give us some approximation of how this work may affect Preferences UI Localization performance/memory cost.
This is an early evaluation, because the Rust side is using sync I/O, as we wait to integrate async, but except of that, the results are from a complete translation with the same behavior of fallbacking and analogous L10nRegistry in Rust.
Variant | Profile | Time (ms) | Memory (Mb) |
---|---|---|---|
JS (1) | https://share.firefox.dev/37ayRok | 12.19 | 8.58 |
JS (2) | https://share.firefox.dev/31bovAU | 12.05 | 8.64 |
JS (3) | https://share.firefox.dev/3lR1UkU | 13.14 | 8.47 |
Rust (1) | https://share.firefox.dev/37evJYy | 5.38 | 1.55 |
Rust (2) | https://share.firefox.dev/3lUA8Ec | 5.62 | 1.79 |
Rust (3) | https://share.firefox.dev/3dA4F73 | 6.85 | 1.73 |
Steps to Reproduce:
- Apply the WIP patch
- Take https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/benches/preferences.ftl
- Put it in `browser/locales/en-US/browser/preferences/preferences2.ftl
- Rebuild Firefox
- Open Firefox with Profiler
- Wait 5s for startup to settle
- Open Browser Console
- Paste the example script using either
Localization
orLocalization2
) - Start Profiler (ctrl+shift+1)
- Execute the script in browser console
- Capture Profile (ctrl+shift+2)
- Main Process, Markers, select
localization
.
Assignee | ||
Comment 10•4 years ago
|
||
Depends on D91709
Updated•4 years ago
|
Assignee | ||
Comment 11•4 years ago
|
||
Updated the patches to @djg's work which is now merged to master of l10nregistry-rs and fluent-rs repos.
The first PR allows for easy testing of the Localization
in isolation. Steps to work with it are in the previous comment
Next steps for it are:
- Plug the async logic so that we can call the test sync or async using
Localization2
- Clean up the code to minimize allocations and I/O
The second patch swaps Localization
to use the new Localization2
allowing us to remove Localization.jsm
and L10nRegistry.jsm
. I got it to run startup and preferences, so this is what I'm running against Talos at the moment.
Assignee | ||
Comment 12•4 years ago
|
||
I folded preferences2.ftl
sample file into the first patch, so the new steps to reproduce are:
- Apply the WIP patch
- Rebuild Firefox
- Open Firefox with Profiler
- Wait 5s for startup to settle
- Open Browser Console
- Paste the example script using either Localization or Localization2)
- Start Profiler (ctrl+shift+1)
- Execute the script in browser console
- Capture Profile (ctrl+shift+2)
- in the profile, select Main Process, Markers, and filter
localization
.
Assignee | ||
Comment 13•4 years ago
|
||
Update on the performance front.
The benchmarks above show the performance of loading 1 file and formatting a lot of messages in it.
When testing the performance of the whole new system we noticed that those performance wins don't translate to real talos numbers.
I profiled it and found that the reason is that l10nregistry-rs
core algorithm for source orders was not using any optimization techniques. It is a very complex algorithm and the JS version of it [0] has been hard to write, hard to maintain and not very well understood.
I spent last two weeks writing a new version of it named ProblemSolver
.
In the process I learned that the problem scope is actually not well solved by generic algorithms and the class of problems we're solving here requires quite fine-tuned heuristics.
I wrote two new algorithms - serial for synchronous I/O and parallel for asynchronous, and documented the algorithm and optimizations extensively [1].
The good news is that the new test (attaching) is showing a really promising perf wins! This test measures loading of a large localization context based on Preferences UI with 16 resources from 2 sources.
I separated "cold" which is the first call that actually performs I/O and caches the result, and "warm" which are all consequitive loads that re-use the cache.
Here are the results as of today:
Variant | Cold | Warm |
---|---|---|
L10nRegistry.jsm | 10.2 | 4.1 |
l10nregistry-rs (master) | 341.6 | 329.1 |
l10nregistry-rs (pruning) | 6.6 | 1.5 |
There's still more work to be done to get the asynchronous code hooked up into L10nRegistry, but the results look very promising!
My next step is to test it against Talos in hopes that the perf wins on resolution and on I/O will result in end-to-end neutral or wins.
[0] https://searchfox.org/mozilla-central/rev/277ab3925eae21b419167a34624ec0ab518d0c94/intl/l10n/L10nRegistry.jsm#346-393
[1] https://github.com/zbraniecki/l10nregistry-rs/blob/pruning2/src/solver/README.md
Assignee | ||
Comment 14•4 years ago
|
||
The improvements lead to at least comparable talos run! https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=6dec23a0ddbf46cc40d26a1f66ef83cd24abba2f&newProject=try&newRevision=328221e959a3c7cb917a37c0beeacd639212425b&framework=1
There's still a number of performance improvements that should lead to wins:
- We compare async (JSM) to sync (rust) for now. Async should lead to improvements
- We allocate unnecessairly, we'll clean it up on the way
- The async optimizations are not complete. I'll work on that next
Assignee | ||
Comment 15•4 years ago
|
||
Assignee | ||
Comment 16•4 years ago
|
||
Assignee | ||
Comment 17•4 years ago
|
||
New numbers based on the latest advancements in the L10nRegistry pruning3
problem solver algorithmic branch:
Formatting heavy test:
Localization | Sync | Cold | Warm |
---|---|---|---|
JavaScript | false | 21.0 | 15.1 |
Rust | false | 13.3 | 7.5 |
JavaScript | true | 19.8 | 15.2 |
Rust | true | 8.3 | 6.9 |
I/O heavy test:
Localization | Sync | Cold | Warm |
---|---|---|---|
JavaScript | false | 25.4 | 4.3 |
Rust | false | 13.4 | 3.1 |
JavaScript | true | 11.4 | 4.0 |
Rust | true | 6.4 | 1.5 |
Assignee | ||
Comment 18•4 years ago
|
||
Assignee | ||
Comment 19•4 years ago
|
||
Third test - this one replicates loading all messages necessary for browser.xhtml localization.
Assignee | ||
Comment 20•4 years ago
|
||
I released fluent-bundle 0.14
and fluent-fallback 0.1.0
and am almost ready to release initial l10nregistry-rs
!
I believe at this point all the features in Rust are complete, although the code would benefit from cleanups and documentation.
I vendored in those versions into class Localization2
on D100643.
Using that PR I got the following results:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 18.6 | 14.6 |
JavaScript | true | 18.9 | 14.8 |
Rust | false | 12.0 | 8.5 |
Rust | true | 8.1 | 7.1 |
I/O heavy test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 17.3 | 4.5 |
JavaScript | true | 8.7 | 3.3 |
Rust | false | 15.7 | 3.5 |
Rust | true | 5.9 | 1.7 |
I also added a third test that replicates very closely what browser.xhtml localization requires.
This test is also available in https://github.com/zbraniecki/l10nregistry-rs crate as cargo bench --all-features registry/scenarios/browser
and cargo bench --all-features localization/scenarios/browser
.
The results are as follows:
Localization | Sync | Cold | Warm |
---|---|---|---|
Rust (standalone) | false | 414 us | |
Rust (standalone) | true | 394 us | |
JavaScript | false | 9.8 ms | 7.1 ms |
JavaScript | true | 8.3 ms | 7.9 ms |
Rust (gecko) | false | 15.0 ms | 4.7 ms |
Rust (gecko) | true | 6.6 ms | 3.2 ms |
Things to notice:
- I the browser.xhtml test async Rust in Gecko takes a long time on cold load. No idea why.
- Standalone Rust completes the same test in ~10% of the time it takes when launched from JS in Gecko. I assume some of that is inevitable, but it seems like the Gecko bits wrangling costs ~90% of the time. Likely we clone a lot of Strings, where
fluent-fallback
returnsCow
's and we neednsCString
. Wondering if we can clean that up.
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 21•4 years ago
|
||
Assignee | ||
Comment 22•4 years ago
|
||
I plugged the new chain into the DOM bindings and ran talos test:
./mach talos-test --activeTests about_preferences_basic --tpcycles 10
The result's I got:
Test | JS | Rust |
---|---|---|
Preferences | 203 | 687.40 |
blank | 25.96 | 452.45 |
#search | 384.95 | 143.56 |
#privacy | 460.77 | 202.98 |
#sync | 431.44 | 129.13 |
#home | 187.09 | 162.91 |
The subpages seems to work much faster, but there's something slow going on with the main load and switching to blank. I'm wondering if it's something related to destructors, and costly drops.
I'll try to debug it, but :djg - maybe you'll have an idea on what may be the cause of it?
Updated•4 years ago
|
Assignee | ||
Comment 23•4 years ago
|
||
I'm getting shutdown leaks and crashes with those two patches in debug mode - https://paste.mozilla.org/RFJXgwdg - likely related to the perf issue above.
Comment 24•4 years ago
|
||
(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #22)
Test JS Rust Preferences 203 687.40 blank 25.96 452.45
Wow that is a lot slower.
Assignee | ||
Comment 25•4 years ago
|
||
I applied latest patches from Dan, and rebased the second patchset on top, but I'm still seeing shutdown crashes and "leaking the world". Trying to debug.
Assignee | ||
Comment 26•4 years ago
|
||
:emilio spotted that the new Localization
path wasn't GC'ing mGlobal
! That fixed the leak.
New talos numbers:
./mach talos-test --activeTests about_preferences_basic --tpcycles 10
The result's I got:
Test | JS | Rust |
---|---|---|
Preferences | 203 | 555.01 |
blank | 25.96 | 387.77 |
#search | 384.95 | 148.31 |
#privacy | 460.77 | 186.58 |
#sync | 431.44 | 117.44 |
#home | 187.09 | 150.73 |
The blank
and main Preferences
are still significantly slower, while the rest is either on par of faster.
Assignee | ||
Comment 27•4 years ago
|
||
With the fixes in place and on top of fluent-rs 0.14, the test benchmarks listed here start showing consistent wins:
Format heavy test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 22.1 | 14.2 |
Rust | false | 11.8 | 7.3 |
JavaScript | true | 20.0 | 16.3 |
Rust | true | 8.6 | 6.8 |
I/O heavy test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 21.9 | 3.9 |
Rust | false | 14.3 | 1.9 |
JavaScript | true | 12.0 | 2.8 |
Rust | true | 5.5 | 1.3 |
browser.xhtml test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 11.5 | 7.1 |
Rust | false | 11.9 | 3.5 |
JavaScript | true | 8.4 | 7.3 |
Rust | true | 7.5 | 3.1 |
I'm going to focus on talos tests now.
Assignee | ||
Comment 28•4 years ago
|
||
First positive results from talos!
The about_preferences wins are suspiciously large, but I am able to run Preferences views and the talos test locally and they seem to work. I can also run most of the mochitest for Preferences UI. I'm going to debug further.
The other wins are below noise level, but at most show some wins.
I'm going to debug the ts_paint a little bit more to understand where the performance goes, but I think we're getting to the point where we can start separating out patches for full review and landing!
Assignee | ||
Comment 30•4 years ago
|
||
Spinning off L10nRegistry part that is gearing up for landing first in bug 1660392.
Assignee | ||
Comment 31•4 years ago
|
||
Depends on D102490
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 32•4 years ago
|
||
Depends on D102096
Updated•4 years ago
|
Assignee | ||
Comment 33•4 years ago
|
||
Since I separated out the fluent-fallback
FFI to localization-ffi
crate, I can't link this last PR:
1:01.81 Finished release [optimized] target(s) in 2.23s
1:27.07 Compiling geckoservo v0.0.1 (/projects/mozilla-unified/servo/ports/geckolib)
1:27.69 Compiling gkrust v0.1.0 (/projects/mozilla-unified/toolkit/library/rust)
1:31.85 Finished release [optimized] target(s) in 1m 13s
1:32.04 toolkit/library/build/libxul.so
1:33.60 ld.lld: error: undefined hidden symbol: localization_new
1:33.60 >>> referenced by Localization.cpp:107 (/projects/mozilla-unified/intl/l10n/Localization.cpp:107)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::Create(nsTArray<nsTString<char> > const&, bool))
1:33.60 >>> referenced by Localization.cpp:107 (/projects/mozilla-unified/intl/l10n/Localization.cpp:107)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::Localization(nsTArray<nsTString<char> > const&, bool))
1:33.60 >>> referenced by Localization.cpp:117 (/projects/mozilla-unified/intl/l10n/Localization.cpp:117)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::Localization(nsIGlobalObject*, nsTArray<nsTString<char> > const&, bool))
1:33.60 >>> referenced 3 more times
1:33.60 ld.lld: error: undefined hidden symbol: localization_release
1:33.60 >>> referenced by LocalizationBindings.h:20 (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/dist/include/mozilla/intl/LocalizationBindings.h:20)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::~Localization())
1:33.60 >>> referenced by LocalizationBindings.h:20 (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/dist/include/mozilla/intl/LocalizationBindings.h:20)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::~Localization())
1:33.60 ld.lld: error: undefined hidden symbol: localization_upgrade
1:33.60 >>> referenced by Localization.cpp:406 (/projects/mozilla-unified/intl/l10n/Localization.cpp:406)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::SetIsSync(bool))
1:33.60 >>> referenced by Localization.cpp:406 (/projects/mozilla-unified/intl/l10n/Localization.cpp:406)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::Upgrade())
1:33.60 ld.lld: error: undefined hidden symbol: localization_add_res_id
1:33.60 >>> referenced by Localization.cpp:197 (/projects/mozilla-unified/intl/l10n/Localization.cpp:197)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::AddResourceId(nsTSubstring<char16_t> const&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_remove_res_id
1:33.60 >>> referenced by Localization.cpp:202 (/projects/mozilla-unified/intl/l10n/Localization.cpp:202)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::RemoveResourceId(nsTSubstring<char16_t> const&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_add_res_ids
1:33.60 >>> referenced by Localization.cpp:210 (/projects/mozilla-unified/intl/l10n/Localization.cpp:210)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::AddResourceIds(nsTArray<nsTString<char16_t> > const&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_remove_res_ids
1:33.60 >>> referenced by Localization.cpp:219 (/projects/mozilla-unified/intl/l10n/Localization.cpp:219)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::RemoveResourceIds(nsTArray<nsTString<char16_t> > const&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_format_value
1:33.60 >>> referenced by Localization.cpp:232 (/projects/mozilla-unified/intl/l10n/Localization.cpp:232)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::FormatValue(nsTSubstring<char> const&, mozilla::dom::Optional<mozilla::dom::Record<nsTString<char>, mozilla::dom::Nullable<mozilla::dom::OwningUTF8StringOrDouble> > > const&, mozilla::ErrorResult&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_format_values
1:33.60 >>> referenced by Localization.cpp:246 (/projects/mozilla-unified/intl/l10n/Localization.cpp:246)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::FormatValues(mozilla::dom::Sequence<mozilla::dom::OwningUTF8StringOrL10nIdArgs> const&, mozilla::ErrorResult&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_format_messages
1:33.60 >>> referenced by Localization.cpp:278 (/projects/mozilla-unified/intl/l10n/Localization.cpp:278)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::FormatMessages(mozilla::dom::Sequence<mozilla::dom::OwningUTF8StringOrL10nIdArgs> const&, mozilla::ErrorResult&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_format_value_sync
1:33.60 >>> referenced by Localization.cpp:317 (/projects/mozilla-unified/intl/l10n/Localization.cpp:317)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::FormatValueSync(nsTSubstring<char> const&, mozilla::dom::Optional<mozilla::dom::Record<nsTString<char>, mozilla::dom::Nullable<mozilla::dom::OwningUTF8StringOrDouble> > > const&, nsTSubstring<char>&, mozilla::ErrorResult&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_format_values_sync
1:33.60 >>> referenced by Localization.cpp:342 (/projects/mozilla-unified/intl/l10n/Localization.cpp:342)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::FormatValuesSync(mozilla::dom::Sequence<mozilla::dom::OwningUTF8StringOrL10nIdArgs> const&, nsTArray<nsTString<char> >&, mozilla::ErrorResult&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_format_messages_sync
1:33.60 >>> referenced by Localization.cpp:367 (/projects/mozilla-unified/intl/l10n/Localization.cpp:367)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::FormatMessagesSync(mozilla::dom::Sequence<mozilla::dom::OwningUTF8StringOrL10nIdArgs> const&, nsTArray<RefPtr<mozIL10nMessage> >&, mozilla::ErrorResult&))
1:33.60 ld.lld: error: undefined hidden symbol: localization_prefetch
1:33.60 >>> referenced by Localization.cpp:405 (/projects/mozilla-unified/intl/l10n/Localization.cpp:405)
1:33.60 >>> /projects/mozilla-unified/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(mozilla::intl::Localization::Prefetch())
1:33.75 clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
1:33.75 make[4]: *** [/projects/mozilla-unified/config/rules.mk:545: libxul.so] Error 1
1:33.75 make[3]: *** [/projects/mozilla-unified/config/recurse.mk:72: toolkit/library/build/target] Error 2
1:33.75 make[2]: *** [/projects/mozilla-unified/config/recurse.mk:34: compile] Error 2
1:33.75 make[1]: *** [/projects/mozilla-unified/config/rules.mk:355: default] Error 2
1:33.75 make: *** [client.mk:89: build] Error 2
1:33.75 274 compiler warnings present.
1:33.79 Failed to parse ccache stats output: secondary config (readonly) /etc/ccache.conf
I may have made some trivial mistake, but I compared this to other -ffi
crates I introduced and tested that structs exposed via cbindgen out of localization-ffi
are available, so my suspicion is that if I'm not making some trivial mistake I can't find, then maybe I'm hitting some implicit limit of functions exposed via cbindgen?
Nika, Emilio, can you help unblock me here?
Comment 34•4 years ago
|
||
diff --git a/toolkit/library/rust/shared/lib.rs b/toolkit/library/rust/shared/lib.rs
index 658446ae9c41..de834e0fa621 100644
--- a/toolkit/library/rust/shared/lib.rs
+++ b/toolkit/library/rust/shared/lib.rs
@@ -34,6 +34,7 @@ extern crate http_sfv;
extern crate jsrust_shared;
extern crate kvstore;
extern crate l10nregistry_ffi;
+extern crate localization_ffi;
extern crate mapped_hyph;
extern crate mozurl;
extern crate mp4parse_capi;
Assignee | ||
Comment 35•4 years ago
|
||
Assignee | ||
Comment 36•4 years ago
|
||
Assignee | ||
Comment 37•4 years ago
|
||
Final part of the patchset is coming together. I requested initial pass review from :emilio, to align on the general direction.
Perf numbers look good:
Format heavy test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 12.8 | 8.5 |
Rust | false | 5.5 | 3.4 |
JavaScript | true | 9.9 | 8.2 |
Rust | true | 3.7 | 3.4 |
I/O heavy test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 16.9 | 3.3 |
Rust | false | 6.1 | 0.7 |
JavaScript | true | 4.9 | 2.5 |
Rust | true | 1.7 | 0.3 |
browser.xhtml test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
JavaScript | false | 3.8 | 5.6 |
Rust | false | 1.4 | 1.4 |
JavaScript | true | 6.6 | 4.7 |
Rust | true | 1.2 | 1.2 |
Assignee | ||
Comment 38•4 years ago
|
||
Depends on D102096
Assignee | ||
Comment 39•4 years ago
|
||
Depends on D104788
Assignee | ||
Comment 40•4 years ago
|
||
Depends on D104789
Updated•4 years ago
|
Assignee | ||
Comment 41•4 years ago
|
||
I separated the patch here into three pieces:
- localization-ffi
- swapping Localization.cpp to use localization-ffi
- Updating callers to use the new API in Localization class
I hope it'll make it easier to review, although first and second are deeply correlated.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 42•4 years ago
|
||
Depends on D104790
Assignee | ||
Comment 43•4 years ago
|
||
test updated to current m-c
Assignee | ||
Comment 44•4 years ago
|
||
Comment on attachment 9202225 [details]
Test for formatting of many strings at once
retiring the formatting
benchmark. It mostly is covered by browser-context
bench which is more representative.
Assignee | ||
Comment 45•4 years ago
|
||
With most of the underlying code - FileSource and L10nRegistry now close to r+, I retested against today's m-c. This time I tested three scenarios (sync and async):
- m-c
- filesource-rs + l10nregistry-rs, but Localization in JSM
- full patchset without JS
I/O heavy test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
m-c | false | 13.6 | 2.4 |
FileSource + L10nRegistry | false | 8.8 | 1.1 |
Rust | false | 4.8 | 0.7 |
m-c | true | 6.4 | 1.9 |
FileSource + L10nRegistry | true | 2.0 | 0.6 |
JavaScript | true | 1.9 | 0.4 |
browser.xhtml test:
Localization | Sync | Cold (ms) | Warm (ms) |
---|---|---|---|
m-c | false | 4.8 | 4.1 |
FileSource + L10nRegistry | false | 3.1 | 3.0 |
Rust | false | 1.8 | 1.7 |
m-c | true | 6.0 | 4.5 |
FileSource + L10nRegistry | true | 4.3 | 4.0 |
JavaScript | true | 1.3 | 1.3 |
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 46•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 47•4 years ago
|
||
Depends on D111178
Assignee | ||
Comment 48•4 years ago
|
||
Depends on D113247
Updated•4 years ago
|
Assignee | ||
Comment 49•4 years ago
|
||
I was able to resume finishing this patchset.
The core functionality works and the performance is satisfactory, so I'm focused now on completing edge cases, and passing tests.
In part6 I am trying to reinstate preferences observation to reset Localization on prefs change, and for that I need to make Localization
implement nsISupportsWeakObserver
. I did clean it up in part2 because I didn't need to hold references to JS objects anymore, so I hope it should be simpler, but at the moment with part6
in it fails to link with:
0:51.33 ld.lld: error: undefined hidden symbol: mozilla::intl::Localization::GetWeakReference(nsIWeakReference**)
0:51.33 >>> referenced by Unified_cpp_intl_l10n0.cpp
0:51.33 >>> /home/mozilla/projects/mozilla-central/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(vtable for mozilla::intl::Localization)
0:51.33 >>> referenced by Unified_cpp_dom_l10n0.cpp
0:51.33 >>> /home/mozilla/projects/mozilla-central/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../dom/l10n/Unified_cpp_dom_l10n0.o:(vtable for mozilla::dom::DOMLocalization)
0:51.33 >>> referenced by Unified_cpp_dom_l10n0.cpp
0:51.33 >>> /home/mozilla/projects/mozilla-central/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../dom/l10n/Unified_cpp_dom_l10n0.o:(vtable for mozilla::dom::DocumentL10n)
0:51.34 ld.lld: error: undefined hidden symbol: non-virtual thunk to mozilla::intl::Localization::GetWeakReference(nsIWeakReference**)
0:51.34 >>> referenced by Unified_cpp_intl_l10n0.cpp
0:51.34 >>> /home/mozilla/projects/mozilla-central/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../intl/l10n/Unified_cpp_intl_l10n0.o:(vtable for mozilla::intl::Localization)
0:51.34 >>> referenced by Unified_cpp_dom_l10n0.cpp
0:51.34 >>> /home/mozilla/projects/mozilla-central/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../dom/l10n/Unified_cpp_dom_l10n0.o:(vtable for mozilla::dom::DOMLocalization)
0:51.34 >>> referenced by Unified_cpp_dom_l10n0.cpp
0:51.34 >>> /home/mozilla/projects/mozilla-central/obj-x86_64-pc-linux-gnu-opt/toolkit/library/build/../../../dom/l10n/Unified_cpp_dom_l10n0.o:(vtable for mozilla::dom::DocumentL10n)
What's the right magic to get WrapperCache + SupportsWeakReference + Observer DECL/IMPL macros?
Comment 50•4 years ago
|
||
Make the type inherit like this:
// header file
class Localization : public nsIObserver, public nsSupportsWeakReference, public nsWrapperCache {
public:
NS_DECL_CYCLE_COLLECTING_ISUPPORTS
NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_AMBIGUOUS(Localization, nsIObserver)
NS_DECL_NSIOBSERVER
// ... etc
};
// cpp file
NS_IMPL_CYCLE_COLLECTING_ADDREF(Localization)
NS_IMPL_CYCLE_COLLECTING_RELEASE(Localization)
NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(Localization)
NS_WRAPPERCACHE_INTERFACE_MAP_ENTRY
NS_INTERFACE_MAP_ENTRY(nsIObserver)
NS_INTERFACE_MAP_ENTRY(nsISupportsWeakReference)
NS_INTERFACE_MAP_ENTRY_AMBIGUOUS(nsISupports, nsIObserver)
NS_INTERFACE_MAP_END
NS_IMPL_CYCLE_COLLECTION_WRAPPERCACHE_WEAK(Localization, mFieldsHere)
note that the base class is nsSupportsWeakReference
not nsISupportsWeakReference
. That type will implement the interface for you (https://searchfox.org/mozilla-central/rev/6143c3b83436184644a55afad83cd4d4396f7fec/xpcom/base/nsWeakReference.h#18).
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 51•4 years ago
|
||
Depends on D113248
Assignee | ||
Comment 52•4 years ago
|
||
Depends on D113570
Updated•4 years ago
|
Assignee | ||
Comment 53•4 years ago
|
||
Depends on D113571
Assignee | ||
Comment 54•4 years ago
|
||
Depends on D113691
Updated•4 years ago
|
Assignee | ||
Comment 55•4 years ago
|
||
Status update - the full patchset is now passing all tests in intl/l10n/test
and all-but-one in dom/l10n/test
!
Next steps:
- Fix the final test
- Run a try build
- Take another look at the prefetch
- Ask for reviews
Assignee | ||
Comment 56•4 years ago
|
||
It took me a bit, but I was able to narrow down the problem scope, but I don't think I understand what is going on.
The issue boils down to:
pub fn format_messages(
&self,
keys: &ThinVec<L10nKey>,
promise: &xpcom::Promise,
callback: extern "C" fn(&xpcom::Promise, Option<&ThinVec<OptionalL10nMessage>>, &ThinVec<nsCString>),
) {
let keys: Vec<FluentL10nKey> = keys.into_iter().map(|k| k.into()).collect();
let this = RefPtr::new(self);
let strong_promise = RefPtr::new(promise);
moz_task::spawn_current_thread(async move {
let mut errors = vec![];
let ret_val = this
.inner
.borrow()
.format_messages(&keys, &mut errors)
.await
.into_iter()
.map(|msg| {
if let Some(msg) = msg {
OptionalL10nMessage {
is_present: true,
message: msg.into(),
}
} else {
OptionalL10nMessage {
is_present: false,
message: L10nMessage::default(),
}
}
})
.collect::<ThinVec<_>>();
assert_eq!(keys.len(), ret_val.len());
let errors = errors.into_iter().map(|err| err.to_string().into()).collect();
callback(&strong_promise, Some(&ret_val), &errors);
})
.expect("Failed to spawn future");
}
}
I think what is going on is that if this method is called twice concurrently the first call thread closure is never executed, and only the second is.
To reproduce, you need the whole (yes 27 patches) stack and then:
var loc = new Localization(["browser/preferences/preferences.ftl"], false);
var value = loc.formatValue("do-not-track-learn-more");
var value2 = loc.formatValue("do-not-track-learn-more");
console.log(await value);
console.log(await value2);
in a fresh browser (if the localization resources are cached, the problem is not reproducible).
On the other hand those two work well:
var loc = new Localization(["browser/preferences/preferences.ftl"], false);
var value = loc.formatValue("do-not-track-learn-more");
var value2 = loc.formatValue("do-not-track-learn-more");
//console.log(await value);
console.log(await value2);
var loc = new Localization(["browser/preferences/preferences.ftl"], false);
var value = await loc.formatValue("do-not-track-learn-more");
var value2 = await loc.formatValue("do-not-track-learn-more");
console.log(value);
console.log(value2);
My debugging got to the point where if I call moz_task::spawn_current_thread
, and then again before the first one is resolved, then the first one never finishes.
I'm not sure if that's the right interpretation of what I see, but I'm hoping maybe Nika or Emilio can recognize if I'm on the right track and what may be causing it.
Comment 57•4 years ago
|
||
If the two closures are executing but it's not resolving, that means that this.inner().borrow().format_messages(...).await
is not returning, right? If so you need to dig into why that might be...
I'm happy to look into it a bit further, but if you want me to, mind pasting a try push with all the needed patches etc? Thanks.
Assignee | ||
Comment 58•4 years ago
|
||
If the two closures are executing but it's not resolving, that means that this.inner().borrow().format_messages(...).await is not returning, right?
I don't think so.
The order of printf's that I see is:
pub fn format_messages(
&self,
keys: &ThinVec<L10nKey>,
promise: &xpcom::Promise,
callback: extern "C" fn(&xpcom::Promise, Option<&ThinVec<OptionalL10nMessage>>, &ThinVec<nsCString>),
) {
let keys: Vec<FluentL10nKey> = keys.into_iter().map(|k| k.into()).collect();
let this = RefPtr::new(self);
let strong_promise = RefPtr::new(promise);
println!("format_messages before: {:?}", keys.len());
moz_task::spawn_current_thread(async move {
println!("format_messages inside: {:?}", keys.len());
// (...)
})
.expect("Failed to spawn future");
}
}
and if I call: format_messages
with first 2 keys, and then 4 keys, I see:
- Calling
format_messages with 2 keys
format_messages before: 2
- Calling
format_messages with 4 keys
format_messages before: 4
format_messages inside: 4
while if I call it just once, I see:
- Calling
format_messages with 2 keys
format_messages before: 2
format_messages inside: 2
It looks as if the second call discarded the first thread?
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 59•4 years ago
|
||
Assignee | ||
Comment 60•4 years ago
|
||
I'll try to catch that separately, but it seems like I'm leaking the World - https://treeherder.mozilla.org/jobs?repo=try&revision=0a120a2884f504bf500096c585af75073a8641a2&selectedTaskRun=VO1Azw3aR1SzOWke8DuhPA.0
Nika, Emilio - if you're debugging this, maybe you'll stumble upon where (or maybe its correlated?)
Comment 61•4 years ago
|
||
So I'm confused, how is format_messages involved with the snippet you provided to repro?
var loc = new Localization(["browser/preferences/preferences.ftl"], false);
var value = loc.formatValue("do-not-track-learn-more");
var value2 = loc.formatValue("do-not-track-learn-more");
console.log(await value);
console.log(await value2);
What I do when I run that is:
format_value out: "do-not-track-learn-more", None
format_value out: "do-not-track-learn-more", None
format_value in: "do-not-track-learn-more", None
format_value in: "do-not-track-learn-more", None
(no console log)
Comment 62•4 years ago
|
||
So you have two tasks listening to two AsyncCacheStream
s backed by the same Cache
object, and what I see is that one of the tasks get stuck in the await and never gets waken up:
format_value out: "do-not-track-learn-more", None
format_value out: "do-not-track-learn-more", None
format_value in: "do-not-track-learn-more", None
AsyncCacheStream::poll_next(0x7fd13be90c78, 0, 0)
poll_next: false
format_value in: "do-not-track-learn-more", None
AsyncCacheStream::poll_next(0x7fd13be90598, 0, 0)
poll_next: false
AsyncCacheStream::poll_next(0x7fd13be90598, 0, 0)
poll_next: true
format_value after: "do-not-track-learn-more", None, "Learn more"
I'm not sure who's responsibility is it to awake the task, and whether it's an AsyncCacheStream
bug (and it should keep track of pending wakers and such), or it's a moz_task
bug.
Comment 63•4 years ago
|
||
Yeah, I think it is a bug in AsyncCacheStream and https://github.com/projectfluent/fluent-rs/pull/224 should fix it. But note the disclaimer there.
Comment 64•4 years ago
|
||
It seems like emilio has figured the bug out, clearing ni?
Assignee | ||
Comment 65•4 years ago
|
||
Fixed the leak and intl tests are passing!
We still seem to have some leaks, but they're weird - "negative leaks".
Here are talos numbers from today - https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=bc8ebc5ba34ff5812479228f1fa5f3a8e8a01903&newProject=try&newRevision=33cd25ff3d083a677275b124313ca71ad7816b4e&framework=1
Here is a try auto
build with the negative leaks
- https://treeherder.mozilla.org/jobs?repo=try&revision=88fdd620c265ea2634776e44732717d1416ae1dc
Next steps:
- Debug the
negative leaks
- Identify other errors from try build and fix them
- Get review
- Land!
Assignee | ||
Comment 66•4 years ago
|
||
There is a small number of talos tests crashing with:
[task 2021-05-05T21:09:05.024Z] 21:09:05 INFO - PROCESS-CRASH | about_newtab_with_snippets | application crashed [@ RustMozCrash(char const*, int, char const*)]
[task 2021-05-05T21:09:05.024Z] 21:09:05 INFO - Crash dump filename: C:\Users\task_1620244936\AppData\Local\Temp\tmpj5ydi9sc\profile\minidumps\4d5e2a47-da00-45f4-aa9c-14efc8d21c96.dmp
[task 2021-05-05T21:09:05.024Z] 21:09:05 INFO - Mozilla crash reason: assertion failed: self.state.is(POLL)
[task 2021-05-05T21:09:05.024Z] 21:09:05 INFO - Operating system: Windows NT
[task 2021-05-05T21:09:05.025Z] 21:09:05 INFO - 10.0.17134
[task 2021-05-05T21:09:05.025Z] 21:09:05 INFO - CPU: amd64
[task 2021-05-05T21:09:05.025Z] 21:09:05 INFO - family 6 model 94 stepping 3
[task 2021-05-05T21:09:05.025Z] 21:09:05 INFO - 8 CPUs
[task 2021-05-05T21:09:05.026Z] 21:09:05 INFO - GPU: UNKNOWN
[task 2021-05-05T21:09:05.026Z] 21:09:05 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2021-05-05T21:09:05.026Z] 21:09:05 INFO - Crash address: 0x5f50f680
[task 2021-05-05T21:09:05.027Z] 21:09:05 INFO - Assertion: Unknown assertion type 0x00000000
[task 2021-05-05T21:09:05.027Z] 21:09:05 INFO - Process uptime: 0 seconds
[task 2021-05-05T21:09:05.027Z] 21:09:05 INFO - Thread 0 (crashed)
[task 2021-05-05T21:09:05.028Z] 21:09:05 INFO - 0 xul.dll!RustMozCrash(char const*, int, char const*) [wrappers.cpp:33cd25ff3d083a677275b124313ca71ad7816b4e : 16 + 0x10]
Assignee | ||
Comment 67•4 years ago
|
||
Assignee | ||
Comment 68•4 years ago
|
||
talos starts looking green - https://treeherder.mozilla.org/jobs?repo=try&revision=da03228c46f311ee1373be193715cf817a7697b0
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 69•3 years ago
|
||
Depends on D114508
Assignee | ||
Comment 70•3 years ago
|
||
Depends on D116738
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 71•3 years ago
|
||
Depends on D116739
Assignee | ||
Comment 72•3 years ago
|
||
Assignee | ||
Comment 73•3 years ago
|
||
Depends on D116755
Assignee | ||
Comment 74•3 years ago
|
||
Depends on D116791
Assignee | ||
Comment 75•3 years ago
|
||
I'm crunching through the remaining test failures and I got from 68 to 28 this week: https://treeherder.mozilla.org/jobs?repo=try&revision=382db0aee1b94ca7a02a364df175a98286d6b6d0&selectedTaskRun=Lhiybl-4Ty-OhOxlC4LR_w.0
Most of the remaining issues seems to be racy tests that are affected by bug 1672317 comment 10 change in how we handle microtasks.
Fixing them is tricky but I think worth it as we're mostly fixing what should already be async but relies on particular microtask order, so hopefully those fixes will result in fewer intermittents.
Assignee | ||
Comment 76•3 years ago
|
||
Harry - can I get your help debugging https://treeherder.mozilla.org/logviewer?job_id=341789731&repo=try&lineNumber=4892 ?
I can reproduce it locally with my patchstack (34 patches, sorry!) in debug mode using ./mach test browser/components/urlbar/tests/browser/browser_keyword.js --headless --verify
(won't hit it without --verify).
The issue is confusing because if I console.log(element.innerHTML)
in [0] I can see that urlbarView-action
has data-l10n-args
and textContent, but not data-l10n-id
. So it did hit some path that did removeAttribute('data-l10n-id')
but not textContent = ""
, and my only guess is that it is some race that my patchset changes from winning to losing.
The logic that checks if actions
is empty is here: https://searchfox.org/mozilla-central/source/browser/components/urlbar/tests/UrlbarTestUtils.jsm#227
but the code that sets/unsets/changes urlbarView-action
id/args and textContent is spread around and in some cases instead of setting textContent =""
when data-l10n-id is removed, a callback is set, like here: https://searchfox.org/mozilla-central/source/browser/components/urlbar/UrlbarView.jsm#1453-1455 and I'm not sure why.
In my local --verify
+ debug test I'm hitting the error in line 115 and line 175 of the test. Can you advice on what may be going on assuming it's just some race?
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 77•3 years ago
|
||
I think I have all remaining test failures under control.
The general outline is that we're now green + one test browser/components/urlbar/tests/browser/browser_keyword*.js
which harry is looking at and a number of intermittents that this patchset changes intermittentness off.
My general sense is that in most cases we are turning those intermittents into either permaorganges or at least more likely to reproduce which is good, but it leaves a question whether we want to chase fixing all of them before we merge this.
My plan is to continue looking into the intermittents while the PRs are being reviewed, but once we have the code approach, I'd like to talk to sherrifs about landing the patchset and filing follow ups for intermittens. Fixing them should be easier once maintainers of the respective areas have a m-c build that reproduces them rather than having to apply this massive patchstack to repro.
In all cases we found so far the failures are test-specific (not affecting users) and are caused by tests firing rapidly many operations and assuming that some condition is met in a given microtask, while in fact it is now done in the next one. As you can see from part 18, the fixes are almost always just BrowserTestUtils.waitForCondition
, and I'm ok continuing with that.
The side benefit is that I believe that this patchset will generally reduce the number of intermittents that are related to l10n being async, but I'm concerned about 9 months, 33 part patchset, being eventually blocked on those oranges, so I hope the sherrifs and Mozilla developer community will be open to unblock landing of that patchset soon.
Assignee | ||
Comment 78•3 years ago
|
||
:nika - when you're reviewing this, the amount of oranges indicates to me that we're inherently affecting some task/microtask in this move from jsm->native as it flares a lot of racy tests.
The previous two parts (FileSource and L10nRegistry) don't trigger it, so I assume something changes about it in this patchset around FormatMessages
(which is used by the Fluent DOM under the hood) timings which extenuates the races.
I'm happy to see that we get a chance to identify and fix those races, but I'd like to ask you to pay attention to when reviewing to if we may be accidentally pushing something that used to be a microtask to a new task and if so, should we be concerned about perf impact?
I tested talos and telemetry and it seems like this should improve the perf quite nicely, but it also may be that some new event loop cycle is not captured by the talos tests.
Assignee | ||
Comment 79•3 years ago
|
||
Full test coverage on try on linux opt/debug:
- central - https://treeherder.mozilla.org/jobs?repo=try&revision=4e3a04e49f076bb4455413891d602cef5b3ad502
- filesource - https://treeherder.mozilla.org/jobs?repo=try&revision=65bdb1f374787b762e31dc0707a901cdb1860be9
- l10nregistry - https://treeherder.mozilla.org/jobs?repo=try&revision=bb6bc06f347767463e4a63748a029cdec3b10dd6
- localization - https://treeherder.mozilla.org/jobs?repo=try&revision=1a22f2f6fcdc417547abbaab707fda41f7c66806
All oranges seem to be known intermittent. As I stated in previous comments - we definitely affect the frequency of many of them which may help debug them and fix the raciness.
I'll keep working on fixing the tests but I'd like to talk about landing this next week assuming we can finish the reviews even if we still have some intermittents. None of them seem to be affecting users and I struggle to get help from code authors when I have to ask them to apply 38 patch patchset and rebuild Firefox to reproduce their intermittents.
One functional area to validate is the microtask shift that I flagged in comment 38. I'll be looking into it this week.
NI on :nika to bring attention to comment 38 :)
Assignee | ||
Comment 80•3 years ago
|
||
I spun off bug 1717241 to handle prefetch as a follow up to this work.
Performance without prefetch looks good: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=ca904ed8b066e7c8e1953aa3ea1ad7b4006c271e&newProject=try&newRevision=2896e14a12f069b66b1d3657dcbb7cf0c47b590e&framework=1
Comment 81•3 years ago
|
||
(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #78)
:nika - when you're reviewing this, the amount of oranges indicates to me that we're inherently affecting some task/microtask in this move from jsm->native as it flares a lot of racy tests.
The previous two parts (FileSource and L10nRegistry) don't trigger it, so I assume something changes about it in this patchset around
FormatMessages
(which is used by the Fluent DOM under the hood) timings which extenuates the races.I'm happy to see that we get a chance to identify and fix those races, but I'd like to ask you to pay attention to when reviewing to if we may be accidentally pushing something that used to be a microtask to a new task and if so, should we be concerned about perf impact?
I tested talos and telemetry and it seems like this should improve the perf quite nicely, but it also may be that some new event loop cycle is not captured by the talos tests.
The use of spawn_current_thread
will always cause at least one spin of the event loop to occur after the task is spawned before the code for the task is run, so any function which uses that will always spin the event loop at least once, even if no async work needs to be done. We probably don't need to worry too much about the perf impact, but that might be what you're observing.
Avoiding this extra pass to the event loop would technically be possible, but would take some work and wouldn't be something we'd want to happen in all cases (as it would mean that the spawn
call could evaluate synchronously, which would be quite surprising).
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 82•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 83•3 years ago
|
||
Green try: https://treeherder.mozilla.org/jobs?repo=try&revision=26605e722071f894e06eace21a2825092e984bcc
My plan is to land FileSource and L10nRegistry first, and iron out the part 17 with :mixedpuppy before landing this bug next week.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 84•3 years ago
|
||
Comment 85•3 years ago
|
||
Comment 86•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f542b20eaa7b
https://hg.mozilla.org/mozilla-central/rev/5b399ae149f6
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 87•3 years ago
|
||
Updated•3 years ago
|
Comment 88•3 years ago
|
||
Comment 89•3 years ago
•
|
||
Backed out for several test failures.
Failure logs:
-
newtab: https://treeherder.mozilla.org/logviewer?job_id=347099969&repo=autoland
-
non-unified: https://treeherder.mozilla.org/logviewer?job_id=347097644&repo=autoland
-
browser chrome:
https://treeherder.mozilla.org/logviewer?job_id=347099957&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=347099966&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=347099752&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=347100618&repo=autoland
https://treeherder.mozilla.org/logviewer?job_id=347100843&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/87e93141e0869f6da5b08f3ba1fd133bfe7d4433
Assignee | ||
Comment 90•3 years ago
|
||
Hmm, seems like this landed only part1-part11.
Comment 91•3 years ago
|
||
Comment 92•3 years ago
|
||
Comment 93•3 years ago
•
|
||
Backed out for causing build bustages complaining about Document.cpp.
Backout link: https://hg.mozilla.org/integration/autoland/rev/4d5846f47a99e2302b8d6f792d0af824e7c68848
Failure log:
-for bustage: https://treeherder.mozilla.org/logviewer?job_id=347225349&repo=autoland&lineNumber=5359
-for bc failures: https://treeherder.mozilla.org/logviewer?job_id=347232795&repo=autoland&lineNumber=3819
https://treeherder.mozilla.org/logviewer?job_id=347232708&repo=autoland&lineNumber=13214
Comment 94•3 years ago
|
||
Comment 95•3 years ago
|
||
Comment 96•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/878b052ebeaa
https://hg.mozilla.org/mozilla-central/rev/92b87b0f776c
https://hg.mozilla.org/mozilla-central/rev/7e901027262a
https://hg.mozilla.org/mozilla-central/rev/1873a25dc33d
https://hg.mozilla.org/mozilla-central/rev/7e85ba836ad5
https://hg.mozilla.org/mozilla-central/rev/cd03ec17b66c
https://hg.mozilla.org/mozilla-central/rev/ec4e0ef9955d
https://hg.mozilla.org/mozilla-central/rev/181487bdf6c0
https://hg.mozilla.org/mozilla-central/rev/85194e5b7a9e
https://hg.mozilla.org/mozilla-central/rev/41ff3770502d
https://hg.mozilla.org/mozilla-central/rev/0d942291a203
https://hg.mozilla.org/mozilla-central/rev/c85e6e320576
https://hg.mozilla.org/mozilla-central/rev/48fa4e836ca1
https://hg.mozilla.org/mozilla-central/rev/0950a9e56ceb
https://hg.mozilla.org/mozilla-central/rev/e030a54c907f
https://hg.mozilla.org/mozilla-central/rev/064a935a55ce
https://hg.mozilla.org/mozilla-central/rev/bce61215b4d0
https://hg.mozilla.org/mozilla-central/rev/120f83c79d5c
https://hg.mozilla.org/mozilla-central/rev/343aef4f4fe4
Assignee | ||
Updated•3 years ago
|
Comment 97•3 years ago
|
||
== Change summary for alert #30856 (as of Tue, 10 Aug 2021 10:40:10 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
3% | welcome | ContentfulSpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,650.75 -> 1,608.92 |
2% | welcome | ContentfulSpeedIndex | linux1804-64-shippable-qr | cold webrender | 1,642.88 -> 1,616.00 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30856
Comment 98•3 years ago
|
||
== Change summary for alert #30870 (as of Tue, 10 Aug 2021 18:35:23 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
9% | about_preferences_basic | linux1804-64-shippable-qr | e10s stylo webrender | 139.37 -> 126.31 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30870
Updated•2 years ago
|
Description
•