Make Localization.jsm stateless
Categories
(Core :: Internationalization, task, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox79 | --- | fixed |
People
(Reporter: zbraniecki, Assigned: zbraniecki)
References
(Blocks 1 open bug)
Details
Attachments
(10 files, 1 obsolete file)
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
5.56 KB,
text/plain
|
Details | |
|
6.02 KB,
patch
|
Details | Diff | Splinter Review | |
|
1.21 KB,
patch
|
Details | Diff | Splinter Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review |
As the next step on the path to bug 1613705 after bug 1627809 I'd like to make JSM class stateless.
This would mean that all state is stored in C++ and JSM is called only to perform async iteration. This would make it easy to load the JSM lazily the first time we need to translate things until we remove it completely.
| Assignee | ||
Updated•6 years ago
|
| Assignee | ||
Comment 1•6 years ago
|
||
| Assignee | ||
Comment 2•6 years ago
|
||
Depends on D71676
| Assignee | ||
Updated•6 years ago
|
| Assignee | ||
Comment 3•6 years ago
|
||
| Assignee | ||
Comment 4•6 years ago
|
||
| Assignee | ||
Comment 5•6 years ago
|
||
Depends on D71814
| Assignee | ||
Comment 6•6 years ago
|
||
| Assignee | ||
Comment 7•6 years ago
|
||
With this patchset (on top of bug 1627809), and if I disable localization of 6 strings from Web Components, I can once again lazy-load JSM only when the initial translation is required!
I'll clean it all up a bit and measure perf.
| Assignee | ||
Comment 8•6 years ago
|
||
Olli, this is not yet a review request, but I'm running into a strange performance regression between D71814 and D71815.
The whole patchset shows a set of regressions in about_newtab_with_snippets, about_newtab_with_snippets responsiveness and tabswitch
Here's a perfherder between bug 1627809, and this patchset: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8a7e913268d4abf76b3a37234e178f91a4739822&newProject=try&newRevision=7fe2a312f72cab5eb4434e13286a1434ef6991a0&framework=1
I went through the patchset here one by one, and the first one to show a significant regression is D71815. Here's perfherder for it - https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=213caddd78ada33c36794113323274bf83161621&newProject=try&newRevision=4b9bf729a0cbd7d5a804a560efb5ac24a6565e7f&framework=1
Before this patch, I store generateBundles and generateBundlesSync in C++ and pass it to Localization.jsm where in onChange [0] generates bundle iterator (this.bundles) and stores it in JS.
In most runtime scenarios generateBundles and generateBundlesSync is null and Localization.jsm uses default generator functions.
After this patch, I call regenerateBundles in JSM which returns the bundle iterator to C++ where it is stored as mBundles. Then I pass it to each format* function.
The only thing I can say is that while custom generateBundles and generateBundlesSync is rare, the iterator stored in mBundles is there always.
I wanted to move in this direction so that next I can move the mBundles generation to L10nRegistry and eventually move the remaining functions to C++ and remove Localization.jsm all together.
Can you take a look at this patch and tell me if anything potentially performance sensitive looks suspicious in it - maybe rooting is costly? Or maybe there's a better way to store an async iterator over FluentBundle (IDL backed)?
[0] https://searchfox.org/mozilla-central/source/intl/l10n/Localization.jsm#499
Comment 9•6 years ago
|
||
I don't see immediately anything suspicious. Perf profiles would be good.
Updated•6 years ago
|
| Assignee | ||
Comment 10•5 years ago
|
||
Here's the minimized testcase of the problem.
As per previous talos runs, there's no impact on tspaint, tpaint or sessionrestore, but there is a quite substantial one on about_newtab_with_snippets - both firstpaint and responsiveness.
The minimized testcase is a bit oversimplified - in the real patch I am moving this.bundles to be stored in C++, but that requires an update to a number of functions and I verified that the moment the regression appears is when I assign the result of the call to JSM (which is the iterator returned by the cached method) to mBundles.
It happens both in the full D71815 patch and in this minimized testcase, so I hope this testcase is going to be useful to track down the reason.
I worked today with :emilio and :smaug on #dom looking for what may cause it and we couldn't find anything in CC log, profile etc. but I noticed that it doesn't seem like the regression appears with tpcycles 1 and it only is visible with tpcycles 10 (default in talos is 20).
It may be some leak, or something related to garbage collection, and :smaug suggested that maybe there's some large object that we have to allocate/deallocate and trace from C++ with the patch.
:emilio noticed that the about:newtab uses fluent-react which uses its own Localization instance and calls to L10nRegistry, but I'm not sure if it also uses the WebIDL Localization or not. If not, why does this change affect that page at all?
I'm going to NI :emilio and :smaug in hopes that they may find time to help me further figure that out.
Comment 11•5 years ago
|
||
I think there's a way to get a profile from a Talos run... That's the easiest way to figure this out. Alexandru, do you know how?
Comment 12•5 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #11)
I think there's a way to get a profile from a Talos run... That's the easiest way to figure this out. Alexandru, do you know how?
You can pass --gecko-profile to the try push and then they show up when selecting a run in treeherder. In this case you'd probably want something like ./mach try fuzzy -q "'talos-chrome-e10s" --gecko-profile.
| Assignee | ||
Comment 13•5 years ago
|
||
I'll try that next.
Removing r?'s because I can't be sure this patchset will end up being a possible way forward.
| Assignee | ||
Comment 14•5 years ago
|
||
(In reply to Brian Grinstead [:bgrins] from comment #12)
You can pass --gecko-profile to the try push and then they show up when selecting a run in treeherder. In this case you'd probably want something like
./mach try fuzzy -q "'talos-chrome-e10s" --gecko-profile.
The way I've been launching the talos run is:
./mach try -b o -p all -u none -t chromez --rebuild-talos 20
since adding --gecko-profiler is not working with this syntax, do you know how can I produce that? fuzzy doesn't seem to allow to put --rebuild-talos 20.
Updated•5 years ago
|
Comment 15•5 years ago
|
||
(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #14)
(In reply to Brian Grinstead [:bgrins] from comment #12)
You can pass --gecko-profile to the try push and then they show up when selecting a run in treeherder. In this case you'd probably want something like
./mach try fuzzy -q "'talos-chrome-e10s" --gecko-profile.The way I've been launching the talos run is:
./mach try -b o -p all -u none -t chromez --rebuild-talos 20since adding
--gecko-profileris not working with this syntax, do you know how can I produce that?fuzzydoesn't seem to allow to put--rebuild-talos 20.
If you are taking a profile then you shouldn't need rebuilds - you don't need the pushes for comparison, just to gather a single profile.
But if you do want to re-run talos with fuzzy then you can use the --rebuild flag instead of --rebuild-talos
| Assignee | ||
Comment 16•5 years ago
|
||
I ran two try runs as per comment 12 suggestion:
- without regression: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c06a5c411fb3e50d5501044b3269a2dc6196834
- with regression: https://treeherder.mozilla.org/#/jobs?repo=try&revision=92b6e538241febd1516aca86107c1beba673c533
Unfortunately I don't seem to be getting talos numbers out of them: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1c06a5c411fb3e50d5501044b3269a2dc6196834&newProject=try&newRevision=92b6e538241febd1516aca86107c1beba673c533&framework=1
which removes my ability to verify if the regression is visible in that try run.
But maybe there is a profile that may be useful out of it?
| Assignee | ||
Comment 17•5 years ago
|
||
Olli, Brian helped me to fish the profiles out of the runs. They're in each "c"'s Job Details.
For example Windows 10 x64 QuantumRender Shippable opt profiles:
- without regression: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/btsMPskBSzuv3Q7gLuiwcQ/runs/0/artifacts/public/test_info/profile_about_newtab_with_snippets.zip
- with regression: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/LfW9rPsNSiipYO1La_Iieg/runs/0/artifacts/public/test_info/profile_about_newtab_with_snippets.zip
I haven't run a parallel try to verify that the talos regression is still there, but I run the same patches. Do you see anything related to this perf issue in those profiles?
If not, I'll redo it rebased on today's master with talos runs to confirm the regression and new profiles.
| Assignee | ||
Comment 18•5 years ago
|
||
link to profiler:
- pre-regression: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FbtsMPskBSzuv3Q7gLuiwcQ%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_about_newtab_with_snippets.zip
- regression: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FLfW9rPsNSiipYO1La_Iieg%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_about_newtab_with_snippets.zip
| Assignee | ||
Comment 19•5 years ago
|
||
Smaug suggested to introduce a Destroy method in Localization class which erases mBundles not just on UNLINK and in the destructor, but also from Document::Destroy which happens earlier.
This erased the regression!
master vs. this patchset: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9e20666939f85da8dacc2990bceb34e9fb524859&newProject=try&newRevision=aefe0311d986dfd6b8b4c888ed2147ec3c8c3c42&framework=1
master vs. this patchset + skip non-proto: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9e20666939f85da8dacc2990bceb34e9fb524859&newProject=try&newRevision=c263bcb82dd0241f12204fe3fe18a358556d9aa8&framework=1
The latter is just adding one more change that emulates perf we'd get if we were able to fix bug 1630743. Here's the patch https://hg.mozilla.org/try/rev/8ac6c6e8bf52c3f06cb8cf72af3dbf3cf5495d7d
In other words - the former is the perf if we land this. The latter is the perf if we land this and solve how to actually make us load the JSM lazy in browser.xhtml.
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
Backed out 6 changesets (bug 1631593) for Assertion failure in xpcom/base/nsCycleCollector.cpp. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302938532&repo=autoland&lineNumber=995
Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=35b4cd4bb12a8bb12cf30a4b028ce9005af4e84d
Backout:
https://hg.mozilla.org/integration/autoland/rev/b560bd9c4c8fcdb1f24346061e92d04a266f1367
Comment 22•5 years ago
|
||
The assertion is: Assertion failure: result->mParticipant == aParticipant (nsCycleCollectionParticipant shouldn't change!)
I haven't looked at your patch, but off the top of my head, one thing that I think could cause that is that you have a cycle collected class A with a parent class B, and in B's ctor, |this| gets AddRefed. At that point, the vtable of the object will be B, so it'll call B's AddRef. But then later it'll have the vtable of A.
| Assignee | ||
Comment 23•5 years ago
|
||
I got help from :mccr8 and :smaug today on Matrix, but we ended up not being able to find the cause just yet.
I ended up dissecting my patchset until I was able to identify a small change that reliably triggers the issue.
Attached patchset can be applied on top of D71679 (Move aIsSync to Localization C++.) to reproduce the crash in debug mode with ./mach test dom/l10n/tests/mochitest/dom_localization/ --headless.
The crash does not happen on any single test, but it happens when running against the whole directory at the end:
0:12.42 INFO TEST-START | Shutdown
0:12.42 INFO Passed: 55
0:12.42 INFO Failed: 0
0:12.42 INFO Todo: 0
0:12.42 INFO Mode: non-e10s
0:12.42 INFO Slowest: 1081ms - chrome://mochitests/content/chrome/dom/l10n/tests/mochitest/dom_localization/test_attr_sanitized.html
0:12.42 INFO SimpleTest FINISHED
0:12.42 INFO TEST-INFO | Ran 1 Loops
0:12.42 INFO SimpleTest FINISHED
0:12.49 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.49 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.49 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.49 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.49 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.49 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.50 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.50 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.56 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.56 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.58 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.58 GECKO(175168) [175168, Main Thread] WARNING: '!inner', file /projects/mozilla-unified/dom/ipc/JSWindowActorProtocol.cpp, line 172
0:12.60 GECKO(175168) [175168, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111 (NS_ERROR_NOT_AVAILABLE): file /projects/mozilla-unified/netwerk/cache/nsApplicationCacheService.cpp, line 161
0:12.76 GECKO(175168) [2020-05-22T02:16:03Z WARN xulstore::persist] tried to remove key that isn't in the store
0:12.76 GECKO(175168) [2020-05-22T02:16:03Z WARN xulstore::persist] tried to remove key that isn't in the store
0:13.49 GECKO(175168) 1590113764377 Marionette TRACE Received observer notification xpcom-will-shutdown
0:13.49 GECKO(175168) 1590113764377 Marionette INFO Stopped listening on port 2828
0:13.49 GECKO(175168) 1590113764378 Marionette DEBUG Marionette stopped listening
0:13.59 GECKO(175168) [175168, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /projects/mozilla-unified/js/xpconnect/loader/mozJSComponentLoader.cpp, line 228
0:13.59 GECKO(175168) [175168, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /projects/mozilla-unified/js/xpconnect/loader/mozJSComponentLoader.cpp, line 255
0:14.88 GECKO(175168) Assertion failure: result->mParticipant == aParticipant (nsCycleCollectionParticipant shouldn't change!), at /projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:2014
Initializing stack-fixing for the first stack frame, this may take a while...
0:34.36 GECKO(175168) #01: CCGraphBuilder::AddPurpleRoot(void*, nsCycleCollectionParticipant*) (/projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:2028)
0:34.36 GECKO(175168) #02: SelectPointersVisitor::Visit(nsPurpleBuffer&, nsPurpleBufferEntry*) (/projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:1053)
0:34.36 GECKO(175168) #03: void nsPurpleBuffer::VisitEntries<SelectPointersVisitor>(SelectPointersVisitor&) (/projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:957)
0:34.36 GECKO(175168) #04: nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (/projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:1068)
0:34.36 GECKO(175168) #05: nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (/projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:3440)
0:34.36 GECKO(175168) #06: nsCycleCollector::ShutdownCollect() (/projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:3352)
0:34.36 GECKO(175168) #07: nsCycleCollector::Shutdown(bool) (/projects/mozilla-unified/xpcom/base/nsCycleCollector.cpp:3644)
0:34.36 GECKO(175168) #08: nsCycleCollector_shutdown(bool) (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/include/mozilla/RefPtr.h:66)
0:34.36 GECKO(175168) #09: mozilla::ShutdownXPCOM(nsIServiceManager*) (/projects/mozilla-unified/xpcom/build/XPCOMInit.cpp:719)
0:34.36 GECKO(175168) #10: ScopedXPCOMStartup::~ScopedXPCOMStartup() (/projects/mozilla-unified/toolkit/xre/nsAppRunner.cpp:1278)
0:34.36 GECKO(175168) #11: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/include/mozilla/cxxalloc.h:51)
0:34.37 GECKO(175168) #12: XRE_main(int, char**, mozilla::BootstrapConfig const&) (/projects/mozilla-unified/toolkit/xre/nsAppRunner.cpp:4807)
0:34.48 GECKO(175168) #13: main (/projects/mozilla-unified/browser/app/nsBrowserApp.cpp:0)
0:34.48 GECKO(175168) #14: __libc_start_main (/usr/lib/libc.so.6 + 0x27002)
0:34.48 GECKO(175168) #15: _start (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/bin/firefox + 0x3902e)
0:34.48 GECKO(175168) #16: ??? (???:???)
0:34.48 GECKO(175168) ExceptionHandler::GenerateDump cloned child 175396
0:34.48 GECKO(175168) ExceptionHandler::SendContinueSignalToChild sent continue signal to child
0:34.48 GECKO(175168) ExceptionHandler::WaitForContinueSignal waiting for continue signal...
0:34.49 Main app process: exit 11
0:34.49 INFO Buffered messages finished
0:34.49 ERROR TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11
0:34.49 INFO runtests.py | Application ran for: 0:00:32.086903
0:34.49 INFO zombiecheck | Reading PID log: /tmp/tmpkIOg41pidlog
0:34.49 INFO ==> process 175168 launched child process 175179
0:34.50 INFO zombiecheck | Checking for orphan process with PID: 175179
0:34.50 CRASH: pid:175063. Test:Last test finished. Minidump analysed:False. Signature:[None]
Mozilla crash reason: MOZ_ASSERT(result->mParticipant == aParticipant) (nsCycleCollectionParticipant shouldn't change!)
Crash dump filename: /tmp/tmpX4VMLo.mozrunner/minidumps/54ed273b-aa3c-e3ae-58a1-e42f99a53513.dmp
MINIDUMP_STACKWALK not set, can't process dump.
0:34.50 INFO Stopping web server
0:34.50 INFO Stopping web socket server
0:34.52 INFO Stopping ssltunnel
0:34.54 INFO leakcheck | Processing log file /tmp/tmpX4VMLo.mozrunner/runtests_leaks.log
0:34.54 INFO TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes
0:34.54 INFO TEST-INFO | leakcheck | forkserver process: leak threshold set at 0 bytes
0:34.54 INFO TEST-INFO | leakcheck | gmplugin process: leak threshold set at 20000 bytes
0:34.54 INFO TEST-INFO | leakcheck | gpu process: leak threshold set at 0 bytes
0:34.54 INFO TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes
0:34.54 INFO TEST-INFO | leakcheck | rdd process: leak threshold set at 400 bytes
0:34.54 INFO TEST-INFO | leakcheck | socket process: leak threshold set at 0 bytes
0:34.54 INFO TEST-INFO | leakcheck | tab process: leak threshold set at 0 bytes
0:34.54 INFO TEST-INFO | leakcheck | vr process: leak threshold set at 0 bytes
0:34.55 INFO leakcheck | Processing leak log file /tmp/tmpX4VMLo.mozrunner/runtests_leaks.log
0:34.55 FAIL leakcheck: default missing output line for total leaks!
0:34.55 INFO runtests.py | Running tests: end.
0:34.55 INFO Buffered messages finished
0 INFO TEST-START | Shutdown
1 INFO Passed: 55
2 INFO Failed: 0
3 INFO Todo: 0
4 INFO Mode: non-e10s
5 INFO SimpleTest FINISHED
0:34.55 INFO Buffered messages finished
0:34.55 SUITE_END
0:34.55
Overall Summary
===============
mochitest-chrome
~~~~~~~~~~~~~~~~
Ran 89 checks (17 asserts, 55 subtests, 17 tests)
Expected results: 89
Unexpected results: 0
Unexpected Results
------------------
ERROR TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11
The exact point of when the crash become reproducable are the changes in Localization::Activate. In other words, just adding mGenerateBundles and mGenerateBundlesSync to the class and hooking them into CC doesn't trigger it.
But populating them in Activate triggers the crash.
I don't know what it means exactly and how to address it. I suspect some changes are required for DOMLocalization and/or DocumentL10n to handle those fields in them, but looking at other examples of class inheritance with fields doesn't help and I couldn't find another class with JS::Heap<JS::Value> field that extends another class.
I hope that with that minimized testcase :mccr8 and :smaug will be able to help me.
Comment 24•5 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #22)
I haven't looked at your patch, but off the top of my head, one thing that I think could cause that is that you have a cycle collected class A with a parent class B, and in B's ctor, |this| gets AddRefed. At that point, the vtable of the object will be B, so it'll call B's AddRef. But then later it'll have the vtable of A.
Isn't the problem the call to |mozilla::HoldJSObjects(this)| in the Localization ctor? Seems to me like the QI that HoldJSObjects does to get the CC participant will not return the right participant at that point if the this object is a DOMLocalization (which inherits from Localization). Not really sure why we can't just call HoldJSObjects when we actually get JS objects to hold in Localization::Activate instead.
Comment 25•5 years ago
|
||
Good catch. I forgot that that QI is done eagerly. I'd guess that there aren't any other non-wrapper-cached nsISupports JS-holding CCed classes. In general, I think it is nice to unconditionally call hold in the ctor so you don't forget it, but it seems like a reasonable workaround here.
| Assignee | ||
Comment 26•5 years ago
|
||
Thank you Peter! That was it!
I moved the HoldJSObjects to Activate and them moved RegisterObservers to Init.
I kept the bool Init in Localization so that I don't have to mess with signatures when overriding it in DocumentL10n. Hope that's ok.
| Assignee | ||
Comment 27•5 years ago
|
||
Here's try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3546a98004283bc37b5ef57cdea500d304d478fe
Changes since May 15th:
- Switched Localization to use
NS_IMPL_CYCLE_COLLECTION_MULTI_ZONE_JSHOLDER_CLASSformBundles - Moved
mozilla::HoldJSObjects(this);toLocalization::Activate - Created
Localization::InitandLocalization::Createand movedRegisterObserversthere
If I understand correctly, I need the MULTI_ZONE here because mBundles is an iterator over bundles that are referencing FluentResource that is assigned to L10nRegistry (singleton per process) and can be used in multiple documents.
| Assignee | ||
Comment 28•5 years ago
|
||
I see two new regressions:
- browser/base/content/test/performance/browser_windowopen.js - started looking into it and it's coming up early. In D71677. I suspect something about difference in triggering
onChangein result of Jonathan's feedback. - browser/components/syncedtabs/test/browser/browser_sidebar_syncedtabslist.js - seems like we changed the promise rejection in some place. Will investigate more tomorrow.
| Assignee | ||
Comment 29•5 years ago
|
||
A new regression, that I again need help with. :(
Running the anti-flicker check test in debug mode with this little patch that should be a no-op, leads to flicker regressions.
All the patch is changing is moving the call to onChange from happening in 4 add/removeResourceId(s) calls in JSM, to the same 4 calls in C++ (which call JSM's onChange).
This should be a complete no-op. Nothing else calls those JSM functions as far as I understand, since they're not exposed (nothing uses mozILocalization except of Localization.cpp) and it should work exactly the same. But it isn't, it seems to cause flickers.
To reproduce, apply this patch on top of central and run:
./mach test browser/base/content/test/performance/browser_windowopen.js --headless
The current result:
Unexpected Results
------------------
browser/base/content/test/performance/browser_windowopen.js
FAIL unexpected changed rect: ({x1:325, x2:475, y1:47, y2:60, w:151, h:14}), window width: 1366 -
Stack trace:
chrome://mochikit/content/browser-test.js:test_ok:1299
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:reportUnexpectedFlicker/rects<:611
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:reportUnexpectedFlicker:602
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:withPerfObserver:671
chrome://mochitests/content/browser/browser/base/content/test/performance/browser_windowopen.js:null:106
chrome://mochikit/content/browser-test.js:Tester_execTest/<:1064
chrome://mochikit/content/browser-test.js:Tester_execTest:1104
chrome://mochikit/content/browser-test.js:nextTest/<:927
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:918
FAIL unexpected changed rect: ({x1:481, x2:537, y1:47, y2:57, w:57, h:11}), window width: 1366 -
Stack trace:
chrome://mochikit/content/browser-test.js:test_ok:1299
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:reportUnexpectedFlicker/rects<:611
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:reportUnexpectedFlicker:602
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:withPerfObserver:671
chrome://mochitests/content/browser/browser/base/content/test/performance/browser_windowopen.js:null:106
chrome://mochikit/content/browser-test.js:Tester_execTest/<:1064
chrome://mochikit/content/browser-test.js:Tester_execTest:1104
chrome://mochikit/content/browser-test.js:nextTest/<:927
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:918
FAIL should have 0 unknown flickering areas - Got 2, expected 0
Stack trace:
chrome://mochikit/content/browser-test.js:test_is:1327
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:reportUnexpectedFlicker:628
chrome://mochitests/content/browser/browser/base/content/test/performance/head.js:withPerfObserver:671
chrome://mochitests/content/browser/browser/base/content/test/performance/browser_windowopen.js:null:106
chrome://mochikit/content/browser-test.js:Tester_execTest/<:1064
chrome://mochikit/content/browser-test.js:Tester_execTest:1104
chrome://mochikit/content/browser-test.js:nextTest/<:927
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:918
I suspected that maybe calling JSM twice has something to do with it, but a broader test where I moved resourceIds to CPP and called JSM only via onChange (so, one call per add/removeResourceId(s), kept reproducing this regression.
I spent 2 days now reproducing different variants of the same regression with minimized testcases and I'm a bit out of ideas on what may be causing it and would like to ask for some sanity check on how this can have any effect on racing, timing, and in result flickering.
Hoping that Peter or Olli will notice where am I making a mistake.
| Assignee | ||
Comment 30•5 years ago
|
||
Actually, even a smaller, two line patch, reproduces it!
| Assignee | ||
Comment 31•5 years ago
|
||
The second orange also seems to be related to the OnChange switch. It only starts being reproducible after D71815 is applied, but the stack is:
0:06.32 GECKO(199110) [Parent 199110, Main Thread] ###!!! ASSERTION: Want to fire DOMNodeRemoved event, but it's not safe: 'Error', file /projects/mozilla-unified/dom/base/nsContentUtils.cpp, line 4469
Initializing stack-fixing for the first stack frame, this may take a while...
0:07.83 INFO runtests.py | Waiting for browser...
0:26.11 GECKO(199110) #01: nsINode::RemoveChild(nsINode&, mozilla::ErrorResult&) (/projects/mozilla-unified/dom/base/nsINode.cpp:777)
0:26.12 GECKO(199110) #02: mozilla::dom::L10nOverlays::GetNodeForNamedElement(mozilla::dom::Element*, mozilla::dom::Element*, nsTArray<mozilla::dom::L10nOverlaysError>&, mozilla::ErrorResult&) (/projects/mozilla-unified/dom/l10n/L10nOverlays.cpp:279)
0:26.12 GECKO(199110) #03: mozilla::dom::L10nOverlays::OverlayChildNodes(mozilla::dom::DocumentFragment*, mozilla::dom::Element*, nsTArray<mozilla::dom::L10nOverlaysError>&, mozilla::ErrorResult&) (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/include/mozilla/AlreadyAddRefed.h:146)
0:26.12 GECKO(199110) #04: mozilla::dom::L10nOverlays::TranslateElement(mozilla::dom::Element&, mozilla::dom::L10nMessage const&, nsTArray<mozilla::dom::L10nOverlaysError>&, mozilla::ErrorResult&) (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/include/mozilla/ErrorResult.h:434)
0:26.12 GECKO(199110) #05: mozilla::dom::DOMLocalization::ApplyTranslations(nsTArray<nsCOMPtr<mozilla::dom::Element> >&, nsTArray<mozilla::dom::Nullable<mozilla::dom::L10nMessage> >&, nsXULPrototypeDocument*, mozilla::ErrorResult&) (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/include/mozilla/ErrorResult.h:434)
0:26.13 GECKO(199110) #06: mozilla::dom::DOMLocalization::TranslateElements(mozilla::dom::Sequence<mozilla::OwningNonNull<mozilla::dom::Element> > const&, nsXULPrototypeDocument*, mozilla::ErrorResult&) (/projects/mozilla-unified/dom/l10n/DOMLocalization.cpp:324)
0:26.13 GECKO(199110) #07: mozilla::dom::DOMLocalization::TranslateFragment(nsINode&, mozilla::ErrorResult&) (/projects/mozilla-unified/dom/l10n/DOMLocalization.cpp:155)
0:26.13 GECKO(199110) #08: mozilla::dom::DOMLocalization::TranslateRoots(mozilla::ErrorResult&) (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/include/mozilla/AlreadyAddRefed.h:146)
0:26.13 GECKO(199110) #09: mozilla::dom::DOMLocalization::OnChange() (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dist/include/mozilla/AlreadyAddRefed.h:146)
0:26.13 GECKO(199110) #10: mozilla::intl::Localization::AddResourceId(nsTSubstring<char16_t> const&) (/projects/mozilla-unified/intl/l10n/Localization.cpp:184)
0:26.13 GECKO(199110) #11: mozilla::dom::Document::LocalizationLinkAdded(mozilla::dom::Element*) (/projects/mozilla-unified/dom/base/Document.cpp:0)
0:26.13 GECKO(199110) #12: mozilla::dom::HTMLLinkElement::BindToTree(mozilla::dom::BindContext&, nsINode&) (/projects/mozilla-unified/dom/html/HTMLLinkElement.cpp:134)
0:26.13 GECKO(199110) #13: nsINode::InsertChildBefore(nsIContent*, nsIContent*, bool) (/projects/mozilla-unified/dom/base/nsINode.cpp:1520)
0:26.13 GECKO(199110) #14: nsINode::ReplaceOrInsertBefore(bool, nsINode*, nsINode*, mozilla::ErrorResult&) (/projects/mozilla-unified/dom/base/nsINode.cpp:2662)
0:26.13 GECKO(199110) #15: mozilla::dom::Node_Binding::appendChild(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&) (/projects/mozilla-unified/obj-x86_64-pc-linux-gnu-dbg/dom/bindings/NodeBinding.cpp:990)
0:26.13 GECKO(199110) #16: bool mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*) (/projects/mozilla-unified/dom/bindings/BindingUtils.cpp:3194)
0:26.14 GECKO(199110) #17: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) (/projects/mozilla-unified/js/src/vm/Interpreter.cpp:493)
0:26.14 GECKO(199110) #18: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (/projects/mozilla-unified/js/src/vm/Interpreter.cpp:585)
0:26.14 GECKO(199110) #19: js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) (/projects/mozilla-unified/js/src/jit/BaselineIC.cpp:2990)
0:26.14 GECKO(199110) #20: ??? (???:???)
So my hope is that both oranges are the same issue - the change to OnChange which can be reproduced with the minimal testacase, makes us fire the translation via JSM's OnChange at a different moment. The first failure in browser_windowopen.js is reproducible with the minimized two-line testcase,
the latter requires the patch-stack but shows that with the patchset we fire OnChange at the time when we shouldn't.
I don't understand how the two-liner causes such a drastic change in the even loop.
Comment 32•5 years ago
|
||
That assertion shows a rather bad issue. Something trying to modify DOM inside BindToTree. Need to move the code doing the modification to happen later, possibly using AddScriptRunner.
| Assignee | ||
Comment 33•5 years ago
|
||
That assertion shows a rather bad issue. Something trying to modify DOM inside BindToTree. Need to move the code doing the modification to happen later, possibly using AddScriptRunner.
I understand that, but can you tell me why this two-line patch https://bugzilla.mozilla.org/attachment.cgi?id=9152286 applied on master causes that?
It's very confusing to me because I don't think it should change anything. And if it changes some timing, then I don't understand why, which makes it very hard to work with this code.
Why did we not need the AddScriptRunner before? Is it possible that this patch regresses that, or is it more likely that the bug existed before but with this change it makes it be reported?
I'm afraid of playing with event loop and the microtasks between JS and CPP not understanding why my changes affect that.
| Assignee | ||
Comment 34•5 years ago
|
||
I found it. Both bugs were caused by the OnChange change which made me actually call DOMLocalization::OnChange rather than Localization::OnChange from Add/RemoveResourceId(s).
Switching OnChange to Localization::OnChange fixed both.
I'm sorry for bothering you Peter and Olli so much, I was assuming it's some unknown in the microtask system around DOM and XPCOM, but it's just my subclassing. I triggered new try https://treeherder.mozilla.org/#/jobs?repo=try&revision=c87c07a228e6a4b1ebf521293a889c231815d82f
If it comes green, I'll be able to land.
Thank you!
| Assignee | ||
Comment 35•5 years ago
|
||
The remaining orange:
TEST-UNEXPECTED-FAIL | browser/components/syncedtabs/test/browser/browser_sidebar_syncedtabslist.js | A promise chain failed to handle a rejection: Encountered unsupported value type writing stack-scoped structured clone - stack: (No stack available.)
was caused by us calling Destroy on DocumentL10n which set mBundles to undefined, but didn't cancel the mutation observer, so mutations that happened after the document started being destroyed were triggering localization with empty bundles and since I didn't think its possible I removed the if (!bundles) check in formatWithFallback and the error was from for async (let bundle of bundles) on undefined which cannot produces an error that is not supported.
I added that check back so that the error is a string and added DOMLocalization::Destroy which removed the mutation observer.
| Assignee | ||
Comment 36•5 years ago
|
||
Comment 37•5 years ago
|
||
Comment 38•5 years ago
|
||
Backed out for perma failures.
Failrue logs:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=304437180&repo=autoland&lineNumber=3502
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=304437178&repo=autoland&lineNumber=3444
Backout: https://hg.mozilla.org/integration/autoland/rev/8b62adf33598b50a04bc44eec167827dd9ede178
| Assignee | ||
Comment 39•5 years ago
|
||
| Assignee | ||
Comment 40•5 years ago
|
||
The issue is that aboutNetError.js uses async localization to translate some of the strings, and those may end up applying the translations later.
It doesn't affect visuals, because the strings are hidden by default, but it may affect the tests because the tests don't wait for the async functions to fire the AboutNetErrorLoad event.
Some of those async calls can be avoided and I filed bug 1642725 for that, but to unblock this patchset, I'd like to make the tests wait for l10n to be applied just like we do in L10n tests - https://searchfox.org/mozilla-central/search?q=verifyL10n&case=true&path=
Updated•5 years ago
|
Comment 41•5 years ago
|
||
Comment 42•5 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/af4747c3599d
https://hg.mozilla.org/mozilla-central/rev/b2be75c1d585
https://hg.mozilla.org/mozilla-central/rev/c2e9a178e0a3
https://hg.mozilla.org/mozilla-central/rev/6a395d69e5d2
https://hg.mozilla.org/mozilla-central/rev/de37ec69840f
https://hg.mozilla.org/mozilla-central/rev/4697d1ca4081
https://hg.mozilla.org/mozilla-central/rev/186da53817cc
Description
•