Testcase resizing canvas N times in a loop without allocating is 31x slower in Firefox, spending all the time in sweep/mark.
Categories
(Core :: JavaScript: GC, defect, P2)
Tracking
()
People
(Reporter: mayankleoboy1, Assigned: jonco, NeedInfo)
References
(Blocks 3 open bugs)
Details
(Keywords: perf-alert)
Attachments
(5 files)
Open testcase
Enter 500000 and click run.
Firefox: https://share.firefox.dev/4m369sA (52s)
Chrome: https://share.firefox.dev/44HQZU4 (1.7s)
Very artificial, but maybe there is something worth improving?
Not sure what the component should be. Tentatively putting under JS, and cc'ing jonco and Lee.
Reporter | ||
Updated•26 days ago
|
Reporter | ||
Comment 1•26 days ago
|
||
Reporter | ||
Comment 2•26 days ago
|
||
You can also see this testcase which creates canvas with size 16Kx16K.
Assignee | ||
Comment 3•24 days ago
|
||
I think what's happening is that the loop is repeatedly allocating memory (for the canvas) and then causing it to be freed again. The size change from allocation triggers a GC, but by the time it starts the memory has already been freed. The threshold for the next GC is then set based on almost nothing being allocated. The next allocation triggers another GC and so on.
Maybe we should check at the start of an allocation-triggered GC whether it is still necessary and abort if not.
This situation is pretty unlikely to happen in practice though.
Reporter | ||
Comment 4•24 days ago
|
||
Not to derail from the original bug and analysis, but i found that if i change the parallel marking parameters, i can get ~30% improvement on the testcase.
Testcase is the 16kX16k file.
- N=5000
javascript.options.mem.gc_max_parallel_marking_threads = 4
javascript.options.mem.gc_parallel_marking_threshold_mb = 1
- Profile: https://share.firefox.dev/46qG5mV (1.5s)
- Default nightly: https://share.firefox.dev/414Qfps (2.1-2.4s)
Maybe worth rechecking to reduce teh threshold size? (or maybe that should wait for the pending allocator changes)
Updated•23 days ago
|
Assignee | ||
Comment 5•22 days ago
|
||
This splits up malloc/JIT code checks that were conflated and tidies up a bunch
of other things.
JS::AddAssociatedMemory doesn't need to call Zone::maybeTriggerGCOnMalloc as
addCellMemory already does this.
Updated•22 days ago
|
Assignee | ||
Comment 6•22 days ago
|
||
This re-checks the heap thresholds for zones in allocation triggered GCs. This
improves the test case a lot, although it still triggeres a ton of GCs.
Reporter | ||
Comment 7•22 days ago
•
|
||
First testcase, N=500000
- Nightly: https://share.firefox.dev/4lG4l99 (13s)
- So we are about 4x faster here.
Using the 16kX16k testcase with N=5000:
- Profile with latest Nightly (with fix from bug 1978751): https://share.firefox.dev/4meypsC (800ms)
- Previous profile: : https://share.firefox.dev/414Qfps (2.1-2.4s)
Caveat is that on this testcase, profiling with the gecko profiler has a lot of overhead. For instance, profiling the first testcase with gecko profiler took 52s and the captured profile had empty threads for half of the profile. Thus, the magnitude of improvement may not be fully accurate as the timings between gecko profiler and Samply are not comparable.
Comment 9•16 days ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a9f62ebf3a27
https://hg.mozilla.org/mozilla-central/rev/2b2cdae238a8
https://hg.mozilla.org/mozilla-central/rev/e5fb733c7f96
Reporter | ||
Comment 10•16 days ago
•
|
||
For the first testcase: https://share.firefox.dev/3H6l6eL (16s)
- For the second testcase, Profile with latest Nightly : https://share.firefox.dev/3U4jvJf (430ms)
Comment 11•15 days ago
|
||
Comment 12•15 days ago
|
||
Backed out as requested by @jonco for increasing GC time in the sp3 benchmark: https://hg.mozilla.org/integration/autoland/rev/3bf8f808fadf731c5592792643c5617c4a10137a
Comment 13•3 days ago
|
||
(In reply to Pulsebot from comment #11)
Pushed by sstanca@mozilla.com:
https://github.com/mozilla-firefox/firefox/commit/31008b7dd31c
https://hg.mozilla.org/integration/autoland/rev/3bf8f808fadf
Revert "Bug 1978224: apply code formatting via Lando" as requested by @jonco
for increasing GC time in the sp3 benchmark.
Perfherder has detected a browsertime performance change from push 3bf8f808fadf731c5592792643c5617c4a10137a.
If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
28% | speedometer3 perfstats-NonIdleMajorGC | android-hw-a55-14-0-aarch64-shippable | webrender | 690.74 -> 500.63 | |
25% | speedometer3 perfstats-NonIdleMajorGC | linux1804-64-nightlyasrelease-qr | fission webrender | 271.62 -> 203.85 | Before/After |
22% | speedometer3 perfstats-MajorGC | android-hw-a55-14-0-aarch64-shippable | webrender | 2,432.60 -> 1,891.57 | |
5% | speedometer3 TodoMVC-JavaScript-ES5/Adding100Items/Sync | linux1804-64-nightlyasrelease-qr | fission webrender | 62.53 -> 59.66 | Before/After |
4% | speedometer3 TodoMVC-JavaScript-ES5/Adding100Items/total | linux1804-64-nightlyasrelease-qr | fission webrender | 68.02 -> 65.13 | Before/After |
... | ... | ... | ... | ... | ... |
3% | speedometer3 TodoMVC-jQuery/CompletingAllItems/Sync | android-hw-a55-14-0-aarch64-shippable | webrender | 196.75 -> 191.46 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.
You can run all of these tests on try with ./mach try perf --alert 46111
The following documentation link provides more information about this command.
Description
•