Open Bug 1978224 Opened 26 days ago Updated 3 days ago

Testcase resizing canvas N times in a loop without allocating is 31x slower in Firefox, spending all the time in sweep/mark.

Categories

(Core :: JavaScript: GC, defect, P2)

defect

Tracking

()

REOPENED

People

(Reporter: mayankleoboy1, Assigned: jonco, NeedInfo)

References

(Blocks 3 open bugs)

Details

(Keywords: perf-alert)

Attachments

(5 files)

Open testcase
Enter 500000 and click run.

Firefox: https://share.firefox.dev/4m369sA (52s)
Chrome: https://share.firefox.dev/44HQZU4 (1.7s)

Very artificial, but maybe there is something worth improving?
Not sure what the component should be. Tentatively putting under JS, and cc'ing jonco and Lee.

Component: JavaScript Engine → JavaScript: GC
Attached file about:support

You can also see this testcase which creates canvas with size 16Kx16K.

I think what's happening is that the loop is repeatedly allocating memory (for the canvas) and then causing it to be freed again. The size change from allocation triggers a GC, but by the time it starts the memory has already been freed. The threshold for the next GC is then set based on almost nothing being allocated. The next allocation triggers another GC and so on.

Maybe we should check at the start of an allocation-triggered GC whether it is still necessary and abort if not.

This situation is pretty unlikely to happen in practice though.

Not to derail from the original bug and analysis, but i found that if i change the parallel marking parameters, i can get ~30% improvement on the testcase.

Testcase is the 16kX16k file.

  • N=5000
javascript.options.mem.gc_max_parallel_marking_threads = 4
javascript.options.mem.gc_parallel_marking_threshold_mb = 1

Maybe worth rechecking to reduce teh threshold size? (or maybe that should wait for the pending allocator changes)

Severity: -- → S4
Type: task → defect
Priority: -- → P2
Depends on: 1978751

This splits up malloc/JIT code checks that were conflated and tidies up a bunch
of other things.

JS::AddAssociatedMemory doesn't need to call Zone::maybeTriggerGCOnMalloc as
addCellMemory already does this.

Assignee: nobody → jcoppeard
Status: NEW → ASSIGNED

This re-checks the heap thresholds for zones in allocation triggered GCs. This
improves the test case a lot, although it still triggeres a ton of GCs.

First testcase, N=500000

Using the 16kX16k testcase with N=5000:

Caveat is that on this testcase, profiling with the gecko profiler has a lot of overhead. For instance, profiling the first testcase with gecko profiler took 52s and the captured profile had empty threads for half of the profile. Thus, the magnitude of improvement may not be fully accurate as the timings between gecko profiler and Samply are not comparable.

Status: ASSIGNED → RESOLVED
Closed: 16 days ago
Resolution: --- → FIXED
Target Milestone: --- → 143 Branch

For the first testcase: https://share.firefox.dev/3H6l6eL (16s)

Pushed by sstanca@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/31008b7dd31c https://hg.mozilla.org/integration/autoland/rev/3bf8f808fadf Revert "Bug 1978224: apply code formatting via Lando" as requested by @jonco for increasing GC time in the sp3 benchmark.

Backed out as requested by @jonco for increasing GC time in the sp3 benchmark: https://hg.mozilla.org/integration/autoland/rev/3bf8f808fadf731c5592792643c5617c4a10137a

Status: RESOLVED → REOPENED
Flags: needinfo?(jcoppeard)
Resolution: FIXED → ---
Target Milestone: 143 Branch → ---
Regressions: 1980447
Regressions: 1980448
Blocks: 1980560
See Also: 1943230

(In reply to Pulsebot from comment #11)

Pushed by sstanca@mozilla.com:
https://github.com/mozilla-firefox/firefox/commit/31008b7dd31c
https://hg.mozilla.org/integration/autoland/rev/3bf8f808fadf
Revert "Bug 1978224: apply code formatting via Lando" as requested by @jonco
for increasing GC time in the sp3 benchmark.

Perfherder has detected a browsertime performance change from push 3bf8f808fadf731c5592792643c5617c4a10137a.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
28% speedometer3 perfstats-NonIdleMajorGC android-hw-a55-14-0-aarch64-shippable webrender 690.74 -> 500.63
25% speedometer3 perfstats-NonIdleMajorGC linux1804-64-nightlyasrelease-qr fission webrender 271.62 -> 203.85 Before/After
22% speedometer3 perfstats-MajorGC android-hw-a55-14-0-aarch64-shippable webrender 2,432.60 -> 1,891.57
5% speedometer3 TodoMVC-JavaScript-ES5/Adding100Items/Sync linux1804-64-nightlyasrelease-qr fission webrender 62.53 -> 59.66 Before/After
4% speedometer3 TodoMVC-JavaScript-ES5/Adding100Items/total linux1804-64-nightlyasrelease-qr fission webrender 68.02 -> 65.13 Before/After
... ... ... ... ... ...
3% speedometer3 TodoMVC-jQuery/CompletingAllItems/Sync android-hw-a55-14-0-aarch64-shippable webrender 196.75 -> 191.46

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 46111

The following documentation link provides more information about this command.

Keywords: perf-alert
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: