Canvas demo creates a lot of garbage that leads to periodic slowdowns and large memory use compared to Chrome
Categories
(Core :: Graphics: Canvas2D, defect, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox138 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: smaug)
References
()
Details
(Keywords: perf-alert)
Attachments
(7 files)
|
2.48 KB,
application/x-zip-compressed
|
Details | |
|
47.99 KB,
text/plain
|
Details | |
|
WIP: Bug 1943811, POC: nursery wrapper, data owned by the wrapper, faster deferred finalization, etc
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
I have modified the original testcase at https://github.com/cceckman/fiona
I changed the following :
Unmodified: let count = Math.round(canvas.height * canvas.width * load)
Modified: let count = 100 *Math.round(canvas.height * canvas.width * load)
Open the attached modified testcase
Profile: https://share.firefox.dev/4jpqCYh
Lots of CC/GC. Much more memory use compared to Chrome (3GB-4GB+ Vs few hundred MB)
You can modify the multiplication factor to change the behaviour/memory use pattern of the demo.
This probbaly sits somewhere between JS and Canvas.
cc: peterv.
| Reporter | ||
Comment 1•11 months ago
|
||
| Reporter | ||
Updated•11 months ago
|
Comment 2•11 months ago
|
||
My guess is that this is coming from the Canvas Image Data. Tentatively moving this to Canvas but the most important thing here is how the canvas image data is dealt with.
The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.
Platforms: Windows
Impact on site: Causes noticeable jank
Websites affected: Rare
Resource impact: Some
[x] Affects animation smoothness
[x] Able to reproduce locally
Comment 3•10 months ago
|
||
The severity field is not set for this bug.
:lsalzman, could you have a look please?
For more information, please visit BugBot documentation.
Comment 4•10 months ago
|
||
Lee, thoughts on this? I'm not sure what to do with one.
Comment 5•10 months ago
•
|
||
It looks like there are just a large number of Canvas2D getImageData calls here, and every time we do this, it allocates an ImageData and a new JS array object when it immediately tries to read the data from the canvas. This is in general not a great strategy and we probably need some sort of lazy/deferred/copy-on-write approach to representing this data so that we don't always allocate every time you call getImageData, especially when the data is not modified.
Also the requests are only for 1x1 pixel areas over and over (a common access pattern for looking up individual pixels), returned as an ImageData encapsulating a JS array, both allocated on the GC heap. So the larger overhead is the ImageData and JS array object itself, not the small single pixel value in it, just to represent the 4 bytes of single pixel data. Pretty embarrassing. But that's potentially a larger project for down the road to fix, I don't know off hand that it would be simple.
Maybe there is some way we could consider to represent these small area requests inline to avoid allocating as much, if not able to avoid the allocation entirely.
Regardless, it does put a lot of strain on the GC, so maybe the GC team or JS can look into this and see if there's anything that can be done to make this demo run a bit faster.
The offending JS app loop looks like:
for (let i = 0; i < x.length; i++) {
let xx = Math.round(x[i] * canvas.width);
let yy = Math.round(y[i] * canvas.height);
let px = name.getImageData(xx, yy, 1, 1).data;
if (px[3] !== 0) {
scales[i] = 0.1
} else {
scales[i] = 1
}
}
Comment 6•10 months ago
|
||
I'm going to defer to someone with more knowledge of the JS JITs, as this seems like it'd require some tricky optimizing on the JS side to make work.
I really hope this pattern isn't common, but with the Web you never know.
| Assignee | ||
Comment 7•10 months ago
|
||
Hmm, could we use nursery for ImageData's wrapper and avoid most of CCs and major GCs. Testing
| Assignee | ||
Comment 8•10 months ago
|
||
oh, ImageData isn't even wrapper cached, and webidl codegen doesn't support nursery allocations for the wrapper
Comment 9•10 months ago
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #8)
oh, ImageData isn't even wrapper cached, and webidl codegen doesn't support nursery allocations for the wrapper
Yes that's worth investigating.
| Assignee | ||
Comment 10•10 months ago
•
|
||
I'm trying to recall why ProbablyShortLivingWrapper is limited to wrappercached objects only:
Nursery allocated wrappers for wrapper cached objects have JSCLASS_FOREGROUND_FINALIZE | JSCLASS_SKIP_NURSERY_FINALIZE | ...
ImageData has JSCLASS_FOREGROUND_FINALIZE. And we do need foreground finalize. But without wrapper cache we can't check which objects should be finalized (given the current setup).
Maybe the simplest tweak is to make ImageData a WrapperCache object and then use ProbablyShortLivingWrapper.
edit: looks like that isn't quite enough, but definitely lots of wrappers could be collected during minor GC.
| Assignee | ||
Comment 11•10 months ago
|
||
| Assignee | ||
Comment 12•10 months ago
|
||
That patch is just a hack atm. Bunch of random optimizations.
With that performance is better than in Chrome, but memory usage higher.
| Assignee | ||
Comment 13•10 months ago
|
||
Updated•10 months ago
|
| Assignee | ||
Comment 14•10 months ago
|
||
| Assignee | ||
Comment 15•10 months ago
|
||
| Assignee | ||
Comment 16•10 months ago
|
||
Comment 17•10 months ago
|
||
| Reporter | ||
Comment 19•10 months ago
|
||
When this landed, it increased the installer size on Windows by 55.7KB.
The subsequent backout reversed the increase.
Comment 20•10 months ago
|
||
Comment 21•10 months ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/4bde6510221e
https://hg.mozilla.org/mozilla-central/rev/4119b15714ac
https://hg.mozilla.org/mozilla-central/rev/2680374567cc
https://hg.mozilla.org/mozilla-central/rev/491e697c9b9a
| Reporter | ||
Comment 22•10 months ago
•
|
||
This is what i get with latest Nightly with 100x: https://share.firefox.dev/4htOhV6
Memory use increases but very slowly: about 100MB (170MB -> 270MB) increase in 3 minute. . Speed is similar to Chrome.
Edit: this is what i get with 1000x : https://share.firefox.dev/3DsiuGl
| Assignee | ||
Updated•10 months ago
|
| Assignee | ||
Comment 23•10 months ago
|
||
FWIW, on my machines the performance is now way better in Nightly than in Chrome.
| Reporter | ||
Comment 24•10 months ago
|
||
What i get with the attached testcase: https://share.firefox.dev/4kUJCOS
Comment 25•9 months ago
|
||
(In reply to Pulsebot from comment #20)
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4bde6510221e
avoid calling SegmentedVector::Length() in hot code paths, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/4119b15714ac
wrappercache ImageData so that it can be nursery allocated, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2680374567cc
ensure deferred finalization is triggered after minorGC, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/491e697c9b9a
delete snow white objects sooner in case there are lots of them, r=mccr8
Perfherder has detected a devtools performance change from push 491e697c9b9a308ea1b106ab76ec49d9501a8b08.
Improvements:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 7% | damp console.log-in-loop-content-process-undefined | windows11-64-24h2-shippable | e10s fission stylo webrender | 25.12 -> 23.46 |
| 3% | damp console.log-in-loop-content-process-bigint | windows11-64-shippable-qr | e10s fission stylo webrender | 23.19 -> 22.42 |
| 3% | damp console.autocomplete.longInput | windows11-64-shippable-qr | e10s fission stylo webrender | 483.93 -> 469.13 |
| 3% | damp console.log-in-loop-content-process-symbol | windows11-64-shippable-qr | e10s fission stylo webrender | 26.77 -> 26.05 |
| 3% | damp console.log-in-loop-content-process-nan | windows11-64-shippable-qr | e10s fission stylo webrender | 23.39 -> 22.77 |
| ... | ... | ... | ... | ... |
| 2% | damp custom.jsdebugger.stepIn.DAMP | windows11-64-shippable-qr | e10s fission stylo webrender | 1,478.61 -> 1,446.06 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 44323
For more information on performance sheriffing please see our FAQ.
Comment 26•9 months ago
|
||
(In reply to Pulsebot from comment #17)
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2eceb546abd2
avoid calling SegmentedVector::Length() in hot code paths, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/8146656f4566
wrappercache ImageData so that it can be nursery allocated, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2f56a6399ae7
ensure deferred finalization is triggered after minorGC, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/d676336db245
delete snow white objects sooner in case there are lots of them, r=mccr8
Perfherder has detected a devtools performance change from push d676336db245fa84b0551b6a524829b4bafcdec4.
Improvements:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 23% | damp console.log-in-loop-content-process-undefined | windows11-64-shippable-qr | e10s fission stylo webrender | 31.67 -> 24.48 |
| 3% | damp console.log-in-loop-content-process-window | windows11-64-shippable-qr | e10s fission stylo webrender | 225.34 -> 218.68 |
| 2% | damp console.log-in-loop-content-process-number | windows11-64-shippable-qr | e10s fission stylo webrender | 23.36 -> 22.79 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 44419
For more information on performance sheriffing please see our FAQ.
Comment 27•9 months ago
|
||
(In reply to agoloman from comment #18)
Backed out for causing hazard bustages.
Perfherder has detected a devtools performance change from push 95277659511d4318515d0e3ebff692521edaec3b.
Regressions:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 4% | damp console.log-in-loop-content-process-window | windows11-64-shippable-qr | e10s fission stylo webrender | 216.17 -> 225.79 |
| 4% | damp console.log-in-loop-content-process-document | windows11-64-shippable-qr | e10s fission stylo webrender | 72.57 -> 75.33 |
| 3% | damp console.log-in-loop-content-process-number | windows11-64-shippable-qr | e10s fission stylo webrender | 22.79 -> 23.37 |
As author of one of the patches included in that push, we need your help to address this regression.
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 44417
For more information on performance sheriffing please see our FAQ.
Description
•