Closed Bug 1943811 Opened 11 months ago Closed 10 months ago

Canvas demo creates a lot of garbage that leads to periodic slowdowns and large memory use compared to Chrome

Categories

(Core :: Graphics: Canvas2D, defect, P3)

defect

Tracking

()

RESOLVED FIXED
138 Branch
Performance Impact low
Tracking Status
firefox138 --- fixed

People

(Reporter: mayankleoboy1, Assigned: smaug)

References

()

Details

(Keywords: perf-alert)

Attachments

(7 files)

Attached file fiona-main.zip

I have modified the original testcase at https://github.com/cceckman/fiona
I changed the following :

Unmodified: let count = Math.round(canvas.height * canvas.width * load)

Modified: let count = 100 *Math.round(canvas.height * canvas.width * load)

Open the attached modified testcase

Profile: https://share.firefox.dev/4jpqCYh

Lots of CC/GC. Much more memory use compared to Chrome (3GB-4GB+ Vs few hundred MB)
You can modify the multiplication factor to change the behaviour/memory use pattern of the demo.

This probbaly sits somewhere between JS and Canvas.
cc: peterv.

Attached file about:support

My guess is that this is coming from the Canvas Image Data. Tentatively moving this to Canvas but the most important thing here is how the canvas image data is dealt with.

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: Windows
Impact on site: Causes noticeable jank
Websites affected: Rare
Resource impact: Some
[x] Affects animation smoothness
[x] Able to reproduce locally

Performance Impact: --- → low
Component: Performance → Graphics: Canvas2D

The severity field is not set for this bug.
:lsalzman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)

Lee, thoughts on this? I'm not sure what to do with one.

It looks like there are just a large number of Canvas2D getImageData calls here, and every time we do this, it allocates an ImageData and a new JS array object when it immediately tries to read the data from the canvas. This is in general not a great strategy and we probably need some sort of lazy/deferred/copy-on-write approach to representing this data so that we don't always allocate every time you call getImageData, especially when the data is not modified.

Also the requests are only for 1x1 pixel areas over and over (a common access pattern for looking up individual pixels), returned as an ImageData encapsulating a JS array, both allocated on the GC heap. So the larger overhead is the ImageData and JS array object itself, not the small single pixel value in it, just to represent the 4 bytes of single pixel data. Pretty embarrassing. But that's potentially a larger project for down the road to fix, I don't know off hand that it would be simple.

Maybe there is some way we could consider to represent these small area requests inline to avoid allocating as much, if not able to avoid the allocation entirely.

Regardless, it does put a lot of strain on the GC, so maybe the GC team or JS can look into this and see if there's anything that can be done to make this demo run a bit faster.

The offending JS app loop looks like:

for (let i = 0; i < x.length; i++) {
    let xx = Math.round(x[i] * canvas.width);
    let yy = Math.round(y[i] * canvas.height);
    let px = name.getImageData(xx, yy, 1, 1).data;
    if (px[3] !== 0) {
        scales[i] = 0.1
    } else {
        scales[i] = 1
    }
}
Severity: -- → S3
Flags: needinfo?(smaug)
Flags: needinfo?(rhunt)
Flags: needinfo?(lsalzman)
Priority: -- → P3

I'm going to defer to someone with more knowledge of the JS JITs, as this seems like it'd require some tricky optimizing on the JS side to make work.

I really hope this pattern isn't common, but with the Web you never know.

Flags: needinfo?(rhunt) → needinfo?(jdemooij)

Hmm, could we use nursery for ImageData's wrapper and avoid most of CCs and major GCs. Testing

oh, ImageData isn't even wrapper cached, and webidl codegen doesn't support nursery allocations for the wrapper

Flags: needinfo?(smaug)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #8)

oh, ImageData isn't even wrapper cached, and webidl codegen doesn't support nursery allocations for the wrapper

Yes that's worth investigating.

Flags: needinfo?(jdemooij) → needinfo?(jcoppeard)

I'm trying to recall why ProbablyShortLivingWrapper is limited to wrappercached objects only:
Nursery allocated wrappers for wrapper cached objects have JSCLASS_FOREGROUND_FINALIZE | JSCLASS_SKIP_NURSERY_FINALIZE | ...
ImageData has JSCLASS_FOREGROUND_FINALIZE. And we do need foreground finalize. But without wrapper cache we can't check which objects should be finalized (given the current setup).

Maybe the simplest tweak is to make ImageData a WrapperCache object and then use ProbablyShortLivingWrapper.
edit: looks like that isn't quite enough, but definitely lots of wrappers could be collected during minor GC.

That patch is just a hack atm. Bunch of random optimizations.
With that performance is better than in Chrome, but memory usage higher.

Assignee: nobody → smaug
Status: NEW → ASSIGNED
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2eceb546abd2 avoid calling SegmentedVector::Length() in hot code paths, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/8146656f4566 wrappercache ImageData so that it can be nursery allocated, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/2f56a6399ae7 ensure deferred finalization is triggered after minorGC, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/d676336db245 delete snow white objects sooner in case there are lots of them, r=mccr8

Backed out for causing hazard bustages.

Flags: needinfo?(smaug)

When this landed, it increased the installer size on Windows by 55.7KB.
The subsequent backout reversed the increase.

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4bde6510221e avoid calling SegmentedVector::Length() in hot code paths, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/4119b15714ac wrappercache ImageData so that it can be nursery allocated, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/2680374567cc ensure deferred finalization is triggered after minorGC, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/491e697c9b9a delete snow white objects sooner in case there are lots of them, r=mccr8

This is what i get with latest Nightly with 100x: https://share.firefox.dev/4htOhV6
Memory use increases but very slowly: about 100MB (170MB -> 270MB) increase in 3 minute. . Speed is similar to Chrome.
Edit: this is what i get with 1000x : https://share.firefox.dev/3DsiuGl

Flags: needinfo?(smaug)
Flags: needinfo?(jcoppeard)

FWIW, on my machines the performance is now way better in Nightly than in Chrome.

What i get with the attached testcase: https://share.firefox.dev/4kUJCOS

(In reply to Pulsebot from comment #20)

Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4bde6510221e
avoid calling SegmentedVector::Length() in hot code paths, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/4119b15714ac
wrappercache ImageData so that it can be nursery allocated, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2680374567cc
ensure deferred finalization is triggered after minorGC, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/491e697c9b9a
delete snow white objects sooner in case there are lots of them, r=mccr8

Perfherder has detected a devtools performance change from push 491e697c9b9a308ea1b106ab76ec49d9501a8b08.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
7% damp console.log-in-loop-content-process-undefined windows11-64-24h2-shippable e10s fission stylo webrender 25.12 -> 23.46
3% damp console.log-in-loop-content-process-bigint windows11-64-shippable-qr e10s fission stylo webrender 23.19 -> 22.42
3% damp console.autocomplete.longInput windows11-64-shippable-qr e10s fission stylo webrender 483.93 -> 469.13
3% damp console.log-in-loop-content-process-symbol windows11-64-shippable-qr e10s fission stylo webrender 26.77 -> 26.05
3% damp console.log-in-loop-content-process-nan windows11-64-shippable-qr e10s fission stylo webrender 23.39 -> 22.77
... ... ... ... ...
2% damp custom.jsdebugger.stepIn.DAMP windows11-64-shippable-qr e10s fission stylo webrender 1,478.61 -> 1,446.06

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44323

For more information on performance sheriffing please see our FAQ.

Keywords: perf-alert

(In reply to Pulsebot from comment #17)

Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2eceb546abd2
avoid calling SegmentedVector::Length() in hot code paths, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/8146656f4566
wrappercache ImageData so that it can be nursery allocated, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2f56a6399ae7
ensure deferred finalization is triggered after minorGC, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/d676336db245
delete snow white objects sooner in case there are lots of them, r=mccr8
Perfherder has detected a devtools performance change from push d676336db245fa84b0551b6a524829b4bafcdec4.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
23% damp console.log-in-loop-content-process-undefined windows11-64-shippable-qr e10s fission stylo webrender 31.67 -> 24.48
3% damp console.log-in-loop-content-process-window windows11-64-shippable-qr e10s fission stylo webrender 225.34 -> 218.68
2% damp console.log-in-loop-content-process-number windows11-64-shippable-qr e10s fission stylo webrender 23.36 -> 22.79

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44419

For more information on performance sheriffing please see our FAQ.

(In reply to agoloman from comment #18)

Backed out for causing hazard bustages.

Perfherder has detected a devtools performance change from push 95277659511d4318515d0e3ebff692521edaec3b.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
4% damp console.log-in-loop-content-process-window windows11-64-shippable-qr e10s fission stylo webrender 216.17 -> 225.79
4% damp console.log-in-loop-content-process-document windows11-64-shippable-qr e10s fission stylo webrender 72.57 -> 75.33
3% damp console.log-in-loop-content-process-number windows11-64-shippable-qr e10s fission stylo webrender 22.79 -> 23.37

As author of one of the patches included in that push, we need your help to address this regression.
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44417

For more information on performance sheriffing please see our FAQ.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: