Closed Bug 1943811 Opened 11 months ago Closed 10 months ago

Canvas demo creates a lot of garbage that leads to periodic slowdowns and large memory use compared to Chrome

Tracking

()

Status:

RESOLVED FIXED

Milestone:

138 Branch

Project Flags:

Performance Impact

low

Tracking Flags:

Tracking

Status

firefox138

---

fixed

People

(Reporter: mayankleoboy1, Assigned: smaug)

References

(
URL
)

Details

(Keywords: perf-alert)

Attachments

(7 files)

fiona-main.zip 11 months ago Mayank Bansal 2.48 KB, application/x-zip-compressed		Details
about:support 11 months ago Mayank Bansal 47.99 KB, text/plain		Details
WIP: Bug 1943811, POC: nursery wrapper, data owned by the wrapper, faster deferred finalization, etc 10 months ago Olli Pettay [:smaug][bugs@pettay.fi] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1943811, avoid calling SegmentedVector::Length() in hot code paths, r=mccr8 10 months ago Olli Pettay [:smaug][bugs@pettay.fi] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1943811, wrappercache ImageData so that it can be nursery allocated, r=mccr8 10 months ago Olli Pettay [:smaug][bugs@pettay.fi] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1943811, ensure deferred finalization is triggered after minorGC, r=mccr8 10 months ago Olli Pettay [:smaug][bugs@pettay.fi] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1943811, delete snow white objects sooner in case there are lots of them, r=mccr8 10 months ago Olli Pettay [:smaug][bugs@pettay.fi] 48 bytes, text/x-phabricator-request		Details \| Review

Mayank Bansal

Reporter

Description

•

11 months ago

•

Edited

Attached file fiona-main.zip — Details

I have modified the original testcase at https://github.com/cceckman/fiona
I changed the following :

Unmodified: let count = Math.round(canvas.height * canvas.width * load)

Modified: let count = 100 *Math.round(canvas.height * canvas.width * load)

Open the attached modified testcase

Profile: https://share.firefox.dev/4jpqCYh

Lots of CC/GC. Much more memory use compared to Chrome (3GB-4GB+ Vs few hundred MB)
You can modify the multiplication factor to change the behaviour/memory use pattern of the demo.

This probbaly sits somewhere between JS and Canvas.
cc: peterv.

Mayank Bansal

Reporter

Comment 1

•

11 months ago

Attached file about:support — Details

Mayank Bansal

Reporter

Updated

•

11 months ago

URL: https://github.com/cceckman/fiona

Bas Schouten (:bas.schouten)

Comment 2

•

11 months ago

My guess is that this is coming from the Canvas Image Data. Tentatively moving this to Canvas but the most important thing here is how the canvas image data is dealt with.

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: Windows
Impact on site: Causes noticeable jank
Websites affected: Rare
Resource impact: Some
[x] Affects animation smoothness
[x] Able to reproduce locally

Performance Impact: --- → low

Component: Performance → Graphics: Canvas2D

BugBot [:suhaib / :marco/ :calixte]

Comment 3

•

10 months ago

The severity field is not set for this bug.
:lsalzman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)

Jim Mathies [:jimm]

Comment 4

•

10 months ago

Lee, thoughts on this? I'm not sure what to do with one.

Lee Salzman [:lsalzman]

Comment 5

•

10 months ago

•

Edited

It looks like there are just a large number of Canvas2D getImageData calls here, and every time we do this, it allocates an ImageData and a new JS array object when it immediately tries to read the data from the canvas. This is in general not a great strategy and we probably need some sort of lazy/deferred/copy-on-write approach to representing this data so that we don't always allocate every time you call getImageData, especially when the data is not modified.

Also the requests are only for 1x1 pixel areas over and over (a common access pattern for looking up individual pixels), returned as an ImageData encapsulating a JS array, both allocated on the GC heap. So the larger overhead is the ImageData and JS array object itself, not the small single pixel value in it, just to represent the 4 bytes of single pixel data. Pretty embarrassing. But that's potentially a larger project for down the road to fix, I don't know off hand that it would be simple.

Maybe there is some way we could consider to represent these small area requests inline to avoid allocating as much, if not able to avoid the allocation entirely.

Regardless, it does put a lot of strain on the GC, so maybe the GC team or JS can look into this and see if there's anything that can be done to make this demo run a bit faster.

The offending JS app loop looks like:

for (let i = 0; i < x.length; i++) {
    let xx = Math.round(x[i] * canvas.width);
    let yy = Math.round(y[i] * canvas.height);
    let px = name.getImageData(xx, yy, 1, 1).data;
    if (px[3] !== 0) {
        scales[i] = 0.1
    } else {
        scales[i] = 1
    }
}

Severity: -- → S3

Flags: needinfo?(smaug)

Flags: needinfo?(rhunt)

Flags: needinfo?(lsalzman)

Priority: -- → P3

Ryan Hunt [:rhunt]

Comment 6

•

10 months ago

I'm going to defer to someone with more knowledge of the JS JITs, as this seems like it'd require some tricky optimizing on the JS side to make work.

I really hope this pattern isn't common, but with the Web you never know.

Flags: needinfo?(rhunt) → needinfo?(jdemooij)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 7

•

10 months ago

Hmm, could we use nursery for ImageData's wrapper and avoid most of CCs and major GCs. Testing

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 8

•

10 months ago

oh, ImageData isn't even wrapper cached, and webidl codegen doesn't support nursery allocations for the wrapper

Flags: needinfo?(smaug)

Jan de Mooij [:jandem]

Comment 9

•

10 months ago

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #8)

oh, ImageData isn't even wrapper cached, and webidl codegen doesn't support nursery allocations for the wrapper

Yes that's worth investigating.

Flags: needinfo?(jdemooij) → needinfo?(jcoppeard)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 10

•

10 months ago

•

Edited

I'm trying to recall why ProbablyShortLivingWrapper is limited to wrappercached objects only:
Nursery allocated wrappers for wrapper cached objects have JSCLASS_FOREGROUND_FINALIZE | JSCLASS_SKIP_NURSERY_FINALIZE | ...
ImageData has JSCLASS_FOREGROUND_FINALIZE. And we do need foreground finalize. But without wrapper cache we can't check which objects should be finalized (given the current setup).

Maybe the simplest tweak is to make ImageData a WrapperCache object and then use ProbablyShortLivingWrapper.
edit: looks like that isn't quite enough, but definitely lots of wrappers could be collected during minor GC.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 11

•

10 months ago

Attached file WIP: Bug 1943811, POC: nursery wrapper, data owned by the wrapper, faster deferred finalization, etc — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 12

•

10 months ago

That patch is just a hack atm. Bunch of random optimizations.
With that performance is better than in Chrome, but memory usage higher.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 13

•

10 months ago

Attached file Bug 1943811, avoid calling SegmentedVector::Length() in hot code paths, r=mccr8 — Details

Phabricator Automation

Updated

•

10 months ago

Assignee: nobody → smaug

Status: NEW → ASSIGNED

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 14

•

10 months ago

Attached file Bug 1943811, wrappercache ImageData so that it can be nursery allocated, r=mccr8 — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 15

•

10 months ago

Attached file Bug 1943811, ensure deferred finalization is triggered after minorGC, r=mccr8 — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 16

•

10 months ago

Attached file Bug 1943811, delete snow white objects sooner in case there are lots of them, r=mccr8 — Details

Pulsebot

Comment 17

•

10 months ago

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2eceb546abd2 avoid calling SegmentedVector::Length() in hot code paths, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/8146656f4566 wrappercache ImageData so that it can be nursery allocated, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/2f56a6399ae7 ensure deferred finalization is triggered after minorGC, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/d676336db245 delete snow white objects sooner in case there are lots of them, r=mccr8

agoloman

Comment 18

•

10 months ago

Backed out for causing hazard bustages.

Flags: needinfo?(smaug)

Mayank Bansal

Reporter

Comment 19

•

10 months ago

When this landed, it increased the installer size on Windows by 55.7KB.
The subsequent backout reversed the increase.

Pulsebot

Comment 20

•

10 months ago

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4bde6510221e avoid calling SegmentedVector::Length() in hot code paths, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/4119b15714ac wrappercache ImageData so that it can be nursery allocated, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/2680374567cc ensure deferred finalization is triggered after minorGC, r=mccr8 https://hg.mozilla.org/integration/autoland/rev/491e697c9b9a delete snow white objects sooner in case there are lots of them, r=mccr8

Cristina Horotan [:chorotan]

Comment 21

•

10 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4bde6510221e
https://hg.mozilla.org/mozilla-central/rev/4119b15714ac
https://hg.mozilla.org/mozilla-central/rev/2680374567cc
https://hg.mozilla.org/mozilla-central/rev/491e697c9b9a

Status: ASSIGNED → RESOLVED

Closed: 10 months ago

status-firefox138: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 138 Branch

Mayank Bansal

Reporter

Comment 22

•

10 months ago

•

Edited

This is what i get with latest Nightly with 100x: https://share.firefox.dev/4htOhV6
Memory use increases but very slowly: about 100MB (170MB -> 270MB) increase in 3 minute. . Speed is similar to Chrome.
Edit: this is what i get with 1000x : https://share.firefox.dev/3DsiuGl

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

10 months ago

Flags: needinfo?(smaug)

Flags: needinfo?(jcoppeard)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 23

•

10 months ago

FWIW, on my machines the performance is now way better in Nightly than in Chrome.

Mayank Bansal

Reporter

Comment 24

•

10 months ago

What i get with the attached testcase: https://share.firefox.dev/4kUJCOS

Acasandrei Beatrice (needinfo me)

Comment 25

•

9 months ago

(In reply to Pulsebot from comment #20)

Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4bde6510221e
avoid calling SegmentedVector::Length() in hot code paths, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/4119b15714ac
wrappercache ImageData so that it can be nursery allocated, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2680374567cc
ensure deferred finalization is triggered after minorGC, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/491e697c9b9a
delete snow white objects sooner in case there are lots of them, r=mccr8

Perfherder has detected a devtools performance change from push 491e697c9b9a308ea1b106ab76ec49d9501a8b08.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
7%	damp console.log-in-loop-content-process-undefined	windows11-64-24h2-shippable	e10s fission stylo webrender	25.12 -> 23.46
3%	damp console.log-in-loop-content-process-bigint	windows11-64-shippable-qr	e10s fission stylo webrender	23.19 -> 22.42
3%	damp console.autocomplete.longInput	windows11-64-shippable-qr	e10s fission stylo webrender	483.93 -> 469.13
3%	damp console.log-in-loop-content-process-symbol	windows11-64-shippable-qr	e10s fission stylo webrender	26.77 -> 26.05
3%	damp console.log-in-loop-content-process-nan	windows11-64-shippable-qr	e10s fission stylo webrender	23.39 -> 22.77
...	...	...	...	...
2%	damp custom.jsdebugger.stepIn.DAMP	windows11-64-shippable-qr	e10s fission stylo webrender	1,478.61 -> 1,446.06

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44323

For more information on performance sheriffing please see our FAQ.

Keywords: perf-alert

Alex Finder

Comment 26

•

9 months ago

(In reply to Pulsebot from comment #17)

Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2eceb546abd2
avoid calling SegmentedVector::Length() in hot code paths, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/8146656f4566
wrappercache ImageData so that it can be nursery allocated, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2f56a6399ae7
ensure deferred finalization is triggered after minorGC, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/d676336db245
delete snow white objects sooner in case there are lots of them, r=mccr8
Perfherder has detected a devtools performance change from push d676336db245fa84b0551b6a524829b4bafcdec4.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
23%	damp console.log-in-loop-content-process-undefined	windows11-64-shippable-qr	e10s fission stylo webrender	31.67 -> 24.48
3%	damp console.log-in-loop-content-process-window	windows11-64-shippable-qr	e10s fission stylo webrender	225.34 -> 218.68
2%	damp console.log-in-loop-content-process-number	windows11-64-shippable-qr	e10s fission stylo webrender	23.36 -> 22.79

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44419

For more information on performance sheriffing please see our FAQ.

Acasandrei Beatrice (needinfo me)

Comment 27

•

9 months ago

(In reply to agoloman from comment #18)

Backed out for causing hazard bustages.

Backout link

Push with failures

Failure Log

Perfherder has detected a devtools performance change from push 95277659511d4318515d0e3ebff692521edaec3b.

Regressions:

Ratio	Test	Platform	Options	Absolute values (old vs new)
4%	damp console.log-in-loop-content-process-window	windows11-64-shippable-qr	e10s fission stylo webrender	216.17 -> 225.79
4%	damp console.log-in-loop-content-process-document	windows11-64-shippable-qr	e10s fission stylo webrender	72.57 -> 75.33
3%	damp console.log-in-loop-content-process-number	windows11-64-shippable-qr	e10s fission stylo webrender	22.79 -> 23.37

As author of one of the patches included in that push, we need your help to address this regression.
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44417

For more information on performance sheriffing please see our FAQ.

You need to log in before you can comment on or make changes to this bug.