Closed Bug 1870488 Opened 2 years ago Closed 2 years ago

Web Worker Canvas Memory Leak

Categories

(Core :: Graphics: Canvas2D, defect)

Firefox 120
defect

Tracking

()

RESOLVED FIXED
123 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox121 --- wontfix
firefox122 --- wontfix
firefox123 --- fixed

People

(Reporter: tomxor, Assigned: aosmond)

References

(Regressed 1 open bug)

Details

(Keywords: memory-leak, perf-alert)

Attachments

(6 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0

Steps to reproduce:

Set up a Web Worker, with an OffscreenCanvas, and RequestAnimationFrame loop, that uses the 2D context API to affect the canvas in any way, on every frame.

The issue has some kind of dependence on browser resource usage. For the simplified test case it was sufficient to use a very large canvas of 8k x 8k pixels for my machine. Beneath a certain size threshold the leak does not occur at all.

It's possible to reproduce the issue under realistic conditions through multiple small off screen canvases on a single page - which is how I discovered the issue. Even when all but one workers are 100% idle it's possible to trigger the leak through only painting to a single small canvas at a time. i.e the dependence on resource usage seems to be shared across off screen canvases on the same page regardless of active use.

Actual results:

The browser consumes all available RAM and crashes the tab. In my case 16 GiB is consumed in around 15 seconds.

Expected results:

No memory leaks.

Online reduced test case for convenience:

https://jsfiddle.net/we0sqjtd/

The Bugbug bot thinks this bug should belong to the 'Core::Graphics: Canvas2D' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Graphics: Canvas2D
Product: Firefox → Core

Can you type "about:support" in your browser and copy-paste its contents to this bug?

Flags: needinfo?(tomxor)
Flags: needinfo?(tomxor)

I've also tried this with a fresh Firefox config, i.e defaults and no extensions, and found the behaviour to be slightly different: While the tab no leak occurs, but as soon as I close the tab, all RAM is quickly consumed and then the whole browser crashes instead of a tab.

When I restore my original Firefox config the behaviour reverts to what I initially described. But both configurations result in a leak and crash.

I can't reproduce on macOS with either the attachment or the jsfiddle. We'll discuss in triage.

Blocks: gfx-triage

Lee, can you repro?

Flags: needinfo?(lsalzman)

This seems more me than Lee. Did you reproduce this on release, or a recent nightly? We've been making a lot of changes in the past week.

Flags: needinfo?(lsalzman) → needinfo?(tomxor)

Also, if on release, does disabling accelerated canvas help? Flip gfx.canvas.accelerated to false, restart and try to reproduce as normal.

Depends on: 1871207

I found a version of leak on Windows with GPU-canvas, and filed bug 1871207.
Filed Bug 1871208 for d2d-canvas leak

Andrew, I've tried setting gfx.canvas.accelerated to false and restarting, the leak still occurs for me.

Note, I do not know whether this is a regression or has been here all along, since I discovered it while developing something new.

Flags: needinfo?(tomxor)

Andrew, this is on release, via Flatpack flathub:

Name     Application ID       Version  Branch  Arch    Origin   Ref                                Active commit
Firefox  org.mozilla.firefox  120.0.1  stable  x86_64  flathub  org.mozilla.firefox/x86_64/stable  f75dc4d15a98

Noticed there was an update, can also reproduce on v121:

Name     Application ID       Version  Branch  Arch    Origin   Ref                                Active commit
Firefox  org.mozilla.firefox  121.0    stable  x86_64  flathub  org.mozilla.firefox/x86_64/stable  b4ed37eec155

I've noticed the behaviour is a little slower when nothing else open in the browser.

Using the test case HTML I have to wait 20 to 30 seconds during which my available RAM hovers around 13-14 GiB, after that initial period available RAM suddenly plummets and the tab crashes just before hitting 0.

The severity field is not set for this bug.
:lsalzman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)
Severity: -- → S3
Flags: needinfo?(lsalzman)
Keywords: memory-leak

Sorry for the delay due to the holidays, could you provide an about:memory report? That might shed some light at least on where the allocations are. Thanks!

Flags: needinfo?(tomxor)
Assignee: nobody → aosmond
No longer blocks: gfx-triage

(In reply to Andrew Osmond [:aosmond] (he/him) from comment #16)

Sorry for the delay due to the holidays, could you provide an about:memory report? That might shed some light at least on where the allocations are. Thanks!

If it helps, bug 1871208 has an about:memory report.

Depends on: 1871208
Attached file memory-report.json.gz

I captured this after the attached test case consumed around 12GB, with only a couple GB of RAM left (it crashes when out of RAM).

Flags: needinfo?(tomxor)

Looks like it is mostly consumed by shmems in the content process:
12,288.19 MB ── shmem-allocated
12,288.19 MB ── shmem-mapped

I've narrowed the difference in behaviour between defaults and my config to layers.acceleration.disabled.

When setting this to true, the test case leaks while open in foreground as per my initial description.

When setting this to false (the default) , no leak occurs while in foreground, however the tab eventually crashes on it's own, unless I close the tab prematurely after ~10 seconds then all RAM is consumed and the whole browser crashes.

I've also played with the interval by using a setInterval instead of a RAF loop and it seems to be very sensitive to the timing, if I make the interval greater than or less than the refresh rate interval (~16ms for my machine) then it doesn't occur so easily. Except for some magic numbers like 4ms where it occurs as easily as 16ms.

Profile with all threads, IPC and allocation: https://share.firefox.dev/3HhWXhW

I can reproduce this with SW-WR.

Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

If we use the buffer provider, the problem goes away. I can land that patch to fix this.

This patch adds support for allocationg shmem sections for
ImageBridgeChild. The recording infrastructure depends on it.

This allows us to create a TextureClient on a different thread than the
actor without special effort on the part of the allocator. Similarly, it
also allows us to destroy a TextureClient on a different thread if it
has a readlock bound to it.

Attachment #9373358 - Attachment description: Bug 1870488 - Make OffscreenCanvas use PersistentBufferProvider on the display pipeline. → Bug 1870488 - Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display pipeline.
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a8db51f72457 Part 1. Implement ImageBridgeChild::GetTileLockAllocator. r=gfx-reviewers,lsalzman https://hg.mozilla.org/integration/autoland/rev/a3e17caaa159 Part 2. Ensure TextureClient's mReadLock is only created on the IPDL actor thread. r=lsalzman https://hg.mozilla.org/integration/autoland/rev/cf3ce7d3c82d Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display pipeline. r=lsalzman
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch
No longer depends on: 1871207
Duplicate of this bug: 1871207
No longer depends on: 1871208
Duplicate of this bug: 1871208

(In reply to Pulsebot from comment #27)

Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a8db51f72457
Part 1. Implement ImageBridgeChild::GetTileLockAllocator.
r=gfx-reviewers,lsalzman
https://hg.mozilla.org/integration/autoland/rev/a3e17caaa159
Part 2. Ensure TextureClient's mReadLock is only created on the IPDL actor
thread. r=lsalzman
https://hg.mozilla.org/integration/autoland/rev/cf3ce7d3c82d
Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display
pipeline. r=lsalzman

We're recorded a large improvement in CI from your patches!

== Change summary for alert #41064 (as of Thu, 18 Jan 2024 23:29:28 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
19% offscreencanvas_webcodecs_worker_2d_vp9 offscreencanvas_webcodecs_worker_2d_vp9 Mean time across 100 frames: windows10-64-ref-hw-2017-qr e10s fission stylo webgl-ipc webrender 25.73 -> 20.81
19% offscreencanvas_webcodecs_worker_2d_av1 offscreencanvas_webcodecs_worker_2d_av1 Mean time across 100 frames: windows10-64-ref-hw-2017-qr e10s fission stylo webgl-ipc webrender 26.53 -> 21.53
19% offscreencanvas_webcodecs_main_2d_vp9 offscreencanvas_webcodecs_main_2d_vp9 Mean time across 100 frames: windows10-64-ref-hw-2017-qr e10s fission stylo webgl-ipc webrender 25.81 -> 20.98
18% offscreencanvas_webcodecs_main_2d_av1 offscreencanvas_webcodecs_main_2d_av1 Mean time across 100 frames: windows10-64-ref-hw-2017-qr e10s fission stylo webgl-ipc webrender 26.50 -> 21.64
18% offscreencanvas_webcodecs_main_2d_vp9 offscreencanvas_webcodecs_main_2d_vp9 Mean time across 100 frames: windows10-64-shippable-qr e10s fission stylo webgl-ipc webrender 13.65 -> 11.16
... ... ... ... ...
2% offscreencanvas_webcodecs_main_2d_av1 offscreencanvas_webcodecs_main_2d_av1 Mean time across 100 frames: windows10-64-shippable-qr e10s fission stylo webrender-sw 13.76 -> 13.48

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=41064

Keywords: perf-alert
Regressions: 1877429
Regressions: 1879678
Regressions: 1879833
See Also: → 1909684
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: