Closed Bug 1870488 Opened 2 years ago Closed 2 years ago

Web Worker Canvas Memory Leak

Tracking

()

Status:

RESOLVED FIXED

Milestone:

123 Branch

Tracking Flags:

Tracking

Status

firefox-esr115

---

wontfix

firefox121

---

wontfix

firefox122

---

wontfix

firefox123

---

fixed

People

(Reporter: tomxor, Assigned: aosmond)

References

(Regressed 1 open bug)

Details

(Keywords: memory-leak, perf-alert)

Attachments

(6 files)

firefox-worker-canvas-raf-leak.html 2 years ago Thomas Brierley 637 bytes, text/html		Details
Output of about:support 2 years ago Thomas Brierley 68.02 KB, application/json		Details
memory-report.json.gz 2 years ago Thomas Brierley 130.22 KB, Application/gzip		Details
Bug 1870488 - Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display pipeline. 2 years ago Andrew Osmond [:aosmond] (he/him) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1870488 - Part 1. Implement ImageBridgeChild::GetTileLockAllocator. 2 years ago Andrew Osmond [:aosmond] (he/him) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1870488 - Part 2. Ensure TextureClient's mReadLock is only created on the IPDL actor thread. 2 years ago Andrew Osmond [:aosmond] (he/him) 48 bytes, text/x-phabricator-request		Details \| Review

Thomas Brierley

Reporter

Description

•

2 years ago

Attached file firefox-worker-canvas-raf-leak.html — Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0

Steps to reproduce:

Set up a Web Worker, with an OffscreenCanvas, and RequestAnimationFrame loop, that uses the 2D context API to affect the canvas in any way, on every frame.

The issue has some kind of dependence on browser resource usage. For the simplified test case it was sufficient to use a very large canvas of 8k x 8k pixels for my machine. Beneath a certain size threshold the leak does not occur at all.

It's possible to reproduce the issue under realistic conditions through multiple small off screen canvases on a single page - which is how I discovered the issue. Even when all but one workers are 100% idle it's possible to trigger the leak through only painting to a single small canvas at a time. i.e the dependence on resource usage seems to be shared across off screen canvases on the same page regardless of active use.

Actual results:

The browser consumes all available RAM and crashes the tab. In my case 16 GiB is consumed in around 15 seconds.

Expected results:

No memory leaks.

Thomas Brierley

Reporter

Comment 1

•

2 years ago

Online reduced test case for convenience:

https://jsfiddle.net/we0sqjtd/

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

2 years ago

The Bugbug bot thinks this bug should belong to the 'Core::Graphics: Canvas2D' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Graphics: Canvas2D

Product: Firefox → Core

Mayank Bansal

Comment 3

•

2 years ago

Can you type "about:support" in your browser and copy-paste its contents to this bug?

Flags: needinfo?(tomxor)

Thomas Brierley

Reporter

Comment 4

•

2 years ago

Attached file Output of about:support — Details

Flags: needinfo?(tomxor)

Thomas Brierley

Reporter

Comment 5

•

2 years ago

I've also tried this with a fresh Firefox config, i.e defaults and no extensions, and found the behaviour to be slightly different: While the tab no leak occurs, but as soon as I close the tab, all RAM is quickly consumed and then the whole browser crashes instead of a tab.

When I restore my original Firefox config the behaviour reverts to what I initially described. But both configurations result in a leak and crash.

Brad Werth [:bradwerth]

Comment 6

•

2 years ago

I can't reproduce on macOS with either the attachment or the jsfiddle. We'll discuss in triage.

Blocks: gfx-triage

Bob Hood [:bhood]

Comment 7

•

2 years ago

Lee, can you repro?

Flags: needinfo?(lsalzman)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 8

•

2 years ago

This seems more me than Lee. Did you reproduce this on release, or a recent nightly? We've been making a lot of changes in the past week.

Flags: needinfo?(lsalzman) → needinfo?(tomxor)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 9

•

2 years ago

Also, if on release, does disabling accelerated canvas help? Flip gfx.canvas.accelerated to false, restart and try to reproduce as normal.

Mayank Bansal

Updated

•

2 years ago

Depends on: 1871207

Mayank Bansal

Comment 10

•

2 years ago

•

Edited

I found a version of leak on Windows with GPU-canvas, and filed bug 1871207.
Filed Bug 1871208 for d2d-canvas leak

Thomas Brierley

Reporter

Comment 11

•

2 years ago

Andrew, I've tried setting gfx.canvas.accelerated to false and restarting, the leak still occurs for me.

Note, I do not know whether this is a regression or has been here all along, since I discovered it while developing something new.

Flags: needinfo?(tomxor)

Thomas Brierley

Reporter

Comment 12

•

2 years ago

Andrew, this is on release, via Flatpack flathub:

Name     Application ID       Version  Branch  Arch    Origin   Ref                                Active commit
Firefox  org.mozilla.firefox  120.0.1  stable  x86_64  flathub  org.mozilla.firefox/x86_64/stable  f75dc4d15a98

Thomas Brierley

Reporter

Comment 13

•

2 years ago

Noticed there was an update, can also reproduce on v121:

Name     Application ID       Version  Branch  Arch    Origin   Ref                                Active commit
Firefox  org.mozilla.firefox  121.0    stable  x86_64  flathub  org.mozilla.firefox/x86_64/stable  b4ed37eec155

Thomas Brierley

Reporter

Comment 14

•

2 years ago

I've noticed the behaviour is a little slower when nothing else open in the browser.

Using the test case HTML I have to wait 20 to 30 seconds during which my available RAM hovers around 13-14 GiB, after that initial period available RAM suddenly plummets and the tab crashes just before hitting 0.

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

2 years ago

The severity field is not set for this bug.
:lsalzman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)

Lee Salzman [:lsalzman]

Updated

•

2 years ago

Severity: -- → S3

Flags: needinfo?(lsalzman)

Darkspirit

Updated

•

2 years ago

Keywords: memory-leak

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 16

•

2 years ago

Sorry for the delay due to the holidays, could you provide an about:memory report? That might shed some light at least on where the allocations are. Thanks!

Flags: needinfo?(tomxor)

Jim Blandy :jimb

Updated

•

2 years ago

Assignee: nobody → aosmond

No longer blocks: gfx-triage

Mayank Bansal

Comment 17

•

2 years ago

(In reply to Andrew Osmond [:aosmond] (he/him) from comment #16)

Sorry for the delay due to the holidays, could you provide an about:memory report? That might shed some light at least on where the allocations are. Thanks!

If it helps, bug 1871208 has an about:memory report.

Mayank Bansal

Updated

•

2 years ago

Depends on: 1871208

Thomas Brierley

Reporter

Comment 18

•

2 years ago

Attached file memory-report.json.gz — Details

I captured this after the attached test case consumed around 12GB, with only a couple GB of RAM left (it crashes when out of RAM).

Flags: needinfo?(tomxor)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 19

•

2 years ago

Looks like it is mostly consumed by shmems in the content process:
12,288.19 MB ── shmem-allocated
12,288.19 MB ── shmem-mapped

Thomas Brierley

Reporter

Comment 20

•

2 years ago

I've narrowed the difference in behaviour between defaults and my config to layers.acceleration.disabled.

When setting this to true, the test case leaks while open in foreground as per my initial description.

When setting this to false (the default) , no leak occurs while in foreground, however the tab eventually crashes on it's own, unless I close the tab prematurely after ~10 seconds then all RAM is consumed and the whole browser crashes.

I've also played with the interval by using a setInterval instead of a RAF loop and it seems to be very sensitive to the timing, if I make the interval greater than or less than the refresh rate interval (~16ms for my machine) then it doesn't occur so easily. Except for some magic numbers like 4ms where it occurs as easily as 16ms.

Mayank Bansal

Comment 21

•

2 years ago

Profile with all threads, IPC and allocation: https://share.firefox.dev/3HhWXhW

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 22

•

2 years ago

I can reproduce this with SW-WR.

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 23

•

2 years ago

If we use the buffer provider, the problem goes away. I can land that patch to fix this.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 24

•

2 years ago

Attached file Bug 1870488 - Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display pipeline. — Details

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 25

•

2 years ago

Attached file Bug 1870488 - Part 1. Implement ImageBridgeChild::GetTileLockAllocator. — Details

This patch adds support for allocationg shmem sections for
ImageBridgeChild. The recording infrastructure depends on it.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 26

•

2 years ago

Attached file Bug 1870488 - Part 2. Ensure TextureClient's mReadLock is only created on the IPDL actor thread. — Details

This allows us to create a TextureClient on a different thread than the
actor without special effort on the part of the allocator. Similarly, it
also allows us to destroy a TextureClient on a different thread if it
has a readlock bound to it.

Phabricator Automation

Updated

•

2 years ago

Attachment #9373358 - Attachment description: Bug 1870488 - Make OffscreenCanvas use PersistentBufferProvider on the display pipeline. → Bug 1870488 - Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display pipeline.

Pulsebot

Comment 27

•

2 years ago

Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a8db51f72457 Part 1. Implement ImageBridgeChild::GetTileLockAllocator. r=gfx-reviewers,lsalzman https://hg.mozilla.org/integration/autoland/rev/a3e17caaa159 Part 2. Ensure TextureClient's mReadLock is only created on the IPDL actor thread. r=lsalzman https://hg.mozilla.org/integration/autoland/rev/cf3ce7d3c82d Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display pipeline. r=lsalzman

Cristian Tuns

Comment 28

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a8db51f72457
https://hg.mozilla.org/mozilla-central/rev/a3e17caaa159
https://hg.mozilla.org/mozilla-central/rev/cf3ce7d3c82d

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox123: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 123 Branch

Mayank Bansal

Updated

•

2 years ago

No longer depends on: 1871207

Duplicate of this bug: 1871207

Mayank Bansal

Updated

•

2 years ago

No longer depends on: 1871208

Duplicate of this bug: 1871208

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox121: --- → wontfix

status-firefox122: --- → wontfix

status-firefox-esr115: --- → wontfix

Andrej (:aglavic)

Comment 31

•

2 years ago

(In reply to Pulsebot from comment #27)

Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a8db51f72457
Part 1. Implement ImageBridgeChild::GetTileLockAllocator.
r=gfx-reviewers,lsalzman
https://hg.mozilla.org/integration/autoland/rev/a3e17caaa159
Part 2. Ensure TextureClient's mReadLock is only created on the IPDL actor
thread. r=lsalzman
https://hg.mozilla.org/integration/autoland/rev/cf3ce7d3c82d
Part 3. Make OffscreenCanvas use PersistentBufferProvider on the display
pipeline. r=lsalzman

We're recorded a large improvement in CI from your patches!

== Change summary for alert #41064 (as of Thu, 18 Jan 2024 23:29:28 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
19%	offscreencanvas_webcodecs_worker_2d_vp9 offscreencanvas_webcodecs_worker_2d_vp9 Mean time across 100 frames:	windows10-64-ref-hw-2017-qr	e10s fission stylo webgl-ipc webrender	25.73 -> 20.81
19%	offscreencanvas_webcodecs_worker_2d_av1 offscreencanvas_webcodecs_worker_2d_av1 Mean time across 100 frames:	windows10-64-ref-hw-2017-qr	e10s fission stylo webgl-ipc webrender	26.53 -> 21.53
19%	offscreencanvas_webcodecs_main_2d_vp9 offscreencanvas_webcodecs_main_2d_vp9 Mean time across 100 frames:	windows10-64-ref-hw-2017-qr	e10s fission stylo webgl-ipc webrender	25.81 -> 20.98
18%	offscreencanvas_webcodecs_main_2d_av1 offscreencanvas_webcodecs_main_2d_av1 Mean time across 100 frames:	windows10-64-ref-hw-2017-qr	e10s fission stylo webgl-ipc webrender	26.50 -> 21.64
18%	offscreencanvas_webcodecs_main_2d_vp9 offscreencanvas_webcodecs_main_2d_vp9 Mean time across 100 frames:	windows10-64-shippable-qr	e10s fission stylo webgl-ipc webrender	13.65 -> 11.16
...	...	...	...	...
2%	offscreencanvas_webcodecs_main_2d_av1 offscreencanvas_webcodecs_main_2d_av1 Mean time across 100 frames:	windows10-64-shippable-qr	e10s fission stylo webrender-sw	13.76 -> 13.48

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=41064

Andrej (:aglavic)

Updated

•

2 years ago

Keywords: perf-alert

Mayank Bansal

Updated

•

2 years ago

Regressions: 1877429

Tyson Smith [:tsmith]

Updated

•

2 years ago

Regressions: 1879678

Daniel Veditz [:dveditz]

Updated

•

1 year ago

Regressions: 1879833

Mayank Bansal

Updated

•

1 year ago