Closed Bug 1697344 Opened 3 years ago Closed 3 years ago

GPU acceleration of canvas creates a memory leak that can rapidly crash Firefox

Categories

(Core :: Graphics: Canvas2D, defect, P1)

Firefox 88
x86_64
Windows
defect

Tracking

()

VERIFIED FIXED
89 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox86 --- wontfix
firefox87 --- wontfix
firefox88 --- wontfix
firefox89 --- verified

People

(Reporter: Zolhungaj, Assigned: bobowen)

References

Details

Attachments

(6 files, 1 obsolete file)

Attached file index.html (obsolete) —

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0

Steps to reproduce:

Prerequsites: GPU-acceleration must be enabled
Tested on Windows 10, on Firefox 86 and the latest Nightly release.
GPU: nvidia RTX 2080,

  1. open the attached index.html in a new tab
  2. press the "Click me!" button
  3. observe the memory usage of Firefox increasing
  4. be aware that if memory usage exceeds available memory the system might hang

Actual results:

The memory usage of the GPU thread will rapidly and acceleratingly increase in size, potentially to the point where the user's machine will hang.
In my tests it levels out around 9,5GB of used memory (which was when the memory usage on my system reached 100%). I also managed to hang the system once when another application started digging into the already empty memory.

Occasionally the GC will come in and clear memory, but it will not do this enough to prevent Firefox from accumulating RAM.

Expected results:

The memory should not increase at such a rapid rate. And the application should not crash the system

Summary: GPU acceleration of canvas creates a memory leak that will rapidly crash Firefox → GPU acceleration of canvas creates a memory leak that can rapidly crash Firefox

The Bugbug bot thinks this bug should belong to the 'Core::Canvas: 2D' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Canvas: 2D
Product: Firefox → Core

Narrowed down the source of the problem to the creation of a CanvasGradient with at least one color stop; of the methods returning CanvasGradient, CanvasRenderingContext2D.createRadialGradient() is the one leaking fastest.

Attachment #9207952 - Attachment is obsolete: true
Status: UNCONFIRMED → NEW
Ever confirmed: true

Looks like this is down to the CanvasGradient holding onto a reference to the GradientStops, coupled with not using the GradientStops cache for recording DrawTargets.

Severity: -- → S3
OS: Unspecified → Windows
Priority: -- → P1
Hardware: Unspecified → x86_64
Assignee: nobody → bobowencode

After some more digging it appears that it is this call to CreateRadialGradientBrush that uses a lot of memory (128K on my machine).

This appears to live as long as the GradientStops.
One way of cleaning these up more quickly in the remote canvas case, is to not cache the stops on the CanvasGradient (patch attached), although that would be worse in the case where the gradient isn't changing and with no "global" caching.

So, we probably need to look at caching for recording DrawTargets as well.

jgilbert pointed out on element that we could cause a similar issue even with the cache, so I tweaked the script to deliberately miss the cache.

This causes us to use lots of memory in the content process (with remote canvas disabled).
It copes a bit better (at least on my machine that has a lot of memory), possibly because the memory pressure in the content process causes GC and maybe cache invalidation.

If you let it run for a long time and then stop with a refresh and let it clean up, it doesn't seem to reclaim all the memory, which probably needs looking into as well.

See Also: → 1699940
Blocks: 1697476

This is so that we can use it in the canvas worker threads.
It also sets a maximum number of entries because on Windows the associated
Direct2D objects can be fairly big.

In the DrawTargetRecording case we create new GradientStopsRecording each time
and holding onto them in the content process can mean they take a very large
amount of memory in the GPU process, if a script deliberately creates lots of
unique stops.
In the non-recording case then the GradientStops are cached in the content
process anyway.

Depends on D109791

Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c991ad5e4a43
p1: Make GradientCache thread safe. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/b335fb3dfdea
p2: Use the gradient cache in CanvasTranslator. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/76a3bcdeaa9f
p3: Don't hold the GradientStops object on CanvasGradient. r=jrmuizel

The patch landed in nightly and beta is affected.
:bobowen, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(bobowencode)

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #13)

The patch landed in nightly and beta is affected.
:bobowen, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

While the issue is easier to cause and more severe with remote canvas, part of it has essentially been around for a long time.
So, I think given the size of the change, it is probably best to just let this roll out normally.

Flags: needinfo?(bobowencode)
Flags: qe-verify+
Blocks: 1586495

I used two machines, one with GTX 1070ti and another with RTX 2070 Super and I got the following results:

  • Firefox 86.0:
    -- GTX 1070ti: Memory grows constantly, I stopped it at around 9000MB, GPU percentage spikes between 30-70-100%
    -- RTX 2070 Super: Memory grows constantly, I stopped it at around 9000MB, GPU percentage was under 20% without spikes

  • Firefox 89.0:
    -- GTX 1070ti: Memory stays at a max of 400MB, GPU percentage does not go over 35% (constant under 20 but had spikes to 35)
    -- GTX 2070 Super: Memory stays at a max of 400MB, GPU percentage does not go over 35% (constant under 20 but had spikes to 35)

I did not get any hangs/crashes because I ended the task at 9000MB, and had plenty of memory left.
Based on the above I'll mark this as verified fixed.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: