Memory leak to GPU process with a Variable Font test site

RESOLVED FIXED in Firefox 64

Status

()

defect
P1
major
RESOLVED FIXED
9 months ago
9 months ago

People

(Reporter: Fanolian+bugzilla, Assigned: lsalzman)

Tracking

(Blocks 2 bugs, 4 keywords)

Trunk
mozilla64
Unspecified
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox62 wontfix, firefox63 wontfix, firefox64 fixed)

Details

()

Attachments

(8 attachments, 1 obsolete attachment)

Reporter

Description

9 months ago
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
Build ID: 20181001220118

Steps to reproduce:

(WARNING: may crash your browser and/or system)

1. Make sure WebRender is enabled.
2. Visit https://www.curtismann.org/variablefonts/


Actual results:

Ram usage on GPU process rises rapidly until all physical ram (16GB) are used. Eventually GPU process will crash, respawn and restart the cycle again.
I cannot release the memory with "Minimize memory usage" in about:memory. Killing GPU process in Windows Task Manager does release the ram and Nightly runs smooth again.


My about:support is attached. Some other computer info:
CPU: Intel i7-4771
RAM: 16GB
GPU: Nvidia GTX760 (latest driver 411.70)


Note:
With WebRender force disabled, the variable font tests run smoothly with moderate CPU usage and no memory leaks.


The page author shared some tidbits[1] about the test, which I fully quote below:
You’re so right about that. Do you mind telling me what device/browser you’re on?

Two things:

1) I’ve got some seriously ugly spaghetti code in there.
2) I’m finding that CSS animations involving the new font-variation-setting can be extremely resource intensive dependent on your browser. This is all demo two’s fault. I originally had a 10x10 matrix of interrobangs—it ran buttery smooth on both my mobile browsers (Safari and Chrome), crashed desktop Chrome, and ran alright’ish on desktop Opera and Firefox. The fact that it ran well on mobile and not on my relatively powerful desktop leads me to think there’s some bug in desktop Chrome’s implementation of variable font animations. It’s a pretty new technology though, so I kinda expect some hiccups.

[1]: https://www.reddit.com/r/typography/comments/9kkb51/new_interactions_experiment_with_variable_fonts/
Reporter

Comment 1

9 months ago
The attachment is a memory report from about:memory after GPU process ram usage reaches around 12.5GB (all my available ram) by visiting the variable font test site.

Ram stops increasing, but are not released, once I close the tab.
Reporter

Updated

9 months ago
Has STR: --- → yes
Keywords: reproducible

Comment 2

9 months ago
confirmed on latest nightly.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(lsalzman)

Comment 3

9 months ago
Also happens without WR. In that case, memory growth is slow but steady.
With WR, memory growth is explosive
Flags: needinfo?(jfkthame)
Viewing the site on macOS, I see "slow but steady" memory growth _with_ webrender enabled; I certainly wouldn't describe it as "explosive". Which suggests the most severe issue here may be specific to Windows/WR.

Without WR on macOS, it looks like memory use is still growing, but even more slowly.

(FWIW, Safari seems to handle this quite well, while Chrome blows up memory usage much faster than Firefox.)
Flags: needinfo?(jfkthame)
Debian Testing, KDE, Xorg
The memory leak is within the content process. "Minimize memory usage" works with and without GPU process.

good = memory usage goes even slightly down after loading the page
bad = memory leak

mozregression --good 2018-03-01 --bad 2018-04-01 --pref gfx.webrender.all:true -a https://www.curtismann.org/variablefonts/
> 10:02.87 INFO: Last good revision: 903a79a1efff18fc7cc50db09a3fe5d768adc9a8
> 10:02.87 INFO: First bad revision: 2a7d2767a24da944f9f0b389c3942af30c08425f
> 10:02.87 INFO: Pushlog:
> https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=903a79a1efff18fc7cc50db09a3fe5d768adc9a8&tochange=2a7d2767a24da944f9f0b389c3942af30c08425f

> 2a7d2767a24d	Jonathan Kew — Bug 1447163 - Enable support for OpenType variation fonts. r=dbaron
Blocks: 1447163
Severity: normal → major
OS: Unspecified → All
Also happens without WebRender. Memory usage grows ca. 100 MB per second.
Chrome Dev 71 takes about 15 MB more memory per second. It goes a bit up, then a bit down, then a bit further up, etc. I've had to wait multiple minutes until it reached 1 GB memory usage. So it doesn't seem to be a WebRender problem.
Component: Graphics: WebRender → Layout: Text and Fonts
Has Regression Range: --- → yes
mozregression --launch 2018-10-01 --pref gfx.webrender.all:true -a https://www.curtismann.org/variablefonts/
mozregression --launch 2018-10-01 --pref gfx.webrender.all:false -a https://www.curtismann.org/variablefonts/

Win10, GTX 1060
With and without WebRender, memory usage seemed to grow with more than 1 GB per second.
Chrome Dev 71 grows about 5 MB per second.

(User expectation: A web browser is not VirtualBox or a game. With https://www.reaperbugs.com/index in mind I expect some sort of throttling and/or enforcing lower quality if a website demands too much, until it asks a browser API for unthrottling and got user confirmation.)
Oops, I am WR-qualified.
mozregression --launch 2018-10-01 --pref gfx.webrender.force-disabled:true -a https://www.curtismann.org/variablefonts/

Without WebRender it grows about 40 MB per second. So WebRender has some more badness on Windows. But bug 1447163 didn't even have a regression on record yet.
Reporter

Comment 9

9 months ago
In this video, left instance is WR-enabled and right WR-disabled. The memory leaks exhibit in different kinds of process. 

WR-disabled (00:00): leak in content process
WR-enabled (00:32): leak in GPU process. Nightly performance is much worse and the leak is, as described before, explosive.
Blocks: wr-memory
Component: Layout: Text and Fonts → Graphics: WebRender
Reporter

Comment 10

9 months ago
A local copy of the site.

(The typeface attached is licensed under the SIL open font license.)
Reporter

Comment 11

9 months ago
There is no memory leaks if I disable variation fonts support (layout.css.font-variations.enabled;false), with or without WR.
See Also: → 1495919
(In reply to Fanolian from comment #11)
> There is no memory leaks if I disable variation fonts support
> (layout.css.font-variations.enabled;false), with or without WR.

With variation fonts disabled, the test page is no longer constantly rendering slightly different font instances, so that'll be why it no longer chews up memory.

I'm curious whether this is actually specific to variations, though, or if it would occur if we animate font-size so that numerous font instances are created even when variations are disabled. Could you please try the reduced testcase at https://jfkthame.github.io/test/bug1495661/, both with and without WR enabled, and confirm whether this version shows a similar problem (perhaps not quite as explosively)?
Flags: needinfo?(Fanolian+bugzilla)

Comment 13

9 months ago
WR: explosive memory, that did not go away even after closing the tab, and minimizing memory
non-WR: slow growth (1MB/s roughly), which went away after closing the tab and/or minimizing memory

Comment 14

9 months ago
(In reply to Mayank Bansal from comment #13)
> WR: explosive memory, that did not go away even after closing the tab, and
> minimizing memory
> non-WR: slow growth (1MB/s roughly), which went away after closing the tab
> and/or minimizing memory


this is with layout.css.font-variations.enabled set to false or true
Reporter

Comment 15

9 months ago
(In reply to Jonathan Kew (:jfkthame) from comment #12)
> Could you please try the reduced testcase at https://jfkthame.github.io/test/bug1495661/,
> both with and without WR enabled, and confirm whether this version shows a similar problem
> (perhaps not quite as explosively)?

My WR case is similar to Mayank Bansal's. Ram usage to GPU process climbs to around 5.5GB in 1 minute and can not be released by minimizing memory. The about:memory attached is measured after opening the testcase for 1 min.

As for non-WR, ram usage to content process does not climb steadily but in a couple of batches, albeit very slowly. Ram usage does not reach 250MB after 5 minutes, and some are released after focusing away from the browser as well as minimizing memory.
Flags: needinfo?(Fanolian+bugzilla) → needinfo?(jfkthame)
OK, thanks. I think these results confirm that the issue here isn't really a Variable Fonts bug, it's a general problem (especially with WebRender on Windows) related to the use of very large numbers of font instances. It just showed up now because font variations offer increased scope for animating styles in such a way that numerous font instances are used.

We should probably try to limit the number of font instances that may be alive at any given time in the content process, but the more serious problem is that the GPU process explodes so badly and doesn't release its memory even when the old content-process fonts are discarded.
Flags: needinfo?(jfkthame)
Bug 1495919 gives me a strong suspicion that WR is leaking fonts.
Priority: -- → P1
Flags: needinfo?(lsalzman)
Lee -- can you look into this at a top priority?  This may affect our getting WR into Beta 64.  Our go/no-go decision for that is next week.  Thanks.
Assignee: nobody → lsalzman
Flags: needinfo?(lsalzman)
Assignee

Comment 19

9 months ago
I believe WR PR https://github.com/servo/webrender/pull/3177 will fix this issue. Jonathan, can you apply this patch I attached here and verify?
Flags: needinfo?(lsalzman)
Attachment #9015408 - Flags: feedback?(jfkthame)
This definitely improves things in my macOS build with WR enabled; I'll try a Windows build a bit later and check there too, as that was the most "explosive" example.

I still see a gradual increase in content-process memory, which is unrelated to WR. (Almost all heap-unclassified; I wouldn't be surprised if a lot of it is internal allocation by CoreText for all those font instances, etc.) We may want to look into that separately as a followup, but let's get this resolved first.
Comment on attachment 9015408 [details] [diff] [review]
delete FontInstances from FontContexts when evicting a GlyphCache

Yes, I can confirm this appears to fix the memory explosion caused by animated font properties with webrender enabled.

(However, note that with WR enabled, the actual glyph variations don't seem to be rendering for me; see bug 1441323 comment 17. That's unrelated to this patch, the same problem occurs in standard Nightly.)
Attachment #9015408 - Flags: feedback?(jfkthame) → feedback+
Assignee

Updated

9 months ago
Depends on: 1496670
Depends on: 1495162
Further testing indicates that there's still an issue here, even with the above patch applied.

The animated example no longer continuously eats up memory, but the interactive examples on the https://www.curtismann.org/variablefonts/ page still show a problem. If I interact with either "Demo One" or "Demo Three" for more than a few moments, the GPU process memory use rises quickly, until when it reaches around 2GB the GPU process crashes and the window goes entirely white. This seems to be some kind of OOM crash, although overall my system isn't short of memory yet.

Oh - wait - my build is a 32-bit build. So it's running out of address space. A 64-bit build will presumably hold up a lot better.

Still, it's a bit worrying that the memory grows so quickly when I'm interacting with one of those demos. (And when I stop interacting, the memory doesn't drop.) It seems like the patch above has solved the problem for lots of font instances being continuously generated by a CSS animation, but not if they're generated as a result of JS-driven style changes.
The comment 22 issue isn't specific to webrender, however; intensive interaction with the Demo One and Demo Three examples causes memory growth with WR disabled, too (this time in the content process). The growth is less rapid than with WR, but it still looks like an ever-increasing number of variation font instances are using ever-more memory. So we should check for a possible leak on the gecko side of this, too.
Depends on: 1497768
No longer depends on: 1497768
Here's a further testcase: https://jfkthame.github.io/test/bug1495661/randomsize.html

This uses a single (fairly large: 600K-ish) webfont resource, but uses JS to constantly instantiate it at new random sizes. Without WebRender, this causes slow but steady content-process memory growth on Windows: I'm typically seeing something like 1-2MB/second. But with WebRender enabled, the GPU process grows MUCH faster -- a couple hundred MB/second -- and fairly quickly dies.

This doesn't happen on macOS, either with or without WebRender: growth remains very slow in either case.

Any thoughts, Lee?
Flags: needinfo?(lsalzman)
FWIW, memory usage of https://jfkthame.github.io/test/bug1495661/randomsize.html on Linux also shows only very slow growth, either with or without WR. It's just the Windows/WR case that shows runaway growth and quickly crashes.
Another observation is that the random-size example https://jfkthame.github.io/test/bug1495661/randomsize.html does NOT leak, or at least not significantly, on windows/webrender if I disable downloadable fonts, so that it is instantiating thousands of sizes of a system-installed font instead.

From the symptoms, it looks very much as though when we use a webfont, and send the raw font data over to the GPU process for rendering, that copy of the data -- which gets repeated for every font instance -- is leaking. But I don't have a good enough grasp of the data flow and ownership, etc., on the webrender side to really understand what's going on here.
Assignee

Comment 27

9 months ago
We shouldn't be creating new UnscaledFontDWrites for every ScaledFontDWrite we create with variations. We just need to be able to recover the font file data and the default variation settings from the UnscaledFontDWrite, as opposed to the actual styled settings. So it's okay for the ScaledFontDWrite's IDWriteFontFace to vary from the UnscaledFontDWrite's.
Flags: needinfo?(lsalzman)
Attachment #9016076 - Flags: review?(jfkthame)
Assignee

Comment 28

9 months ago
v2... small bug fix.
Attachment #9016076 - Attachment is obsolete: true
Attachment #9016076 - Flags: review?(jfkthame)
Attachment #9016083 - Flags: review?(jfkthame)
Comment on attachment 9016083 [details] [diff] [review]
reuse the default UnscaledFontDWrite for variation fonts

Review of attachment 9016083 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

As a followup, I think we should also handle the bold-simulation case in the same way; there's no need for a separate mUnscaledFontBold in the font entry. I'll post an additional patch to include this.
Attachment #9016083 - Flags: review?(jfkthame) → review+
I've started a try run with both the above patches, to check we haven't broken anything too badly: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ffc3e883d9775f20cede891426bfbdebcebc0f4

Comment 32

9 months ago
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d492774b453f
reuse the default UnscaledFontDWrite for variation fonts. r=jfkthame
Assignee

Updated

9 months ago
Attachment #9016246 - Flags: review?(lsalzman)

Comment 33

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d492774b453f
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Next Fx63 build is the RC, which seems too late for taking this in that release.
You need to log in before you can comment on or make changes to this bug.