Open Bug 1677364 Opened 10 months ago Updated 9 months ago

After OOM crash nn a contrived testcase, the GPU process doesnt release 4.1GB memory unless force terminated from about:support

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

People

(Reporter: mayankleoboy1, Assigned: bradwerth)

References

(Depends on 2 open bugs, Blocks 3 open bugs)

Details

Attachments

(4 files)

Attached file Moire - Copy.html
  1. Use Nightly+ WR+ Windowsx64
  2. Open the testcase. The tab should OOM crash

Look at teh task manager. It will show one Firefox process using large memory (4.1 GB in my case). If you try to minimize the memory using about:memory, it will have no effect.
To reclaim memory, you have to go to about:support and click on "terminate GPU process"

Expected result : The browser releases the memory, and the browser becomes usable again.

Fission doesnt fix this

Attached file memory-report.json.gz
Attached file about:support
Summary: After OOM crash, the GPU process doesnt release memory → On a contrived testcase, after OOM crash, the GPU process doesnt release 4.1GB memory unless force terminated from about:support
Summary: On a contrived testcase, after OOM crash, the GPU process doesnt release 4.1GB memory unless force terminated from about:support → After OOM crash nn a contrived testcase, the GPU process doesnt release 4.1GB memory unless force terminated from about:support
See Also: → 1623557
Blocks: wr-memory
Blocks: wr-perf
Blocks: wr-stability

:gw, do you have any idea about high memory usage?

Flags: needinfo?(gwatson)
Blocks: gfx-triage
Severity: -- → S3

Nothing obvious I can think of - would need to investigate / repro to see what is happening here, I think. WR will retain render targets for some time in a pool, but they should be getting freed, same for texture cache pages. It's possible that the display list from the OOM'ed content process doesn't get removed, but I wouldn't expect that to be a huge amount of memory.

Flags: needinfo?(gwatson)

I can reproduce, or at least lock the browser if I open the testcase. I'll investigate further.

Assignee: nobody → bwerth

Okay, in reproducing this, it's becoming clear that the initial setup code (adding 500 * 3 = 1500 overlapping circles) is causing the memory explosion, even without the animation loop. I'll see what I can find from investigating the memory usage pattern with a smaller number of circles.

Attached file 200circles.html

This is a simplified version of the Moire testcase, that does no animation and generates only 200 circles. On my system, macOS software Webrender with no compositor, the baseline memory usage is ~300MB. When this testcase is being rendered -- and only while it's being rendered -- the memory usage leaps up to 7.5GB. Immediately afterwards it drops back to the lower level.

With the 200circles testcase, here's a profile taken after opening the testcase and letting the browser settle and opening a new about:home tab and focusing it. Recording begins, the about:home tab is closed and recording continues until the 200circles content is visible again. https://share.firefox.dev/2LeG6m8

Most of the time is spent in cs_border_solid_frag::run.

Breaking into rendering on a debug build shows that update_texture_cache is processing 676 entries in update_list, many of which seem to be 2048 x 2048 4-byte allocations (16MB). 676 of those would be 10.5GB, so not all of the allocations are that size, but if 70% of them are 16MB allocations, then that would account for the memory bloat I see on my system. I'll see if I can determine when and why so many large textures are being added to update_list.

Hmm... adding some logging to texture_cache.rs shows that the allocation requests are coming from allocate_standalone_entry. They start at 518 x 518 and increase every 4 allocations steadily from there. By allocation #385 they have reached 2048 x 2048 and stay that size until the final allocation #672. I suppose this corresponds to the increasing circle sizes in the test case, with the increasing size of the circles being totally covered by a texture, or perhaps multiple textures when the maximum size is reached.

I'll see if I can understand what triggers the allocations, in terms of a stack trace. Then the next step will be batching these better, or overwriting an old texture with new texture data, etc.

No longer blocks: gfx-triage

See also : https://github.com/servo/webrender/issues/2706
The intention to file this bug was to investigate the GPU process not releasing memory after OOM crash.

Depends on: 1664060
Depends on: 1549734
You need to log in before you can comment on or make changes to this bug.