GPU Process Hardware Decode Memory Leak (multiple GB) Windows AMD Leak since ff126
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: vincent, Assigned: alwu)
References
(Blocks 2 open bugs, Regression)
Details
(Keywords: regression)
Attachments
(3 files, 1 obsolete file)
32.64 KB,
text/plain
|
Details | |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
dmeehan
:
approval-mozilla-release+
|
Details | Review |
37.70 KB,
text/plain
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0
Steps to reproduce:
Just browse some HTML5 video content like youtube/9gag/etc.
This issue started once my main PC (AMD 7900XTX) firefox got updated to firefox 126 (dev edition). And started to affect my laptop (AMD 7840HS) once firefox stable 126 got published.
Both are on the latest available drivers.
media.hardware-video-decoding.enabled = false seems to prevent this from happening.
Actual results:
Ever increasing GPU Process commit size (got it a few times at around 60GB before I manually killed the process, can bump to 10GB in a few minutes of scrolling videos). Paged size seems normal. (ie. memory is allocated but used?).
Expected results:
Same behavior as with firefox 125, a normal GPU process memory commit size.
Youtube shorts seems to be the best way to reproduce, got it to 19GB in a few seconds.
Partial copy of about:memory
for this GPU Process
170.99 MB ── d3d11-shared-textures
0.00 MB ── gfx-d2d-vram-draw-target
36.00 MB ── gfx-d2d-vram-source-surface
18,989.10 MB ── gpu-committed
1,506.33 MB ── gpu-dedicated
114.39 MB ── gpu-shared
39.39 MB ── heap-allocated
1.00 MB ── heap-chunksize
19,225.47 MB ── private
613.02 MB ── resident
343.38 MB ── resident-unique
0.00 MB ── shmem-allocated
250.35 MB ── shmem-mapped
44.91 MB ── system-heap-allocated
0 ── unresolved-ipc-responses
2,125,875.48 MB ── vsize
130,383,950.69 MB ── vsize-max-contiguous
38 ── webgl-buffer-count
0.02 MB ── webgl-buffer-memory
1 ── webgl-context-count
0 ── webgl-renderbuffer-count
0.00 MB ── webgl-renderbuffer-memory
20 ── webgl-shader-count
10 ── webgl-texture-count
21.21 MB ── webgl-texture-memory
Process Explorer only reports 2.5GB of committed GPU memory and 1.1 dedicated GPU memory
Comment 2•10 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 3•10 months ago
|
||
Can you :
- Attach the contents of your about:support to this bug
- Use https://mozilla.github.io/mozregression/ tool to do a bisection and find the change/bug that leads to this issue?
Comment 4•10 months ago
|
||
Sotaro, any ideas what might have caused this, around the time of 126?
Updated•10 months ago
|
Comment 5•10 months ago
|
||
Alastor, any ideas about what might have caused this around 126?
Comment 6•10 months ago
•
|
||
I wonder if Bug 1888354 might be related to this bug.
From Bug 1313883, with AMD GPUs, it seems necessary to use same device for hardware video decoders.
I did the bisect twice to be sure.
2024-05-16T09:12:06.456000: INFO : Narrowed integration regression window from [c6228adc, 236ed2ef] (3 builds) to [c6228adc, b4e3dcc4] (2 builds) (~1 steps left)
2024-05-16T09:12:07.770000: INFO : The bisection is done.
2024-05-16T09:12:07.771000: INFO : Stopped
Updated•10 months ago
|
Comment 9•10 months ago
|
||
Ashely, can you try reproducing this on your Radeon 780m?
Assignee | ||
Comment 10•10 months ago
|
||
Yes I think Bug 1888354 caused this, so it seems we should always reuse the device on AMD cards.
Updated•10 months ago
|
Assignee | ||
Comment 11•10 months ago
|
||
As bug 1896823 has disabled the device reuse for intel gen12, it doesn't
seem necessary to add this workaround to disable the device reuse.
We should revert this and investigate whether we can use DXVA decoder
directly via ffmpeg in Bug 1893427.
Comment 12•10 months ago
|
||
I'm doing some regression testing on this to confirm whether we're cleaning up some objects correctly.
Assignee | ||
Comment 13•10 months ago
|
||
Could you help me verify if this build works for you? Thanks!
Comment 14•10 months ago
|
||
Comment 15•10 months ago
|
||
NeedInfo - I've verified this regression in committed memory usage when repeatedly recreating the video decode device (e.g. scrolling on YouTube Shorts) on current AMD drivers (Adrenalin 24.5.1), but could not repro the issue on older AMD drivers (I do not recall exact version but it was in 22.* series before I updated to latest).
Steps to reproduce:
- Open Task Manager, switch to Details view, make sure the Commit size column is enabled (right click any column header and choose Select Columns to open a dialog that lets you select the columns to show)
- Open any affected version of Firefox
- Go to https://www.youtube.com/shorts and play a video, scroll to see additional videos, scroll up and down, this causes a lot of video decode devices to be created and destroyed in a short time.
Observed behavior: Commit size of one of the Firefox.exe processes increases by gigabytes in a short time while scrolling through YouTube Shorts.
On my laptop with AMD Ryzen 7840U (Radeon 780M graphics) the memory shows up in regular commit rather than gpu-commit, likely due to shared memory, the Dedicated GPU Memory and Shared GPU Memory columns did not appear to change much when reproing the issue.
Comment 16•10 months ago
|
||
I've now also successfully reproed the memory leak on Adrenalin 24.2.1 on a desktop (Ryzen 7950X3D + Radeon 7900XTX).
Assignee | ||
Comment 17•10 months ago
|
||
Could we help me verify if this build fixes the memory leak for you?
Reporter | ||
Comment 18•10 months ago
|
||
(In reply to Alastor Wu [:alwu] from comment #17)
Could we help me verify if this build fixes the memory leak for you?
Build looks good, GPU Memory stays around 1GB and about:supports still shows hardware decoding on and taskmgr GPU shows codec engine usages.
Updated•9 months ago
|
Updated•9 months ago
|
Assignee | ||
Comment 19•9 months ago
|
||
As bug 1896823 has disabled the device reuse for intel gen12, it doesn't
seem necessary to add this workaround to disable the device reuse.
We should revert this and investigate whether we can use DXVA decoder
directly via ffmpeg in Bug 1893427.
Original Revision: https://phabricator.services.mozilla.com/D210721
Updated•9 months ago
|
Updated•9 months ago
|
Comment 20•9 months ago
|
||
Assignee | ||
Comment 21•9 months ago
|
||
Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.
Beta/Release Uplift Approval Request
- User impact if declined: High GPU memory usage
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This reverts the previous change of not reusing decoder device, and fallback to the case of reusing the device, which is not a new feature and has been used for long while.
- String changes made/needed: No
- Is Android affected?: No
Comment 22•9 months ago
|
||
AMD ticket to investigate the root of the leak has been filed. This doesn't seem to happen with other application playback scenarios, is Firefox doing anything unusual when using the D3D11 device?
Comment 23•9 months ago
|
||
bugherder |
Comment 24•9 months ago
|
||
Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.
Approved for 127 beta 4, thanks.
Comment 25•9 months ago
|
||
uplift |
Comment 26•9 months ago
|
||
bugherder uplift |
Comment 27•9 months ago
|
||
:alwu could you add a release uplift request?
We can include this in a Fx126 dot release
Assignee | ||
Comment 28•9 months ago
|
||
Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.
Beta/Release Uplift Approval Request
- User impact if declined: High GPU memory usage for users who are using certain versions of AMD card
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: Yes
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This reverts the previous change of not reusing decoder device, and fallback to the case of reusing the device, which is not a new feature and has been used for long while.
- String changes made/needed:
- Is Android affected?: No
Updated•9 months ago
|
Comment 30•9 months ago
|
||
Hello! I have reproduced the issue with Firefox 126.0 on Windows 10x64 with AMD 7800XT using Adrenalin 24.2.1 on a desktop machine by following the steps from comment 15. The GPU memory in about:processes page and in Task Manager > Commit size
values are increasing after scrolling through multiple short videos.
The issue no longer reproduces with Firefox 128.0a1 (2024-05-21) and 127.0b4 on the same machine using steps from comment 15. The GPU memory stays at ~1GB in about:processes and in Task Manager >Commit size
when scrolling through YouTube Short videos.
Comment 31•9 months ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #27)
:alwu could you add a release uplift request?
We can include this in a Fx126 dot release
Here are some recent threads on Reddit that could be related to this issue, to help inform the uplift decision:
https://www.reddit.com/r/firefox/comments/1cwmuha/firefox_126_crashing_the_gpu_driver_when_watching/
https://www.reddit.com/r/firefox/comments/1cxkkvh/firefox_memory_leak/
https://www.reddit.com/r/firefox/comments/1cxskhw/firefox_running_into_memory_leaks_often/
Comment 32•9 months ago
|
||
Here are some recent threads on Reddit that could be related to this issue, to help inform the uplift decision:
https://www.reddit.com/r/firefox/comments/1cwmuha/firefox_126_crashing_the_gpu_driver_when_watching/
https://www.reddit.com/r/firefox/comments/1cxkkvh/firefox_memory_leak/
https://www.reddit.com/r/firefox/comments/1cxskhw/firefox_running_into_memory_leaks_often/
This will be included in the Fx126 planned dot release.
The uplift request I asked about will pend until closer to the release preparation.
Comment 33•9 months ago
|
||
Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.
Approved for 126.0.1
Comment 34•9 months ago
|
||
uplift |
Updated•9 months ago
|
Comment 36•9 months ago
|
||
Verified fixed with Firefox 126.0.1 on Windows 10x64 using steps from comment 15. The GPU memory stays at ~1GB in about:processes and in Task Manager > Commit size when scrolling through YouTube Short videos.
Updated•7 months ago
|
Description
•