Closed Bug 1897006 Opened 18 days ago Closed 15 days ago

GPU Process Hardware Decode Memory Leak (multiple GB) Windows AMD Leak since ff126

Categories

(Core :: Graphics, defect)

Firefox 126
defect

Tracking

()

VERIFIED FIXED
128 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox126 + verified
firefox127 + verified
firefox128 + verified

People

(Reporter: vincent, Assigned: alwu, NeedInfo)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0

Steps to reproduce:

Just browse some HTML5 video content like youtube/9gag/etc.

This issue started once my main PC (AMD 7900XTX) firefox got updated to firefox 126 (dev edition). And started to affect my laptop (AMD 7840HS) once firefox stable 126 got published.

Both are on the latest available drivers.

media.hardware-video-decoding.enabled = false seems to prevent this from happening.

Actual results:

Ever increasing GPU Process commit size (got it a few times at around 60GB before I manually killed the process, can bump to 10GB in a few minutes of scrolling videos). Paged size seems normal. (ie. memory is allocated but used?).

Expected results:

Same behavior as with firefox 125, a normal GPU process memory commit size.

Youtube shorts seems to be the best way to reproduce, got it to 19GB in a few seconds.

Partial copy of about:memory for this GPU Process

        170.99 MB ── d3d11-shared-textures
          0.00 MB ── gfx-d2d-vram-draw-target
         36.00 MB ── gfx-d2d-vram-source-surface
     18,989.10 MB ── gpu-committed
      1,506.33 MB ── gpu-dedicated
        114.39 MB ── gpu-shared
         39.39 MB ── heap-allocated
          1.00 MB ── heap-chunksize
     19,225.47 MB ── private
        613.02 MB ── resident
        343.38 MB ── resident-unique
          0.00 MB ── shmem-allocated
        250.35 MB ── shmem-mapped
         44.91 MB ── system-heap-allocated
                0 ── unresolved-ipc-responses
  2,125,875.48 MB ── vsize
130,383,950.69 MB ── vsize-max-contiguous
               38 ── webgl-buffer-count
          0.02 MB ── webgl-buffer-memory
                1 ── webgl-context-count
                0 ── webgl-renderbuffer-count
          0.00 MB ── webgl-renderbuffer-memory
               20 ── webgl-shader-count
               10 ── webgl-texture-count
         21.21 MB ── webgl-texture-memory

Process Explorer only reports 2.5GB of committed GPU memory and 1.1 dedicated GPU memory

The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Graphics
Product: Firefox → Core

Can you :

  1. Attach the contents of your about:support to this bug
  2. Use https://mozilla.github.io/mozregression/ tool to do a bisection and find the change/bug that leads to this issue?
Flags: needinfo?(vincent)

Sotaro, any ideas what might have caused this, around the time of 126?

Flags: needinfo?(sotaro.ikeda.g)

Alastor, any ideas about what might have caused this around 126?

Flags: needinfo?(alwu)

I wonder if Bug 1888354 might be related to this bug.

From Bug 1313883, with AMD GPUs, it seems necessary to use same device for hardware video decoders.

Flags: needinfo?(sotaro.ikeda.g)
See Also: → 1313883

I did the bisect twice to be sure.

2024-05-16T09:12:06.456000: INFO : Narrowed integration regression window from [c6228adc, 236ed2ef] (3 builds) to [c6228adc, b4e3dcc4] (2 builds) (~1 steps left)
2024-05-16T09:12:07.770000: INFO : The bisection is done.
2024-05-16T09:12:07.771000: INFO : Stopped
Flags: needinfo?(vincent)
Keywords: regression
Regressed by: 1888354

Ashely, can you try reproducing this on your Radeon 780m?

Flags: needinfo?(ahale)

Yes I think Bug 1888354 caused this, so it seems we should always reuse the device on AMD cards.

Assignee: nobody → alwu
Flags: needinfo?(alwu)
Status: UNCONFIRMED → NEW
Ever confirmed: true

As bug 1896823 has disabled the device reuse for intel gen12, it doesn't
seem necessary to add this workaround to disable the device reuse.

We should revert this and investigate whether we can use DXVA decoder
directly via ffmpeg in Bug 1893427.

I'm doing some regression testing on this to confirm whether we're cleaning up some objects correctly.

Could you help me verify if this build works for you? Thanks!

Flags: needinfo?(vincent)

NeedInfo - I've verified this regression in committed memory usage when repeatedly recreating the video decode device (e.g. scrolling on YouTube Shorts) on current AMD drivers (Adrenalin 24.5.1), but could not repro the issue on older AMD drivers (I do not recall exact version but it was in 22.* series before I updated to latest).

Steps to reproduce:

  • Open Task Manager, switch to Details view, make sure the Commit size column is enabled (right click any column header and choose Select Columns to open a dialog that lets you select the columns to show)
  • Open any affected version of Firefox
  • Go to https://www.youtube.com/shorts and play a video, scroll to see additional videos, scroll up and down, this causes a lot of video decode devices to be created and destroyed in a short time.

Observed behavior: Commit size of one of the Firefox.exe processes increases by gigabytes in a short time while scrolling through YouTube Shorts.

On my laptop with AMD Ryzen 7840U (Radeon 780M graphics) the memory shows up in regular commit rather than gpu-commit, likely due to shared memory, the Dedicated GPU Memory and Shared GPU Memory columns did not appear to change much when reproing the issue.

Flags: needinfo?(ahale) → needinfo?(paul.blinzer)

I've now also successfully reproed the memory leak on Adrenalin 24.2.1 on a desktop (Ryzen 7950X3D + Radeon 7900XTX).

Could we help me verify if this build fixes the memory leak for you?

Flags: needinfo?(ahale)

(In reply to Alastor Wu [:alwu] from comment #17)

Could we help me verify if this build fixes the memory leak for you?

Build looks good, GPU Memory stays around 1GB and about:supports still shows hardware decoding on and taskmgr GPU shows codec engine usages.

Flags: needinfo?(vincent)
Severity: -- → S2

As bug 1896823 has disabled the device reuse for intel gen12, it doesn't
seem necessary to add this workaround to disable the device reuse.

We should revert this and investigate whether we can use DXVA decoder
directly via ffmpeg in Bug 1893427.

Original Revision: https://phabricator.services.mozilla.com/D210721

Attachment #9402543 - Flags: approval-mozilla-beta?
Attachment #9402543 - Attachment is obsolete: true
Attachment #9402543 - Flags: approval-mozilla-beta?
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/07e212fd2be0
revert the change of disabling reuse device. r=sotaro

Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.

Beta/Release Uplift Approval Request

  • User impact if declined: High GPU memory usage
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This reverts the previous change of not reusing decoder device, and fallback to the case of reusing the device, which is not a new feature and has been used for long while.
  • String changes made/needed: No
  • Is Android affected?: No
Attachment #9402306 - Flags: approval-mozilla-beta?

AMD ticket to investigate the root of the leak has been filed. This doesn't seem to happen with other application playback scenarios, is Firefox doing anything unusual when using the D3D11 device?

Flags: needinfo?(paul.blinzer)
Depends on: 1896823
Status: NEW → RESOLVED
Closed: 15 days ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch

Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.

Approved for 127 beta 4, thanks.

Attachment #9402306 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1897875
Blocks: 1847453

:alwu could you add a release uplift request?
We can include this in a Fx126 dot release

Flags: needinfo?(alwu)

Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.

Beta/Release Uplift Approval Request

  • User impact if declined: High GPU memory usage for users who are using certain versions of AMD card
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This reverts the previous change of not reusing decoder device, and fallback to the case of reusing the device, which is not a new feature and has been used for long while.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(alwu)
Attachment #9402306 - Flags: approval-mozilla-release?
Duplicate of this bug: 1818982
See Also: → 1820789
QA Whiteboard: [qa-triaged]
Flags: qe-verify+

Hello! I have reproduced the issue with Firefox 126.0 on Windows 10x64 with AMD 7800XT using Adrenalin 24.2.1 on a desktop machine by following the steps from comment 15. The GPU memory in about:processes page and in Task Manager > Commit size values are increasing after scrolling through multiple short videos.
The issue no longer reproduces with Firefox 128.0a1 (2024-05-21) and 127.0b4 on the same machine using steps from comment 15. The GPU memory stays at ~1GB in about:processes and in Task Manager >Commit size when scrolling through YouTube Short videos.

QA Whiteboard: [qa-triaged]
Flags: qe-verify+
Blocks: 1820789
See Also: → 1897881

(In reply to Donal Meehan [:dmeehan] from comment #27)

:alwu could you add a release uplift request?
We can include this in a Fx126 dot release

Here are some recent threads on Reddit that could be related to this issue, to help inform the uplift decision:

https://www.reddit.com/r/firefox/comments/1cwmuha/firefox_126_crashing_the_gpu_driver_when_watching/
https://www.reddit.com/r/firefox/comments/1cxkkvh/firefox_memory_leak/
https://www.reddit.com/r/firefox/comments/1cxskhw/firefox_running_into_memory_leaks_often/

Here are some recent threads on Reddit that could be related to this issue, to help inform the uplift decision:

https://www.reddit.com/r/firefox/comments/1cwmuha/firefox_126_crashing_the_gpu_driver_when_watching/
https://www.reddit.com/r/firefox/comments/1cxkkvh/firefox_memory_leak/
https://www.reddit.com/r/firefox/comments/1cxskhw/firefox_running_into_memory_leaks_often/

This will be included in the Fx126 planned dot release.
The uplift request I asked about will pend until closer to the release preparation.

Comment on attachment 9402306 [details]
Bug 1897006 - revert the change of disabling reuse device.

Approved for 126.0.1

Attachment #9402306 - Flags: approval-mozilla-release? → approval-mozilla-release+
Duplicate of this bug: 1897875

Verified fixed with Firefox 126.0.1 on Windows 10x64 using steps from comment 15. The GPU memory stays at ~1GB in about:processes and in Task Manager > Commit size when scrolling through YouTube Short videos.

Status: RESOLVED → VERIFIED
Has STR: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: