Closed Bug 1663227 Opened 4 years ago Closed 4 years ago

100% RAM and VRAM usage

Categories

(Core :: Audio/Video: Playback, defect, P1)

80 Branch
defect

Tracking

()

VERIFIED FIXED
82 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 82+ verified
firefox80 --- wontfix
firefox81 - wontfix
firefox82 + verified

People

(Reporter: ttfh3500, Assigned: jya)

References

(Regression)

Details

(Keywords: memory-footprint, regression, reproducible)

Attachments

(9 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0

Steps to reproduce:

Open https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3080/#design-container
Animation starts

Actual results:

RAM usage increase to 8GB (from 1.5GB), VRAM usage increase to 4GB (from 250MB)
The system freeze

Additional notes:
Firefox version: 80.0.1 (64-bit)
Hardware acceleration is enabled
GPU is GTX 1050ti 4GB
8GB of RAM DDR4 2400MHz

Expected results:

Lower RAM usage

I can reproduce the issue on Nightly82.0a1(20200904094341) Windows10.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core
Has Regression Range: --- → yes
Has STR: --- → yes
Component: Graphics: WebRender → Audio/Video: Playback
Keywords: regression
Regressed by: 1582353
Attached file memory-report.json.gz
Flags: needinfo?(jyavenard)

Setting media.gpu-process-decoder to false seems to reduce memory usage.

The only thing I can think of is that we used to return one image at a time over IPC, but now we can return an array of images instead when draining the decoder.
But overall this shouldn't have an impact as the data would have been accumulated in the RemoteDecoderManagerChild anyway before being returned to the MDSM

Having said that, I can't reproduce the issue with Nightly on Windows 10 x64 with an AMD 5700XT.
The RAM usage stays constant (very light increase when the page is opened)
GPU memory stays steady too.

Can you reproduce the issue with this build? https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/G3z_OKsQR4CoIw5-yAocLw/runs/0/artifacts/public/build/install/sea/target.installer.exe

Flags: needinfo?(jyavenard) → needinfo?(alice0775)

Can't reproduce on Windows 10 64 with a nvidia 1050Ti.
I also don't see how bug 1582353 could have anything to do with it.

The number of memory allocations and when they are freed stayed identical.

(In reply to Jean-Yves Avenard [:jya] from comment #6)

Can you reproduce the issue with this build? https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/G3z_OKsQR4CoIw5-yAocLw/runs/0/artifacts/public/build/install/sea/target.installer.exe

I can still reproduce the issue on the try-build.

Flags: needinfo?(alice0775)

(In reply to Alice0775 White from comment #8)

(In reply to Jean-Yves Avenard [:jya] from comment #6)

Can you reproduce the issue with this build? https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/G3z_OKsQR4CoIw5-yAocLw/runs/0/artifacts/public/build/install/sea/target.installer.exe

I can still reproduce the issue on the try-build.

Thank you.

I've managed to reproduce the issue on a local build.

Assignee: nobody → jyavenard

Actually. No, I can't reproduce it, it was another bug I had locally in my working changes.

Are you certain of your regression range? because with a build containing the range given, I can't reproduce it either.

Could you give these two builds a try to confirm the regression range:
https://drive.google.com/drive/folders/1XlXuCj1SqUNy1Y-PHrIBISfhEKHYK93x?usp=sharing

firefox-71.0a1.en-US.win64.with.installer.exe is a build using code from last year with change of bug 1582353 in.
firefox-71.0a1.en-US.win64.with.installer.exe is a build using code from last year without change of bug 1582353 in.

Narrowing the regression would help greatly.

Thank you very much in advance.

Flags: needinfo?(alice0775)

(In reply to Jean-Yves Avenard [:jya] from comment #10)

Actually. No, I can't reproduce it, it was another bug I had locally in my working changes.

Are you certain of your regression range? because with a build containing the range given, I can't reproduce it either.

Could you give these two builds a try to confirm the regression range:
https://drive.google.com/drive/folders/1XlXuCj1SqUNy1Y-PHrIBISfhEKHYK93x?usp=sharing

firefox-71.0a1.en-US.win64.with.installer.exe is a build using code from last year with change of bug 1582353 in.
firefox-71.0a1.en-US.win64.with.installer.exe is a build using code from last year without change of bug 1582353 in.

Narrowing the regression would help greatly.

Thank you very much in advance.

firefox-71.0a1.en-US.win64.with.installer.exe : reproduce the issue
firefox-71.0a1.en-US.win64.without.installer.exe : not eproduce

Flags: needinfo?(alice0775)
Attached file reduced.zip

Is there any chance you could post your about:support ?

What exactly are you doing?

e.g. Start firefox in a clean profile, open https://drive.google.com/drive/folders/1XlXuCj1SqUNy1Y-PHrIBISfhEKHYK93x?usp=sharing and that's it?

The only think special here that I can see is that Windows never use a hardware decoder, and instead use a software one.

(In reply to Jean-Yves Avenard [:jya] from comment #13)

Is there any chance you could post your about:support ?

yes I attached.

What exactly are you doing?

e.g. Start firefox in a clean profile, open https://drive.google.com/drive/folders/1XlXuCj1SqUNy1Y-PHrIBISfhEKHYK93x?usp=sharing and that's it?

STEP

  1. Create a new profile from firefox.exe -p
  2. Start Firefox with the profile
  3. Open https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3080/#design-container
  4. Wait for 10-20sec
    --- observe memory increasing
Attached image image.png

This is very puzzling indeed.

I've tried with 3 different machines, two AMD, one nvidia ; I can't reproduce the issue nor could I explain why bug 1582353 could explain that change.

Screen capture attached.

when you play the video attached in your reduced.zip and with the media devtools (https://addons.mozilla.org/en-US/firefox/addon/devtools-media-panel/) when you go into the devtools (Ctrl+Shift+I and select Media/Webrtc) and select the video (make it play in loop for more easily capturing the data).

What type of video decoder is being used? if you expand the json, in the video entry should show something like "wmf hardware video decoder - nv12 (remote)"

Thank you once again for your patience on this.

Ok, I think I know how to reproduce it.

If I disable the HW decoder and force the Windows software decoder, then I see a leak.

Not sure why the HW decoder would be disabled for you though. Waiting on the answer on my previous question.

Edit: "wmf software video decoder - yuv420 (remote)"

yep, this is happening with SW decoding for some reason.

awesome, thanks

FYI, If I only open the video file directly(and enabled loop from contextmenu), the memory problems are not observed.

The leak occurs when we return the images via an IPC MozPromise. It doesn't occur if we send the images via a dedicated async Output method.

Seems the issue is in the IPC's framework.

See Also: → 1639544

(In reply to Jean-Yves Avenard [:jya] from comment #20)

The leak occurs when we return the images via an IPC MozPromise. It doesn't occur if we send the images via a dedicated async Output method.

Seems the issue is in the IPC's framework.

I assume cross-referencing bug 1639544 here was a mistake? It doesn't have to do with this bug but was a graphic driver issue. Most likely you wanted to add a different one.

See Also: 1639544
See Also: → 1648309
Depends on: 1664362

[Tracking Requested - why for this release]: Massive memory leak when using videos when seeking constantly into them

This is a quick & dirty solution designed for easy uplift to beta.

We add a RemoteImageHolder container class that will take ownership of the image transferred across IPC.
Should the image not be received for whatever the reason by the RemoteDecoderChild (either because the decode promise got disconnected or because the handler exited early), the parent will be notified to recycle the image.

IPDL binding code still uses a lot of copies across which we can't all remove. As such, the object ArrayOfRemoteVideoData is made to be refcounted so that we move a pointer instead, bypassing const issues.

Depends on D90203

Fx81 is in RC already and this is an old issue.

Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ce029dee9e29 P1. Always process decoded video promise to avoid leaks. r=mattwoodrow
Keywords: leave-open
Severity: -- → S2
Priority: -- → P1
Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/29db5f8655d0 P2. Wrap image in a RemoteImageHolder and use move semantics. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/a6b09fc8e9f8 P3. Have RemoteImageHandler handles image's lifecycle even in the parent. r=mattwoodrow

Is the leave-open still needed after P2/P3 land?

Flags: needinfo?(jyavenard)
Flags: needinfo?(jyavenard)
Keywords: leave-open
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
Regressions: 1665945

Is this something we should consider uplifting to ESR78? Please nominate if so.

Flags: needinfo?(jyavenard)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #32)

Is this something we should consider uplifting to ESR78? Please nominate if so.

definitely.
I'll nominate the hackish approach, because it's going to be easier to backport.

Flags: needinfo?(jyavenard)

Comment on attachment 9175740 [details]
Bug 1663227 - P1. Always process decoded video promise to avoid leaks. r?mattwoodrow

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Will cause within a few seconds gigabytes of memory usage. After a short while will render the browser unusable.
  • User impact if declined: Gigabyte of memory in use, never freed.
  • Fix Landed on Version: 82
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): There is the quick&dirty patch that just plug the main issue, but doesn't resolve it all.
    The two later patches contain the proper fix, however it's quite extensive and make it harder to assess the whole impact
  • String or UUID changes made by this patch: none
Attachment #9175740 - Flags: approval-mozilla-esr78?

Comment on attachment 9175740 [details]
Bug 1663227 - P1. Always process decoded video promise to avoid leaks. r?mattwoodrow

Thanks for the band-aid uplift fix. Approved for 78.4esr.

Attachment #9175740 - Flags: approval-mozilla-esr78? → approval-mozilla-esr78+
QA Whiteboard: [qa-triaged]

Hi Alice and reporter,
Can you please check out the fixes on Beta 82 and ESR? Can't reproduce this on my end on AMD 5700XT, NVidia GeForce GTX 1050 and Intel HD Graphics 4600.

Flags: needinfo?(ttfh3500)
Flags: needinfo?(alice0775)

The site seems to have been changed. No longer reproduce the issue with url of comment#0 on the bad build.
However, I can reproduce the issue with reduced.zip on the bad build.

Reproduced the issue with reduced.zip on Nightly82a1(20200916095656) and esr78.3.0 Windows10.
And verified fix on Nightly82a1(20200916153738) Windows10.
And also verified fix on Firefox82.0b6.

However, I can still reproduce the issue with reduced.zip on 78.3.1esr windows10.

Flags: needinfo?(alice0775)

Marking 82 as verified as per Comment 17.
Alice, could you check it out now on ESR 78.4?

Flags: needinfo?(alice0775)

(In reply to Timea Cernea [:tbabos] from comment #39)

Marking 82 as verified as per Comment 17.
Alice, could you check it out now on ESR 78.4?

I can manage to reproduce the issue on ESR 78.3.1.
And verified fix on ESR 78.4.0 RC build2(20201013163257).

Flags: needinfo?(alice0775)

Thanks Alice!
Closing the bug as Verified-fixed based on Comment 38 and Comment 40.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
Flags: needinfo?(ttfh3500)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: