Closed Bug 1743750 Opened 3 years ago Closed 3 years ago

High never-clearing heap-unclassified memory with VA-API and ffvpx

Categories

(Core :: Audio/Video: Playback, defect)

Firefox 96
defect

Tracking

()

VERIFIED FIXED
98 Branch
Tracking Status
firefox98 --- verified

People

(Reporter: tgnff242, Assigned: stransky)

References

(Blocks 1 open bug)

Details

(Keywords: nightly-community)

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0

Steps to reproduce:

  1. You'll need a GPU that can decode VP9 for the ffvpx to be used.
  2. In a new profile, set media.ffmpeg.vaapi.enabled to true and restart Firefox.
  3. To make the memory accumulate quickly, open a few 4K VP9 videos, each in its own window, and let them play at the highest speed for at least a few minutes.
  4. Open about:memory and watch the memory of the decoding process.

Actual results:

As the videos are being decoded, a lot of memory is accumulating under heap-unclassified of the decoding process and it's never released unless the process is terminated.

Expected results:

The memory should be cleared normally, just like when ffvpx is disabled and ffmpeg is used instead.

This is not a regression. You get the same issue since ffvpx got VA-API support in Bug 1660336.

Has STR: --- → yes

Set the Severity to S4 for now. Feel free to change it.
Martin, would you mind having a look?

Severity: -- → S4
Flags: needinfo?(stransky)

Can you test now when we updated ffvpx to 4.4.1? (Bug 1652958).
Thanks.

Flags: needinfo?(stransky) → needinfo?(tgnff242)

Yes, I can still reproduce it in 20220106090415.

Flags: needinfo?(tgnff242)

I see that too.

Assignee: nobody → stransky
Status: UNCONFIRMED → NEW
Ever confirmed: true

Do you use radeon driver? I can reproduce it on radeon only, in Intel the memory usage is stable and low.

Flags: needinfo?(tgnff242)

It may not be related to radeon itself as playback via mpv does not show this bug.

Also I see that with VP9 only...AV1 playback via ffvpx seems to be ok.

Yes, I'm using an AMD GPU. I can't check AV1 myself, however, as my GPU doesn't support it.

Just to be clear, if I disable ffvpx (media.ffvpx.enabled:false), Firefox can still decode VP9 through ffmpeg/VA-API without leaking memory.

Flags: needinfo?(tgnff242)

Thanks, tracking it down right now. The memory allocations are pretty regular and huge (1MB per sec) so it may be doable to find it.

Found it, the memory is allocated here:

#11 0x00005594427c6255 in malloc(size_t) (arg1=564105) at /raid/src2/memory/build/malloc_decls.h:51
#12 0x00007f97e489de06 in vlVaCreateBuffer
    (ctx=0x7f97e40ea4c0, context=<optimized out>, type=VASliceDataBufferType, size=564105, num_elements=<optimized out>, data=0x7f97df403000, buf_id=0x7f97e0bf3a04)
    at ../src/gallium/frontends/va/buffer.c:56
#13 0x00007f97ebd791de in vaCreateBuffer (dpy=0x7f97e2d07a40, context=20, type=VASliceDataBufferType, size=564105, num_elements=1, data=<optimized out>, buf_id=0x7f97e0bf3a04)
    at /usr/src/debug/libva-2.13.0-3.2.fc35.x86_64/va/va.c:1374
#14 0x00007f97ebb9ddd5 in vaCreateBuffer (dpy=0x7f97e2d07a40, context=20, type=VASliceDataBufferType, size=564105, num_elements=1, data=0x7f97df403000, buf_id=0x7f97e0bf3a04)
    at /raid/src2/media/ffvpx/mozva/mozva.c:230
#15 0x00007f97eb78351d in ff_vaapi_decode_make_slice_buffer
    (avctx=0x7f97e1dfa200, pic=0x7f97e2b88100, params_data=0x7f97e34fdb98, params_size=316, slice_data=0x7f97df403000, slice_size=564105)
    at /raid/src2/media/ffvpx/libavcodec/vaapi_decode.c:102
#16 0x00007f97eb78697c in vaapi_vp9_decode_slice (avctx=0x7f97e1dfa200, buffer=0x7f97df403000 "\204", size=564105) at /raid/src2/media/ffvpx/libavcodec/vaapi_vp9.c:160
#17 0x00007f97eb7e4ef0 in vp9_decode_frame (avctx=0x7f97e1dfa200, frame=0x7f97e3353a00, got_frame=0x7f97e34fdeac, pkt=0x7f97e415c880) at /raid/src2/media/ffvpx/libavcodec/vp9.c:1637
#18 0x00007f97eb709de3 in decode_simple_internal (avctx=0x7f97e1dfa200, frame=0x7f97e3353a00, discarded_samples=0x7f97e34fdef0) at /raid/src2/media/ffvpx/libavcodec/decode.c:329
#19 0x00007f97eb709a5a in decode_simple_receive_frame (avctx=0x7f97e1dfa200, frame=0x7f97e3353a00) at /raid/src2/media/ffvpx/libavcodec/decode.c:530
#20 0x00007f97eb705590 in decode_receive_frame_internal (avctx=0x7f97e1dfa200, frame=0x7f97e3353a00) at /raid/src2/media/ffvpx/libavcodec/decode.c:550
#21 0x00007f97eb705471 in avcodec_send_packet (avctx=0x7f97e1dfa200, avpkt=0x7f97e34fe080) at /raid/src2/media/ffvpx/libavcodec/decode.c:617
#22 0x00007f9810897580 in mozilla::FFmpegVideoDecoder<46465650>::DoDecode(mozilla::MediaRawData*, unsigned char*, int, bool*, nsTArray<RefPtr<mozilla::MediaData> >&)
    (this=0x7f97e1ad3400, aSample=0x7f97dddb74c0, aData=0x7f97e0b03000 "\204", aSize=593464, aGotFrame=0x7f97e34fe277, aResults=mozilla::MediaDataDecoder::DecodedData &)
    at /raid/src2/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:494

vaCreateBuffer() is called without vaDestroyBuffer(). Not sure why it affects ffvpx and not system ffmpeg.

It's because ffvpx is built for libva api < 1.0 so the buffers are not released.

Pushed by stransky@redhat.com:
https://hg.mozilla.org/integration/autoland/rev/70a0af56b87c
Provide more logging to ffmpeg decoder init r=alwu,media-playback-reviewers
https://hg.mozilla.org/integration/autoland/rev/5a706be23b7a
Build bundled ffvpx with VA-API 1.0 support r=alwu
https://hg.mozilla.org/integration/autoland/rev/336a9119e1ee
Add missing VA-API 1.0 function wrappers r=alwu
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Flags: qe-verify+

Hi tgn-ff! I'm not sure if I'm seeing the same issue as you reported in the comment 0 on an affected Nightly build. I have approximately the same results on a fixed build, Beta 98.0b8 under Ubuntu 18.04 x64.

Could you please help us checking if the issue is fixed on your end?

Flags: needinfo?(tgnff242)

I had verified the fix at the time.

However, I too started seeing something similar recently (~4 days), although it's likely a bug somewhere else, since it affects the content process instead of the RDD, and I've encountered it with 1) both ffvpx and ffmpeg, 2) with both VA-API enabled and disabled, and 3) with online and offline videos. It's also very intermittent and I hadn't any luck to reproduce the issue at all when using mozregression. I've tried to use DMD with the nightly build, but the output isn't useful [1]. I'm not a developer, I'm sure how to track it down.

[1]: The memory is allocated at:

    #02: ???[/opt/Software/firefox/firefox-bin +0x2562c]
    #03: ???[/opt/Software/firefox/firefox-bin +0x22e5a]
    #04: malloc[/opt/Software/firefox/firefox-bin +0x6d851]
    #05: ???[/opt/Software/firefox/libxul.so +0x528f509]
    #06: ???[/opt/Software/firefox/libxul.so +0x528eda4]
    #07: ???[/opt/Software/firefox/libxul.so +0x528eda4]
    #08: ???[/opt/Software/firefox/libxul.so +0x528eda4]```
Flags: needinfo?(tgnff242)

The issue I was referring to in comment 18 is new: Bug 1757184.

This bug here is definitely fixed.

That's great! Thank you for taking the time to verify this fix. Closing this as verified fixed per comment 18.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: