Closed Bug 1762725 Opened 4 years ago Closed 3 years ago

Crash in [@ VideoFrameSurface::ReleaseVAAPIData]

Categories

(Core :: Audio/Video: Playback, defect, P3)

Desktop
Linux
defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox99 --- unaffected
firefox100 --- wontfix
firefox101 --- fixed
firefox102 --- fixed

People

(Reporter: toadking, Assigned: stransky)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files, 2 obsolete files)

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/6991c31e-0d0f-4128-8ed6-d9c440220402

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0 libavutil.so.57 av_log 
1 libavutil.so.57 av_buffer_pool_uninit 
2 libavutil.so.57 av_buffer_unref 
3 libxul.so mozilla::VideoFrameSurface<59>::ReleaseVAAPIData dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp:59
4 libxul.so mozilla::VideoFramePool<59>::ReleaseUnusedVAAPIFrames dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp:90
5 libxul.so mozilla::FFmpegVideoDecoder<59>::DoDecode dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:815
6 libxul.so mozilla::FFmpegDataDecoder<59>::DoDecode dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp:192
7 libxul.so mozilla::FFmpegDataDecoder<59>::ProcessDecode dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp:146
8 libxul.so mozilla::detail::ProxyRunnable<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true> >  xpcom/threads/MozPromise.h:1538
9 libxul.so mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:196

Another possible FFMPEG 5.0 crash like Bug 1759137. Got it a couple times when trying to seek through a VOD on Twitch, but it eventually stopped happening.

Thanks for the report! The stats against this signature show a this starting recently and a steady increase. Looks like it could be something introduced late in the Fx 100 cycle that is now in beta.

Matrin, are you aware of any recent changes that could cause this?

Severity: -- → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(stransky)
Keywords: crash
Priority: -- → P3

Can you run Firefox with MOZ_LOG="PlatformDecoderModule:5" env variable, make ti crash and attach the log here?
Thanks.

Flags: needinfo?(stransky) → needinfo?(toadking)
Summary: Crash in [@ av_log] → [FFMPEG 5.0] Crash in [@ av_log]

Attached the trace with that logging enabled. I browsed through some videos on Twitter. The first few worked, then I tried to seek one of the videos and it crashes.

Crash: https://crash-stats.mozilla.org/report/index/36e15db5-05d3-4f61-8639-a81600220409

Build 20220408094506, Linux 64-bit.

Flags: needinfo?(toadking)

That's NVDEC so NVIDIA closed source drivers, right?

Please attach the log with

MOZ_LOG="PlatformDecoderModule:5, Dmabuf:5"

env variables.
Thanks.

Flags: needinfo?(toadking)
Attachment #9271571 - Attachment is obsolete: true
Flags: needinfo?(toadking)

Do I understand correctly that RDD VA-API process crashed and video decoding was restarted without VA-API?

Flags: needinfo?(toadking)
Assignee: nobody → stransky
Status: NEW → ASSIGNED
Attachment #9273222 - Attachment description: Bug 1762725 [Linux] ReleaseVAAPIData(): Clear mLib to avoid doube free of mHWAVBuffer/mAVHWDeviceContext r?alwu → Bug 1762725 [Linux] ReleaseVAAPIData(): Clear mLib to avoid possible doube free of mHWAVBuffer/mAVHWDeviceContext r?alwu

I don't expect the patches here will fix that but we need more logs for it.

Summary: [FFMPEG 5.0] Crash in [@ av_log] → [FFMPEG 5.0][NVIDIA] Crash in [@ av_log]

(In reply to Martin Stránský [:stransky] (ni? me) from comment #7)

Do I understand correctly that RDD VA-API process crashed and video decoding was restarted without VA-API?

That appears to be the case. I don't know how to explicitly check if video decoding is done with or without VAAPI but I can't get it to crash on a video again after seeking until I refresh the page.

I'm also not sure if this bug is NVIDIA only since the crash signature seems to effect NVIDIA, Intel, and AMD graphics adapters.

Flags: needinfo?(toadking)

Definitely not NVIDIA-only. I've had this crash happen on an Intel-only system (e.g. bp-5c4f7bfa-6b86-4acb-9354-5d7dc0220415 and bp-2d770fba-61c6-4725-b395-d7a4d0220415).

PS: This and bug 1759596 (av_vlog at the top instead of av_log) are probably the same issue.

Summary: [FFMPEG 5.0][NVIDIA] Crash in [@ av_log] → [FFMPEG 5.0] Crash in [@ av_log]
Crash Signature: [@ av_log] → [@ av_log] [@ av_vlog]
Summary: [FFMPEG 5.0] Crash in [@ av_log] → [FFMPEG 5.0] Crash in [@ VideoFrameSurface::ReleaseVAAPIData]
Crash Signature: [@ av_log] [@ av_vlog] → [@ av_log] [@ av_vlog] [@ libavutil.so.56@0x271c9] [@ mozalloc_abort | abort | libavcodec.so.59@0x33bcda] [@ mozilla::detail::MutexImpl::lock | mozilla::FFmpegDataDecoder<T>::ProcessDecode] [@ vaapi_buffer_free | buffer_pool_free | buffer_replace] [@ vaa…
Crash Signature: [@ av_log] [@ av_vlog] [@ libavutil.so.56@0x271c9] [@ mozalloc_abort | abort | libavcodec.so.59@0x33bcda] [@ mozilla::detail::MutexImpl::lock | mozilla::FFmpegDataDecoder<T>::ProcessDecode] [@ vaapi_buffer_free | buffer_pool_free | buffer_replace] [@ vaa… → [@ av_log] [@ av_vlog] [@ libavutil.so.56@0x271c9] [@ mozalloc_abort | abort | libavcodec.so.59@0x33bcda] [@ mozilla::detail::MutexImpl::lock | mozilla::FFmpegDataDecoder<T>::ProcessDecode] [@ vaapi_buffer_free | buffer_pool_free | buffer_replace] […
Regressed by: 1750760

Set release status flags based on info from the regressing bug 1750760

Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/c362f710befa [Linux] Add more logs to LockVAAPIData()/ReleaseVAAPIData r=alwu https://hg.mozilla.org/integration/autoland/rev/40fb5163a33e [Linux] ReleaseVAAPIData(): Clear mLib to avoid possible doube free of mHWAVBuffer/mAVHWDeviceContext r=alwu

Can you please test latest nightly and attach MOZ_LOG="PlatformDecoderModule:5" log here?
Thanks.

Flags: needinfo?(toadking)
Has Regression Range: --- → yes

I can reproduce that with ffmpeg 5.0 + h264 vaapi decoder. Happens when video is rewound.

This one reverts Bug 1759137 and switch back to reference hw_frames_ctx. We need that as hw_frames_ctx holds surface pool which can be recreated when h.264 VA-API decoder seeks in video stream.

This bug is caused by patch from Bug 1759137 which was wrong. I reopened Bug 1759137 for further investigation.
While testing on ffmpeg 5.0.1 I can clearly reproduce this one but I can't replicate the crash from Bug 1759137 but let's keep eye on that.

Keywords: leave-open
OS: Unspecified → Linux
Regressed by: 1759137
No longer regressed by: 1750760
Hardware: Unspecified → Desktop

Sorry for the late update. Not sure how useful this will be now but here's another log with the extra logging patches on latest nightly.

Crash: https://crash-stats.mozilla.org/report/index/a76e65cd-4a68-434a-b923-faf0d0220430

Attachment #9273090 - Attachment is obsolete: true
Flags: needinfo?(toadking)

No idea if related, but I see Nightly crashes with FFmpeg 4.4.2: https://bugzilla.mozilla.org/show_bug.cgi?id=1767431.

Set release status flags based on info from the regressing bug 1759137

Can this bug report title be updated, and the FFmpeg version [5.0] be removed as it is also happening with 4.4.2?

Summary: [FFMPEG 5.0] Crash in [@ VideoFrameSurface::ReleaseVAAPIData] → Crash in [@ VideoFrameSurface::ReleaseVAAPIData]
Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/32ba4b5dfd3e [VA-API] Keep AVHWFrameContext instead of AVHWDeviceContext r=alwu

I do not have two-factor authentication enabled, so can’t comment at Phabricator. So, here:

This one reverts Bug 1759137 and switch back to reference hw_frames_ctx. We need that as hw_frames_ctx holds surface pool which can be recreated when h.264 VA-API decoder seeks in video stream.

s/switch/switches/

Crash Signature: [@ av_log] [@ av_vlog] [@ libavutil.so.56@0x271c9] [@ mozalloc_abort | abort | libavcodec.so.59@0x33bcda] [@ mozilla::detail::MutexImpl::lock | mozilla::FFmpegDataDecoder<T>::ProcessDecode] [@ vaapi_buffer_free | buffer_pool_free | → [@ av_log] [@ av_vlog] [@ av_opt_get ] [@ libavutil.so.56@0x271c9] [@ mozalloc_abort | abort | libavcodec.so.59@0x33bcda] [@ mozilla::detail::MutexImpl::lock | mozilla::FFmpegDataDecoder<T>::ProcessDecode] [@ vaapi_buffer_free | buffer_pool_free |
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch

Comment on attachment 9274447 [details]
Bug 1762725 [VA-API] Keep AVHWFrameContext instead of AVHWDeviceContext r?alwu

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes in VA-API video playback on Linux
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Reverts to previous state - backout of Bug 1759137 which was wrong.
    Also disabled by default but it's widely used so it will reduce BZ noise.
  • String changes made/needed: none
  • Is Android affected?: No
Attachment #9274447 - Flags: approval-mozilla-beta?
Attachment #9273221 - Flags: approval-mozilla-beta?
Attachment #9273222 - Flags: approval-mozilla-beta?

I can confirm, it’s fixed for me in 102.0a1, 20220505185614.

(In reply to abonnements from comment #32)

Maybe there are separate issues, for me it's still crashing in 102.0a1 (2022-05-05) (64-bit)
bp-023fce89-c2b3-4db0-953f-5023d0220506
bp-33f3ce2e-e26e-4c62-bd45-baba10220506
bp-76134918-de43-4dcf-92c3-4cf9f0220506
bp-956e87ce-8660-4a96-88cd-ebb860220506
bp-a68ced14-382e-4379-9ba3-d65d10220506

That's something different.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #33)

That's something different.

Why was it marked as a duplicate then? https://bugzilla.mozilla.org/show_bug.cgi?id=1759596

(In reply to abonnements from comment #34)

(In reply to Martin Stránský [:stransky] (ni? me) from comment #33)

That's something different.

Why was it marked as a duplicate then? https://bugzilla.mozilla.org/show_bug.cgi?id=1759596

That may be a mistake. But that bug contains various comments/signatures. Please file a new one for that. The

Assertion src->f->buf[0] failed at libavcodec/h264_picture.c:105

may be valid here.

Comment on attachment 9273221 [details]
Bug 1762725 [Linux] Add more logs to LockVAAPIData()/ReleaseVAAPIData r?alwu

The first two patches landed on 101 prior to the Nightly->Beta merge.

Attachment #9273221 - Flags: approval-mozilla-beta?
Attachment #9273222 - Flags: approval-mozilla-beta?

Comment on attachment 9274447 [details]
Bug 1762725 [VA-API] Keep AVHWFrameContext instead of AVHWDeviceContext r?alwu

Approved for 101.0b4.

Attachment #9274447 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

([@ mozalloc_abort | abort | libavcodec.so.59@0x33bcda] is bug 1759596.)

Crash Signature: [@ av_log] [@ av_vlog] [@ av_opt_get ] [@ libavutil.so.56@0x271c9] [@ mozalloc_abort | abort | libavcodec.so.59@0x33bcda] [@ mozilla::detail::MutexImpl::lock | mozilla::FFmpegDataDecoder<T>::ProcessDecode] [@ vaapi_buffer_free | buffer_pool_free | b… → [@ av_log] [@ av_vlog] [@ av_opt_get ] [@ libavutil.so.56@0x271c9] [@ mozilla::detail::MutexImpl::lock | mozilla::FFmpegDataDecoder<T>::ProcessDecode] [@ vaapi_buffer_free | buffer_pool_free | buffer_replace] [@ vaapi_buffer_free | pool_release_buff…
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: