Closed Bug 1752282 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::VideoFrameSurfaceVAAPI::ReleaseVAAPIData]

Categories

(Core :: Audio/Video: Playback, defect)

Desktop
Linux
defect

Tracking

()

VERIFIED FIXED
98 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- unaffected
firefox97 --- unaffected
firefox98 + verified

People

(Reporter: pascalc, Assigned: stransky)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/f354b2cb-01e5-4383-b43a-4a0e80220127

Reason: SIGSEGV / SI_KERNEL

Top 10 frames of crashing thread:

0 libxul.so mozilla::VideoFrameSurfaceVAAPI::ReleaseVAAPIData dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp:58
1 libxul.so mozilla::VideoFrameSurfaceVAAPI::~VideoFrameSurfaceVAAPI dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp:73
2 libxul.so mozilla::VideoFrameSurfaceVAAPI::~VideoFrameSurfaceVAAPI dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp:69
3 libxul.so mozilla::VideoFramePool::GetVideoFrameSurface dom/media/platforms/ffmpeg/FFmpegVideoFramePool.cpp:142
4 libxul.so mozilla::FFmpegVideoDecoder<58>::CreateImageVAAPI dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:1126
5 libxul.so mozilla::FFmpegVideoDecoder<58>::DoDecode dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:861
6 libxul.so mozilla::FFmpegDataDecoder<58>::DoDecode dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp:192
7 libxul.so mozilla::FFmpegDataDecoder<58>::ProcessDecode dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp:146
8 libxul.so mozilla::detail::ProxyRunnable<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true> >  xpcom/threads/MozPromise.h:1536
9 libxul.so mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:206

Bug 1751710 seems related to this crash signature and landed in this build, Martin, Emilio can you confirm? Thanks

Flags: needinfo?(stransky)
Flags: needinfo?(emilio)
Regressed by: 1751710

Will look at it, Thanks.

Flags: needinfo?(stransky)
Flags: needinfo?(emilio)

I think that comes from mixed dmabuf/vaapi surfaces and will be fixed by Bug 1752097.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE
Has Regression Range: --- → yes

This is actually something different, can reproduce it.

Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Regressed by: 1724385
No longer regressed by: 1751710
  • Right now we mark VideoFrameSurface as used in VideoFrameSurface constructor (for newly created surfaces) and in
    GetFreeVideoFrameSurface() for recycled ones.

    In this patch we remove them and mark it as used in VideoFramePool::GetVideoFrameSurface() for both cases
    when VideoFrameSurface is really used.

  • Call av_buffer_unref() only if VideoFrameSurface is locked, i.e. we have valid mAVHWFramesContext/mHWAVBuffer.

Assignee: nobody → stransky
Depends on: 1752097

This bug (crash) happens when we fail to create EGLImage over decoded dmabuf memory (right now due to RDD sandbox - Bug 1751363) and we release VideoFrameSurface which does not have set mLib/mAVHWFramesContext/mHWAVBuffer (i.e. it's not locked).

If we can't land a fix today, we should backout the regressor as the number of affected install is very high for Nightly.

I'm hitting this, I have thousands of crashes in about:crashes. Whenever I hit this, Firefox slows down so much that I can't use it.
I imagine there is a retry mechanism to ensure the RDD process starts at some point, perhaps we should have some kind of await mechanism between the retries.

The crashes you see is a combination of Bug 1746232 (RDD is restarted too fast) and Bug 1751363 and Bug 1751709.
VA-API is broken right now due to Bug 1751363 and Bug 1751709 (sandbox issues).

Also this is an experimental feature and it's disabled by default - users need to flip prefs at about:config to enable it.

I mean, when we revert Bug 1724385 we'll just crash with different stacktrace - but we will still crash due to RDD sandbox.

Also Bug 1752493 may help here although I don't know how to implement that yet.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #11)

I mean, when we revert Bug 1724385 we'll just crash with different stacktrace - but we will still crash due to RDD sandbox.

There must be something that caused this to be more frequent than before, as I was not hitting it a couple of days ago.

I've run mozregression using:

mozregression --good 2022-01-24 --bad 2022-01-28 --pref media.ffmpeg.vaapi.enabled:true -a https://tekeye.uk/html/html5-video-test-page

The first video on the page causes the RDD process to crash.

Last good revision: 5c02a8fc256d00be53d63de9d8e299ababdc78a7
First bad revision: e8d0c3d85c48bc793ee27bcc9fbd3bcde819781f
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5c02a8fc256d00be53d63de9d8e299ababdc78a7&tochange=e8d0c3d85c48bc793ee27bcc9fbd3bcde819781f

Regressed by bug 1752014.

Regressed by: 1752014

Gnome Wayland, Debian Testing, Intel
I've got the same range:

history:

  1. good: video uses vaapi

  1. video plays, but without vaapi
    caused by:
    mozregression --good 2022-01-10 --bad 5c02a8fc256d00be53d63de9d8e299ababdc78a7 --pref media.ffmpeg.vaapi.enabled:true -a https://bug1619882.bmoattachments.org/attachment.cgi?id=9149605 -B debug -P stdout

8:40.34 INFO: Last good revision: 5834c469b533e08bbd08a8ffbaad1388e5d61344
8:40.34 INFO: First bad revision: c3d14f2b234a7494edf3231113af52f5fc2d2355
8:40.34 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5834c469b533e08bbd08a8ffbaad1388e5d61344&tochange=c3d14f2b234a7494edf3231113af52f5fc2d2355

c3d14f2b234a7494edf3231113af52f5fc2d2355 stransky — Bug 1724385 [Linux] Try to create EGLImage over decoded VA-API video frame r=alwu,emilio,media-playback-reviewers


  1. this bug: RDD process crashes multiple times
    caused by:
    mozregression --good 2022-01-10 --bad 2022-01-27 --pref media.ffmpeg.vaapi.enabled:true -a https://bug1619882.bmoattachments.org/attachment.cgi?id=9149605 -B debug -P stdout

11:47.31 INFO: Last good revision: 5c02a8fc256d00be53d63de9d8e299ababdc78a7
11:47.31 INFO: First bad revision: e8d0c3d85c48bc793ee27bcc9fbd3bcde819781f
11:47.31 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5c02a8fc256d00be53d63de9d8e299ababdc78a7&tochange=e8d0c3d85c48bc793ee27bcc9fbd3bcde819781f

e8d0c3d85c48bc793ee27bcc9fbd3bcde819781f Emilio Cobos Álvarez — Bug 1752014 - Cleanup VideoFramePool::GetFreeVideoFrameSurface. r=stransky,media-playback-reviewers

Summary

  • The sandbox violation breaks VAAPI, it doesn't seem to crash for me, but for stransky,
    it seems to be intentional to force bug 1751363 + bug 1751709 become fixed.
    A GL context seems to be required in the RDD process (but blocked by the sandbox)

    • to display the hardware decoded video frame as bookmark preview image: bug 1751363
    • to check if VAAPI decoded image can be successfully imported: bug 1724385
      • Its fix seems to cause the sandbox violation in any case now because it requires a GL context in RDD.
  • bug 1752014 caused crashing of the broken VAAPI (this bug), but wouldn't/shouldn't this be fixed by comment 5 + bug 1752097 to reduce complexity (because there is bug 1713276 for better sw decoding performance now)?

Attachment #9261042 - Attachment description: Bug 1752282 Mark VideoFrameSurface as used in VideoFramePool::GetVideoFrameSurface() r?alwu → Bug 1752282 [Linux] Mark VideoFrameSurface as used in VideoFramePool::GetVideoFrameSurface() r?alwu
Pushed by stransky@redhat.com:
https://hg.mozilla.org/integration/autoland/rev/d87eb81f8495
[Linux] Mark VideoFrameSurface as used in VideoFramePool::GetVideoFrameSurface() r=alwu,media-playback-reviewers
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Flags: qe-verify+

I was not able to reproduce this crash using the STR from comment 14, on an affected Nightly build (2021-01-27) with Ubuntu 18.04 x64. The first video from the link, did not crash Firefox when playing it.

HI, Jan Alexander! Given the fact that you were able to find the regression range, could you please help us verifying this bug on Beta 98?

Flags: needinfo?(jan.steffens)

2022-01-28 still crashes. 98.0b7 works fine for me.

Flags: needinfo?(jan.steffens)

(In reply to Jan Alexander Steffens [:heftig] from comment #21)

2022-01-28 still crashes. 98.0b7 works fine for me.

Great! Thanks for confirming. Marking this as verified fixed.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: