Crash in [@ DMABufSurfaceYUV::UpdateYUVData]
Categories
(Core :: Widget: Gtk, defect)
Tracking
()
People
(Reporter: sefeng, Assigned: stransky)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(5 files)
Crash report: https://crash-stats.mozilla.org/report/index/60545573-c384-4685-8c4a-8cb200200825
Top 10 frames of crashing thread:
0 libgallium_dri.so nouveau_drm_screen_create
1 libgallium_dri.so nouveau_drm_screen_create
2 libgallium_dri.so nouveau_drm_screen_create
3 libgallium_dri.so __driDriverGetExtensions_zink
4 libgallium_dri.so libgallium_dri.so@0x1287d8
5 libxul.so DMABufSurfaceYUV::UpdateYUVData widget/gtk/DMABufSurface.cpp:867
6 libxul.so DMABufSurfaceYUV::CreateYUVSurface widget/gtk/DMABufSurface.cpp:734
7 libxul.so mozilla::FFmpegVideoDecoder<58>::CreateImageDMABuf dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:776
8 libxul.so mozilla::FFmpegVideoDecoder<58>::DoDecode dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:504
9 libxul.so mozilla::FFmpegDataDecoder<58>::DoDecode dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp:181
This seems related to our Wayland support, not sure if there are actionable since the crashes happen in Nouveau driver.
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Interesting that I hit the "nouveau" code-path on intel, but I'm seeing this crash on Amazon Chimes during video calling.
bpcde98884-be8b-4362-9d18-1a5a80200904
I have similar SIGSEGV backtrace with radeonsi and X11:
#0 0x00007fc107a82267 in u_transfer_unmap_vtbl () at /usr/lib64/dri/radeonsi_dri.so
#1 0x00007fc107a82e6f in _tc_sync.constprop.0 () at /usr/lib64/dri/radeonsi_dri.so
#2 0x00007fc107a873e3 in tc_flush () at /usr/lib64/dri/radeonsi_dri.so
#3 0x00007fc1070f18c9 in st_context_flush () at /usr/lib64/dri/radeonsi_dri.so
#4 0x00007fc1070e67f9 in dri_flush () at /usr/lib64/dri/radeonsi_dri.so
#5 0x00007fc110817dee in DMABufSurfaceYUV::UpdateYUVData(void**, int*) () at /usr/lib64/firefox/libxul.so
#6 0x00007fc110817f4d in DMABufSurfaceYUV::CreateYUVSurface(int, int, void**, int*) () at /usr/lib64/firefox/libxul.so
#7 0x00007fc110509965 in mozilla::FFmpegVideoDecoder<58>::CreateImageDMABuf(long, long, long, nsTArray<RefPtr<mozilla::MediaData> >&) () at /usr/lib64/firefox/libxul.so
#8 0x00007fc11050a010 in mozilla::FFmpegVideoDecoder<58>::DoDecode(mozilla::MediaRawData*, unsigned char*, int, bool*, nsTArray<RefPtr<mozilla::MediaData> >&) () at /usr/lib64/firefox/libxul.so
#9 0x00007fc110508881 in mozilla::FFmpegDataDecoder<58>::DoDecode(mozilla::MediaRawData*, bool*, nsTArray<RefPtr<mozilla::MediaData> >&) () at /usr/lib64/firefox/libxul.so
#10 0x00007fc11050b1c2 in mozilla::FFmpegDataDecoder<58>::ProcessDecode(mozilla::MediaRawData*) () at /usr/lib64/firefox/libxul.so
#11 0x00007fc111ee7d97 in mozilla::detail::ProxyRunnable<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true> > (mozilla::FFmpegDataDecoder<58>::*)(mozilla::MediaRawData*), mozilla::FFmpegDataDecoder<58>, mozilla::MediaRawData*>::Run() () at /usr/lib64/firefox/libxul.so
#12 0x00007fc111736db2 in mozilla::TaskQueue::Runner::Run() () at /usr/lib64/firefox/libxul.so
#13 0x00007fc1117369a0 in nsThreadPool::Run() () at /usr/lib64/firefox/libxul.so
#14 0x00007fc111489dd7 in nsThread::ProcessNextEvent(bool, bool*) () at /usr/lib64/firefox/libxul.so
#15 0x00007fc111489b90 in NS_ProcessNextEvent(nsIThread*, bool) () at /usr/lib64/firefox/libxul.so
#16 0x00007fc1114a1e3e in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) () at /usr/lib64/firefox/libxul.so
#17 0x00007fc1118a6d05 in MessageLoop::Run() () at /usr/lib64/firefox/libxul.so
#18 0x00007fc111735a57 in nsThread::ThreadFunc(void*) () at /usr/lib64/firefox/libxul.so
#19 0x00007fc1186f5150 in _pt_root () at /lib64/libnspr4.so
#20 0x00007fc118c3d3f9 in start_thread () at /lib64/libpthread.so.0
#21 0x00007fc118818903 in clone () at /lib64/libc.so.6
Assignee | ||
Comment 3•4 years ago
|
||
This can happens when more than one GPU is installed on system (integrated+dedicated for instance) and we use a wrong GPU.
It may also happens when incorrect data are uploaded to GPU textures when VA-API does not support particular format (say VP8/9) and we do SW decoding by ffmpeg and then upload frames to GPU (dmabuf).
It may help to attach terminal output of:
lspci | grep "VGA"
vainfo --display drm --device /dev/dri/renderD128
vainfo --display drm --device /dev/dri/renderD129
and run Firefox with
MOZ_LOG="PlatformDecoderModule:5"
and attach the log here. I may provide more logging to dmabuf module when we have more similar issues.
Reporter | ||
Comment 4•4 years ago
|
||
Not sure why I am needinfo'ed. I guess you want me to try those commands, Martin?
However, I've never hit this crash. I filed this bug because I was analyzing crash reports.
I have only one gpu, so it is probably the second case.
It seems to crash frequently when there is several (more than two) VP8 videos on the same page.
I can reproduce this easily with this HTML file:
<!DOCTYPE html>
<html>
<body>
<video autoplay muted>
<source src=http://techslides.com/demos/sample-videos/small.webm type=video/webm>
</video>
<video autoplay muted>
<source src=http://techslides.com/demos/sample-videos/small.webm type=video/webm>
</video>
<video autoplay muted>
<source src=http://techslides.com/demos/sample-videos/small.webm type=video/webm>
</video>
</body>
</html>
lspci:
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
vainfo:
libva info: VA-API version 1.9.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_9
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.9 (libva 2.9.0)
vainfo: Driver version: Mesa Gallium driver 20.2.2 for Radeon RX 580 Series (POLARIS10, DRM 3.39.0, 5.9.8-200.fc33.x86_64, LLVM 11.0.0)
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
Firefox 83 terminal output with MOZ_LOG="PlatformDecoderModule:5" when page crashed.
Assignee | ||
Comment 7•4 years ago
|
||
Adding to my TODO list, thanks for the reproducer.
Assignee | ||
Comment 9•4 years ago
|
||
I can reproduce it now on radeon/gallium driver. When running on debug build it complains about double-free.
Comment hidden (obsolete) |
Assignee | ||
Comment 11•4 years ago
|
||
Hm, looks like multi-threading issue after all, there are significat parts of the backtraces here:
[Switching to thread 175 (Thread 0x7f17160fe640 (LWP 180491))]
#5 0x00007f179c22f1e0 in <signal handler called> () at /lib64/libpthread.so.0
#6 0x000055f70fc51bad in arena_run_reg_dalloc(arena_run_t*, arena_bin_t*, void*, unsigned long) (run=0x7f172f8d5000, bin=0x7f179bb001b8, ptr=0x7f172f8d5bf0, size=80)
at /raid/src/memory/build/mozjemalloc.cpp:2209
#7 0x000055f70fc5150d in arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) (this=0x7f179bb00000, aChunk=0x7f172f800000, aPtr=0x7f172f8d5bf0, aMapElm=0x7f172f801418)
at /raid/src/memory/build/mozjemalloc.cpp:3288
#8 0x000055f70fc51067 in arena_dalloc(void*, unsigned long, arena_t*) (aPtr=0x7f172f8d5bf0, aOffset=875504, aArena=0x0) at /raid/src/memory/build/mozjemalloc.cpp:3372
#9 0x000055f70fc573d5 in BaseAllocator::free(void*) (this=0x7f17160fbd60, aPtr=0x7f172f8d5bf0) at /raid/src/memory/build/mozjemalloc.cpp:4137
#10 0x000055f70fc53bc5 in Allocator<MozJemallocBase>::free(void*) (arg1=0x7f172f8d5bf0) at /raid/src/memory/build/malloc_decls.h:54
#11 0x000055f70fc8cca6 in PageFree(mozilla::Maybe<unsigned long> const&, void*) (aArenaId=..., aPtr=0x7f172f8d5bf0) at /raid/src/memory/replace/phc/PHC.cpp:1281
#12 0x000055f70fc8d426 in replace_free(void*) (aPtr=0x7f172f8d5bf0) at /raid/src/memory/replace/phc/PHC.cpp:1317
#13 0x000055f70fc47d57 in Allocator<ReplaceMallocBase>::free(void*) (arg1=0x7f172f8d5bf0) at /raid/src/memory/build/malloc_decls.h:54
#14 0x000055f70fc47cf5 in free(void*) (arg1=0x7f172f8d5bf0) at /raid/src/memory/build/malloc_decls.h:54
#15 0x00007f176a905783 in tc_batch_execute (job=job@entry=0x7f17148025a0, thread_index=thread_index@entry=0) at ../src/gallium/auxiliary/util/u_threaded_context.c:163
#16 0x00007f176a905b49 in _tc_sync (tc=tc@entry=0x7f1714802000, func=<optimized out>, info=<optimized out>) at ../src/gallium/auxiliary/util/u_threaded_context.c:277
#17 0x00007f176a90618e in tc_transfer_map (_pipe=0x7f1714802000, resource=0x7f16fc613800, level=0, usage=2, box=0x7f17160fc800, transfer=<optimized out>)
at ../src/gallium/auxiliary/util/u_threaded_context.c:1589
#18 0x00007f1769e9cecc in pipe_transfer_map (transfer=0x7f17160fc7f8, h=160, w=<optimized out>, y=0, x=0, access=2, layer=0, level=0, resource=<optimized out>, context=<optimized out>)
at ../src/gallium/auxiliary/util/u_inlines.h:486
#19 dri2_map_image (context=<optimized out>, image=0x7f1711fac880, x0=0, y0=0, width=280, height=160, flags=2, stride=0x7f16f998a8ec, data=0x7f16f998a8d0)
at ../src/gallium/frontends/dri/dri2.c:1661
#20 0x00007f1770c1aab3 in gbm_dri_bo_map (_bo=0x7f1711fac0b0, x=0, y=0, width=280, height=<optimized out>, flags=2, stride=0x7f16f998a8ec, map_data=0x7f16f998a8d0)
at ../src/gbm/backends/dri/gbm_dri.c:1231
#21 0x00007f178f84b6eb in mozilla::widget::nsGbmLib::Map(gbm_bo*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int*, void**)
(bo=0x7f1711fac0b0, x=0, y=0, width=280, height=160, flags=2, stride=0x7f16f998a8ec, map_data=0x7f16f998a8d0) at /raid/src/widget/gtk/DMABufLibWrapper.h:71
(gdb) thread 176
#7 0x00007f179c22f1e0 in <signal handler called> () at /lib64/libpthread.so.0
#8 0x00007f176a8fee77 in u_transfer_unmap_vtbl (pipe=0x7f17149ba000, transfer=0x7f171489b7e0) at ../src/gallium/auxiliary/util/u_transfer.c:158
#9 0x00007f176a905783 in tc_batch_execute (job=job@entry=0x7f17148025a0, thread_index=thread_index@entry=0) at ../src/gallium/auxiliary/util/u_threaded_context.c:163
#10 0x00007f176a905b49 in _tc_sync (tc=tc@entry=0x7f1714802000, func=<optimized out>, info=<optimized out>) at ../src/gallium/auxiliary/util/u_threaded_context.c:277
#11 0x00007f176a905d08 in tc_flush (_pipe=0x7f1714802000, fence=0x0, flags=1) at ../src/gallium/auxiliary/util/u_threaded_context.c:2188
#12 0x00007f1769eacd7d in st_context_flush (stctxi=0x7f17148b1000, flags=3, fence=0x0, before_flush_cb=0x0, args=0x7f17153968d0) at ../src/mesa/state_tracker/st_manager.c:674
#13 0x00007f1769ea18e1 in dri_flush (cPriv=<optimized out>, dPriv=<optimized out>, flags=<optimized out>, reason=<optimized out>) at ../src/gallium/frontends/dri/dri_drawable.c:536
#14 0x00007f178f84b911 in mozilla::widget::nsGbmLib::Unmap(gbm_bo*, void*) (bo=0x7f1714aedfb0, map_data=0x7f171489b4c0) at /raid/src/widget/gtk/DMABufLibWrapper.h:73
#15 0x00007f178f84833a in DMABufSurface::Unmap(int) (this=0x7f1714add430, aPlane=0) at /raid/src/widget/gtk/DMABufSurface.cpp:612
Assignee | ||
Comment 12•4 years ago
|
||
When ffmpeg decodes video to dmabuf surfaces and dmabuf backed fails to allocate one (for instance we're running out of file descriptors),
we need to disable dmabuf surfaces and restart video decoder to create non-dmabuf ImageHost.
Assignee | ||
Comment 13•4 years ago
|
||
It's possible that DMABufSurface::CreateDMABufSurface() fails, for instance when we're running out of file descriptors. In such case mSurface is null
and we need to check it before we use it.
Also implement DMABUFTextureHostOGL::IsValid() to claim mSurface state.
Depends on D107976
Assignee | ||
Comment 14•4 years ago
|
||
When multiple DMABuf surfaces are used (for instance during video playback) we can run out of free file descriptors.
To avoid such scenario open file DMABuf file descriptors only when it's needed, i.e. when DMABuf objects are mapped to user
space, mapped as EGLImages or shared with another processes.
-
Implement OpenFileDescriptors()/CloseFileDescriptors() methods to provide such functionality and also
OpenFileDescriptorForPlane() / CloseFileDescriptorForPlane() for particular planes. -
Use mutex to protect parts where file descriptors are used.
-
Make functions which use file decriptors fails-safe, i.e. return error code when we can't get file descriptor for DMABuf object
and propagate it.
Depends on D107977
Assignee | ||
Comment 15•4 years ago
|
||
See https://gitlab.freedesktop.org/mesa/mesa/-/issues/4422 for details. Mesa/radeon can't handle multiple map/unmap operations so protect it by lock.
Depends on D107978
Comment 16•4 years ago
|
||
Comment 17•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a96a4ccad4aa
https://hg.mozilla.org/mozilla-central/rev/b19b750f1ecf
https://hg.mozilla.org/mozilla-central/rev/c063ca73efb0
https://hg.mozilla.org/mozilla-central/rev/cbc4cfcb9a3b
Updated•4 years ago
|
Description
•