Crash in [@ __driDriverGetExtensions_d3d12 ]
Categories
(Core :: Graphics: CanvasWebGL, defect)
Tracking
()
People
(Reporter: yannis, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: topcrash)
Crash Data
This main process crash has started spiking with 118.0 release, coming from Ubuntu and Linux Mint users. Almost all crashes seem to be with the same version of libgallium_dri.so, which appears in stack as libgallium_dri.so/e4404ad02a06448d4ff6f23d4fe768210
. Also the same adapter driver version 23.0.4.0
. Looking at a few crashes these look like snap builds but I'm not sure how to confirm that they all are.
Example crash report: here
Crashing stack of the CanvasRenderer thread:
0 libgallium_dri!__driDriverGetExtensions_d3d12
1 libgallium_dri!__driDriverGetExtensions_d3d12
2 libgallium_dri!__driDriverGetExtensions_d3d12
3 libxul!mozilla::gl::GLContext::raw_fReadPixels(int, int, int, int, unsigned int, unsigned int, void*) @ /build/firefox/parts/firefox/build/gfx/gl/GLContext.h:1561
4 libxul!mozilla::WebGLContext::DoReadPixelsAndConvert(mozilla::webgl::FormatInfo const*, mozilla::webgl::ReadPixelsDesc const&, unsigned long, unsigned long, unsigned int) @ /build/firefox/parts/firefox/build/dom/canvas/WebGLContextGL.cpp:0
5 libxul!mozilla::WebGLContext::ReadPixelsImpl(mozilla::webgl::ReadPixelsDesc const&, unsigned long, unsigned long) @ /build/firefox/parts/firefox/build/dom/canvas/WebGLContextGL.cpp:0
6 libxul!mozilla::WebGLContext::ReadPixelsInto(mozilla::webgl::ReadPixelsDesc const&, mozilla::Range<unsigned char> const&) @ /build/firefox/parts/firefox/build/dom/canvas/WebGLContextGL.cpp:920
7 libxul!mozilla::HostWebGLContext::ReadPixelsInto(mozilla::webgl::ReadPixelsDesc const&, mozilla::Range<unsigned char> const&) const @ /build/firefox/parts/firefox/build/dom/canvas/HostWebGLContext.h:658
7 libxul!mozilla::dom::WebGLParent::RecvReadPixels(mozilla::webgl::ReadPixelsDesc const&, mozilla::dom::ReadPixelsBuffer&&, mozilla::webgl::ReadPixelsResultIpc*) @ /build/firefox/parts/firefox/build/dom/canvas/WebGLParent.cpp:184
8 libxul!mozilla::dom::PWebGLParent::OnMessageReceived(IPC::Message const&, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >&) @ s3:gecko-generated-sources-l1:2a248686e6e0592b8e6aeed598ab60f1e5d153bd90beb378a1562e772b556e4d8616c947012c792440631bb6a3dbad1d1e41fc97871d765fdf918ca9e7423973/ipc/ipdl/PWebGLParent.cpp::637
9 libxul!mozilla::gfx::PCanvasManagerParent::OnMessageReceived(IPC::Message const&, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >&) @ s3:gecko-generated-sources-l1:f6791f264875ee31de704005f3dbb0c82e497cfa447fbba9d259d8f7eb70227ef7e57bfc03642158a030baedd5ec025b42d4a74aa33034e3a461f2fcaef5e1f3/ipc/ipdl/PCanvasManagerParent.cpp::383
a libxul!mozilla::ipc::MessageChannel::DispatchSyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >&) @ /build/firefox/parts/firefox/build/ipc/glue/MessageChannel.cpp:1778
The stack above is the most common, but by aggregating over proto signatures we can see some variety in the callers of __driDriverGetExtensions_d3d12
, e.g. mozilla::gl::GLContext::fTexSubImage2D
, mozilla::gl::GLContext::fClientWaitSync
, or mozilla::gl::GLContext::fBufferSubData
, webrender::device::gl::Device::reset_state
, mozilla::gl::GLContext::raw_fDrawElements
, etc.
Comment 1•2 years ago
|
||
aggregating the signature on distribution id
reveals not just canonical-002
but also mozilla
and canonical
so I would suspect this is not limited to snap. Mesa upstream bug?
Comment 2•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Linux on release
For more information, please visit BugBot documentation.
Comment 3•2 years ago
|
||
I'm looking for better symbols but I have a feeling that I know what this crash is about.
Comment 4•2 years ago
|
||
I have reprocessed the crash in comment 0 with the new symbols and this is bug 1850271.
Updated•2 years ago
|
Comment 6•2 years ago
|
||
This is not a dup of 1850271. Different signature.
Comment 7•2 years ago
|
||
(In reply to Lee Salzman [:lsalzman] from comment #6)
This is not a dup of 1850271. Different signature.
see comment 4 and reprocessed crash https://crash-stats.mozilla.org/report/index/2206fd66-0ec8-4e08-a716-ac25b0231003
Updated•2 years ago
|
Comment 8•2 years ago
|
||
Some of the crashes may match the reprocessed one where it does seem to match bug 1850271, but then several other __driDriverGetExtensions_d3d12 crashes are from completely different causes than that as well, i.e. https://crash-stats.mozilla.org/report/index/ec68aa94-aede-4fb6-8f64-b406e0231003
This crash signature here is too generic that I would prefer we don't lump in the two different crash signatures because then we're combining many different bugs with different causes which will make it harder to diagnose them, rather than easier.
Updated•2 years ago
|
Comment 9•2 years ago
|
||
(In reply to Lee Salzman [:lsalzman] from comment #8)
This crash signature here is too generic that I would prefer we don't lump in the two different crash signatures because then we're combining many different bugs with different causes which will make it harder to diagnose them, rather than easier.
I agree, let's wait for the volume to go down. In case there's more missing symbols that would help breaking away more signatures from this generic one I'll look it up. In the meantime I have identified the issue that caused us to miss the debug information for this build (and thus the detailed function names) but I haven't fixed it yet.
Reporter | ||
Comment 10•2 years ago
•
|
||
The crashes we receive here are likely no-symbol versions of various known bugs then: bug 1850271 (raw_fReadPixels
), bug 1855911 (fTexSubImage2D
), bug 1852794 (fClientWaitSync
), bug 1855688 (fBufferSubData
), bug 1817816 (reset_state
), bug 1855686 (raw_fDrawElements
)...
Reporter | ||
Updated•2 years ago
|
Comment 11•2 years ago
|
||
FYI I've found all the missing symbols for this signature, it should disappear starting tomorrow. I cannot reprocess old crashes because of an issue we're facing with reprocessing unfortunately.
Updated•2 years ago
|
Reporter | ||
Comment 12•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #11)
FYI I've found all the missing symbols for this signature, it should disappear starting tomorrow.
The crash reports have successfully moved to the distinct bugs mentioned in comment 10 (and maybe a few others I missed), so follow-up work should happen there. There has been a significant volume bump in all of them, which, although expected as a consequence of this bug, looks rather concerning.
Comment 13•2 years ago
|
||
We're having another spike and it's being caused by this job which run a few days ago. Looking through the log it seems we're finding all the debuginfo files (debuginfod is filling some missing ones). If we check the lines where the affected symbol is being emitted, we can see that the debug information is present. However, in the output we only have the PUBLIC
symbols and the CFI information, no lines, inlined functions, non-public functions, etc... This tells me that the debug information file is present but likely empty. Alexandre can you double-check next week?
Comment 14•2 years ago
|
||
I downloaded the following two files:
- https://launchpad.net/~desktop-snappers/+snap/gnome-42-2204-sdk/+build/2275597/+files/gnome-42-2204-sdk_0+git.3741e59_amd64.debug
- https://launchpad.net/~desktop-snappers/+snap/gnome-42-2204-sdk/+build/2275597/+files/gnome-42-2204-sdk_0+git.3741e59_amd64.snap
I've confirmed that the debug information is indeed empty... but we can get proper debug information from debuginfod. I'll open a separate bug to work around this issue and I'll reprocess the affected files tomorrow morning.
Updated•2 years ago
|
Comment 15•2 years ago
|
||
I've rescraped the affected files, the volume here should go down again, let's keep this closed.
Description
•