Segfault in mozilla::DoCompressedTexImage running WebGL Conformance Tests
Categories
(Core :: Graphics, defect, P2)
Tracking
()
People
(Reporter: leonard, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
506.76 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (X11; Linux aarch64; rv:105.0) Gecko/20100101 Firefox/105.0
Steps to reproduce:
Using the build from https://bugzilla.mozilla.org/show_bug.cgi?id=1696691#c8 and Mesa 22.3 development version in release build configuration (including the freedreno driver fixes https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17888) I ran https://registry.khronos.org/webgl/sdk/tests/webgl-conformance-tests.html
Actual results:
Segmentation fault during the negativetextureapi test part of the gles3 testsuite. Note that just running the gles3 testsuite does not trigger the segmentation fault. But running the complete WebGL Conformance Testsuite reliable triggers the segfault at negativetextureapi after around 90 minutes.
Thread 38 "CanvasRenderer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f5323e100 (LWP 21259)]
0x0000007ff0b51e40 in mozilla::DoCompressedTexImage(mozilla::gl::GLContext*, StrongGLenum<TexImageTargetDetails>, int, unsigned int, int, int, int, int, void const*) () from /home/leonard/Downloads/firefox/libxul.so
(gdb) bt
#0 0x0000007ff0b51e40 in mozilla::DoCompressedTexImage(mozilla::gl::GLContext*, StrongGLenum<TexImageTargetDetails>, int, unsigned int, int, int, int, int, void const*) () at /home/leonard/Downloads/firefox/libxul.so
#1 0x0000007ff0b518bc in mozilla::WebGLTexture::CompressedTexImage(bool, unsigned int, unsigned int, unsigned int, mozilla::avec3<unsigned int> const&, mozilla::avec3<unsigned int> const&, mozilla::Range<unsigned char const> const&, unsigned int, mozilla::Maybe<unsigned
long> const&) () at /home/leonard/Downloads/firefox/libxul.so
#2 0x0000007ff0b11b54 in mozilla::WebGLContext::CompressedTexImage(bool, unsigned int, unsigned int, unsigned int, mozilla::avec3<unsigned int>, mozilla::avec3<unsigned int>, mozilla::Range<unsigned char const> const&, unsigned int, mozilla::Maybe<unsigned long> const&)
const () at /home/leonard/Downloads/firefox/libxul.so
#3 0x0000007ff0b382e0 in _ZZN7mozilla16MethodDispatcherINS_21WebGLMethodDispatcherELm75EMNS_16HostWebGLContextEKFvbjjjRKNS_5avec3IjEES6_RKNS_9RawBufferIhEEjRKNS_5MaybeImEEEXadL_ZNKS2_18CompressedTexImageEbjjjS6_S6_SA_jSE_EEE15DispatchCommandIS2_EEbRT_mRNS_5webgl17RangeCo
nsumerViewEENKUlDpRT_E_clIJbjjjS4_S4_S8_jSC_EEEDaSQ_ () at /home/leonard/Downloads/firefox/libxul.so
#4 0x0000007ff0b22dfc in mozilla::dom::WebGLParent::RecvDispatchCommands(mozilla::ipc::Shmem&&, unsigned long) ()
at /home/leonard/Downloads/firefox/libxul.so
#5 0x0000007ff0b73310 in mozilla::dom::PWebGLParent::OnMessageReceived(IPC::Message const&) ()
at /home/leonard/Downloads/firefox/libxul.so
#6 0x0000007fefd1162c in mozilla::gfx::PCanvasManagerParent::OnMessageReceived(IPC::Message const&) ()
at /home/leonard/Downloads/firefox/libxul.so
#7 0x0000007fef7eb2ac in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&)
() at /home/leonard/Downloads/firefox/libxul.so
#8 0x0000007fef7ea3e0 in mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >) () at /home/leonard/Downloads/firefox/libxul.so
#9 0x0000007fef7ea73c in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) () at /home/leonard/Downloads/firefox/libxul.so
#10 0x0000007fef7eacac in mozilla::ipc::MessageChannel::MessageTask::Run() () at /home/leonard/Downloads/firefox/libxul.so
#11 0x0000007fef194e04 in nsThread::ProcessNextEvent(bool, bool*) () at /home/leonard/Downloads/firefox/libxul.so
#12 0x0000007fef198b90 in NS_ProcessNextEvent(nsIThread*, bool) () at /home/leonard/Downloads/firefox/libxul.so
#13 0x0000007fef7ee19c in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ()
at /home/leonard/Downloads/firefox/libxul.so
#14 0x0000007fef7a3264 in MessageLoop::Run() () at /home/leonard/Downloads/firefox/libxul.so
#15 0x0000007fef192524 in nsThread::ThreadFunc(void*) () at /home/leonard/Downloads/firefox/libxul.so
#16 0x0000007ff6fca094 in _pt_root () at /home/leonard/Downloads/firefox/libnspr4.so
#17 0x0000005555618888 in set_alt_signal_stack_and_start(PthreadCreateParams*) ()
#18 0x0000007ff7f79f3c in start_thread (arg=0x0) at pthread_create.c:481
#19 0x0000007ff7bb2cdc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
Expected results:
No segmentation fault.
Reporter | ||
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Thank you for filing. A 90-minute reproduction case is challenging to work with. Andrew, do you have any insight into what might be happening here?
Comment 2•3 years ago
|
||
Reading https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17888#note_1502076
Note this was happening because either ffox or the khr webgl tests are creating a 16k x 16k FBO rgba8 + z24s8.. so that adds up to 2GB of RAM (actually probably more because of UBWC). Which exhausts half of the GPUs address space. Unsurprisingly, as a result some of the tests are hitting problems with OoM.
it sounds to me like this is somewhat expected and acceptable. So IMO we can close this.
Reporter | ||
Comment 3•3 years ago
|
||
Note this was happening because either ffox or the khr webgl tests are creating a 16k x 16k FBO rgba8 + z24s8.. so that adds up to 2GB of RAM (actually probably more because of UBWC). Which exhausts half of the GPUs address space. Unsurprisingly, as a result some of the tests are hitting problems with OoM.
it sounds to me like this is somewhat expected and acceptable. So IMO we can close this.
This 2GB allocation related segfault is fixed by https://gitlab.freedesktop.org/robclark/mesa/-/commit/2d7f00d5c86e8a234f1c198c3bb0dd8f132a1f31 The segfault here may be different. Specifically, it happens in mozilla::DoCompressedTexImage and the mesa driver is not involved in the backtrace.
Comment 4•3 years ago
|
||
I understand the comment that way that with the commit the driver now allows to do such big allocations and does not fail straight away, making it still possible that this is just some kind of OoM error.
To be a bit more sure, could you post the bt full
output? I don't see what could actually crash in https://searchfox.org/mozilla-central/source/dom/canvas/WebGLTextureUpload.cpp#652-671 apart from the GL context being invalid or so.
The worrying part here is that IIUC this crashes FF, not just the tab - that's correct, right?
Reporter | ||
Comment 5•3 years ago
|
||
With https://gitlab.freedesktop.org/mesa/mesa/-/commit/2bc1d08c48bd3b309eb9b65db5ac1d7749f512cd and https://gitlab.freedesktop.org/mesa/mesa/-/commit/401d03e1e947279306a9cccc8b86996c940ef91b ("Clip advertised GPU memory at MIN of system memory and gpu virtual address space size") I'm no longer able to reproduce this segfault and firefox reliably gets killed by the Linux OOM killer after a few minutes (~5 minutes) instead of the 90 minutes required to reproduce the segfault.
The worrying part here is that IIUC this crashes FF, not just the tab - that's correct, right?
Yes.
I'll close this issue for now as it's not reproducible with the updated, current Mesa 22.3 development version in release build configuration.
Updated•3 years ago
|
Comment 6•3 years ago
|
||
firefox reliably gets killed by the Linux OOM killer after a few minutes
This is a bit sad - only the process in question should be killed or FF should even shut it down itself. But I'm not sure if there's anything we can do about this atm, especially as long EGL doesn't have a way to advertise available GPU ram (https://gitlab.freedesktop.org/mesa/mesa/-/issues/2976). So yeah, should be fine for now.
Updated•1 year ago
|
Description
•