Hit MOZ_CRASH(Caught GL error 507 at push_debug_group_khr) at gfx/wr/webrender/src/device/gl.rs:1494 in debug mode
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
People
(Reporter: manuel, Unassigned)
References
()
Details
Attachments
(3 files)
My Firefox in debug mode is always crashing, either instantly or in after some time. I have to run in --headless for test cases in debug mode to not encounter crashes.
0:13.99 GECKO(922581) [WARN webrender::device::gl] Missing optimized shader source for gpu_cache_update
0:14.16 GECKO(922581) [Child 922741: Main Thread]: I/DocShellAndDOMWindowLeak --DOCSHELL 71f917928c00 == 0 [pid = 922741] [id = 0] [url = about:blank]
0:14.19 GECKO(922581) [Child 922741: Main Thread]: I/DocShellAndDOMWindowLeak --DOMWINDOW == 3 (71f91792d3e0) [pid = 922741] [serial = 1] [outer = 0] [url = about:blank]
0:14.33 GECKO(922581) [GFX1-]: Caught GL error 507 at push_debug_group_khr
0:14.34 GECKO(922581) [ERROR webrender::device::gl] Caught GL error 507 at push_debug_group_khr
0:14.34 GECKO(922581) [922581] Hit MOZ_CRASH(Caught GL error 507 at push_debug_group_khr) at gfx/wr/webrender/src/device/gl.rs:1494
Initializing stack-fixing for the first stack frame, this may take a while...
0:35.01 GECKO(922581) #01: RustMozCrash (/home/user/dev/gecko5/mozglue/static/rust/wrappers.cpp:18)
0:35.01 GECKO(922581) #02: mozglue_static::panic_hook (mozglue/static/rust/lib.rs:80)
0:35.01 GECKO(922581) #03: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xd4bc56c)
0:35.02 GECKO(922581) #04: std::panicking::rust_panic_with_hook (library/alloc/src/boxed.rs:0)
0:35.02 GECKO(922581) #05: std::panicking::begin_panic_handler::{{closure}} (library/std/src/panicking.rs:0)
0:35.02 GECKO(922581) #06: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26b726)
0:35.02 GECKO(922581) #07: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26def4)
0:35.02 GECKO(922581) #08: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe2b8735)
0:35.02 GECKO(922581) #09: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xd004453)
0:35.02 GECKO(922581) #10: webrender::device::query_gl::GpuMarker::new (gfx/wr/webrender/src/device/query_gl.rs:283)
0:35.03 GECKO(922581) #11: webrender::device::query_gl::GpuProfiler::start_marker (gfx/wr/webrender/src/device/query_gl.rs:266)
0:35.03 GECKO(922581) #12: webrender::renderer::Renderer::render_impl (gfx/wr/webrender/src/renderer/mod.rs:1477)
0:35.03 GECKO(922581) #13: webrender::renderer::Renderer::render (gfx/wr/webrender/src/renderer/mod.rs:1253)
0:35.03 GECKO(922581) #14: wr_renderer_render (gfx/webrender_bindings/src/bindings.rs:649)
0:35.04 GECKO(922581) #15: mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*, mozilla::wr::RendererStats*) (/home/user/dev/gecko5/gfx/webrender_bindings/RendererOGL.cpp:191)
0:35.05 GECKO(922581) #16: mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:804)
0:35.05 GECKO(922581) #17: mozilla::wr::RenderThread::HandleFrameOneDocInner(mozilla::wr::WrWindowId, bool, bool, mozilla::Maybe<mozilla::wr::FramePublishId>) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:648)
0:35.05 GECKO(922581) #18: mozilla::wr::RenderThread::WrNotifierEvent_HandleNewFrameReady(mozilla::wr::WrWindowId, bool, mozilla::wr::FramePublishId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:558)
0:35.05 GECKO(922581) #19: mozilla::wr::RenderThread::HandleWrNotifierEvents(mozilla::wr::WrWindowId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:0)
0:35.05 GECKO(922581) #20: mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId), true, (mozilla::RunnableKind)0, mozilla::wr::WrWindowId>::Run() (/home/user/dev/gecko5/obj-debug-sccache/dist/include/nsThreadUtils.h:1134)
0:35.05 GECKO(922581) #21: nsThread::ProcessNextEvent(bool, bool*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:1199)
0:35.05 GECKO(922581) #22: NS_ProcessNextEvent(nsIThread*, bool) (/home/user/dev/gecko5/xpcom/threads/nsThreadUtils.cpp:480)
0:35.05 GECKO(922581) #23: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (/home/user/dev/gecko5/ipc/glue/MessagePump.cpp:0)
0:35.05 GECKO(922581) #24: MessageLoop::Run() (/home/user/dev/gecko5/ipc/chromium/src/base/message_loop.cc:346)
0:35.05 GECKO(922581) #25: nsThread::ThreadFunc(void*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:372)
0:35.08 GECKO(922581) #26: _pt_root (/home/user/dev/gecko5/nsprpub/pr/src/pthreads/ptthread.c:204)
0:35.21 GECKO(922581) #27: set_alt_signal_stack_and_start(PthreadCreateParams*) (/home/user/dev/gecko5/mozglue/interposers/pthread_create_interposer.cpp:81)
0:35.21 GECKO(922581) #28: ??? (/usr/lib/libc.so.6 + 0x92ded)
0:35.21 GECKO(922581) #29: ??? (/usr/lib/libc.so.6 + 0x1160dc)
0:35.21 GECKO(922581) #30: ??? (???:???)
Pernosco session: https://pernos.co/debug/LhA3tU0VfwYJW6-2hsmC5w/index.html
Comment 1•1 year ago
|
||
Can you please attach your about:support?
The gpu_cache_update warning is unrelated.
In bug 1879858 we saw glPushDebugGroup always returning an error (which causes us to assert in debug builds). Though on a different OS and a different error code.
If you set the pref gfx.webrender.enable-gpu-markers to false does the issue persist?
| Reporter | ||
Comment 2•1 year ago
|
||
| Reporter | ||
Comment 3•1 year ago
•
|
||
The crash still happens during the test when I disable gfx.webrender.enable-gpu-markers through browser.toml.
Comment 4•1 year ago
|
||
With the same stacktrace? Or do we crash due to a different OpenGL function returning an error? We shouldn't still be calling glPushDebugGroup with that pref disabled...
| Reporter | ||
Comment 5•1 year ago
|
||
Ah yes, it is a different opengl function. Didn't notice.
0:04.51 GECKO(31929) [GFX1-]: Caught GL error 507 at bind_framebuffer
0:04.51 GECKO(31929) [ERROR webrender::device::gl] Caught GL error 507 at bind_framebuffer
0:04.51 GECKO(31929) [31929] Hit MOZ_CRASH(Caught GL error 507 at bind_framebuffer) at gfx/wr/webrender/src/device/gl.rs:1494
Initializing stack-fixing for the first stack frame, this may take a while...
0:04.70 INFO runtests.py | Waiting for browser...
0:21.64 GECKO(31929) #01: RustMozCrash (/home/user/dev/gecko5/mozglue/static/rust/wrappers.cpp:18)
0:21.65 GECKO(31929) #02: mozglue_static::panic_hook (mozglue/static/rust/lib.rs:80)
0:21.65 GECKO(31929) #03: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xd4bc56c)
0:21.65 GECKO(31929) #04: std::panicking::rust_panic_with_hook (library/alloc/src/boxed.rs:0)
0:21.65 GECKO(31929) #05: std::panicking::begin_panic_handler::{{closure}} (library/std/src/panicking.rs:0)
0:21.65 GECKO(31929) #06: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26b726)
0:21.65 GECKO(31929) #07: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26def4)
0:21.65 GECKO(31929) #08: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe2b8735)
0:21.65 GECKO(31929) #09: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xcfff523)
0:21.66 GECKO(31929) #10: webrender::device::gl::Device::init_fbos (gfx/wr/webrender/src/device/gl.rs:2867)
0:21.66 GECKO(31929) #11: webrender::device::gl::Device::create_texture (gfx/wr/webrender/src/device/gl.rs:2660)
0:21.66 GECKO(31929) #12: webrender::renderer::Renderer::update_texture_cache (gfx/wr/webrender/src/renderer/mod.rs:1901)
0:21.66 GECKO(31929) #13: webrender::renderer::Renderer::render_impl (gfx/wr/webrender/src/renderer/mod.rs:1491)
0:21.66 GECKO(31929) #14: webrender::renderer::Renderer::render (gfx/wr/webrender/src/renderer/mod.rs:1253)
0:21.66 GECKO(31929) #15: wr_renderer_render (gfx/webrender_bindings/src/bindings.rs:649)
0:21.67 GECKO(31929) #16: mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*, mozilla::wr::RendererStats*) (/home/user/dev/gecko5/gfx/webrender_bindings/RendererOGL.cpp:191)
0:21.67 GECKO(31929) #17: mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:804)
0:21.68 GECKO(31929) #18: mozilla::wr::RenderThread::HandleFrameOneDocInner(mozilla::wr::WrWindowId, bool, bool, mozilla::Maybe<mozilla::wr::FramePublishId>) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:648)
0:21.68 GECKO(31929) #19: mozilla::wr::RenderThread::WrNotifierEvent_HandleNewFrameReady(mozilla::wr::WrWindowId, bool, mozilla::wr::FramePublishId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:558)
0:21.68 GECKO(31929) #20: mozilla::wr::RenderThread::HandleWrNotifierEvents(mozilla::wr::WrWindowId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:0)
0:21.68 GECKO(31929) #21: mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId), true, (mozilla::RunnableKind)0, mozilla::wr::WrWindowId>::Run() (/home/user/dev/gecko5/obj-debug-sccache/dist/include/nsThreadUtils.h:1134)
0:21.68 GECKO(31929) #22: nsThread::ProcessNextEvent(bool, bool*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:1199)
0:21.68 GECKO(31929) #23: NS_ProcessNextEvent(nsIThread*, bool) (/home/user/dev/gecko5/xpcom/threads/nsThreadUtils.cpp:480)
0:21.68 GECKO(31929) #24: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (/home/user/dev/gecko5/ipc/glue/MessagePump.cpp:0)
0:21.68 GECKO(31929) #25: MessageLoop::Run() (/home/user/dev/gecko5/ipc/chromium/src/base/message_loop.cc:346)
0:21.68 GECKO(31929) #26: nsThread::ThreadFunc(void*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:372)
0:21.69 GECKO(31929) #27: _pt_root (/home/user/dev/gecko5/nsprpub/pr/src/pthreads/ptthread.c:204)
0:21.77 GECKO(31929) #28: set_alt_signal_stack_and_start(PthreadCreateParams*) (/home/user/dev/gecko5/mozglue/interposers/pthread_create_interposer.cpp:81)
0:21.77 GECKO(31929) #29: ??? (/usr/lib/libc.so.6 + 0x92ded)
0:21.77 GECKO(31929) #30: ??? (/usr/lib/libc.so.6 + 0x1160dc)
0:21.77 GECKO(31929) #31: ??? (???:???)
I can record a pernosco session if that is interesting (The one above was just a side product of wanting to record a pernosco session for a different bug).
| Reporter | ||
Comment 6•1 year ago
|
||
For reference, it happens on all test cases where a browser window opens. Most of the time either immediate after browser window opens or if I take focus on the window. Running the test ./mach test toolkit/components/antitracking/bouncetrackingprotection/test/browser/browser_bouncetracking_telemetry_purge_count.js.
Updated•1 year ago
|
Comment 7•1 year ago
|
||
Okay so it seems you consistently lose your GL context and then the next GL call generates this error, causing us to assert in debug builds.
1879858 is therefore unrelated, it just so happened that was the first GL function to be called.
There may be something broken in the GL setup on your machine to cause the context to be lost so easily. Andrew, Nical, any ideas?
Comment 8•1 year ago
|
||
What happens if you set gfx.webrender.prefer-robustness to false?
| Reporter | ||
Comment 9•1 year ago
|
||
Flipping that pref to false makes the crashes disappear.
Comment 10•1 year ago
|
||
(In reply to Manuel Bucher [:manuel] from comment #9)
Flipping that pref to false makes the crashes disappear.
And Firefox works normally?
Comment 11•1 year ago
|
||
Your driver is very new. Could this is a recent regression?
https://www.nvidia.com/Download/driverResults.aspx/226768/en-us/
We don't have any blocklisting support for robustness, and this makes me wonder whether or not we should....
Comment 12•1 year ago
|
||
I understand it is a bit of a pain, but would you be willing to try downgrading your NVIDIA driver version to see if you can regression window this? Assuming it is indeed a driver issue.
Comment 13•1 year ago
|
||
Actually, this only happens in debug builds? You don't see in release builds?
| Reporter | ||
Comment 14•1 year ago
|
||
(In reply to Andrew Osmond [:aosmond] (he/him) from comment #10)
And Firefox works normally?
Yes, tests are passing. Looks normal.
(In reply to Andrew Osmond [:aosmond] (he/him) from comment #12)
I understand it is a bit of a pain, but would you be willing to try downgrading your NVIDIA driver version to see if you can regression window this? Assuming it is indeed a driver issue.
Can try out, but will take time.
Actually, this only happens in debug builds? You don't see in release builds?
Yeah, probably due to this being a MOZ_ASSERT, the crash is only happening in debug. But the window looks normal in release. With Firefox nothing weird is happening. Now that I think about it I do have a weird behavior when starting applications. New windows that I open (mostly nautilus) take longer than usual for the first paint and just display the previous window for some time, even when I drag them around. However, that doesn't happen in Firefox.
| Reporter | ||
Comment 15•1 year ago
•
|
||
Yeah, probably due to this being a MOZ_ASSERT, the crash is only happening in debug.
Oh, it's a MOZ_CRASH. So would also trigger in release. I'm pretty sure that it's not crashing in release, because I disabled debug to run the test cases. But let me verify again. (will report back when the non-debug build compiled and I could run the test case)
Comment 16•1 year ago
|
||
Even though it's a MOZ_CRASH it's (by default) only enabled in debug builds: https://searchfox.org/mozilla-central/rev/cb1060f7b4581e6c2d30f1accc84c7d807132d82/gfx/wr/webrender/src/device/gl.rs#1488
But setting the pref gfx.webrender.panic-on-gl-error to true will also enable it in release builds, so you can try that. Probably needs a restart after setting the pref to take effect
Comment 17•1 year ago
•
|
||
(In reply to Jamie Nicol [:jnicol] from comment #16)
Even though it's a MOZ_CRASH it's (by default) only enabled in debug builds: https://searchfox.org/mozilla-central/rev/cb1060f7b4581e6c2d30f1accc84c7d807132d82/gfx/wr/webrender/src/device/gl.rs#1488
But setting the pref
gfx.webrender.panic-on-gl-errorto true will also enable it in release builds, so you can try that. Probably needs a restart after setting the pref to take effect
What happens to the user if we don't panic? I guess it triggers a context reset, we tear down everything and try again....
Comment 18•1 year ago
|
||
Maybe we shouldn't be panicking for context reset errors specifically by default.
| Reporter | ||
Comment 19•1 year ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #16)
Even though it's a MOZ_CRASH it's (by default) only enabled in debug builds: https://searchfox.org/mozilla-central/rev/cb1060f7b4581e6c2d30f1accc84c7d807132d82/gfx/wr/webrender/src/device/gl.rs#1488
But setting the pref
gfx.webrender.panic-on-gl-errorto true will also enable it in release builds, so you can try that. Probably needs a restart after setting the pref to take effect
Also crashes in release mode iff gfx.webrender.panic-on-gl-error=true
Mozilla crash reason: Caught GL error 507 at get_integer_v
Crash dump filename: /tmp/tmpbvddhwm3.mozrunner/minidumps/720cc7d4-64c7-0369-914f-fb27d73fe77e.dmp
Operating system: Linux
Comment 20•1 year ago
|
||
Okay. The fact that it works in release, without rendering issues, suggests to me that these errors are recoverable and we are panicking unnecessarily. Someone should then write a patch to ignore context lost errors in the context, possibly add a pref so that someone who really wants to crash even if there is a context loss can continue to, and let the context lost recovery algorithm do its job.
| Reporter | ||
Comment 21•1 year ago
|
||
The problem disappeared. Attaching my update log since last week.
Probably interesting lines:
[2024-06-24T13:55:11+0200] [ALPM] upgraded mesa (1:24.1.1-1 -> 1:24.1.2-1)
[2024-06-24T13:55:13+0200] [ALPM] upgraded nvidia-utils (550.90.07-2 -> 550.90.07-3)[2024-06-24T13:55:13+0200] [ALPM-SCRIPTLET] If you run into trouble with CUDA not being available, run nvidia-modprobe first.
[2024-06-24T13:55:13+0200] [ALPM-SCRIPTLET] If you use GDM on Wayland, you might have to run systemctl enable --now nvidia-resume.service
[...]
[2024-06-24T13:55:13+0200] [ALPM] upgraded sdl2 (2.30.3-1 -> 2.30.4-1)
[...]
[2024-06-24T13:55:15+0200] [ALPM] upgraded gegl (0.4.48-3 -> 0.4.48-4)
[...]
[2024-06-24T13:55:22+0200] [ALPM] upgraded vulkan-intel (1:24.1.1-1 -> 1:24.1.2-1)
[2024-06-24T13:55:22+0200] [ALPM] upgraded vulkan-radeon (1:24.1.1-1 -> 1:24.1.2-1)
There was a driver update and I can try whether that update really fixed the problem.
| Reporter | ||
Comment 22•1 year ago
|
||
| Reporter | ||
Comment 23•1 year ago
•
|
||
I didn't consider driver update before submitting the bug report due to the PC being new to me and I had the problem from the beginning. If it really was a driver problem, closing as WORKSFORME would be fine by me.
| Reporter | ||
Comment 24•1 year ago
|
||
There was a driver update and I can try whether that update really fixed the problem.
Not sure if that statement was wrong, but the problem reoccurred and I haven't got to reproducing the problem selecting different driver versions (it's a bit of a pain, because it involves restarting the computer). Sorry 😔
Description
•