Open Bug 1903319 Opened 1 year ago Updated 1 year ago

Hit MOZ_CRASH(Caught GL error 507 at push_debug_group_khr) at gfx/wr/webrender/src/device/gl.rs:1494 in debug mode

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

People

(Reporter: manuel, Unassigned)

References

()

Details

Attachments

(3 files)

My Firefox in debug mode is always crashing, either instantly or in after some time. I have to run in --headless for test cases in debug mode to not encounter crashes.

 0:13.99 GECKO(922581) [WARN  webrender::device::gl] Missing optimized shader source for gpu_cache_update
 0:14.16 GECKO(922581) [Child 922741: Main Thread]: I/DocShellAndDOMWindowLeak --DOCSHELL 71f917928c00 == 0 [pid = 922741] [id = 0] [url = about:blank]
 0:14.19 GECKO(922581) [Child 922741: Main Thread]: I/DocShellAndDOMWindowLeak --DOMWINDOW == 3 (71f91792d3e0) [pid = 922741] [serial = 1] [outer = 0] [url = about:blank]
 0:14.33 GECKO(922581) [GFX1-]: Caught GL error 507 at push_debug_group_khr
 0:14.34 GECKO(922581) [ERROR webrender::device::gl] Caught GL error 507 at push_debug_group_khr
 0:14.34 GECKO(922581) [922581] Hit MOZ_CRASH(Caught GL error 507 at push_debug_group_khr) at gfx/wr/webrender/src/device/gl.rs:1494
Initializing stack-fixing for the first stack frame, this may take a while...
 0:35.01 GECKO(922581) #01: RustMozCrash (/home/user/dev/gecko5/mozglue/static/rust/wrappers.cpp:18)
 0:35.01 GECKO(922581) #02: mozglue_static::panic_hook (mozglue/static/rust/lib.rs:80)
 0:35.01 GECKO(922581) #03: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xd4bc56c)
 0:35.02 GECKO(922581) #04: std::panicking::rust_panic_with_hook (library/alloc/src/boxed.rs:0)
 0:35.02 GECKO(922581) #05: std::panicking::begin_panic_handler::{{closure}} (library/std/src/panicking.rs:0)
 0:35.02 GECKO(922581) #06: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26b726)
 0:35.02 GECKO(922581) #07: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26def4)
 0:35.02 GECKO(922581) #08: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe2b8735)
 0:35.02 GECKO(922581) #09: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xd004453)
 0:35.02 GECKO(922581) #10: webrender::device::query_gl::GpuMarker::new (gfx/wr/webrender/src/device/query_gl.rs:283)
 0:35.03 GECKO(922581) #11: webrender::device::query_gl::GpuProfiler::start_marker (gfx/wr/webrender/src/device/query_gl.rs:266)
 0:35.03 GECKO(922581) #12: webrender::renderer::Renderer::render_impl (gfx/wr/webrender/src/renderer/mod.rs:1477)
 0:35.03 GECKO(922581) #13: webrender::renderer::Renderer::render (gfx/wr/webrender/src/renderer/mod.rs:1253)
 0:35.03 GECKO(922581) #14: wr_renderer_render (gfx/webrender_bindings/src/bindings.rs:649)
 0:35.04 GECKO(922581) #15: mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*, mozilla::wr::RendererStats*) (/home/user/dev/gecko5/gfx/webrender_bindings/RendererOGL.cpp:191)
 0:35.05 GECKO(922581) #16: mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:804)
 0:35.05 GECKO(922581) #17: mozilla::wr::RenderThread::HandleFrameOneDocInner(mozilla::wr::WrWindowId, bool, bool, mozilla::Maybe<mozilla::wr::FramePublishId>) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:648)
 0:35.05 GECKO(922581) #18: mozilla::wr::RenderThread::WrNotifierEvent_HandleNewFrameReady(mozilla::wr::WrWindowId, bool, mozilla::wr::FramePublishId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:558)
 0:35.05 GECKO(922581) #19: mozilla::wr::RenderThread::HandleWrNotifierEvents(mozilla::wr::WrWindowId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:0)
 0:35.05 GECKO(922581) #20: mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId), true, (mozilla::RunnableKind)0, mozilla::wr::WrWindowId>::Run() (/home/user/dev/gecko5/obj-debug-sccache/dist/include/nsThreadUtils.h:1134)
 0:35.05 GECKO(922581) #21: nsThread::ProcessNextEvent(bool, bool*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:1199)
 0:35.05 GECKO(922581) #22: NS_ProcessNextEvent(nsIThread*, bool) (/home/user/dev/gecko5/xpcom/threads/nsThreadUtils.cpp:480)
 0:35.05 GECKO(922581) #23: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (/home/user/dev/gecko5/ipc/glue/MessagePump.cpp:0)
 0:35.05 GECKO(922581) #24: MessageLoop::Run() (/home/user/dev/gecko5/ipc/chromium/src/base/message_loop.cc:346)
 0:35.05 GECKO(922581) #25: nsThread::ThreadFunc(void*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:372)
 0:35.08 GECKO(922581) #26: _pt_root (/home/user/dev/gecko5/nsprpub/pr/src/pthreads/ptthread.c:204)
 0:35.21 GECKO(922581) #27: set_alt_signal_stack_and_start(PthreadCreateParams*) (/home/user/dev/gecko5/mozglue/interposers/pthread_create_interposer.cpp:81)
 0:35.21 GECKO(922581) #28: ??? (/usr/lib/libc.so.6 + 0x92ded)
 0:35.21 GECKO(922581) #29: ??? (/usr/lib/libc.so.6 + 0x1160dc)
 0:35.21 GECKO(922581) #30: ??? (???:???)

Pernosco session: https://pernos.co/debug/LhA3tU0VfwYJW6-2hsmC5w/index.html

Can you please attach your about:support?

The gpu_cache_update warning is unrelated.

In bug 1879858 we saw glPushDebugGroup always returning an error (which causes us to assert in debug builds). Though on a different OS and a different error code.

If you set the pref gfx.webrender.enable-gpu-markers to false does the issue persist?

Flags: needinfo?(manuel)
See Also: 16917601879858
Attached file about:support.json
Flags: needinfo?(manuel)

The crash still happens during the test when I disable gfx.webrender.enable-gpu-markers through browser.toml.

With the same stacktrace? Or do we crash due to a different OpenGL function returning an error? We shouldn't still be calling glPushDebugGroup with that pref disabled...

Flags: needinfo?(manuel)

Ah yes, it is a different opengl function. Didn't notice.

 0:04.51 GECKO(31929) [GFX1-]: Caught GL error 507 at bind_framebuffer
 0:04.51 GECKO(31929) [ERROR webrender::device::gl] Caught GL error 507 at bind_framebuffer
 0:04.51 GECKO(31929) [31929] Hit MOZ_CRASH(Caught GL error 507 at bind_framebuffer) at gfx/wr/webrender/src/device/gl.rs:1494
Initializing stack-fixing for the first stack frame, this may take a while...
 0:04.70 INFO runtests.py | Waiting for browser...
 0:21.64 GECKO(31929) #01: RustMozCrash (/home/user/dev/gecko5/mozglue/static/rust/wrappers.cpp:18)
 0:21.65 GECKO(31929) #02: mozglue_static::panic_hook (mozglue/static/rust/lib.rs:80)
 0:21.65 GECKO(31929) #03: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xd4bc56c)
 0:21.65 GECKO(31929) #04: std::panicking::rust_panic_with_hook (library/alloc/src/boxed.rs:0)
 0:21.65 GECKO(31929) #05: std::panicking::begin_panic_handler::{{closure}} (library/std/src/panicking.rs:0)
 0:21.65 GECKO(31929) #06: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26b726)
 0:21.65 GECKO(31929) #07: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe26def4)
 0:21.65 GECKO(31929) #08: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xe2b8735)
 0:21.65 GECKO(31929) #09: ??? (/home/user/dev/gecko5/obj-debug-sccache/dist/bin/libxul.so + 0xcfff523)
 0:21.66 GECKO(31929) #10: webrender::device::gl::Device::init_fbos (gfx/wr/webrender/src/device/gl.rs:2867)
 0:21.66 GECKO(31929) #11: webrender::device::gl::Device::create_texture (gfx/wr/webrender/src/device/gl.rs:2660)
 0:21.66 GECKO(31929) #12: webrender::renderer::Renderer::update_texture_cache (gfx/wr/webrender/src/renderer/mod.rs:1901)
 0:21.66 GECKO(31929) #13: webrender::renderer::Renderer::render_impl (gfx/wr/webrender/src/renderer/mod.rs:1491)
 0:21.66 GECKO(31929) #14: webrender::renderer::Renderer::render (gfx/wr/webrender/src/renderer/mod.rs:1253)
 0:21.66 GECKO(31929) #15: wr_renderer_render (gfx/webrender_bindings/src/bindings.rs:649)
 0:21.67 GECKO(31929) #16: mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*, mozilla::wr::RendererStats*) (/home/user/dev/gecko5/gfx/webrender_bindings/RendererOGL.cpp:191)
 0:21.67 GECKO(31929) #17: mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:804)
 0:21.68 GECKO(31929) #18: mozilla::wr::RenderThread::HandleFrameOneDocInner(mozilla::wr::WrWindowId, bool, bool, mozilla::Maybe<mozilla::wr::FramePublishId>) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:648)
 0:21.68 GECKO(31929) #19: mozilla::wr::RenderThread::WrNotifierEvent_HandleNewFrameReady(mozilla::wr::WrWindowId, bool, mozilla::wr::FramePublishId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:558)
 0:21.68 GECKO(31929) #20: mozilla::wr::RenderThread::HandleWrNotifierEvents(mozilla::wr::WrWindowId) (/home/user/dev/gecko5/gfx/webrender_bindings/RenderThread.cpp:0)
 0:21.68 GECKO(31929) #21: mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId), true, (mozilla::RunnableKind)0, mozilla::wr::WrWindowId>::Run() (/home/user/dev/gecko5/obj-debug-sccache/dist/include/nsThreadUtils.h:1134)
 0:21.68 GECKO(31929) #22: nsThread::ProcessNextEvent(bool, bool*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:1199)
 0:21.68 GECKO(31929) #23: NS_ProcessNextEvent(nsIThread*, bool) (/home/user/dev/gecko5/xpcom/threads/nsThreadUtils.cpp:480)
 0:21.68 GECKO(31929) #24: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (/home/user/dev/gecko5/ipc/glue/MessagePump.cpp:0)
 0:21.68 GECKO(31929) #25: MessageLoop::Run() (/home/user/dev/gecko5/ipc/chromium/src/base/message_loop.cc:346)
 0:21.68 GECKO(31929) #26: nsThread::ThreadFunc(void*) (/home/user/dev/gecko5/xpcom/threads/nsThread.cpp:372)
 0:21.69 GECKO(31929) #27: _pt_root (/home/user/dev/gecko5/nsprpub/pr/src/pthreads/ptthread.c:204)
 0:21.77 GECKO(31929) #28: set_alt_signal_stack_and_start(PthreadCreateParams*) (/home/user/dev/gecko5/mozglue/interposers/pthread_create_interposer.cpp:81)
 0:21.77 GECKO(31929) #29: ??? (/usr/lib/libc.so.6 + 0x92ded)
 0:21.77 GECKO(31929) #30: ??? (/usr/lib/libc.so.6 + 0x1160dc)
 0:21.77 GECKO(31929) #31: ??? (???:???)

I can record a pernosco session if that is interesting (The one above was just a side product of wanting to record a pernosco session for a different bug).

Flags: needinfo?(manuel)

For reference, it happens on all test cases where a browser window opens. Most of the time either immediate after browser window opens or if I take focus on the window. Running the test ./mach test toolkit/components/antitracking/bouncetrackingprotection/test/browser/browser_bouncetracking_telemetry_purge_count.js.

Severity: -- → S3

Okay so it seems you consistently lose your GL context and then the next GL call generates this error, causing us to assert in debug builds.
1879858 is therefore unrelated, it just so happened that was the first GL function to be called.

There may be something broken in the GL setup on your machine to cause the context to be lost so easily. Andrew, Nical, any ideas?

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(aosmond)
See Also: 1879858

What happens if you set gfx.webrender.prefer-robustness to false?

Flags: needinfo?(aosmond) → needinfo?(manuel)

Flipping that pref to false makes the crashes disappear.

Flags: needinfo?(manuel)

(In reply to Manuel Bucher [:manuel] from comment #9)

Flipping that pref to false makes the crashes disappear.

And Firefox works normally?

Your driver is very new. Could this is a recent regression?
https://www.nvidia.com/Download/driverResults.aspx/226768/en-us/

We don't have any blocklisting support for robustness, and this makes me wonder whether or not we should....

I understand it is a bit of a pain, but would you be willing to try downgrading your NVIDIA driver version to see if you can regression window this? Assuming it is indeed a driver issue.

Flags: needinfo?(manuel)

Actually, this only happens in debug builds? You don't see in release builds?

(In reply to Andrew Osmond [:aosmond] (he/him) from comment #10)

And Firefox works normally?

Yes, tests are passing. Looks normal.

(In reply to Andrew Osmond [:aosmond] (he/him) from comment #12)

I understand it is a bit of a pain, but would you be willing to try downgrading your NVIDIA driver version to see if you can regression window this? Assuming it is indeed a driver issue.

Can try out, but will take time.

Actually, this only happens in debug builds? You don't see in release builds?

Yeah, probably due to this being a MOZ_ASSERT, the crash is only happening in debug. But the window looks normal in release. With Firefox nothing weird is happening. Now that I think about it I do have a weird behavior when starting applications. New windows that I open (mostly nautilus) take longer than usual for the first paint and just display the previous window for some time, even when I drag them around. However, that doesn't happen in Firefox.

Yeah, probably due to this being a MOZ_ASSERT, the crash is only happening in debug.

Oh, it's a MOZ_CRASH. So would also trigger in release. I'm pretty sure that it's not crashing in release, because I disabled debug to run the test cases. But let me verify again. (will report back when the non-debug build compiled and I could run the test case)

Even though it's a MOZ_CRASH it's (by default) only enabled in debug builds: https://searchfox.org/mozilla-central/rev/cb1060f7b4581e6c2d30f1accc84c7d807132d82/gfx/wr/webrender/src/device/gl.rs#1488

But setting the pref gfx.webrender.panic-on-gl-error to true will also enable it in release builds, so you can try that. Probably needs a restart after setting the pref to take effect

(In reply to Jamie Nicol [:jnicol] from comment #16)

Even though it's a MOZ_CRASH it's (by default) only enabled in debug builds: https://searchfox.org/mozilla-central/rev/cb1060f7b4581e6c2d30f1accc84c7d807132d82/gfx/wr/webrender/src/device/gl.rs#1488

But setting the pref gfx.webrender.panic-on-gl-error to true will also enable it in release builds, so you can try that. Probably needs a restart after setting the pref to take effect

What happens to the user if we don't panic? I guess it triggers a context reset, we tear down everything and try again....

Maybe we shouldn't be panicking for context reset errors specifically by default.

(In reply to Jamie Nicol [:jnicol] from comment #16)

Even though it's a MOZ_CRASH it's (by default) only enabled in debug builds: https://searchfox.org/mozilla-central/rev/cb1060f7b4581e6c2d30f1accc84c7d807132d82/gfx/wr/webrender/src/device/gl.rs#1488

But setting the pref gfx.webrender.panic-on-gl-error to true will also enable it in release builds, so you can try that. Probably needs a restart after setting the pref to take effect

Also crashes in release mode iff gfx.webrender.panic-on-gl-error=true

Mozilla crash reason: Caught GL error 507 at get_integer_v
Crash dump filename: /tmp/tmpbvddhwm3.mozrunner/minidumps/720cc7d4-64c7-0369-914f-fb27d73fe77e.dmp
Operating system: Linux
Flags: needinfo?(manuel)

Okay. The fact that it works in release, without rendering issues, suggests to me that these errors are recoverable and we are panicking unnecessarily. Someone should then write a patch to ignore context lost errors in the context, possibly add a pref so that someone who really wants to crash even if there is a context loss can continue to, and let the context lost recovery algorithm do its job.

Flags: needinfo?(nical.bugzilla)
Attached file pacman.log

The problem disappeared. Attaching my update log since last week.
Probably interesting lines:

[2024-06-24T13:55:11+0200] [ALPM] upgraded mesa (1:24.1.1-1 -> 1:24.1.2-1)
[2024-06-24T13:55:13+0200] [ALPM] upgraded nvidia-utils (550.90.07-2 -> 550.90.07-3)[2024-06-24T13:55:13+0200] [ALPM-SCRIPTLET] If you run into trouble with CUDA not being available, run nvidia-modprobe first.
[2024-06-24T13:55:13+0200] [ALPM-SCRIPTLET] If you use GDM on Wayland, you might have to run systemctl enable --now nvidia-resume.service
[...]
[2024-06-24T13:55:13+0200] [ALPM] upgraded sdl2 (2.30.3-1 -> 2.30.4-1)
[...]
[2024-06-24T13:55:15+0200] [ALPM] upgraded gegl (0.4.48-3 -> 0.4.48-4)
[...]
[2024-06-24T13:55:22+0200] [ALPM] upgraded vulkan-intel (1:24.1.1-1 -> 1:24.1.2-1)
[2024-06-24T13:55:22+0200] [ALPM] upgraded vulkan-radeon (1:24.1.1-1 -> 1:24.1.2-1)

There was a driver update and I can try whether that update really fixed the problem.

Flags: needinfo?(manuel)
Flags: needinfo?(manuel)

I didn't consider driver update before submitting the bug report due to the PC being new to me and I had the problem from the beginning. If it really was a driver problem, closing as WORKSFORME would be fine by me.

Flags: needinfo?(manuel)

There was a driver update and I can try whether that update really fixed the problem.

Not sure if that statement was wrong, but the problem reoccurred and I haven't got to reproducing the problem selecting different driver versions (it's a bit of a pain, because it involves restarting the computer). Sorry 😔

Flags: needinfo?(manuel)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: