Closed Bug 1553887 Opened 3 years ago Closed 3 years ago

[wayland] Render with WR before first frame callback causes crash

Categories

(Core :: Widget: Gtk, defect, P2)

Desktop
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: kennylevinsen, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 1 obsolete file)

If mEGLSurface in RenderCompositorEGL has not yet been set (which is not done until the first frame callback on Wayland), then webrender will crash with "Hit MOZ_CRASH(Caught GL error 506 at clear) at gfx/wr/webrender/src/device/gl.rs:1209".

I can hit this race on debug builds. Some users with faster machines might be able to hit this bug on release builds.

Backtrace:

#0  0x00007ffff009bd39 in MOZ_Crash(char const*, int, char const*)
    (aFilename=0x7fffd83b71b2 "gfx/wr/webrender/src/device/gl.rs", aLine=1209, aReason=0x7fffd83b6faa "Caught GL error 506 at clear")
    at /home/kenny/src/hg.mozilla.org/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Assertions.h:313
#1  0x00007ffff009bd39 in GeckoCrash(char const*, int, char const*)
    (aFilename=0x7fffd83b71b2 "gfx/wr/webrender/src/device/gl.rs", aLine=1209, aReason=0x7fffd83b6faa "Caught GL error 506 at clear") at /home/kenny/src/hg.mozilla.org/mozilla-central/toolkit/xre/nsAppRunner.cpp:5084
#2  0x00007ffff16689e0 in gkrust_shared::panic_hook (info=0x7fffd83b7438)
    at toolkit/library/rust/shared/lib.rs:243
#3  0x00007ffff1668c68 in core::ops::function::Fn::call ()
    at /rustc/50a0defd5a93523067ef239936cc2e0755220904/src/libcore/ops/function.rs:69
#4  0x00007ffff4745879 in rust_panic_with_hook () at src/libstd/panicking.rs:478
#5  0x00007ffff4745312 in continue_panic_fmt () at src/libstd/panicking.rs:381
#6  0x00007ffff474525f in begin_panic_fmt () at src/libstd/panicking.rs:336
#7  0x00007ffff26b7dab in webrender::device::gl::Device::new::{{closure}} (gl=..., name=..., code=1286)
    at gfx/wr/webrender/src/device/gl.rs:1209
#8  0x00007ffff26f342f in <gleam::gl::ErrorReactingGl<F> as gleam::gl::Gl>::clear
    (self=0x7fffd22b5110, buffer_mask=16640)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/third_party/rust/gleam/src/gl.rs:97
#9  0x00007ffff29d261e in webrender::device::gl::Device::clear_target
    (self=0x7fffd11e4020, color=..., depth=..., rect=...) at gfx/wr/webrender/src/device/gl.rs:2977
#10 0x00007ffff28c746e in webrender::renderer::Renderer::draw_tile_frame
    (self=0x7fffd11e4000, frame=0x7fffd11e7208, device_size=..., frame_id=..., stats=0x7fffd83b8ab0, clear_framebuffer=true) at gfx/wr/webrender/src/renderer.rs:4771
#11 0x00007ffff2db6142 in webrender::renderer::Renderer::render_impl::{{closure}} ()
    at gfx/wr/webrender/src/renderer.rs:3058
#12 0x00007ffff2da13e4 in webrender::profiler::TimeProfileCounter::profile (self=0x7fffd83b8b58, callback=...)
    at gfx/wr/webrender/src/profiler.rs:282
#13 0x00007ffff28be34d in webrender::renderer::Renderer::render_impl (self=0x7fffd11e4000, device_size=...)
    at gfx/wr/webrender/src/renderer.rs:3034
#14 0x00007ffff28bde83 in webrender::renderer::Renderer::render (self=0x7fffd11e4000, device_size=...)
    at gfx/wr/webrender/src/renderer.rs:2959
#15 0x00007ffff24256ee in wr_renderer_render
    (renderer=0x7fffd11e4000, width=1912, height=2119, had_slow_frame=false, out_stats=0x7fffd83b95b0)
    at gfx/webrender_bindings/src/bindings.rs:663
#16 0x00007fffea23cbc4 in mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool, mozilla::wr::RendererStats*)
    (this=0x7fffd11c5900, aReadbackSize=..., aReadbackFormat=..., aReadbackBuffer=..., aHadSlowFrame=false, aOutStats=0x7fffd83b95b0) at /home/kenny/src/hg.mozilla.org/mozilla-central/gfx/webrender_bindings/RendererOGL.cpp:123
#17 0x00007fffea23c20b in mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool)
    (this=0x7ffff78b3700, aWindowId=..., aStartId=..., aStartTime=..., aRender=true, aReadbackSize=..., aReadbackFormat=..., aReadbackBuffer=..., aHadSlowFrame=false)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/gfx/webrender_bindings/RenderThread.cpp:374
#18 0x00007fffea23bd00 in mozilla::wr::RenderThread::HandleFrame(mozilla::wr::WrWindowId, bool)
    (this=0x7ffff78b3700, aWindowId=..., aRender=true)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/gfx/webrender_bindings/RenderThread.cpp:266
#19 0x00007fffea262e28 in mozilla::detail::RunnableMethodArguments<mozilla::wr::WrWindowId, bool>::applyImpl<mozilla::wr::RenderThread, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, bool), StoreCopyPassByConstLRef<mozilla::wr::WrWindowId>, StoreCopyPassByConstLRef<bool>, 0ul, 1ul>(mozilla::wr::RenderThread*, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, bool), mozilla::Tuple<StoreCopyPassByConstLRef<mozilla::wr::WrWindowId>, StoreCopyPassByConstLRef<bool> >&, std::integer_sequence<unsigned long, 0ul, 1ul>) (o=0x7ffff78b3700, m=
    (void (mozilla::wr::RenderThread::*)(mozilla::wr::RenderThread * const, mozilla::wr::WrWindowId, bool)) 0x7fffea23b930 <mozilla::wr::RenderThread::HandleFrame(mozilla::wr::WrWindowId, bool)>, args=...)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:1122
#20 0x00007fffea262d4d in mozilla::detail::RunnableMethodArguments<mozilla::wr::WrWindowId, bool>::apply<mozilla::wr::RenderThread, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, bool)>(mozilla::wr::RenderThread*, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, bool)) (this=0x7fffd11ec1e0, o=0x7ffff78b3700, m=
    (void (mozilla::wr::RenderThread::*)(mozilla::wr::RenderThread * const, mozilla::wr::WrWindowId, bool)) 0x7fffea23b930 <mozilla::wr::RenderThread::HandleFrame(mozilla::wr::WrWindowId, bool)>)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:1128
#21 0x00007fffea262b2e in mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, bool), true, (mozilla::RunnableKind)0, mozilla::wr::WrWindowId, bool>::Run()
    (this=0x7fffd11ec1a0)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/nsThreadUtils.h:1174
#22 0x00007fffe8c0bcd9 in MessageLoop::RunTask(already_AddRefed<nsIRunnable>) (this=0x7fffd83b9ce8, aTask=...)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/message_loop.cc:442
#23 0x00007fffe8c0c496 in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&)
    (this=0x7fffd83b9ce8, pending_task=...)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/message_loop.cc:450
#24 0x00007fffe8c0c6d0 in MessageLoop::DoWork() (this=0x7fffd83b9ce8)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/message_loop.cc:523
#25 0x00007fffe8c0d568 in base::MessagePumpDefault::Run(base::MessagePump::Delegate*)
    (this=0x7fffde302280, delegate=0x7fffd83b9ce8)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/message_pump_default.cc:35
#26 0x00007fffe8c0bb1f in MessageLoop::RunInternal() (this=0x7fffd83b9ce8)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/message_loop.cc:315
#27 0x00007fffe8c0ba95 in MessageLoop::RunHandler() (this=0x7fffd83b9ce8)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/message_loop.cc:308
#28 0x00007fffe8c0ba4a in MessageLoop::Run() (this=0x7fffd83b9ce8)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/message_loop.cc:290
#29 0x00007fffe8c2d838 in base::Thread::ThreadMain() (this=0x7fffd82abf10)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/thread.cc:192
#30 0x00007fffe8c12aee in ThreadFunc(void*) (closure=0x7fffd82abf10)
    at /home/kenny/src/hg.mozilla.org/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:40
#31 0x00007ffff7f96a92 in start_thread () at /usr/lib/libpthread.so.0
#32 0x00007ffff7bcfcd3 in clone () at /usr/lib/libc.so.6

This is necessary on Wayland, where the egl surface is not ready until next frame callback. Letting wr go through with render without a proper egl surface leads to panics.

GLContextEGL::MakeCurrentImpl() uses mFallbackSurface if no EGLSurface is set to GLContextEGL. Then I wonder if it is a problem of mFallbackSurface on Wayland.
https://searchfox.org/mozilla-central/source/gfx/gl/GLContextProviderEGL.cpp#432

"Caught GL error 506 at clear" in comment 0 means GL_INVALID_FRAMEBUFFER_OPERATION.

(In reply to Sotaro Ikeda [:sotaro] from comment #2)

GLContextEGL::MakeCurrentImpl() uses mFallbackSurface if no EGLSurface is set to GLContextEGL. Then I wonder if it is a problem of mFallbackSurface on Wayland.

When I tested on wayland on Ubuntu 18.04, mFallbackSurface was not created because KHR_surfaceless_context was supported on wayland.
https://searchfox.org/mozilla-central/source/gfx/gl/GLContextProviderEGL.cpp#189

From the following, No surface means, that EGL is the same as a context with an incomplete framebuffer object bound. It could cause GL_INVALID_FRAMEBUFFER_OPERATION error if rendering to default frame buffer happens.
https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_surfaceless_context.txt

It seems that we could create default fallback EGLSurface by GLContextEGL::CreateWaylandBufferSurface()
https://searchfox.org/mozilla-central/source/gfx/gl/GLContextProviderEGL.cpp#706

Just plopping CreateWaylandBufferSurface in for a 1x1 buffer didn't seem to work. Maybe surface creation fails due to incompatible config/flags? I haven't tried digging deeper, just quickly checked if the issue was still reproducible.

Easy way to guarantee hitting the crash is to comment out the mEGLSurface assignment in RenderCompositorEGL.cpp.

With the patch, 1x1 EGLSurface was successfully created on Ubuntu 18.04 for me.

Assignee: nobody → sotaro.ikeda.g

Kenny Levinsen, can you check if Attachment 9067912 [details] [diff] works for you?

Flags: needinfo?(bugzilla)
Assignee: sotaro.ikeda.g → nobody

It doesn't work in the sense that the GL error is still present, but the buffer surface creation itself seems to be successful. I added some extra fprintfs to your patch, and the fallback surface seems successfully created. I suspect the same goes for my experiment, where I just extended CreateFallbackSurface with a branch for wayland sessions.

Maybe the issue isn't wayland specific. Wayland might simply be exercising the issue due to its lazy egl surface initializing, resulting in greater use of the fallback surface.

I suppose the next step is to figure out what GL operation is failing.

Flags: needinfo?(bugzilla)
Priority: -- → P2
Blocks: wayland, wr-linux
OS: Unspecified → Linux
Hardware: Unspecified → Desktop

Bug 1565785 and Bug 1566468 seems to address the problem.

Depends on: 1566468
Depends on: 1565785

:kennylevinsen, does the problem still happen with latest nightly?

Flags: needinfo?(bugzilla)

I wonder if Bug 1565785 and Bug 1588987 might address the problem.

I apologize for late response.

#1565583 implemented a check that interrupts BeginFrame if no surface is ready yet. This technically fixes this issue, but I instead often see the message about missing surface, and I believe that the missing surface might be related to some weird issues seen in extension popups.

However, from the perspective of crashing, the issue is fixed. I don't seem to be able to resolve this issue as fixed, though, only "invalid" etc.

Flags: needinfo?(bugzilla)

Okay, Thanks.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Attachment #9067108 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.