[Wayland] Freeze at WaitForSyncNotify() / get_back_bo()
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
People
(Reporter: stransky, Assigned: stransky)
References
(Blocks 2 open bugs, Regressed 1 open bug)
Details
Attachments
(1 file)
This is follow up from Bug 1733754.
Assignee | ||
Comment 1•3 years ago
|
||
SRT: Latest nightly, open testcase from https://bugzilla.mozilla.org/show_bug.cgi?id=1733754#c0 and open/close the 'Hightres' popup repeatedly.
Assignee | ||
Comment 2•3 years ago
|
||
This is deadlock between widget and mesa rendering code.
We request sync repaint and wait for it and we're blocking event loop. But event loop is needed to release buffers in render thread - mesa does not have any buffer available and it's waiting for a new one.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
We're blocked here:
#6 0x00007f01df9b1c9f in mozilla::ipc::MessageChannel::WaitForSyncNotify(bool) (this=0x7f01875e34f8) at /raid/src2/ipc/glue/MessageChannel.cpp:2233
#7 0x00007f01df9b1054 in mozilla::ipc::MessageChannel::Send(mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >, IPC::Message*) (this=0x7f01875e34f8, aMsg=
[(class IPC::Message ) 0x0], aReply=0x7fffe9253c88) at /raid/src2/ipc/glue/MessageChannel.cpp:1449
#8 0x00007f01df9f344a in mozilla::ipc::IProtocol::ChannelSend(IPC::Message, IPC::Message*) (this=0x7f0165367200, aMsg=0x7f01658fe5e0, aReply=0x7fffe9253c88)
at /raid/src2/ipc/glue/ProtocolUtils.cpp:534
#9 0x00007f01dfb0e778 in mozilla::layers::PCompositorBridgeChild::SendFlushRendering(mozilla::wr::RenderReasons const&) (this=0x7f0165367200, aReasons=...)
at PCompositorBridgeChild.cpp:1084
#10 0x00007f01e09fd3ab in mozilla::layers::CompositorBridgeChild::SendFlushRendering(mozilla::wr::RenderReasons const&) (this=0x7f0165367200, aReasons=...)
at /raid/src2/gfx/layers/ipc/CompositorBridgeChild.cpp:403
#11 0x00007f01e0b0f20b in mozilla::layers::WebRenderLayerManager::FlushRendering(mozilla::wr::RenderReasons) (this=0x7f016e2a0800, aReasons=...)
at /raid/src2/gfx/layers/wr/WebRenderLayerManager.cpp:709
#12 0x00007f01e4b094ed in nsWindow::OnConfigureEvent(_GtkWidget*, _GdkEventConfigure*) (this=0x7f016e342400, aWidget=0x7f016e343660, aEvent=0x7f016e12d740)
at /raid/src2/widget/gtk/nsWindow.cpp:3864
#13 0x00007f01e4b1468f in configure_event_cb(_GtkWidget*, _GdkEventConfigure*) (widget=0x7f016e343660, event=0x7f016e12d740) at /raid/src2/widget/gtk/nsWindow.cpp:7675
[...]
#20 0x00007f01ef878286 in gtk_widget_event (widget=0x7f016e343660, event=0x7f016e12d740) at /usr/src/debug/gtk3-3.24.30-1.1.fc34.x86_64/gtk/gtkwidget.c:7378
#21 0x00007f01ef696e77 in gtk_main_do_event (event=0x7f016e12d740) at /usr/src/debug/gtk3-3.24.30-1.1.fc34.x86_64/gtk/gtkmain.c:1861
#22 0x00007f01ef28326b in _gdk_event_emit (event=0x7f016e12d740) at /usr/src/debug/gtk3-3.24.30-1.1.fc34.x86_64/gdk/gdkevents.c:73
#23 0x00007f01ef30f2b9 in gdk_event_source_dispatch (base=0x7f01f0aeff80, callback=0x0, data=0x0) at wayland/gdkeventsource.c:124
It's because we request SynchronouslyRepaintOnResize().
Assignee | ||
Comment 4•3 years ago
|
||
This affects remote popups (extensions) as we use HW acceleration there.
Assignee | ||
Comment 5•3 years ago
|
||
Mesa bt:
#1 0x00007f4213d92d7c in wl_display_dispatch_queue () at /lib64/libwayland-client.so.0
#2 0x00007f4213d93aff in wl_display_roundtrip_queue () at /lib64/libwayland-client.so.0
#3 0x00007f41cb53a309 in get_back_bo (dri2_surf=0x7f4189eb1400) at ../src/egl/drivers/dri2/platform_wayland.c:570
#4 update_buffers (dri2_surf=dri2_surf@entry=0x7f4189eb1400) at ../src/egl/drivers/dri2/platform_wayland.c:696
#5 0x00007f41cb53a4a4 in update_buffers_if_needed (dri2_surf=0x7f4189eb1400) at ../src/egl/drivers/dri2/platform_wayland.c:726
#6 dri2_wl_query_buffer_age (disp=<optimized out>, surface=0x7f4189eb1400) at ../src/egl/drivers/dri2/platform_wayland.c:1171
#7 0x00007f41cb52edba in _eglQuerySurface (disp=<optimized out>, surface=0x7f4189eb1400, attribute=<optimized out>, value=0x7f41cabfd0e4) at ../src/egl/main/eglsurface.c:546
#8 0x00007f41cb524775 in eglQuerySurface (dpy=0x7f41cb4a2000, surface=<optimized out>, attribute=12605, value=0x7f41cabfd0e4) at ../src/egl/main/eglapi.c:1225
#9 0x00007f42056c1b82 in mozilla::gl::GLLibraryEGL::fQuerySurface(void*, void*, int, int*) const
(this=0x7f41cb438400, dpy=0x7f41cb4a2000, surface=0x7f4189eb1400, attribute=12605, value=0x7f41cabfd0e4) at /raid/src2/gfx/gl/GLLibraryEGL.h:347
#10 0x00007f42056a505f in mozilla::gl::EglDisplay::fQuerySurface(void*, int, int*) const (this=0x7f41cb4f1080, surface=0x7f4189eb1400, attribute=12605, value=0x7f41cabfd0e4)
at /raid/src2/gfx/gl/GLLibraryEGL.h:721
#11 0x00007f42056a4fd6 in mozilla::gl::GLContextEGL::GetBufferAge() const (this=0x7f41bb9ca000) at /raid/src2/gfx/gl/GLContextProviderEGL.cpp:599
#12 0x00007f4205bc83e4 in mozilla::wr::RenderCompositorEGL::GetBufferAge() const (this=0x7f4193259760) at /raid/src2/gfx/webrender_bindings/RenderCompositorEGL.cpp:274
#13 0x00007f4205bda784 in mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*, mozilla::wr::RendererStats*)
(this=0x7f41974514c0, aReadbackSize=..., aReadbackFormat=..., aReadbackBuffer=..., aNeedsYFlip=0x0, aOutStats=0x7f41cabfd350) at /raid/src2/gfx/webrender_bindings/RendererOGL.cpp:171
#14 0x00007f4205bd96b1 in mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*) (this=0x7f41ead4dab0, aWindowId=..., aStartId=..., aStartTime=..., aRender=true, aReadbackSize=..., aReadbackFormat=..., aReadbackBuffer=..., aNeedsYFlip=0x0)
at /raid/src2/gfx/webrender_bindings/RenderThread.cpp:501
#15 0x00007f4205bd8b8c in mozilla::wr::RenderThread::HandleFrameOneDoc(mozilla::wr::WrWindowId, bool) (this=0x7f41ead4dab0, aWindowId=..., aRender=true)
we're blocked on get_back_bo() in render thread while main thread is blocked by WaitForSyncNotify().
Assignee | ||
Comment 6•3 years ago
|
||
This is a general issue, we can deadlock any time between mesa 'get buffer code' at render thread and 'wait for reply' at main thread. Looks like we can't use synced events on Wayland.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Synced FlushRendering() causes deadlock on Wayland.
Main thread is blocked at MessageChannel::WaitForSyncNotify() and it's not processing events from system.
Rendering thread is blocked at RendererOGL::UpdateAndRender() / eglQuerySurface(). Mesa is waiting for free back buffer
at get_back_bo(), it's spinning wl_display_roundtrip_queue() and waiting for free buffer but it doesn't get one as main loop is blocked.
Updated•3 years ago
|
Comment 9•3 years ago
|
||
bugherder |
Assignee | ||
Comment 11•3 years ago
|
||
Let's keep that in nighly, it touches webrender repaint code and I don't want to bring any regressions there.
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 12•3 years ago
|
||
Testing with Weston, this causes a regression where the display stays full black until some event kicks weston (moving the mouse inside the window, focusing the title bar, etc.).
STR:
- Launch Weston with
weston --shell=desktop-shell.so
(weston 9.0.0 from Ubuntu 21.04). - Launch Firefox with
MOZ_ENABLE_WAYLAND=1 WAYLAND_DISPLAY=wayland-0 GDK_DPI_SCALE=2 ./mach run
Expected:
- A brief flash of full black and then immediate display of Firefox UI.
Observed:
- The wayland screen turns full black and no painting happens.
mozregression points to this bug:
4:43.42 INFO: No more integration revisions, bisection finished.
4:43.42 INFO: Last good revision: ab5f761072d345eacdff74c945af6ed41b848920
4:43.42 INFO: First bad revision: 9fcf1f9ef0e71a241b50df4f158fa4936da30a8b
4:43.42 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ab5f761072d345eacdff74c945af6ed41b848920&tochange=9fcf1f9ef0e71a241b50df4f158fa4936da30a8b
Assignee | ||
Comment 13•3 years ago
|
||
Can you test a different compositor please?
Thanks.
Comment 14•3 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #13)
Can you test a different compositor please?
Thanks.
Any compositor you recommend? I'm mostly testing with Weston because I use it in kiosk mode for full screen display on devices like the Pinephone.
Assignee | ||
Comment 15•3 years ago
|
||
Please test mutter.
Comment 16•3 years ago
|
||
That works fine in Mutter.
Comment 17•3 years ago
|
||
(In reply to [:fabrice] Fabrice Desré from comment #12)
Testing with Weston, this causes a regression where the display stays full black until some event kicks weston (moving the mouse inside the window, focusing the title bar, etc.).
I remember that this has been an issue before in the Weston kiosk mode. And I'm pretty sure the underlying issue was a Weston bug :/
Comment 18•3 years ago
|
||
(In reply to Robert Mader [:rmader] from comment #17)
(In reply to [:fabrice] Fabrice Desré from comment #12)
Testing with Weston, this causes a regression where the display stays full black until some event kicks weston (moving the mouse inside the window, focusing the title bar, etc.).
I remember that this has been an issue before in the Weston kiosk mode. And I'm pretty sure the underlying issue was a Weston bug :/
Hi Robert! I guess you're thinking about https://gitlab.freedesktop.org/wayland/weston/-/issues/473 but I run into this regression with a self-compiled version of Weston that includes the fix for #473. Now, this doesn't mean there is not another issue in Weston :)
Description
•