[Wayland][Webrender] Lock surface in GLContextEGL::SwapBuffers()
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: rmader, Assigned: rmader)
References
(Blocks 1 open bug)
Details
Attachments
(1 obsolete file)
We usually lock wl_surface
s via moz_container_wayland_surface_lock
in order to protect us from race conditions. However, GLContextEGL::SwapBuffers()
also chains up to wl_surface_commit()
, without being locked. Most of our calls to moz_container_wayland_surface_lock
happen on the main thread, while WR calls GLContextEGL::SwapBuffers()
on another thread, so there's a realistic chance to get surface commits while we consider the surface locked, potentially causing different kinds of crashes we occasionally see on Wayland with WR.
Lets also lock the wl_surface
when calling GLContextEGL::SwapBuffers()
.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
This is not well tested yet, but it appears to work well and I'd be glad
for feedback whether the approach looks reasonable.
Comment 2•3 years ago
|
||
Robert, can you try to reproduce the issue without the patch? For instance you can run mozilla test suite locally under wayland, I think mochitest/reftest may be useful or some test which uses popups as we have reports about popups used.
You need to build tree with ac_add_options --enable-tests and I also expect you'd need an optimized build to catch the race-condition, then run:
MOZ_ENABLE_WAYLAND=1 ./mach mochitest --enable-webrender
you can choose to run tests by directory:
MOZ_ENABLE_WAYLAND=1 ./mach mochitest docshell/test/ --enable-webrender
Comment 3•3 years ago
•
|
||
For reference there's a try I run under wayland, you can see various tests there and how they fail :)
https://treeherder.mozilla.org/jobs?repo=try&revision=2a4d6b19d3cfce3dea771d327872df38d8da906e&selectedTaskRun=V12qw91cTJuOuHsYEu-Gjg.0
Assignee | ||
Comment 4•3 years ago
|
||
(In reply to Martin Stránský [:stransky] from comment #2)
Robert, can you try to reproduce the issue without the patch? ...
Will try :)
What I already can say is that I do occasionally get crashes which appear to be related to popups - however I find them incredible hard to reproduce. Hopefully the tests make them easier to reproduce.
Comment 5•3 years ago
|
||
Robert, I think I found the underlying issue here. It's caused by the thread specific event loop:
https://searchfox.org/mozilla-central/rev/c7cf087b6e1384608ca3989f042f12f7cabd0a5f/widget/gtk/nsWaylandDisplay.cpp#56
and it's also cause of other similar bugs. There's a race condition between this and loop and the one at Gtk/main thread. Will look at it asap, I have a WIP already.
Assignee | ||
Comment 6•3 years ago
|
||
(In reply to Martin Stránský [:stransky] from comment #5)
Robert, I think I found the underlying issue here.
... I have a WIP already.
\o/
Strongly looking forward to it - I had two crashes today alone with the latest Fedora build (although with widget.wayland_vsync.enabled
enabled) because of protocol errors - thus no crash reports created :(
It's pretty much the only crash I've seen in a while, so fixing that should make Wayland + WR much more stable from what I can see.
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
Martin, while your patches in bug 1648698 will likely fix the crashes, IIUC they do fully address the error chain. AFAIK we only hit the code paths such as WindowSurfaceWayland::FrameCallbackHandler()
when we previously crashed in WR, falling back to the basic renderer, i.e. when we previously got the following warnings:
We don't have EGLSurface to draw into. Called too early?
Compositors might be mixed (5,1)
From what I can see the fallback now happens much less often after bug 1680505 landed (and you backported it to the Fedora FF build), i.e. after a We don't have EGLSurface to draw into. Called too early?
we usually keep using WR.
However the fallback still happens sometimes and I think we need to address that in the first place. Maybe the approach here can archive just that, I'll try a bit.
Assignee | ||
Comment 8•3 years ago
|
||
Hm, the approach here doesn't seem to help much - I guess we first need to fix bug 1681107. Also adding bug 1648698 to "See also"
Assignee | ||
Comment 9•3 years ago
|
||
Closing this one - haven't seen any relate crashes since bug bug 1681107 landed, so we probably don't need this.
Updated•3 years ago
|
Description
•