Closed Bug 1680961 Opened 3 years ago Closed 3 years ago

[Wayland][Webrender] Lock surface in GLContextEGL::SwapBuffers()

Categories

(Core :: Graphics, defect)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: rmader, Assigned: rmader)

References

(Blocks 1 open bug)

Details

Attachments

(1 obsolete file)

We usually lock wl_surfaces via moz_container_wayland_surface_lock in order to protect us from race conditions. However, GLContextEGL::SwapBuffers() also chains up to wl_surface_commit(), without being locked. Most of our calls to moz_container_wayland_surface_lock happen on the main thread, while WR calls GLContextEGL::SwapBuffers() on another thread, so there's a realistic chance to get surface commits while we consider the surface locked, potentially causing different kinds of crashes we occasionally see on Wayland with WR.

Lets also lock the wl_surface when calling GLContextEGL::SwapBuffers().

Assignee: nobody → robert.mader
See Also: → 1655282

This is not well tested yet, but it appears to work well and I'd be glad
for feedback whether the approach looks reasonable.

Robert, can you try to reproduce the issue without the patch? For instance you can run mozilla test suite locally under wayland, I think mochitest/reftest may be useful or some test which uses popups as we have reports about popups used.

You need to build tree with ac_add_options --enable-tests and I also expect you'd need an optimized build to catch the race-condition, then run:

MOZ_ENABLE_WAYLAND=1 ./mach mochitest --enable-webrender

you can choose to run tests by directory:

MOZ_ENABLE_WAYLAND=1 ./mach mochitest docshell/test/ --enable-webrender

For reference there's a try I run under wayland, you can see various tests there and how they fail :)
https://treeherder.mozilla.org/jobs?repo=try&revision=2a4d6b19d3cfce3dea771d327872df38d8da906e&selectedTaskRun=V12qw91cTJuOuHsYEu-Gjg.0

(In reply to Martin Stránský [:stransky] from comment #2)

Robert, can you try to reproduce the issue without the patch? ...

Will try :)

What I already can say is that I do occasionally get crashes which appear to be related to popups - however I find them incredible hard to reproduce. Hopefully the tests make them easier to reproduce.

Robert, I think I found the underlying issue here. It's caused by the thread specific event loop:
https://searchfox.org/mozilla-central/rev/c7cf087b6e1384608ca3989f042f12f7cabd0a5f/widget/gtk/nsWaylandDisplay.cpp#56

and it's also cause of other similar bugs. There's a race condition between this and loop and the one at Gtk/main thread. Will look at it asap, I have a WIP already.

(In reply to Martin Stránský [:stransky] from comment #5)

Robert, I think I found the underlying issue here.
... I have a WIP already.

\o/

Strongly looking forward to it - I had two crashes today alone with the latest Fedora build (although with widget.wayland_vsync.enabled enabled) because of protocol errors - thus no crash reports created :(

It's pretty much the only crash I've seen in a while, so fixing that should make Wayland + WR much more stable from what I can see.

Blocks: wr-linux-wayland
No longer blocks: wayland

Martin, while your patches in bug 1648698 will likely fix the crashes, IIUC they do fully address the error chain. AFAIK we only hit the code paths such as WindowSurfaceWayland::FrameCallbackHandler() when we previously crashed in WR, falling back to the basic renderer, i.e. when we previously got the following warnings:

We don't have EGLSurface to draw into. Called too early?
Compositors might be mixed (5,1)

From what I can see the fallback now happens much less often after bug 1680505 landed (and you backported it to the Fedora FF build), i.e. after a We don't have EGLSurface to draw into. Called too early? we usually keep using WR.
However the fallback still happens sometimes and I think we need to address that in the first place. Maybe the approach here can archive just that, I'll try a bit.

Hm, the approach here doesn't seem to help much - I guess we first need to fix bug 1681107. Also adding bug 1648698 to "See also"

See Also: → 1681107, 1648698
See Also: → 1645677

Closing this one - haven't seen any relate crashes since bug bug 1681107 landed, so we probably don't need this.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
Attachment #9191533 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: