Closed Bug 1680961 Opened 3 years ago Closed 3 years ago

[Wayland][Webrender] Lock surface in GLContextEGL::SwapBuffers()

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: rmader, Assigned: rmader)

References

(Blocks 1 open bug)

Details

Attachments

(1 obsolete file)

Bug 1680961 - RFC: [Wayland][Webrender] Lock surface in GLContextEGL::SwapBuffers(), r?stransky 3 years ago Robert Mader [:rmader] 47 bytes, text/x-phabricator-request		Details \| Review

Robert Mader [:rmader]

Assignee

Description

•

3 years ago

•

Edited

We usually lock wl_surfaces via moz_container_wayland_surface_lock in order to protect us from race conditions. However, GLContextEGL::SwapBuffers() also chains up to wl_surface_commit(), without being locked. Most of our calls to moz_container_wayland_surface_lock happen on the main thread, while WR calls GLContextEGL::SwapBuffers() on another thread, so there's a realistic chance to get surface commits while we consider the surface locked, potentially causing different kinds of crashes we occasionally see on Wayland with WR.

Lets also lock the wl_surface when calling GLContextEGL::SwapBuffers().

Robert Mader [:rmader]

Assignee

Updated

•

3 years ago

Assignee: nobody → robert.mader

Robert Mader [:rmader]

Assignee

Updated

•

3 years ago

Comment 1

•

3 years ago

Attached file Bug 1680961 - RFC: [Wayland][Webrender] Lock surface in GLContextEGL::SwapBuffers(), r?stransky (obsolete) — Details

This is not well tested yet, but it appears to work well and I'd be glad
for feedback whether the approach looks reasonable.

Martin Stránský [:stransky] (ni? me)

Comment 2

•

3 years ago

Robert, can you try to reproduce the issue without the patch? For instance you can run mozilla test suite locally under wayland, I think mochitest/reftest may be useful or some test which uses popups as we have reports about popups used.

You need to build tree with ac_add_options --enable-tests and I also expect you'd need an optimized build to catch the race-condition, then run:

MOZ_ENABLE_WAYLAND=1 ./mach mochitest --enable-webrender

you can choose to run tests by directory:

MOZ_ENABLE_WAYLAND=1 ./mach mochitest docshell/test/ --enable-webrender

Martin Stránský [:stransky] (ni? me)

Comment 3

•

3 years ago

•

Edited

For reference there's a try I run under wayland, you can see various tests there and how they fail :)
https://treeherder.mozilla.org/jobs?repo=try&revision=2a4d6b19d3cfce3dea771d327872df38d8da906e&selectedTaskRun=V12qw91cTJuOuHsYEu-Gjg.0

Robert Mader [:rmader]

Assignee

Comment 4

•

3 years ago

(In reply to Martin Stránský [:stransky] from comment #2)

Robert, can you try to reproduce the issue without the patch? ...

Will try :)

What I already can say is that I do occasionally get crashes which appear to be related to popups - however I find them incredible hard to reproduce. Hopefully the tests make them easier to reproduce.

Martin Stránský [:stransky] (ni? me)

Comment 5

•

3 years ago

Robert, I think I found the underlying issue here. It's caused by the thread specific event loop:
https://searchfox.org/mozilla-central/rev/c7cf087b6e1384608ca3989f042f12f7cabd0a5f/widget/gtk/nsWaylandDisplay.cpp#56

and it's also cause of other similar bugs. There's a race condition between this and loop and the one at Gtk/main thread. Will look at it asap, I have a WIP already.

Robert Mader [:rmader]

Assignee

Comment 6

•

3 years ago

(In reply to Martin Stránský [:stransky] from comment #5)

Robert, I think I found the underlying issue here.
... I have a WIP already.

\o/

Strongly looking forward to it - I had two crashes today alone with the latest Fedora build (although with widget.wayland_vsync.enabled enabled) because of protocol errors - thus no crash reports created :(

It's pretty much the only crash I've seen in a while, so fixing that should make Wayland + WR much more stable from what I can see.

Darkspirit

Updated

•

3 years ago

Blocks: wr-linux-wayland
No longer blocks: wayland

Robert Mader [:rmader]

Assignee

Comment 7

•

3 years ago

Martin, while your patches in bug 1648698 will likely fix the crashes, IIUC they do fully address the error chain. AFAIK we only hit the code paths such as WindowSurfaceWayland::FrameCallbackHandler() when we previously crashed in WR, falling back to the basic renderer, i.e. when we previously got the following warnings:

We don't have EGLSurface to draw into. Called too early?
Compositors might be mixed (5,1)

From what I can see the fallback now happens much less often after bug 1680505 landed (and you backported it to the Fedora FF build), i.e. after a We don't have EGLSurface to draw into. Called too early? we usually keep using WR.
However the fallback still happens sometimes and I think we need to address that in the first place. Maybe the approach here can archive just that, I'll try a bit.

Robert Mader [:rmader]

Assignee

Comment 8

•

3 years ago

Hm, the approach here doesn't seem to help much - I guess we first need to fix bug 1681107. Also adding bug 1648698 to "See also"

Updated

•

3 years ago

Comment 9

•

3 years ago

Closing this one - haven't seen any relate crashes since bug bug 1681107 landed, so we probably don't need this.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → INVALID

Phabricator Automation

Updated

•

3 years ago

Attachment #9191533 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[Wayland][Webrender] Lock surface in GLContextEGL::SwapBuffers()

Categories

(Core :: Graphics, defect)

Tracking

()

People

(Reporter: rmader, Assigned: rmader)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 obsolete file)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated

Comment 9

Updated

Attachment

General

Description

File Name

Content Type