Closed Bug 1712665 Opened 3 years ago Closed 3 years ago

[X11][EGL][WR] UI freezes when the system's compositor is suspended

Categories

(Core :: Graphics: WebRender, defect, P3)

Firefox 90
Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- disabled
firefox88 --- disabled
firefox89 --- disabled
firefox90 --- disabled
firefox91 --- disabled
firefox92 --- disabled
firefox93 --- fixed

People

(Reporter: tgnff242, Assigned: rmader)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: nightly-community, regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

Step (2) requires KWin. I can't test other compositors currently.

  1. Start Firefox in X11 with EGL (MOZ_X11_EGL=1).
  2. Suspend the compositor (Alt + Shift + F12).
  3. Try to open a few new tabs.

Actual results:

The UI freezes. If you click on a bookmark, however, the titlebar suggests that it's loaded normally.

Some applications, including mpv (if it's not started with --x11-bypass-compositor=no), suspend the compositor by default when they are in fullscreen which triggers this issue in Firefox.

Expected results:

mozregression result:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=de1a1b350e9e0fb606cc7f5b709df544af8dd313&tochange=52299c7cbec44f2fe75273acdf2aed8e2496931c

Has Regression Range: --- → yes
Has STR: --- → yes
Regressed by: 1684194
Attached file about:support.txt
Blocks: wr-linux
OS: Unspecified → Linux
Blocks: linux-egl

:stransky, can you comment to the bug?

Flags: needinfo?(stransky)

Set release status flags based on info from the regressing bug 1684194

I think we need to restart WebRender and re-create all surfaces in such case. I expect Gtk fires some event for it so we can track the change.
Do I understand correctly the this bug affects Firefox fullscreen mode or not?

Flags: needinfo?(stransky) → needinfo?(tgnff242)
Priority: -- → P3
Summary: UI freezes when the system's compositor is suspended if EGL is used → [KDE] UI freezes when the system's compositor is suspended if EGL is used

No, the state of Firefox's window is not important. Setting Firefox in fullscreen or maximising its window doesn't suspend the compositor.

Flags: needinfo?(tgnff242)

In my case. Firefox freezes each time I open or close a game because the game stop the compositor.
The workaround is:
Close
Open

End xD..

But before it was not that way. Egl enabled on x11 didn't make Firefox freezes each time composition is enabled or disabled so it has to be a regression in some point of Firefox stable releases. (Or a regression on manjaro libs )
Can someone confirm on others so?

(In reply to albertogomezmarin from comment #6)

In my case. Firefox freezes each time I open or close a game because the game stop the compositor.
The workaround is:
Close
Open

End xD..

But before it was not that way. Egl enabled on x11 didn't make Firefox freezes each time composition is enabled or disabled so it has to be a regression in some point of Firefox stable releases. (Or a regression on manjaro libs )
Can someone confirm on others so?

More info I think it happens since one month or more

This sounds like a wontfix due to unexpected user behavior, but I might be wrong. Andrew, any thoughts?

Flags: needinfo?(aosmond)

It sounds like a potential blocker for an EGL rollout to release. We should leave it open for now.

Some applications, including mpv (if it's not started with --x11-bypass-compositor=no), suspend the compositor by default when they are in fullscreen which triggers this issue in Firefox.

IIUC it is relatively normal X11 WM behaviour - Gnome does this as well when a game is launched. One open question is if other WMs also trigger this bug. If it's only KWin, then it should probably also get fixed there.

It's reported in our chat group that GNOME has this issue too.

I just tried to restart picom (a standalone compositor) on my Awesome WM, and now I have to drag this tab out from the frozen window to make this comment.

I just want to chime in that I have the same issue. Ever since 88 (and 89), ff's ui will freeze when a compositor is suspended on kwin - for example when playing a game which suspends it. It was not an issue with 87.

Removing MOZ_X11_EGL=1 from /etc/environment works around the issue. But this is a regression, since it worked fine up until FF87 (incl) even with the envvar set.

See Also: → 1714199

I see this too with Fedora 34's KDE spin on FF89; as others have said unsetting MOZ_X11_EGL is a workaround this issue.

I can instantly reproduce this on KDE/Fedora 34 but not on Gnome (and double-checked that e.g. a game was indeed unredirected). Now I wonder if this a FF or KDE bug. Maybe the difference is between fullscreen unredirecting and actually disabling compositing - the previous only affecting the application in question, while the later affects all apps.

Vlad, have you heard similar reports for Kwin before?

Flags: needinfo?(vlad.zahorodnii)

I don't recall seeing this bug specifically. For what it's worth, this issue can be reproduced on my machine too. It's worth noting that when kwin stops compositing, it will nuke everything that has to do with compositing, including the compositing selection. At quick glance, it's either an app bug, e.g. it doesn't gracefully handle the case where the compositing manager selection disappears, or a bug down in the stack, e.g. in xorg.

Flags: needinfo?(vlad.zahorodnii)

It's worth noting that this feels like a regression on FF's side, since FF87 handled the compositor shutdown gracefully and perfectly and there's been no relevant kde plasma update in the time of 88 update when the issue popped up.

For GNOME + Xorg you'll need to alt+f2 r to restart the window manager, which is much less annoying.

Thanks, restarting Gnome via alt+f2 r does reproduce the issue for me. What changed in bug 1684194 is that we now use a single shared GL context, like we do on Wayland. Previously we created one for every window, like we do on GLX. Will have a look.

At quick glance, it's either an app bug, e.g. it doesn't gracefully handle the case where the compositing manager selection disappears, or a bug down in the stack, e.g. in xorg.

We apparently handled it previously, so yeah, this will potentially require some deep digging :/

Blocks: 1695933
Summary: [KDE] UI freezes when the system's compositor is suspended if EGL is used → [X11][EGL][WR] UI freezes when the system's compositor is suspended
See Also: → 1718959

Hi! this issue could be related to 1693445

Starting firefox without the MOZ_X11_EGL=1 variable prevents this from happening. Of course there will be a loss of video acceleration, but it's a workaround for now
Plasma 5.22.3

See Also: → 1693445
Status: UNCONFIRMED → NEW
Ever confirmed: true
See Also: → 1725538

So reverting bug 1684194 does indeed work around this issue, but it should be noted that even then we don't handle things clearly. Debug builds will independently of GLX or EGL run into an assert in [1] and there are also a bunch of warnings: CP+[GFX1-]: Failed to connect WebRenderBridgeChild.

This sounds to me like even before bug 1684194 we only handled the case by chance, not intention. Will take a look if I can make things work.

1: https://searchfox.org/mozilla-central/source/gfx/layers/ipc/ContentCompositorBridgeParent.cpp#214

Good news: all what's missing is a call to RenderCompositorEGL::Resume() because the RenderCompositorEGL object gets destroyed and recreated. Now I just have to find the best place to put it.

(In reply to Robert Mader [:rmader] from comment #27)

Good news: all what's missing is a call to RenderCompositorEGL::Resume() because the RenderCompositorEGL object gets destroyed and recreated. Now I just have to find the best place to put it.

Martin, as you probably know best how the pause/resume logic is handled in nsWindow, do you have an idea what we could do here? I think all we need is to call nsWindow::ResumeCompositor() after the reset.

For the record, you can trigger this in a Gnome Xorg session via alt+f2 -> r

Flags: needinfo?(stransky)

(In reply to Robert Mader [:rmader] from comment #28)

(In reply to Robert Mader [:rmader] from comment #27)

Good news: all what's missing is a call to RenderCompositorEGL::Resume() because the RenderCompositorEGL object gets destroyed and recreated. Now I just have to find the best place to put it.

Martin, as you probably know best how the pause/resume logic is handled in nsWindow, do you have an idea what we could do here? I think all we need is to call nsWindow::ResumeCompositor() after the reset.

For the record, you can trigger this in a Gnome Xorg session via alt+f2 -> r

Yes, I think it may be fairly easy.

You can add the Pause/Resume block at void nsWindow::OnCompositedChanged() routine. It may be interesting check how screen_composited_changed_cb() behaves - does it reset the compositor before or after the widget_composited_changed_cb callback?

I think the best solution is to call

mCompositorState = COMPOSITOR_PAUSED_MISSING_EGL_WINDOW;
ResumeCompositorHiddenWindow();

(or we can add a parameter to ResumeCompositorHiddenWindow() like ResumeCompositorHiddenWindow(bool aForceResume = false)
to skip COMPOSITOR_PAUSED_MISSING_EGL_WINDOW check).

Flags: needinfo?(stransky)

To ensure we don't freeze when the compositing state changes and
makes a similar call in Create() redundant.

Note: we can't do this in OnCompositedChanged() as it gets called
too early.

Assignee: nobody → robert.mader
Status: NEW → ASSIGNED
See Also: → 1720388
See Also: → 1727703
Pushed by robert.mader@posteo.de:
https://hg.mozilla.org/integration/autoland/rev/c41693441230
Ensure we reset the EGLSurface on new CompositorWidgetDelegate on X11/EGL, r=stransky
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch

Thanks once again for your efforts @Robert Mader!
May I ask if there could be still cases left where the window content breaks (e.g. when running uncomposited all the time and changing virtual desktops etc.)? If no, could this complete fix also be achieved for the GLX backend (Nvidia users, sigh)? With GLX, it's not that bad like it was with EGL, but it can appear "glitchy" after resuming compositing by losing window content temporarily/until rendered scene changes.

(In reply to walmartguy from comment #33)

Thanks once again for your efforts @Robert Mader!
May I ask if there could be still cases left where the window content breaks (e.g. when running uncomposited all the time and changing virtual desktops etc.)?

No, after this fix there won't be any bug anymore ;) Sure there still can be bugs, please file issues for them if you still see them with with this patch (with the EGL backend).

If no, could this complete fix also be achieved for the GLX backend (Nvidia users, sigh)? With GLX, it's not that bad like it was with EGL, but it can appear "glitchy" after resuming compositing by losing window content temporarily/until rendered scene changes.

This patch is strictly EGL focussed - and TBH, I really hope we can make the EGL backend the default eventually soon as it has so many benefits. As to Nvidia, I got reports that the EGL backend works on the recent 470 driver series - maybe you can try that?

(In reply to Robert Mader [:rmader] from comment #34)

No, after this fix there won't be any bug anymore ;) Sure there still can be bugs, please file issues for them if you still see them with with this patch (with the EGL backend).

The "good" news: The latest Nightly build still freezes also with EGL + Nvidia, so I should be able to test and confirm the fix as soon as it ships in Nightly. :)

This patch is strictly EGL focussed - and TBH, I really hope we can make the EGL backend the default eventually soon as it has so many benefits. As to Nvidia, I got reports that the EGL backend works on the recent 470 driver series - maybe you can try that?

First impression is that it seems to just work. However (and that's the bad news), still with the ugly 60fps fixed timer vsync. :(

(In reply to walmartguy from comment #35)

This patch is strictly EGL focussed - and TBH, I really hope we can make the EGL backend the default eventually soon as it has so many benefits. As to Nvidia, I got reports that the EGL backend works on the recent 470 driver series - maybe you can try that?

First impression is that it seems to just work. However (and that's the bad news), still with the ugly 60fps fixed timer vsync. :(

That's intentional because of bug 1669275. Good to have this confirmed! If the Nvidia driver does not necessarily crash anymore when mixing EGL and GLX, I'll create a patch to also allow the GLX vsync work on Nvidia with EGL and ping you to test it - maybe we're lucky it also just works ¯_(ツ)_/¯

I've tried 93b1 with Nvidia: I can confirm that Firefox doesn't freeze anymore. However, just like with the GLX backend, the window content needs a little bit of "pushing" (e.g. scrolling input) to re-appear.

See Also: → 1708498

Does current status mean that the bug won't be fixed in ESR 91?

(In reply to jirtahupsu from comment #40)

Does current status mean that the bug won't be fixed in ESR 91?

Yes, correct, EGL/X11 is disabled by default in ESR 91, therefore the fix hasn't been backported.
EGL/X11 is enabled by default in 94, currently Beta. Firefox 94 will be released 2021-11-02.
If you want to test EGL/X11, please use https://nightly.mozilla.org or https://beta.mozilla.org, otherwise you need to wait for Firefox 94.

(In reply to Darkspirit from comment #41)

Yes, correct, EGL/X11 is disabled by default in ESR 91, therefore the fix hasn't been backported.
But many people enable it for HW accelerated video (every manual you see will tell you to activate it), so I don't see why it wouldn't be backported, especially if it is disabled by default, so no (even theoretical) breakage for the default installs.
Otherwise people on ESR will have to suffer for another year without HW acceleration or hanging FF window every time they restart Gnome.

It's a policy to not backport patches for non-default features to ESR and chances are low that a such a request would get accepted. Given that this seems to be a quite commonly activated feature and that the chances of breakage are indeed very low (the code is explicitly guarded), we can give it a try.

Comment on attachment 9238211 [details]
Bug 1712665 - Ensure we reset the EGLSurface on new CompositorWidgetDelegate on X11/EGL, r=stransky

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This fixes the most common bug for a popular but non-default feature. The feature (X11/EGL) is needed for hardware video decoding (important for battery life) and got enabled by default in the 94 release.
  • User impact if declined: ESR users who would like to activate hardware video decoding may see regular freezes.
  • Fix Landed on Version: 93
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The code change is simple and explicitly guarded with a check for the feature and thus can't regress default configurations. There are also no known regressions from the patch.
  • String or UUID changes made by this patch:
Attachment #9238211 - Flags: approval-mozilla-esr91?

Comment on attachment 9238211 [details]
Bug 1712665 - Ensure we reset the EGLSurface on new CompositorWidgetDelegate on X11/EGL, r=stransky

This will need a rebased patch for ESR91 before we can consider taking it there.

Flags: needinfo?(robert.mader)
Attachment #9238211 - Flags: approval-mozilla-esr91?
Blocks: 1739748

Backporting to 91 is unfortunately not trivial, thus not going to do it (would require extensive testing i.e. too dangerous for ESR).

Flags: needinfo?(robert.mader)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: