Closed Bug 1635186 Opened 4 years ago Closed 2 years ago

[Xwayland] Webrender causes firefox to lag when video is played behind another window

Categories

(Core :: Graphics: WebRender, defect)

78 Branch
Desktop
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox78 --- disabled

People

(Reporter: sajdl.vojtech, Assigned: rmader)

References

(Depends on 1 open bug)

Details

Attachments

(3 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0

Steps to reproduce:

Tried finding a regression range for this bug, but it pointed me to Webrender being enabled on AMD GPUs. Disabling Webrender helps. The windows are open each on different monitor. I'm running ArchLinux, my GPU is AMD RX 5700XT.

  1. Open any video and have it playing in one window, have Google open in another
  2. Put the video window behind another window (i.e. Slack)
  3. Try typing inside the Google search bar

Actual results:

Browser is unresponsive and typing is really slow

Expected results:

Browser should be responsive

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

Still present after update to Nightly 78. From the looks of it is not limited to playing videos only, but every time a window, which is not directly visible needs to draw something Firefox lags - noticed it with developer tools too when refreshing page and leaving the devtool window in background.

When reporting I forgot to mention that I'm running Gnome 3.36 on Wayland if that helps.

OS: Unspecified → Linux
Hardware: Unspecified → Desktop
Version: 77 Branch → 78 Branch

From further testing this doesn't seem to affect Gnome Xorg session, so it's very probably something Wayland related.

Because this bug's Severity is normal and has not been changed, and this bug's priority is -- (none,) indicating it has has not been previously triaged, the bug's Severity is being updated to -- (default, untriaged.)

Severity: normal → --

Can you share the contents of about:support as a text file?

Flags: needinfo?(sajdl.vojtech)
Blocks: wr-linux
Severity: -- → S3
Attached file about:support.txt
Flags: needinfo?(sajdl.vojtech)

Can't reproduce here but there are a couple of possible reasons ... lets start with this: could you try if it still happens if you enable widget.wayland_vsync.enabled?

Flags: needinfo?(sajdl.vojtech)

Tried right now, doesn't seem to do much, might have delayed the lagging for a few seconds (might be placebo though). I tested with one of my two monitors disconnected and that actually resolved the issue. It is worth noting that the two monitors have each different refresh rate. Setting them to the same refresh rate didn't help at all last time i tested, so I omitted that info.

Flags: needinfo?(sajdl.vojtech)

Window Protocol: x11
Desktop Environment: gnome

(Robert Mader [:rmader] from comment #7)

Can't reproduce here but there are a couple of possible reasons ... lets start with this: could you try if it still happens if you enable widget.wayland_vsync.enabled?

Can you try this with env var $ MOZ_ENABLE_WAYLAND=1 path/to/firefox on Gnome Wayland?

Oh, it's on X11, not on Wayland - so that option doesn't have any effect. Sorry for the noise then!

Window Protocol: x11

Good catch, I thought I had MOZ_ENABLE_WAYLAND=1 in the launch command (and kinda thought it was on by default by now). Turns out I don't have it set on my main PC and that this actually fixed the issue for me.

Using that option causes Firefox to crash when using autoscroll - is that a known issue? I couldn't find anything much on bugzilla, but I don't exactly know how to search on here properly

[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface
[GFX1-]: We don't have EGLSurface to draw into. Called too early?
[GFX1-]: Compositors might be mixed (5,2)
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
ExceptionHandler::GenerateDump cloned child 27008
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
Exiting due to channel error.

Autoscroll works fine here - don't know about an existing issue about it though.

Concerning the issue: we should probably change the title to reflect that it happens on Xwayland (which is a special case as Firefox has a weird vsync implementation that does work better on plain X11 but not on Xwayland) and only in a multi monitor setup.

Finally, we don't know yet if other compositors apart from Gnome are affected - I wouldn't be surprised, given bug 3 and 1162 (both hopefully resolved in 3.38 as there's very active work on both of them).

https://gitlab.gnome.org/GNOME/mutter/-/issues/3
https://gitlab.gnome.org/GNOME/mutter/-/issues/1162

Summary: Webrender causes firefox to lag when video is played behind another window → Webrender causes firefox to lag when video is played behind another window (Gnome/Xwayland)
Depends on: wayland-nightly
Blocks: wayland
No longer depends on: wayland-nightly

Filed bug 1638084 for the crashing autoscroll icon.

I can confirm this issue occurs with xwayland only when:

  • There are (at least) two windows open
  • There is any animation in the first window (video, gif... for example https://en.wikipedia.org/wiki/GIF#/media/File:Rotating_earth_(large).gif)
  • The animation happens in the selected tab. If another tab with no animation is selected, there is no lag
  • The window where the lag occurs is maximized (the second window). If not, there is no lag.
  • If both windows play animation, only the focused window is laggy. You can see that with gnome, when you put both windows in different workspace (see video attachment). I'm not entirely sure about this one, because if your window lag, focusing to another software don't stop the lag...
  • The animation can happen in the sidebar, in that case it's still the other window that will lag (not the window with the sidebar)

The lag caused by this issue make the entire UI of Firefox irresponsive, even when you use keyboard shortcuts.
The UI is rendered at around 1 fps, and I don't see any kind of spike in CPU or GPU usage.

(In reply to filman230 from comment #20)

I can confirm this issue occurs with xwayland only when:

  • There are (at least) two windows open
  • There is any animation in the first window (video, gif... for example https://en.wikipedia.org/wiki/GIF#/media/File:Rotating_earth_(large).gif)
  • The animation happens in the selected tab. If another tab with no animation is selected, there is no lag
  • The window where the lag occurs is maximized (the second window). If not, there is no lag.
  • If both windows play animation, only the focused window is laggy. You can see that with gnome, when you put both windows in different workspace (see video attachment). I'm not entirely sure about this one, because if your window lag, focusing to another software don't stop the lag...
  • The animation can happen in the sidebar, in that case it's still the other window that will lag (not the window with the sidebar)

The lag caused by this issue make the entire UI of Firefox irresponsive, even when you use keyboard shortcuts.
The UI is rendered at around 1 fps, and I don't see any kind of spike in CPU or GPU usage.

Could you try reproducing this with

  • MOZ_X11_EGL=1 env var set
  • without the above env var but layers.acceleration.force-enabled enabled (and webrender force-disabled)

on nightly? The reason I'm asking is that there are several different things that could cause the issue and the above options force two of the code paths I would think about first. Sorry for not trying to reproduce myself atm, little time :(

Also, could everyone who sees the issue post their Xorg version (I assume some 1.20 point release)? Thanks!

(In reply to Robert Mader [:rmader] from comment #22)

Could you try reproducing this with

  • MOZ_X11_EGL=1 env var set
  • without the above env var but layers.acceleration.force-enabled enabled (and webrender force-disabled)

on nightly? The reason I'm asking is that there are several different things that could cause the issue and the above options force two of the code paths I would think about first. Sorry for not trying to reproduce myself atm, little time :(

Also, could everyone who sees the issue post their Xorg version (I assume some 1.20 point release)? Thanks!

xorg-server version is 1.20.9-2, and yes I was (and is) on nightly.

I'm sorry, I don't know why, but now when there are animations in both windows, there is no lag... (I updated nightly though, it was one day old...)

Is having gfx.webrender.all set to false enough to force-disable webrender?

So, here is what I got, this is a bit complicated so I'm not entirely sure if I reported it correctly. There is always an animation playing in the background in another window:

In any case those options improved the situation, if there is an animation in another window, you can browse without problem, the scroll is fluid, the page is responsive...
BUT if you touch any UI element, AND IF there is no animation inside your tab, then your window will lag. If there is an animation (like a gif), it won't lag. If there are animations in both windows, there is not lag caused by the UI.
By "touch any UI element", I mean using the megabar (typing text), or simply hovering the buttons in the megabar (favorite button, pocket, read mode...)
This is where the subtle difference comes to play: (not entirely sure about this)
In any case, clicking on the megabar and typing text will cause your window to lag (not sure if it would stop after some time), but with MOZ_X11_EGL=1, hovering the buttons will cause a lag for a few seconds, while with layers.acceleration.force-enabled, the lag is much longer (never stops?).

I don't know if it's useful, but here is the output in the terminal with MOZ_X11_EGL=1:

Can't find symbol 'eglGetNativeClientBufferANDROID'.
Can't find symbol 'eglQuerySurfacePointerANGLE'.
Can't find symbol 'eglCreateStreamKHR'.
Can't find symbol 'eglDestroyStreamKHR'.
Can't find symbol 'eglQueryStreamKHR'.
Can't find symbol 'eglStreamConsumerGLTextureExternalKHR'.
Can't find symbol 'eglStreamConsumerAcquireKHR'.
Can't find symbol 'eglStreamConsumerReleaseKHR'.
Can't find symbol 'eglStreamConsumerGLTextureExternalAttribsNV'.
Can't find symbol 'eglCreateStreamProducerD3DTextureANGLE'.
Can't find symbol 'eglStreamPostD3DTextureANGLE'

I'm sorry if this is confusing.

I think I now know what's happening here at IIUC it's either a bug in Xwayland or of the GLX vsync implementation.

Xwayland has a mechanism to adapt its reported refresh rate to Wayland frame callbacks. If no callbacks arrive, it will throttle to 1Hz (can can simply test by running glxgears, hiding it behind some other window - it will fall back to 1fps in a Wayland session).

Although the GLX vsync source on Xwayland does not properly detect the higher refresh rates, the throttling mechanism still appears to apply to it. You can reproduce by:

  • open FF X11 in Gnome Shell Wayland session
  • open https://www.vsynctester.com/
  • cover FF with some other window (I used gnome-terminal)
  • wait a few seconds
  • switch to FF again

The website will report much lower fps for the last seconds, while quickly catching up again.

Now in the case described above, one window is always on a hidden workspace, thus not getting frame callbacks and thus being throttled (as soon as the overview is opened, frame callbacks will get send to all windows). So apparently the throttling to 1Hz somehow happens despite the fact that there's a window completely visible. Why the effect only gets visible when both windows render I can only speculate about - but my guess is that if one window is idle, its vsync source gets stopped, thus it will not call into the glXWaitVideoSync.

To summarize, I think this could be a bug in the glXWaitVideoSync implementation, either in mesa or Xwayland, or in the way we handle the vsync source.

Michel, does this sound reasonable / possible?

Flags: needinfo?(michel)

(In reply to Robert Mader [:rmader] from comment #24)

Michel, does this sound reasonable / possible?

I'm afraid not.

I'm able to reproduce the problem, but glXWaitVideoSyncSGI seems to be consistently returning after just a fraction of a second.

(Since glXWaitVideoSyncSGI doesn't take a drawable parameter, the Mesa implementation uses the drawable of the current GLX context, which is the root window AFAICT on the GLXVsyncThread. Since there's no Wayland object corresponding to the X11 root window, Xwayland uses a fake ~60 Hz timer for its MSC)

AFAICT the problem is that the Renderer thread of the main Firefox process repeatedly calls glXSwapBuffers for the background window, which in the long term only returns after ~1s (when one of the previous buffer swaps has actually completed).

Flags: needinfo?(michel)

(In reply to Michel Dänzer from comment #25)

AFAICT the problem is that the Renderer thread of the main Firefox process repeatedly calls glXSwapBuffers for the background window, which in the long term only returns after ~1s (when one of the previous buffer swaps has actually completed).

Ah that makes sense, thanks!

So I think the solution here is to disable the GLX vsyncsource and use the software 60Hz timer on Xwayland. There's no regression for us here, as Xwayland also serves us with a software 60Hz timer. Also the native Wayland backend now ships a the frame callback based source, i.e. users needing 144Hz etc. can just switch to that.

Assignee: nobody → robert.mader
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

wayland will give us a 60Hz timer for glXWaitVideoSyncSGI anyway,
but an optimization in Xwayland to reduce that to 1Hz if a window is
occluded can cause issues for us in multi-window cases.

In unaffected (i.e. single window) cases this will make us consume
more resources, as rendering will not get throttled to 1Hz anymore
when hidden. The native Wayland backend supports this, however.

Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/5e30c026c632
Do not use GLX vsync source on Xwayland, r=stransky
Flags: needinfo?(robert.mader)

Thanks. This possibly needs a more comprehensive approach.

Flags: needinfo?(robert.mader)

I increasingly think we'll have to implement bug 1640779 for this. The good news is that after bug 1645528 a lot of per-window infrastructure is now in place and used by default on Wayland.

Depends on: 1640779

(In reply to Robert Mader [:rmader] from comment #32)

I increasingly think we'll have to implement bug 1640779 for this.

Seems doubtful that (by itself) would make any difference either, since the issue here isn't directly related to VSync functionality; it's that SwapBuffers blocks, which will be an issue as long as Firefox calls SwapBuffers for multiple windows from the same thread.

(In reply to Michel Dänzer from comment #33)

(In reply to Robert Mader [:rmader] from comment #32)

I increasingly think we'll have to implement bug 1640779 for this.

Seems doubtful that (by itself) would make any difference either, since the issue here isn't directly related to VSync functionality; it's that SwapBuffers blocks, which will be an issue as long as Firefox calls SwapBuffers for multiple windows from the same thread.

The idea is the EGL vsync source would throttle down SwapBuffers for those Windows to 1Hz under exactly the same conditions where we hit this issue IIUC.

See Also: → 1628437
Summary: Webrender causes firefox to lag when video is played behind another window (Gnome/Xwayland) → Webrender causes firefox to lag when video is played behind another window (Xwayland)
Attachment #9205902 - Attachment is obsolete: true
No longer blocks: wayland
Summary: Webrender causes firefox to lag when video is played behind another window (Xwayland) → [EGL] Webrender causes firefox to lag when video is played behind another window (Xwayland)

I ran into this today (using Nightly on Ubuntu 21.04), due to having a Google Doc as the foreground tab in a background window. (Specifically, it was my 1:1 notes doc with my manager) The doc is only 7 US Letter pages long, so this isn't a huge-amounts-of-content-being-painted sort of issue. It may have been due to the blinking cursor (either my own local blinking-cursor or the second blinking-cursor from me-having-the-doc-open-in-another-tab).

I was getting ~1 second paint latency for UI and interacting with other windows. Things went back to normal when I closed the Google Doc. (And I was able to make the issue happen again by opening a new background-window with the same Google Doc again.)

Summary: [EGL] Webrender causes firefox to lag when video is played behind another window (Xwayland) → [Xwayland] Webrender causes firefox to lag when video is played behind another window

Just for the record: strictly speaking this is not a Firefox issue but a Xwayland one - some other applications have been reported to be affected as well, for example steam. The big question for is whether it's worth to try to implement the proposed solution from comment 32, or rather just push the native Wayland backend to get production ready (bug 1543600). Given that the solution via EGL would likely not cover e.g. nvidia prop. drivers for a while makes me think that going native Wayland is the better investment.

So for everyone affected I'd recommend to enable the native Wayland backend via MOZ_ENABLE_WAYLAND=1 (see also https://mastransky.wordpress.com/2020/03/16/wayland-x11-how-to-run-firefox-in-mixed-environment/)

(In reply to Robert Mader [:rmader] from comment #37)

Just for the record: strictly speaking this is not a Firefox issue but a Xwayland one

It's the Wayland compositor which stops sending frame callbacks. Xwayland doesn't hold anything back.

  • some other applications have been reported to be affected as well, for example steam.

Right, same issue there, calling SwapBuffers for multiple windows on the same thread.

The big question for is whether it's worth to try to implement the proposed solution from comment 32, or rather just push the native Wayland backend to get production ready (bug 1543600). Given that the solution via EGL would likely not cover e.g. nvidia prop. drivers for a while makes me think that going native Wayland is the better investment.

Agreed.

For some odd reasons I'm unable to reproduce this for a while now on recent Gnome + recent Firefox. Can anyone confirm to still see this when running nightly in a Wayland session (but of course Firefox X11 backend)?

Behavior of this EGL/Xwayland bug seemed to be similar to GLX/Nvidia bug 1716049.

(Sotaro Ikeda [:sotaro] from bug 1716049 comment #10)

When WebRender is enabled, RenderCompositorOGL and GLContextGLX. RenderCompositorOGL creates GLContextGLX for each window. I wonder if it might be related to the problem. If RenderCompositorEGL is used only one GLContextEGL is created for all windows.

Then I would assume bug 1684194 has fixed this EGL/Xwayland bug.

But comment 36 was after bug 1684194.

(In reply to Darkspirit from comment #42)

But comment 36 was after bug 1684194.

It's not obvious to me if comment 36 was on EGL. Daniel, could you retest recent nightly (which enables EGL by default on recent Mesa) and check if you still see the issue?
An Jan, could you also confirm that you can't reproduce it?
It would be such an relieve if this was finally fixed!

Flags: needinfo?(dholbert)

Jan, can you shortly confirm that you also can't reproduce the issue?

The odd thing is that I also can't reproduce the issue with GLX (gfx.x11-egl.force-disabled). To me this indicates that something must have changed in Xwayland or Mesa, but I'm not aware of anything that could have fixed it. Then again, other apps like Steam were also hit by this bug, so there's a chance that somebody got fixed somewhere.

In any case, I'm very inclined to reenable HW-WR on Xwayland on recent Mesa - maybe only if EGL is available, increasing the chance that this does not happen (because of bug 1684194).

Flags: needinfo?(jan)
Blocks: 1730671

As discussed on IRC, apparently Firefox sets swap interval 0 now, which avoids the problem.

Attached video 2021-09-14_21-06-15.mp4

Debian Testing (Frankendebian with Mesa 21.2.1 from unstable), Gnome (X)Wayland, Intel Iris Graphics 6100 (BDW GT3)

(Michel Dänzer from comment #48)

As discussed on IRC, apparently Firefox sets swap interval 0 now, which avoids the problem.

bug 1515448 set fSwapInterval(0) on EGL/Wayland (visible window froze when the other window was on another workspace & invisible).
bug 1684194 + bug 1713468 + bug 1695933 brought the fix to X11+Xwayland.


Yes, it was bug 1684194 which fixed EGL/Xwayland:
https://hg.mozilla.org/integration/autoland/shortlog/52299c7cbec4
last bad: MOZ_X11_EGL=1 mozregression --repo autoland --launch de1a1b350e9e0fb606cc7f5b709df544af8dd313 --pref gfx.webrender.all:true -a https://www.vsynctester.com/ -a https://www.vsynctester.com/
first good: MOZ_X11_EGL=1 mozregression --repo autoland --launch 52299c7cbec44f2fe75273acdf2aed8e2496931c --pref gfx.webrender.all:true -a https://www.vsynctester.com/ -a https://www.vsynctester.com/


This bug is still reproducible with GLX/Xwayland:
Attached screencast: mozregression --launch 20210913213224 --pref gfx.x11-egl.force-disabled:true gfx.webrender.all:true -a https://www.vsynctester.com/ -a https://www.vsynctester.com/

Nvidia bug 1716049 seems to be similar to this GLX/Xwayland bug.
After submitting this comment I will switch to a regular X11 session and test there.

Yes, GLX/Xwayland can be fixed by setting swap interval to 0 (layout.frame_rate=0):
mozregression --launch 20210913213224 --pref gfx.x11-egl.force-disabled:true gfx.webrender.all:true layout.frame_rate:0 -a https://www.vsynctester.com/ -a https://www.vsynctester.com/

https://searchfox.org/mozilla-central/rev/fb7c66cb59ccc282aecfe157b05dc12b1e38753f/gfx/gl/GLContextProviderGLX.cpp#482-486

// Many GLX implementations default to blocking until the next
// VBlank when calling glXSwapBuffers. We want to run unthrottled
// in ASAP mode. See bug 1280744.
const bool isASAP = (StaticPrefs::layout_frame_rate() == 0);
mGLX->fSwapInterval(*mDisplay, mDrawable, isASAP ? 0 : 1);

There seem to be made some assumptions based on the layout.frame_rate pref:
https://searchfox.org/mozilla-central/rev/fb7c66cb59ccc282aecfe157b05dc12b1e38753f/gfx/layers/ipc/CompositorBridgeParent.cpp#248

static int32_t CalculateCompositionFrameRate() {
// Used when layout.frame_rate is -1. Needs to be kept in sync with
// DEFAULT_FRAME_RATE in nsRefreshDriver.cpp.
// TODO: This should actually return the vsync rate.

Maybe the layout.frame_rate pref should be left untouched (= -1)
and the code directly be changed to mGLX->fSwapInterval(*mDisplay, mDrawable, 0);?
I don't know. I am not a programmer.

Flags: needinfo?(jan)
Flags: needinfo?(dholbert)

(In reply to Darkspirit from comment #49)

Nvidia bug 1716049 seems to be similar to this GLX/Xwayland bug.
After submitting this comment I will switch to a regular X11 session and test there.

Have moved the second Firefox window

  1. behind the terminal and
  2. onto another workspace

and could not reproduce any instant problem on:

  • GLX/Gnome X11/Intel
  • GLX/KDE X11 without compositing/Intel

In case they are affected, then it would take time to reproduce or different STR.

This bug is still reproducible with GLX/Xwayland

Thanks Jan! Good to know that you can still reproduce the issue on GLX but not on EGL after bug 1684194. I'll take that as a "go" for bug 1730671, but will still wait till shortly before the next beta (as it doesn't affect nightly). Fingers crossed that all of this (enabling EGL by default etc.) sticks!

No longer depends on: 1640779

(Darkspirit from bug 1730822 comment 7)

WebRender (Software OpenGL)

[...]

apparently does not use RenderCompositorEGL yet.
Yes, SW WR OpenGL is affected by bug 1635186 on EGL/Xwayland while regular WR is not anymore.
MOZ_X11_EGL=1 mozregression --launch 2021-09-15 --pref devtools.chrome.enabled:true gfx.webrender.all:true gfx.webrender.software:true gfx.webrender.software.opengl:true -a https://www.vsynctester.com/ -a https://www.vsynctester.com/

See Also: → 1732365
No longer blocks: wr-linux

This bug essentially depends on bug 788319 - as we unblock more hardware/drivers for EGL, it will automatically solve the issue here as well.

Depends on: linux-egl

Closing as there's nothing to track here any more. We'll only enable HW-WR on Xwayland for setups that are also qualified for EGL, which solves the issue here.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: