SW WR/Gnome Wayland/Intel GM45: Regression: Firefox 95+ is lagging
Categories
(Core :: Widget: Gtk, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr91 | --- | unaffected |
| firefox95 | --- | wontfix |
| firefox96 | --- | fixed |
| firefox97 | --- | fixed |
People
(Reporter: kevin, Unassigned)
References
(Blocks 1 open bug, Regression, )
Details
(Keywords: regression)
Attachments
(5 files, 1 obsolete file)
I have a couple of ThinkPads with Intel GM45 graphics, running Ubuntu 20.04.3 with GNOME Shell 3.36.8.
Since the update to Firefox 95, it's been very laggy; especially noticeable while scrolling up and down, or when a popup window opens (for example when downloading a file). Strangely, the bug is less prominent when the browser window is smaller.
I have a feeling that EGL has something to do with it. When I force the Firefox snap on Ubuntu to run on Xwayland / GLX by setting WAYLAND_DISABLE=1 as an environment variable prior to starting Firefox, the bug is not present. However, when I set gfx.x11-egl.force-enabled to true in about:config, the lagging also happens on Xwayland.
One may assume that EGL is simply broken on this chipset, but that is not the case. When I run snap revert firefox to go back to 94.0.2, the bug is not present, even when running on Wayland / EGL.
So I started testing the tarball releases from https://www.mozilla.org/firefox/
The problems there are identical, so it doesn't seem to be specific to Ubuntu or snap.
I tested 94.0.2, 95.0, 96.0b2 and 97.0a1. I ran all of them with MOZ_ENABLE_WAYLAND=1.
94.0.2 works perfectly fine, and all the other releases show lag while scrolling and sometimes even cause short compositor lockups.
So "something" has changed between 94.0.2 and 95.0, breaking the Wayland / EGL backend at least on gen4 Intel graphics (and maybe other platforms as well).
Comment 1•3 years ago
|
||
Thanks for the report! Please try to find a regression range:
$ sudo apt install python3-pip
$ pip3 install --user mozregression
$ MOZ_ENABLE_WAYLAND=1 ~/.local/bin/mozregression --good 94 --bad 95
| Reporter | ||
Comment 2•3 years ago
|
||
18:51.36 INFO: Narrowed integration regression window from [334e3d59, 77e655ba] (3 builds) to [334e3d59, f632271e] (2 builds) (~1 steps left)
18:51.36 INFO: No more integration revisions, bisection finished.
18:51.36 INFO: Last good revision: 334e3d59d932dd2850dec7c582f32f49b504e450
18:51.36 INFO: First bad revision: f632271e9d62ad6b7e90acd52e77d3e02a66980e
18:51.36 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=334e3d59d932dd2850dec7c582f32f49b504e450&tochange=f632271e9d62ad6b7e90acd52e77d3e02a66980e
| Reporter | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 3•3 years ago
|
||
Kevin, can you share the about:support from an affected device ("Copy text to clipboard" -> paste it here into a comment, if asked to turn it into attachment -> yes)? I also have a GM45 device around (Thinkpad T400) and haven't seen any issue so far, but usually I run it with gfx.webrender.compositor.force-enabled enabled, which may make a difference.
| Reporter | ||
Comment 4•3 years ago
|
||
Here is an example of a Lenovo X200 I have. I have the same issue on my T400 and T500.
When I visited my parents today, I wasn't able to reproduce it on two X220's with Sandy Bridge graphics though.
Comment 5•3 years ago
|
||
So the main difference here should be that these devices run on SW-WR, not HW-WR.
| Reporter | ||
Comment 6•3 years ago
|
||
If you're using an internal monitor with a resolution of 1280x800, the bug is not very clear. But when you use a larger screen (like the T500's 1680x1050 display or an external monitor) it becomes a lot more present. Resizing the window to make it smaller also "improves" the situation.
It can be triggered easiest by going to a site with a lot of text, like about:support, and then scrolling up and down all the time.
Comment 7•3 years ago
|
||
It doesn't reproduce for me here (on Fedora 35, Gnome 41, GTK 3.24.30), which mighty point us to something.
The "all the other releases show lag while scrolling and sometimes even cause short compositor lockups." together with the regressing bug looks to me like this could be an issue around opaque regions. These regions tell the Wayland compositor which areas are opaque so it doesn't need to draw what's behind it. When not setting them appropriately, this usually causes exactly what you describe - the compositor being super busy painting way more than necessary.
On Wayland, getting optimal opaque regions requires GTK 3.24.25 (see bug 1668805), however Ubuntu 20.04 only ships 3.24.20 IIUC (https://packages.ubuntu.com/focal-updates/libgtk-3-0). This means the compositor (Mutter) has to draw the whole content area twice, assuming the GDK wayland surface properly gets its opaque region set. In the regressing bug 1737068 we might have messed up setting the opaque region with the GTK APIs, resulting in the compositor having to draw the full area at least three times - easily letting performance fall from a cliff.
Martin, unfortunately I have very limited time for Firefox atm and can't look into this myself - can you check if bug 1668805 potentially results in GTK not setting its own opaque area any more, maybe due to remapping?
Additional thoughts:
- EGL should not be related here as this is all about software rendering - no GL involved. There is however a certain chance that we somewhere take a different code path with EGL even when using SW rendering.
Comment 8•3 years ago
•
|
||
Kevin, is there any chance that you can update one of the devices to Ubuntu 21.10 (or just run a live session) and check if the issue still occurs there?
| Reporter | ||
Comment 9•3 years ago
|
||
I have tested 21.10 and 22.04 even. The bug is also present there.
Comment 10•3 years ago
|
||
(In reply to Kevin Keijzer from comment #9)
I have tested 21.10 and 22.04 even. The bug is also present there.
Could you give Fedora 35 a go so we know it's not a distro issue?
| Reporter | ||
Comment 11•3 years ago
|
||
I have tested a Fedora 35 live USB. I ran sudo dnf update firefox, which installed Firefox 95.0-1-fc35. The bug is absolutely present there as well.
However, I have found a way to make the bug disappear. When I set gfx.webrender.software.opengl to true, everything seems fine again. This is the case with the deb, snap and tarballs on Ubuntu 20.04, 21.10 and 22.04 and with the rpm and tarballs on Fedora 35.
I used completely clean profiles for this, with only this one setting changed.
Comment 12•3 years ago
|
||
Set release status flags based on info from the regressing bug 1737068
Comment 13•3 years ago
|
||
(In reply to Kevin Keijzer from comment #0)
When I force the Firefox snap on Ubuntu to run on Xwayland / GLX by setting
WAYLAND_DISABLE=1as an environment variable prior to starting Firefox, the bug is not present.
Please attach about:support from that configuration. (With default gfx.x11-egl.force-enabled=false.)
Updated•3 years ago
|
| Reporter | ||
Comment 14•3 years ago
|
||
| apparently-unaffected glx sw-wr xwayland | ||
Comment 15•3 years ago
|
||
Thanks!
(In reply to Kevin Keijzer from comment #0)
However, when I set gfx.x11-egl.force-enabled to true in about:config, the lagging also happens on Xwayland.
Please attach that about:support, too.
Updated•3 years ago
|
| Reporter | ||
Comment 16•3 years ago
|
||
Comment 17•3 years ago
|
||
Thanks for all the information Kevin, that is really appreciated. I'm very puzzled that I haven't been able to reproduce the issue, even though using a device with the exact same GPU device ID.
The fact that gfx.webrender.software.opengl appears to help is quite interesting, even though I'm not quite sure what to make of it yet.
In order to validate my theory in comment 7, could you:
- use a distro with Gnome >= 3.38 and GTK >= 3.24.25
- when the issue is present, enable the opaque region debugger (run
alt+f2(run command) ->lg->Meta.add_debug_paint_flag(Meta.DebugPaintFlag.OPAQUE_REGION)- you can disable it again withMeta.remove_debug_paint_flag(Meta.DebugPaintFlag.OPAQUE_REGION)) - make sure that the firefox window has a green overlay (as opposed to purple)?
Broken opaque regions would be very helpful to rule out.
| Reporter | ||
Comment 18•3 years ago
|
||
I asked a friend with a T400 to test it and he could reproduce it as well. (He's on Ubuntu 20.04 with the Firefox 95 deb with MOZ_ENABLE_WAYLAND=1.)
As requested, I'm testing on a fully updated Ubuntu 22.04 (which has GNOME Shell 40.5 and libgtk 3.24.30).
The overlay stays green all the time, even when the scrolling stutters. I never see any purple unless I open menus or windows pop up.
Comment 19•3 years ago
|
||
(In reply to Kevin Keijzer from comment #18)
...
Thanks! So it's not about opaque regions... Does disabling frame callback throttling help (widget.wayland.vsync.enabled:false)?
| Reporter | ||
Comment 20•3 years ago
|
||
No, widget.wayland.vsync.enabled has no noticeable effect.
I've also made two videos, which could perhaps clarify things a bit.
https://quietlife.nl/files/tmp/default.mp4 is Firefox 95 on Wayland with the default settings.
https://quietlife.nl/files/tmp/opengl.mp4 is Firefox 95 on Wayland with gfx.webrender.software.opengl=true.
I'm basically just scrolling up and down in about:support all the time.
Updated•3 years ago
|
Comment 21•3 years ago
|
||
I think when we moved code from nsWindow::Create() to nsWindow::OnMap() we left something behind or the window configuration was changed, maybe some change happens between nsWindow::Create() and nsWindow::OnMap() so we create a different window now. Maybe compositing hints or so.
Do you see any error message when you run Firefox 95 on terminal? Can you attach about:support from Firefox 94 where the bug is not visible?
Thanks.
Updated•3 years ago
|
Updated•3 years ago
|
| Reporter | ||
Comment 22•3 years ago
|
||
I don't get any output on stdout or in journalctl whatsoever when starting Firefox.
| Reporter | ||
Comment 23•3 years ago
|
||
Sorry, the previous one was without MOZ_ENABLE_WAYLAND=1.
Comment 24•3 years ago
|
||
Can you try to get profile during scrolling? Go to https://profiler.firefox.com , select graphics in the combo box, enable profiling and try scrolling at about:support page and share the profile ID.
Thanks.
| Reporter | ||
Comment 25•3 years ago
|
||
Comment 26•3 years ago
|
||
Thanks, that's what we need here. Can you please also attach profile from Firefox 94 for reference?
Thanks.
Comment 27•3 years ago
|
||
Kevin,
which binaries did you use to get the profile? Looks like the symbols are shifted somehow so I can't clearly identify which code is called.
From the profile it looks like we're blocked in IPC bridge and we're failing to get events from OS. But to make sure I'd need a profile from Mozilla binaries.
Please try to download and run binaries directly from Mozilla:
Firefox 95:
http://archive.mozilla.org/pub/firefox/candidates/95.0-candidates/build1/linux-x86_64/en-US/firefox-95.0.tar.bz2
Firefox 94:
http://archive.mozilla.org/pub/firefox/candidates/94.0.2-candidates/build2/linux-x86_64/en-US/firefox-94.0.2.tar.bz2
and capture the profile for both and attach it here.
Thanks!
| Reporter | ||
Comment 28•3 years ago
|
||
Firefox 94.0.2: https://share.firefox.dev/3yvXirV
Firefox 95.0: https://share.firefox.dev/3pSBlzr
| Reporter | ||
Comment 29•3 years ago
|
||
Not sure if it's a coincidence, but when the profiler is running, the lagging seems to be less noticeable. As soon as I disable it, it gets a lot worse again.
Comment 30•3 years ago
|
||
Thanks. I see 70% CPU time at posix_fallocate which is Wayland buffer allocation routine. But you say you can also reproduce that on XWayland with gfx.x11-egl.force-enabled = true. Can you please attach such profile from Mozilla Firefox 95 ?
Thanks.
| Reporter | ||
Comment 31•3 years ago
|
||
Comment 32•3 years ago
|
||
Thanks. I see 95 is lagging in scrolling events so I wonder if that's related to scrolling only or if it's reproducible with general rendering.
Please try to run this webgl demo:
https://webglsamples.org/blob/blob.html
in both 95 and 94 and try to compare rendering performance (fps).
Thanks.
| Reporter | ||
Comment 33•3 years ago
|
||
In both cases I get 30~35 fps. I also don't see any visible differences in the animation between 94 and 95.
Comment 34•3 years ago
|
||
Thanks. Looks like scrolling regression but I don't see any obvious reason why we lag here. On X11 we lag at gdk_window_get_origin() where X11 roundtrips are performed and we delay at posix_fallocate but without obvious reason.
Updated•3 years ago
|
Comment 35•3 years ago
|
||
I think I'm getting close, seems to be related to early in-process compositing init. Please run the broken Firefox with MOZ_LOG="Widget:5" env variable and attach the log here. Also please try the scrolling on a different site you youtube or facebook if you see the wrong scrolling too.
Thanks.
| Reporter | ||
Comment 36•3 years ago
|
||
The scrolling glitch definitely happens on other websites as well. I just used about:support as an easy example because it's long and contains a lot of text, and you can't blame slow JavaScript rendering there.
Updated•3 years ago
|
Comment 37•3 years ago
|
||
Looks like related to early nsWindow::GetCompositorWidgetInitData() call before we set up GdkWindow and we create GtkCompositorWidget::GtkCompositorWidget() early. But I don't understand how it can affects the SW based rendering here.
Comment 38•3 years ago
|
||
Kevin, could you try the build/steps from bug 1744896 comment 69 just to be sure this isn't for some reason a duplicate of bug 1744896?
| Reporter | ||
Comment 39•3 years ago
|
||
I'm having a hard time replicating the issue with that build, so I think the issues may indeed be related.
Comment 40•3 years ago
|
||
(In reply to Kevin Keijzer from comment #39)
I'm having a hard time replicating the issue with that build, so I think the issues may indeed be related.
Thanks! Given the weirdness bug 1744896 triggers regarding vsync and refresh drivers, it would actually make sense to me if this is a duplicate.
Closing for now - it would be great if you could test the upcoming nightly at some point and confirm that things are indeed back to normal. The patch should also get uplifted to 96.
Updated•3 years ago
|
Description
•