Open Bug 1880467 Opened 2 months ago Updated 1 month ago

[Scale change] Firefox continues rendering windows on non-visible workspaces

Categories

(Core :: Widget: Gtk, defect, P2)

Firefox 122
defect

Tracking

()

UNCONFIRMED

People

(Reporter: hugo, Unassigned)

References

(Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0

Steps to reproduce:

  1. This is likely far easier to reproduce with tiling enabled, or a tiling compositor.
  2. Open many Firefox windows (I have several dozen right now).
  3. Switch to another workspace.
    3a. Change the output resolution OR
    3b. Change the size of a the system status bar, dock, or similar shell component (stopping the status bar can trigger this under swaywm).
    3c. Unplug the output and plug it back in again.

During (3), because the screen size has changed, all Firefox windows get resized.

Actual results:

Firefox starts re-rendering every non-visible window. When many windows are open in various workspaces, this can spike CPU to 100%, trigger the out-of-memory killer or (most often) both.

Expected results:

Firefox should not re-render windows that are in non-visible workspaces, even when resized. To be a bit more specific, Firefox should not re-render any windows until it receives a wl_surface::frame event. This event indicates that the compositor needs a new frame, which implies that Firefox is [about to be] visible again.

This issue should be marked as related to https://bugzilla.mozilla.org/show_bug.cgi?id=635134

Reproduction steps 3a, 3b and 3c are three separate ways to trigger these; only one of the three needs to be executed to reproduce the issue. They don't need to be executed in sequence.

Yes we refresh any window with changed size/scale etc. It may need extra work to postpone the change to window repaint. I wonder if nsWindow::NotifyOcclusionState() works correctly here and mIsFullyOccluded is correctly set for non-active windows.

Component: Untriaged → Widget: Gtk
Priority: -- → P3
Product: Firefox → Core

Can you confirm that it doesn't affect X11? Because X11 is missing idle page inhibit (Bug 1826291).

Flags: needinfo?(hugo)

Sorry, I'm afraid that I'm not the right person to confirm the state of this on X11. I don't have a working X11 setup on any of my hosts. I tried setting up X11 and i3 on my laptop, but it looks like I need to manually configure display drivers... or something like that.

I don't think that testing on XWayland will yield valid results.

Flags: needinfo?(hugo)

XWayland is the same as X.Org from Firefox POV and in this case it should give the same results.

Firefox with XWayland crashes at startup:

> unset WAYLAND_DISPLAY 
> unset MOZ_ENABLE_WAYLAND
> firefox
Failed to open curl lib from binary, use libcurl.so instead
[Parent 14950, IPC I/O Parent] WARNING: process 15003 exited on signal 9: file /home/buildozer/aports/community/firefox/src/firefox-122.0.1/ipc/chromium/src/base/process_util_posix.cc:265
ATTENTION: default value of option mesa_glthread overridden by environment.
DRI3 not available
failed to load driver: zink
vulkan: No DRI3 support detected - required for presentation
Note: you can probably enable DRI3 in your Xorg config
MESA: error: zink: could not create swapchain
ExceptionHandler::GenerateDump cloned child 15219
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
Exiting due to channel error.
Failed to open curl lib from binary, use libcurl.so instead
[Parent 15335, IPC I/O Parent] WARNING: process 15398 exited on signal 9: file /home/buildozer/aports/community/firefox/src/firefox-122.0.1/ipc/chromium/src/base/process_util_posix.cc:265

Do you missing DISPLAY variable too?

DISPLAY is properly set. The small window that asks me to pick a profile does render properly. When I hit enter, Firefox crashes before rendering.

I tried on a different host and it does seem to work there. Results under XWayland:

  • I had about 39 Firefox windows on workspace 1 (with lots of various websites open).
  • Switched to workspace 2. After a few moments of idle, Firefox settled on around 30% CPU.
  • Stopped waybar (system status bar). This slightly increases the screen space for all clients, so they all resize.
  • Firefox spiked to 250% CPU for about two seconds. System memory usage also temporarily jumps by about 1GB.
  • I changed the display scale between 1.5 and 1.4 a few times. CPU used by Firefox also spiked to about 300%.

The same steps on Wayland (natively):

  • I had about 39 Firefox windows on workspace 1 (with lots of various websites open).
  • Switched to workspace 2. After a few moments of idle, Firefox settled on around 5-10% CPU.
  • Stopped waybar (system status bar). This slightly increases the screen space for all clients, so they all resize.
  • Firefox spiked to 200% CPU for about two seconds. System memory usage also temporarily jumps.
  • I changed the display scale between 1.5 and 1.4 a few times. CPU used by Firefox also spiked.

Note that the profile was the exact same Firefox profile with the same windows. It seems that the baseline for CPU usage when idle and in background is a lot lower on Wayland.

To answer the above inquiry, it does seem like windows in non-visible workspaces keep getting re-rendered under XWayland.

I see. Can you attach backtrace of the crash please? Please run in gdb or use coredumpctl.
Thanks.

I did a bit more looking into this. I ran Firefox with WAYLAND_DEBUG=1 and moved all of its windows into workspace 2.

I then switched to workspace 1 and changed the output scale from 1.5 to 1.4. CPU and memory usage spiked. The last thing that I sway was kswapd0 peaking at 100% and then the system froze.

I rebooted, added a lot more swap and changed the output scale again from 1.5 to 1.4. Immediately after I change the output scale, Firefox logged the following:

https://paste.sr.ht/~whynothugo/d99f3d76275eac7df8dfb09794a3da62babfef07

It seems that when the scale changes, Firefox immediately reallocates buffers for all of its surfaces. It's also committing all these surfaces, so I'd guess it's actually rendering every single window too.

As a side note: I notice a lack of fractional-scale-v1 in the logs, so Firefox is also allocating buffers about 30% larger than necessary and rendering windows at an ever higher resolution than strictly necessary.

The reproduction steps above are quite artificial and are written this way to ease reproducing the issue in a controlled way.

But please note that this issue does get triggered in everyday situations, e.g.:

  • Yesterday I worked all day on my laptop, lots of Firefox windows open.
  • At the end of the day, I plug my laptop onto a TV to watch a movie with family. All Firefox windows are in background workspaces.
  • Once the movie starts, I close my laptop lid.
  • The laptop output is disabled, and Firefox starts re-rendering all the non-visible windows to the other output's size/scale.
  • CPU and memory usage peak. Video playback staggers dramatically for a few minutes. Eventually the system freezes.
  • A hard-reboot is required.

(In reply to Hugo Osvaldo Barrera from comment #11)

It seems that when the scale changes, Firefox immediately reallocates buffers for all of its surfaces. It's also committing all these surfaces, so I'd guess it's actually rendering every single window too.

OnScaleChange() is here:

https://searchfox.org/mozilla-central/rev/098f910d0593b12a42089dd8f40dcd19d1121430/widget/gtk/nsWindow.cpp#5390

Can you run Firefox on terminal with MOZ_LOG="Widget:5" and attach the log here? I'd expect OnScaleChanged log entries for visible windows only but let's see.

Thanks.

Flags: needinfo?(hugo)
Flags: needinfo?(stransky)
Priority: P3 → P2
Summary: Firefox continues rendering windows on non-visible workspaces → [Scale change] Firefox continues rendering windows on non-visible workspaces

I ran MOZ_LOG="Widget:5" firefox 2>&1 | tee firefox.log to grab the logs. The full logs are here: https://paste.sr.ht/~whynothugo/47eb80c779446c9005b179865bd15dd3524467a7

Before changing the scale, I ran tail -n1 -f firefox.log | tee firefox.log2. This second log file starts from the moment that I changed the scale. I hope that this will be a bit more useful: https://paste.sr.ht/~whynothugo/78cab875a092340d5741696b5e8982f0b46c46c1

I changed the scale from 1.6 to 1, but for clients that don't support fractional scaling, the change would appear to be from 2 to 1. The logs here reflect this.

I don't think that the scale change itself triggers the issue; I think that the scale change transitively triggers the issue. I think that the window resize triggers the issue (this happens implicitly when the scale changes).

Flags: needinfo?(hugo)

I ran another experiment to reproduce this:

  1. Ran MOZ_LOG="Widget:5" firefox 2>&1 | tee firefox.log
  2. Switched to another desktop, waited for logs to settle.
  3. Ran tail -n1 -f firefox.log | tee firefox.log2, to split logs generated from this point forward.
  4. Closed the laptop lid, opened the laptop lid again.

The system slowed down to a crawl, but actually came back to life after a minute when the OOM kicked in.

firefox.log: https://paste.sr.ht/~whynothugo/4f6a9351b66ea4063143639da9baf42bbc37f7e4
firefox.log2: https://paste.sr.ht/~whynothugo/0eb06337f72f08bf2ba2a10e4b6e4cedbb5d9eb4

Thanks, will look at it.

Looks like it's partially regression from Bug 1829493.

See Also: → 1829493

I see that we're getting the events only if Firefox window is maximized. If Firefox window is in normal state, it doesn't get any events after suspend/resume.

Hugo, can you confirm that?
Thanks.

Flags: needinfo?(hugo)

The main issue here is that we're getting nsWindow::OnExposeEvent() even for the hidden window if it's maximized and monitor setup is somehow changed (scale / suspend and so).

Looks like configure handler is run during resize even if window is hidden:

That sounds correct to me; the window is notified of the resize even if not visible. But the next frame shouldn't be rendered until the next wl_surface::frame event (this also means that content flow on pages doesn't need to be recomputed yet either).

I see that we're getting the events only if Firefox window is maximized. If Firefox window is in normal state, it doesn't get any events after suspend/resume.

I'm using sway, which is a tiling compositor. Windows are tiled in a few different ways:

  • Tabbed: window occupy the entire screen height/width, minus a tab bar on top to switch between windows.
  • Vertically split: each window occupies 100% of the screen height and 50% (assuming just two windows) of the screen width.
  • Horizontally split
  • Other permutations of the above.

There is no concept of maximisation, but windows behave similar to maximised windows on floating compositors. Specifically, window size is tied to output resolution and scale. What you observe sounds consistent sounds consistent.

I suspect that if you increase screen resolution, the floating windows don't need to resize. If you decrease screen resolution, I assume that floating windows smaller than the target resolution would need to resize/shrink.

The main issue here is that we're getting nsWindow::OnExposeEvent() even for the hidden window if it's maximized and monitor setup is somehow changed (scale / suspend and so).

So there is an assumption somewhere that if a window was resized it was also made visible too?

Flags: needinfo?(hugo)

I supposed that if you unplug and re-plug a monitor (or if you close a laptop lid and open it again, which implicitly does the same) it should also trigger for non-maximised windows.

Note that another path to trigger the same scenario is to stop and/or start the system status bar (which I do when I reconfigure it).

Because the status bar is removed from the bottom of the screen, all windows have 24px more height space and get resized. I'm sure an equivalent situation can be reproduce in GNOME/KDE by maximising a window, switching to another workspace, and re-configuring the fixed space taken up by the dock.

I'm afraid it's not possible to fix this bug on Firefox side. We're getting size change + expose request even if the window is not visible. We need to react accordingly because all the indications we have states the window is visible (we have valid surfaces etc.).

It may be possible to postpone all changes to fist frame callback event but that will be very difficult as it needs radical changes to how we handle the widget event and also may introduce regressions/bug to resize code as it may introduce flickering & so as the rendering/resize will be delayed.

I suppose to file this but against mutter/gnome as it should not sent events to invisible windows at all or delay such changes until the window is visible (note that we don't have info about workspaces on Wayland).

Flags: needinfo?(stransky)

We're getting size change + expose request even if the window is not visible. We need to react accordingly because all the indications we have states the window is visible (we have valid surfaces etc.).

By "expose request", do you mean a wl_surface::frame event?

I reproduced this issue with swaywm, and the logs (with WAYLAND_DEBUG=1), indicate that no frame event is being delivered by the compositor.

Are you seeing a wl_surface::frame event for non-visible windows on Mutter?

I suppose to file this but against mutter/gnome as it should not sent events to invisible windows at all or delay such changes until the window is visible (note that we don't have info about workspaces on Wayland).

The wayland protocol does not have any such requirement; it is perfectly valid to send configure events to background windows. They should not re-render their content until they get a frame event. I asked on of the wlroots developers about this issue before reporting here, and delaying events until a window is visible didn't seem feasible.

(In reply to Hugo Osvaldo Barrera from comment #26)

We're getting size change + expose request even if the window is not visible. We need to react accordingly because all the indications we have states the window is visible (we have valid surfaces etc.).

By "expose request", do you mean a wl_surface::frame event?

no, it's "draw" gtk event which is issued to widgets when it needs repaint.

I reproduced this issue with swaywm, and the logs (with WAYLAND_DEBUG=1), indicate that no frame event is being delivered by the compositor.

Are you seeing a wl_surface::frame event for non-visible windows on Mutter?

Yes, that's correct. No frame events for widgets. But we're getting resize events (AFAIK).

I suppose to file this but against mutter/gnome as it should not sent events to invisible windows at all or delay such changes until the window is visible (note that we don't have info about workspaces on Wayland).

The wayland protocol does not have any such requirement; it is perfectly valid to send configure events to background windows.

It's kind of chicken-egg problem here but I don't know how to solve it. It may be as well a bug in Gtk3 how it handles it - it may hold the widget signals until window is visible.

They should not re-render their content until they get a frame event. I asked on of the wlroots developers about this issue before reporting here, and delaying events until a window is visible didn't seem feasible.

That's not correct. Wayland client can paint to wl_surface at any point and if the window is not visible wl_surface is deleted. So if you have valid wl_surface it means you can paint to it. Frame callback is just a an optimization to keep painting related to monitor / compositor frame rate but there isn't any hard request to paint in frame callback only.

Just to make sure that I understand correctly: when the window is resized GTK3 signals both a resize and a "needs redraw" event? If that is the case, it sounds to me like a GTK3 issue; you have no way of disambiguating a "background resize" from a "foreground resize", right?

If my understanding here is correct, it might be worth opening an issue on the GTK3 side; maybe it can be addressed or maybe the devs have additional insight for this scenario.

That's not correct. Wayland client can paint to wl_surface at any point and if the window is not visible wl_surface is deleted. So if you have valid wl_surface it means you can paint to it. Frame callback is just a an optimization to keep painting related to monitor / compositor frame rate but there isn't any hard request to paint in frame callback only.

Yes, you are correct; it is legal to re-render the surface at any time. I say it "should" render on a wl_surface::frame event in the sense that rendering in any other situation results in rendering content that might never be displayed (like non-visible windows in this case).

In this situation, Firefox is not doing anything that's strictly disallowed or prohibited by the Wayland protocol itself. It is doing needless work that can result in exhausting system resources.

(In reply to Hugo Osvaldo Barrera from comment #28)

Just to make sure that I understand correctly: when the window is resized GTK3 signals both a resize and a "needs redraw" event? If that is the case, it sounds to me like a GTK3 issue; you have no way of disambiguating a "background resize" from a "foreground resize", right?

If my understanding here is correct, it might be worth opening an issue on the GTK3 side; maybe it can be addressed or maybe the devs have additional insight for this scenario.

Yes, you're right. AFAIK the 'redraw' signal is issued if any part of the window is invalidated. It may or may not be directly related to window visibility, but unfortunately widget visibility tracking is generally broken in Gtk (Bug 1826291).

You need to log in before you can comment on or make changes to this bug.