[Wayland][WebRender] Browser unresponsive when not all windows are displayed (minimized/different workspace)

RESOLVED FIXED in mozilla68

Status

()

defect
P3
normal
RESOLVED FIXED
7 months ago
3 months ago

People

(Reporter: yorickvanpelt, Unassigned)

Tracking

(Blocks 3 bugs, Regressed 1 bug)

Trunk
mozilla68
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

With recent sway beta:
Open firefox. Hit ctrl-N to make a new window. Move the window to another workspace.


Actual results:

Both windows are unresponsive until they are both displayed on the screen.


Expected results:

Browser should stay responsive even if partially hidden. I suspect this is because sway (and other wlroots compositors) does not send frame events for hidden windows, but I don't know the specifics.
Moving to the right component. 
Can you please check it out if this is reproducible with a clean profile and safe mode?
Thanks
Component: Untriaged → Widget: Gtk
Flags: needinfo?(yorickvanpelt)
Product: Firefox → Core
Hi! This seems related to acceleration.
It reproduces on a clean profile with MOZ_ACCELERATED=1, but not with MOZ_ACCELERATED=0 (or safemode).
Flags: needinfo?(yorickvanpelt)

When using EGL on Wayland, clients need to set eglSwapInterval(0), otherwise they'll block when hidden. More info:

https://emersion.fr/blog/2018/wayland-rendering-loop/

For egl to work with FF mobile still, set only set if MOZ_WAYLAND is enabled. I think https://paste.pound-python.org/show/T2j6FhtHY2HzXDM18LAb/ is right, we'll see if it even builds.

So it seems like (from the link in Comment 3) that Wayland does a crazy thing, and repurposes SwapInterval(1) as RAF?
From 1, it sounds like SwapInterval(0) won't tear, but just also won't block, but will allow us to no longer block if we SwapBuffers on a hidden window.

:stransky/:sotaro, am I reading this right and we should SwapInterval(0) on Wayland, because we render multiple windows from one thread?

Flags: needinfo?(stransky)
Flags: needinfo?(sotaro.ikeda.g)
Component: Widget: Gtk → Graphics

Just calling eglSwapInterval(0) won't be enough, you'll also need to register frame callbacks and redraw when you receive one.

Also, #if defined(MOZ_WAYLAND) is not enough, there should be a dynamic check I think

(In reply to Jeff Gilbert [:jgilbert] from comment #5)

:stransky/:sotaro, am I reading this right and we should SwapInterval(0) on Wayland, because we render multiple windows from one thread?

Yea, we should avoid blocking on eglSwapBuffers(). But when I just added calling SwapInterval(0), the block still happened.

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Sotaro Ikeda [:sotaro] from comment #9)

:stransky/:sotaro, am I reading this right and we should SwapInterval(0) on Wayland, because we render multiple windows from one thread?

Yea, we should avoid blocking on eglSwapBuffers(). But when I just added calling SwapInterval(0), the block still happened.

I called the SwapInterval(0) in GLContextEGLFactory::Create() like comment 6. It did not work. In this case, GLContextEGLFactory::Create() did not create EGLSurface, since widget was not fully mapped yet.

But when the SwapInterval(0) was called just after EGLSurface creation, it works as expected. From it, SwapInterval(0) seems to need valid current EGLSurface.

Assignee: nobody → sotaro.ikeda.g

Sotaro Ikeda: just to make sure you've seen this:

Just calling eglSwapInterval(0) won't be enough, you'll also need to register frame callbacks and redraw when you receive one.

(This isn't addressed in the patch you sent)

Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

(In reply to Jeff Gilbert [:jgilbert] from comment #5)

So it seems like (from the link in Comment 3) that Wayland does a crazy
thing, and repurposes SwapInterval(1) as RAF?
From 1, it sounds like SwapInterval(0) won't tear, but just also won't
block, but will allow us to no longer block if we SwapBuffers on a hidden
window.
:stransky/:sotaro, am I reading this right and we should SwapInterval(0) on
Wayland, because we render multiple windows from one thread?

Can you please run Firefox as:

WAYLAND_DEBUG=1 MOZ_ENABLE_WAYLAND=1 ./firefox

and attach the log here? Please mark the point at the log where Firefox hangs. That will be a starting point to find out why we have a hang here. I suspect this is a variant of:

https://bugs.freedesktop.org/show_bug.cgi?id=106753

We already do the VSync on wayland:

https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/gfx/thebes/gfxPlatformGtk.cpp#742)

and that's derived from our SW compositor. When HW syc is missing we use 60fps approximation.

https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/widget/gtk/WindowSurfaceWayland.cpp#556

It may be possible to use the frame callback with eglSwapBuffers() as but I'd like to discuss it a bit with Jonas Ahdal first.

Also that may be Sway specific issue, I don't see that on mutter. Would be great to test it on Weston which is a reference Wayland compositor.

Flags: needinfo?(stransky) → needinfo?(yorickvanpelt)

We already do the VSync on wayland

That's not a good way of doing vsync. The framerate information from wl_output must not be used for synchronization purposes. Instead, use frame callbacks!

BTW if we set eglSwapInterval(0) without the frame callback it may hide the actual problem and cause performance issues. eglSwapInterval(0) + frame callback may produce the same results as eglSwapInterval(1) as we don't use egl from main thread.

(In reply to emersion from comment #14)

We already do the VSync on wayland

That's not a good way of doing vsync. The framerate information from
wl_output must not be used for synchronization purposes. Instead, use
frame callbacks!

Yes, we use the frame callback, see https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/widget/gtk/WindowSurfaceWayland.cpp#556

Sorry for the confusion.

we use the frame callback

But you still rely on GetWaylandFrameDelay. Is there a reason why you use it instead of frame callbacks?

When you receive a frame callback, it doesn't mean "your content has been presented". It means "now is a good time for Firefox to draw and submit a new frame".

Related: https://patchwork.freedesktop.org/patch/258648/

Feel free to ask questions on the Wayland mailing list, or to join the IRC channel (#wayland on Freenode).

Assignee: sotaro.ikeda.g → nobody
Status: ASSIGNED → NEW

I can reproduce this on Gnome and Sway. Dropping [sway] from the title.

Summary: [wayland] [sway] browser unresponsive when not all window displayed → [wayland] browser unresponsive when not all window displayed

Hm. The steps to reproduce aren't obvious in GNOME; I only see the bug if I move windows / workspaces using keyboard shortcuts. Going through the Activities overview seems to work normally.

Steps to reproduce:

  1. Launch Firefox
  2. Ctrl-N -- create a new window
  3. Super-Shift-PgDown -- move window to next workspace
  4. Super-PgUp -- return to previous workspace

As long as you switch between workspaces using Super-PgUp/PgDown, the Firefox windows will be unresponsive.

Triggering the overview (Super) revives the windows.

For me in sway once I move firefox to a desktop/monitor that does not have the scale=1.5 it remains small, no matter how many other windows I open and try and force it to resize. https://imgur.com/a/nSR84ew

(In reply to Vladi from comment #21)

For me in sway once I move firefox to a desktop/monitor that does not have the scale=1.5 it remains small, no matter how many other windows I open and try and force it to resize. https://imgur.com/a/nSR84ew

This seems completely unrelated. Please check Sway's bugtracker if there's any opened issue (I think there is one). If not, please create one.

Priority: -- → P3

On Ubuntu 18.10 I also see this when a window is minimized with WebRender/OpenGL enabled. It also causes problems with videos playing minimized/hidden, there's a memory spike and after a short while the video stops playing until it becomes visible again.

Blocks: wr-linux
Summary: [wayland] browser unresponsive when not all window displayed → [Wayland][WebRender] Browser unresponsive when not all windows are displayed (minimized/different workspace)

I'm toying around with an attempt to use frame callbacks as VSyncSource (as my first attempt to poke at mozilla sources—wish me luck). I have a simple implementation, but I am debugging some hangs.

Unless it's okay to let WebRenders "wrench" event loop pause, we also need some integration there.

I think I can reproduce it with WebRender/OpenGL. But it takes some time to reproduce, it's not immediate.

Flags: needinfo?(yorickvanpelt)

It reproduces easily with sway:

  1. Launch sway
  2. Launch Firefox
  3. Start additional window
  4. Move additional window to a different workspace
  5. Observe that the windows appear frozen.

Another reproduction:

  1. Launch sway
  2. Launch Firefox
  3. Copy text from Firefox
  4. Attempt to paste text on a different workspace.
  5. Observe that paste hangs until Firefox is made visible again.

However, after testing only the eglSwapInterval(0) fix (without the much needed update to a more wayland-specific vsync logic), I see no immediate issue with :sotaro's patch. It seems to solve all the hangs.

Making a proper VSyncSource based on frame-callbacks (or refactoring so that things happen per-surface rather than globally) can be handled separately from this issue.

:stransky, anything holding back :sotaro's patch?

(In reply to Kenny Levinsen from comment #26)

:stransky, anything holding back :sotaro's patch?

eglSwapInterval(0) may cause performance overheads at least (I'm not sure about tearing). I'm going to investigate that as I can reproduce it now and understand what's going on before we eventually enable the eglSwapInterval(0).

I'm unable to reproduce that in debug build - looks like a kind of race condition to me.

Managed to reproduce in debug build:

#0 0x00007fa2b6a84847 in poll () at /lib64/libc.so.6
#1 0x00007fa2b567bb7c in poll (__timeout=-1, __nfds=1, __fds=0x7fa28fb95420) at /usr/include/bits/poll2.h:46
#2 0x00007fa2b567bb7c in wl_display_poll (display=display@entry=0x7fa2b67901c0, events=events@entry=1) at src/wayland-client.c:1713
#3 0x00007fa2b567d63c in wl_display_dispatch_queue (queue=<optimized out>, display=<optimized out>) at src/wayland-client.c:1786
#4 0x00007fa2b567d63c in wl_display_dispatch_queue (display=0x7fa2b67901c0, queue=0x7fa28a148140) at src/wayland-client.c:1759
#5 0x00007fa28b9ee09c in dri2_wl_swap_buffers_with_damage (drv=<optimized out>, disp=0x7fa28de69800, draw=0x7fa28ac74400, rects=0x0, n_rects=0)
at ../src/egl/drivers/dri2/platform_wayland.c:985
#6 0x00007fa28b9dc4b3 in eglSwapBuffers (dpy=0x7fa28de69800, surface=<optimized out>) at ../src/egl/main/eglapi.c:1278
#7 0x00007fa2aaab0ae9 in mozilla::gl::GLLibraryEGL::fSwapBuffers(void*, void*) const (this=0x7fa28de4a3e0, dpy=0x7fa28de69800, surface=0x7fa28ac74400)
at /home/komat/tmp676-trunk-gtk3/src-wayland/gfx/gl/GLLibraryEGL.h:226
#8 0x00007fa2aaa9e2ab in mozilla::gl::GLContextEGL::SwapBuffers() (this=0x7fa28aca9000) at /home/komat/tmp676-trunk-gtk3/src-wayland/gfx/gl/GLContextProviderEGL.cpp:494
#9 0x00007fa2aafec095 in mozilla::wr::RenderCompositorEGL::EndFrame() (this=0x7fa28de5cd00) at /home/komat/tmp676-trunk-gtk3/src-wayland/gfx/webrender_bindings/RenderCompositorEGL.cpp:120

....

As expected the freeze happens at eglSwapBuffers(). I suspect that the wl_surface passed to eglSwapBuffers() does not get the frame callbacks so we freeze here. Needs further debugging with WAYLAND_DEBUG=1 to see actual sequences.

Would be great to test it on Weston which is a reference Wayland compositor.

Weston sends frame callbacks if your surface is on at least one output - you get no frame callbacks if you're not on any output. You will get frame callbacks if your surface is on an output but completely hidden/occluded. However, this shouldn't be considered canonical, and we plan to follow the wlroots/Sway behaviour in future.

Frame callbacks aren't guaranteed to happen at every vblank interval (if the display is idle, you won't get them), nor are they guaranteed to happen at vblank either (Weston delivers them half an interval out of phase by default). They're just a callback for when the compositor thinks would be a good time for you to draw, which leaves a huge amount of discretion to the compositor developer.

Okay let's go with the eglSwapInterval(0) for now.

Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7a2b3a8fab9c
Set eglSwapInterval to 0 on wayland r=stransky

Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Assignee: nobody → sotaro.ikeda.g

please re-open

the root cause is still there. This fix is partial. This fix causes high CPU usage (I imagine setting the interval to 0 could be the cause...).

Assignee: sotaro.ikeda.g → nobody
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Looks like https://bugzilla.mozilla.org/show_bug.cgi?id=1542808 is the issue that handles 'the right fix' going forward, so this is closable. Sorry for the confusion.

The eglSwapInterval(0) is a generally a good thing here, because when the surface is not visible, eglSwapBuffers() may be blocked, according to the wayland specs and as Daniel mentioned here. And when eglSwapBuffers() blocks at Rendering thread also main thread is blocked at expose/paint event so whole Firefox freezes.

So we need to either:

  1. set eglSwapInterval(1) and make eglSwapBuffers() non-blocking at Firefox side which may not be possible (I use that for SW wayland rendering)
  2. set eglSwapInterval(0) and throttle the eglSwapBuffers() cadence.

while the second choice is more realistic.

Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.