Closed Bug 1668805 Opened 4 years ago Closed 3 years ago

[Wayland] Reenable opaque region for content surface

Categories

(Core :: Widget: Gtk, enhancement)

Desktop
Linux
enhancement

Tracking

()

RESOLVED FIXED
87 Branch
Tracking Status
firefox87 --- fixed

People

(Reporter: rmader, Assigned: rmader)

References

(Blocks 2 open bugs, )

Details

Attachments

(1 file, 1 obsolete file)

We disabled setting an opaque region for the content surface as it triggered bug 1615098 on certain Wayland compositors (Gnome >= 3.36 / https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/918, other compositors are likely to do the same sooner or later). As of bug 1617002, widget.wayland.use-opaque-region can be used to force-enable it again.

Not setting the opaque region prevents compositors to make several optimizations, most notable unredirection / direct buffer flipping (Gnome does that since 3.38).

Lets find a way to safely reenable it.

Severity: -- → S3
OS: Unspecified → Linux
Hardware: Unspecified → Desktop

I apologize if this ends up being just noise, but, after using widget.wayland.use-opaque-region continuously for a while, I have some details on exactly how bug 1615098 gets triggered on GNOME 3.38 (they're probably exactly the same on GNOME 3.36 as well), as you can actually avoid triggering the aforementioned bug entirely if you're careful.

In GNOME 3.36 or 3.38 on Ubuntu:

  1. Either make Firefox fullscreen, either by pressing Fn+F11 or by making a video fullscreen, or disable the title bar
  2. Either use the four-finger swipe gesture, not Super+PgUp/Dn, to switch between workspaces, or open Activities

If you simply never make Firefox fullscreen, or disable the title bar, regardless of window size, or you simply never open Activities, or switch to another workspace using the touchpad gesture, then you will never encounter this bug.

To recover proper pointer interaction, the following steps are needed:

  1. Make Firefox no longer fullscreen
  2. Either switch to another workspace, whether using the aforementioned gesture or Super+PgUp/Dn, or open Activities

In the case of disabling the title bar, just re-enable it.

(In reply to Sandi Vujaković from comment #1)

I apologize if this ends up being just noise, but, after using widget.wayland.use-opaque-region continuously for a while, I have some details on exactly how bug 1615098 gets triggered on GNOME 3.38...

Hehe, thanks Sandi :) Actually it's well understood what triggers this behaviour, but still good to have this documented.

Over the holidays I'm currently tinkering with a possible solution for this. The basic problem is that GTKs main surface needs to receive frame callbacks from the compositor to process things. However, since Gnome 3.36 Mutter does not send frame callbacks to surfaces any more that are fully covered by other opaque surfaces - in the case of FF, the "content" sub-surface, which has the actual content you see, covers the gtk main surface if you set widget.wayland.use-opaque-region:true. This kind of optimization is explicitly mentioned in the spec (https://wayland.freedesktop.org/docs/html/apa.html):

A server should avoid signaling the frame callbacks if the surface is not visible in any way, e.g. the surface is off-screen, or completely obscured by other opaque surfaces.

To solve the issue I see basically three ways:

  1. add a Wayland protocol to control frame callback behaviour differently (maybe something like a frame-callback group for surfaces - not very attractive and will likely get rejected by Wayland upstream)
  2. hack into gtk internals or fork it altogether (not enough (wo)man power - we too few already)
  3. do some crazy stuff to "trick" gtk and the compositor to do what we want

I'm currently trying the last solution - the idea is to high-jack frame requests for the GTK toplevel surface and emit them from the content subsurface. It would be hacky but, from what I can see so far, apparently feasible. Given that we would reduce gpu memory bandwith quite a bit (up to 50%) it's worth a try I guess. Currently fighting the linker to let me overwrite symbols from libwayland.so.

In order to keep GDK responsive we need to make sure its surface continues
to receive frame callbacks. This is problematic if the container surface
covers it completely and is opaque - the Wayland spec encourages compositors
to not send frame callbacks obscurred surfaces.

In order to work around that, add a small library that hooks into the
Wayland communication and triggers frame requests from GDK on the
container surface (if mapped).

This is early WIP: we do not explicitely lock and commit the container
surface yet and, more importantly, we need to find a way to build and
preload this library properly.

Assignee: nobody → robert.mader
Component: Graphics → Widget: Gtk

Got a small prototype running and the concept appears to work surprisingly well - hope to get this done for FF 87.

The biggest challenge here (at least for me) will be to get the linking in order. Currently I build the binary as shared lib and just preload it - optimally we'd build it statically and would need to make sure that it's loaded before gtk/gdk/libwayland-client.

Martin: can you help me here? I'd like to have a small binary (libwaylandutils.a or libwaylandutils.so) linked in before gtk and later mozwayland get loaded. Is that feasible? I have still very limited knowledge of the FF build system.

Flags: needinfo?(stransky)

(In reply to Robert Mader [:rmader] from comment #5)

The biggest challenge here (at least for me) will be to get the linking in order. Currently I build the binary as shared lib and just preload it - optimally we'd build it statically and would need to make sure that it's loaded before gtk/gdk/libwayland-client.

Can you help me understand why do you need that?

It's because you don't want inlined empty placeholder functions from mozwayland.c to waylandutils_redirect_wl_surface_frame / waylandutils_unredirect_wl_surface_frame ?

Flags: needinfo?(stransky) → needinfo?(robert.mader)

(In reply to Martin Stránský [:stransky] from comment #6)

Can you help me understand why do you need that?

It's because you don't want inlined empty placeholder functions from mozwayland.c to waylandutils_redirect_wl_surface_frame / waylandutils_unredirect_wl_surface_frame ?

So the problem I'm trying to solve is GDK occasionally doing frame requests (1) via wl_surface_frame, freezing GDK until a frame callback is send by the compositor.
The idea I've come up with is to overwrite wl_surface_frame and make it request a frame for the container surface instead of the gdk surface - i.e. something like wl_surface_frame(gdk_surface) -> wl_surface_frame(moz_container_surface). This works because the frame listener is still the same, so a frame callback for moz_container_surface would still trigger the callback listener in GDK.

wl_surface_frame is a static inline, so the symbol we've to overwrite is wl_proxy_marshal_array_constructor_versioned, checking the opcode to be the right one to overwrite.

This all appears to work well, however in order to make the GDK call to wl_surface_frame to go through our version of wl_proxy_marshal_array_constructor_versioned, we need to make sure the dynamic linker takes up it's symbol before using the one from libwayland-client.so. This can be archived by compiling our own version as shared library (currently named libwaylandutils.so) and preloading it, i.e. LD_PRELOAD=libwaylandutils.so firefox.

Now I'm trying to find a way to tell the build system that it should always load libwaylandutils.so this way. IIUC libmozwayland.so is loaded after libwayland-client.so, i.e. libwayland-client.so has preference. libwaylandutils.so in turn must have preference over libwayland-client.so to work.

IIUC the current order something like: xul.so->libgdk-3.so->libwayland-client.so->libmozwayland.so. We'd want something like xul.so->libwaylandutils.so->libgdk-3.so->libwayland-client.so->libmozwayland.so

1: https://gitlab.gnome.org/GNOME/gtk/-/blob/326d101a57c18cf767494b417c4c8506158d98c5/gdk/wayland/gdkwindow-wayland.c#L670-672

Flags: needinfo?(robert.mader)

Could you put the symbols you want to override directly into libxul?

IIUC not, because we import wayland-client.h, directly or indirectly (via GDK). AFAIK it would always result in a compile error complaining about the method already being declared previously.

Ok, looks like my patch above hurt the eyes of GTK maintainers enough to maybe allow us to get a new GDK API in which does the same thing in a clean manner.

Attachment #9195950 - Attachment is obsolete: true
Attachment #9198813 - Attachment description: Bug 1668805: WIP Use new GDK API to redirect frame requests → Bug 1668805: WIP Use new GDK API to allow setting of opaque regions

Martin, in order to test the new GDK API before landing upstream, do you think we could carry this patch (and the corresponding GTK one) as a distro patch in Fedora for a while to get widespread testing? Just from your site, not GTK maintainers which I'd contact independently?

FTR: some local testing suggests it works quite reliable and has measurable performance benefits. Running intel_gpu_top and FF with https://webglsamples.org/aquarium/aquarium.html in fullscreen gives me about a ~0.5W/15% lower energy consumption - I suppose the relative gain could be even bigger on normal websites where the GPU is less stressed by the content.

Flags: needinfo?(stransky)

(In reply to Robert Mader [:rmader] from comment #13)

Martin, in order to test the new GDK API before landing upstream, do you think we could carry this patch (and the corresponding GTK one) as a distro patch in Fedora for a while to get widespread testing? Just from your site, not GTK maintainers which I'd contact independently?

I think Kalev Lember (klember 'at' redhat.com) can help with the distro/gtk side.

FTR: some local testing suggests it works quite reliable and has measurable performance benefits. Running intel_gpu_top and FF with https://webglsamples.org/aquarium/aquarium.html in fullscreen gives me about a ~0.5W/15% lower energy consumption - I suppose the relative gain could be even bigger on normal websites where the GPU is less stressed by the content.

Okay, I can do that when we have gtk packages available.

Flags: needinfo?(stransky)
Attachment #9198813 - Attachment description: Bug 1668805: WIP Use new GDK API to allow setting of opaque regions → Bug 1668805: Use new GDK API to allow setting of opaque regions

Hej Martin, if you have time, could you give the patch here and the API in gtk a look if it looks sane to you? Especially https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/3111#note_1027713 - I think it's good as it is, but would love to have your opinion on that before the API gets merged upstream.

Flags: needinfo?(stransky)

Hello Robert, it looks ok to me.

Regards https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/3111#note_1027713

Does WebRender use more than one wl_subsurface? I don't have much knowledge about WebRender/GL but Basic compositor / SW_WR uses only one wl_subsurface which is connected to the GdkWindow's one.

Flags: needinfo?(stransky)

(In reply to Martin Stránský [:stransky] from comment #16)

Hello Robert, it looks ok to me.

Cool, thanks for having a look!

Regards https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/3111#note_1027713

Does WebRender use more than one wl_subsurface? I don't have much knowledge about WebRender/GL but Basic compositor / SW_WR uses only one wl_subsurface which is connected to the GdkWindow's one.

Not yet, but I plan to start to work on bug 1617498 (gfx.webrender.compositor) soon, in which case we'd have several subsurfaces in most common situations. It will open up the door for quite nice performance optimizations as we hand over work to the Wayland compositor, which again can make use of hardware features such as overlay planes. That's what WR does on Win10 and MacOS.

One example: when watching youtube non-fullscreen, Webrender would promote the video area into its own subsurface and the Wayland compositor in turn would use direct scanout for it, skipping compositing in both Webrender and the Wayland compositor, saving a lot of blitting. If the compositor (and hardware) supports it, the video could even be passed as YUV directly to the hardware, avoiding color conversions. Support for this already exists in Weston and Mutter will likely get support for in the next cycle (41) :)

Well, in that case you should know better how the wl_subsurfaces of WebRender will be arranged :)

Things are moving and it looks like GTK maintainers would like to merge the new API for the upcoming release at the end of the week.

Here's a new try build: https://treeherder.mozilla.org/jobs?repo=try&revision=b33cf52f7a802717b8e46b3492176a98dfdddf72

Attachment #9198813 - Attachment description: Bug 1668805: Use new GDK API to allow setting of opaque regions → Bug 1668805: Enable opaque region if new GDK API is availabe, r=stransky

The new API has landed \o/
It will be released in GTK 3.24.25 which should become available in distributions like Fedora and Arch in the coming days. Doing some final testing, but this should be good to review now.

So here's a rebased try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6845649d5289175a994dd2bd194bdd524ce9030

It can be tested with MOZ_ENABLE_WAYLAND=1 mozregression --repo try --launch f6845649d5289175a994dd2bd194bdd524ce9030 and MOZ_ENABLE_WAYLAND=1 mozregression --repo try --launch f6845649d5289175a994dd2bd194bdd524ce9030 --pref widget.wayland.opaque-region.enabled:false. Note that you'll need to wait for GTK 3.24.25.

On Gnome (>= 3.38) you can verify the result by enabling the opaque layer overlay: alt+f2 -> lg -> Meta.add_debug_paint_flag(Meta.DebugPaintFlag.OPAQUE_REGION). You can disable it again with Meta.remove_debug_paint_flag(Meta.DebugPaintFlag.OPAQUE_REGION)

I just gave this a try on an old Thinkpad T400 with 1440x900 display. Comparing the effect of widget.wayland.opaque-region.enabled, scrolling becomes visibly smoother when enabled - i.e. there is frame skipping when it's disabled. Assuming that the same is true for stronger hardware on bigger screens, this appears to be a quite visible win \o/

Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/e939d74498ce
Enable opaque region if new GDK API is availabe, r=stransky
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

Is it supposed to turn back purple when it loses focus?

(In reply to Laurențiu Nicola from comment #25)

Is it supposed to turn back purple when it loses focus?

No...and it doesn't do that here as far as I can see :/
However, I vaguely remember having seen something like that it the past...on which exact Mutter version are you on?

3.38.3, on Arch Linux.

Could you open a follow-up issue about that with you about:support? Does this by chance only happen with the basic compositor?

(In reply to Robert Mader [:rmader] from comment #28)

Could you open a follow-up issue about that with you about:support? Does this by chance only happen with the basic compositor?

No, I'm using WebRender. I'll file an issue if I manage to reproduce it consistently. Right now, the opaque flag seems to behave pretty janky in Mutter: windows turn transparent or opaque when I open or close other windows, or they toggle between partly-transparent and partly-opaque at 1 Hz.

FWIW, I just noticed the same thing with Signal and Skype (both Electron apps). When I'm in Firefox (the three windows partially overlap), all three of them are opaque. If I focus Skype, Signal turns transparent. If I focus Signal, Skype turns transparent until I hover the mouse over it.

So yeah, I don't think this is a Firefox issue.

Ah I see! In that case it's likely just an issue with the opaque layer in Mutter - I didn't bother to make it perfect and occasionally Mutter will do a draw without caring for opaque regions, marking them as purple. If the app doesn't repaint, it will stay that way until the next repaint. We could maybe do better there, but I wouldn't care too much.

So yeah, maybe open a Mutter issue instead.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: