Closed Bug 1648872 Opened 1 year ago Closed 1 year ago

[Wayland] WebRender partial present never worked with official builds

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect

Tracking

(firefox-esr68 unaffected, firefox-esr78 disabled, firefox77 unaffected, firefox78 disabled, firefox79 disabled, firefox80 disabled, firefox81 disabled, firefox82 verified)

VERIFIED FIXED
82 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- disabled
firefox77 --- unaffected
firefox78 --- disabled
firefox79 --- disabled
firefox80 --- disabled
firefox81 --- disabled
firefox82 --- verified

People

(Reporter: jan, Assigned: heftig)

References

(Blocks 1 open bug)

Details

(Keywords: nightly-community)

Attachments

(3 files, 1 obsolete file)

(Jan Alexander Steffens [:heftig] from bug 1640858 comment 12)

Something's wrong with the official nightly, it seems. Enabling GNOME-Shell's damage region painting shows that the official build isn't posting any partial damage and just submitting the whole window, unlike my builds, which do show individual tiles getting updated.


You can toggle the damage region painting by entering the following into the looking glass console (run lg using Alt+F2):

  • enable: Meta.add_clutter_debug_flags(0, Clutter.DrawDebugFlag.PAINT_DAMAGE_REGION, 0)
  • disable: Meta.remove_clutter_debug_flags(0, Clutter.DrawDebugFlag.PAINT_DAMAGE_REGION, 0)

PS: Enabling damage region painting is also one way to disable the shell's partial redrawing optimization, so ironically the bug seems to disappear.

Build from bug 1620076 comment 17:
$ pip3 install --upgrade mozregression
$ MOZ_ENABLE_WAYLAND=1 mozregression --repo autoland --launch 90998e8e1bdd --pref gfx.webrender.all:true gfx.webrender.max-partial-present-rects:-1 -a https://tagesschau.de

Actual: The whole window is red when hovering a link.
Expected: Only the hovered region should be red. It correctly works with the context menu (Basic widget).

Can confirm that, works fine with my own builds.

That is really weird, given my experience in bug 1646202.

Maybe I need to specify what I experience: when running gnome shell and enabling damage region painting (lg -> Meta.add_clutter_debug_flags(0, Clutter.DrawDebugFlag.PAINT_DAMAGE_REGION, 0)) Mutter draws damage regions as reported by applications. With the official build, these always show full damage, indicating that EGL_KHR_swap_buffers_with_damage is not used.

However, that does not necessarily mean that WR doesn't do partial damage internally somehow - maybe it even explains certain glitches.

Severity: -- → S3
Blocks: 1625070
Attachment #9162283 - Attachment mime type: text/x-log → text/plain
Attachment #9162284 - Attachment mime type: text/x-log → text/plain

The logs look all fine in the try build - now I wonder if there's something else that could be triggering a full repaint. Maybe something about GTK or so - can I look up the build specs somewhere? Do we build with GTK 3.14, as required now?

Important to note: on the distribution build of FF 78 on Fedora 32 things work fine. So this is apparently has to do with the build system.

bug 1634204 upgraded Linux requirements. Let's move this bug over to the Firefox Build System component.

Component: Graphics: WebRender → General
Product: Core → Firefox Build System
Keywords: leave-open

This should not fix anything but cleans up the code a bit to make
actually used fallback code easier to spot.

Assignee: nobody → robert.mader
Status: NEW → ASSIGNED
See Also: → 1617002

Hm try push is failing, but things work from a clean repo. I guess I need to schedule clobber or so - can someone point me to how to do that? Can't find it so far :/

Is the GDK_VERSION_3_24 symbol defined in the on-builder GTK version (which is it actually?)? AFAICS GTK doesn't forward-declare those symbols and the builders are on Debian 8 / GTK 3.14 per bug 1634204.

(In reply to Johannes Pfrang [:johnp] from comment #16)

Is the GDK_VERSION_3_24 symbol defined in the on-builder GTK version (which is it actually?)? AFAICS GTK doesn't forward-declare those symbols and the builders are on Debian 8 / GTK 3.14 per bug 1634204.

Thanks, that was it :)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4c6da910a8faa47a1001dde8c78542a36edd53c

This patch doesn't fix the bug. Can it be moved to its own bug in case it causes regressions? Bugs are cheap. Just push an updated commit message with the new bug number.

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #18)

This patch doesn't fix the bug. Can it be moved to its own bug in case it causes regressions? Bugs are cheap. Just push an updated commit message with the new bug number.

Yep. It's a preparation patch that I'll most likely need here or in bug 1617002, will move it then.

Comment on attachment 9162325 [details]
Bug 1648872 - Remove checks for unsupported GTK3 versions. r=stransky

Revision D82804 was moved to bug 1617002. Setting attachment 9162325 [details] to obsolete.

Attachment #9162325 - Attachment is obsolete: true
Depends on: 1617002
Keywords: leave-open

Ok, getting closer to the root cause. It's not about GTK but rather something about mesa/wayland. The actual place where things go wrong / happen differently is: https://gitlab.freedesktop.org/mesa/mesa/-/blob/63cf8adb12440512226dddbe3e233bcba87c7c18/src/egl/drivers/dri2/platform_wayland.c#L1037-1039

if (wl_proxy_get_version((struct wl_proxy *) dri2_surf->wl_surface_wrapper)
       < WL_SURFACE_DAMAGE_BUFFER_SINCE_VERSION)
      return EGL_FALSE;

i.e. we hit https://gitlab.freedesktop.org/mesa/mesa/-/blob/63cf8adb12440512226dddbe3e233bcba87c7c18/src/egl/drivers/dri2/platform_wayland.c#L1117-1122:

   /* If the compositor doesn't support damage_buffer, we deliberately
    * ignore the damage region and post maximum damage, due to
    * https://bugs.freedesktop.org/78190 */
   if (!n_rects || !try_damage_buffer(dri2_surf, rects, n_rects))
      wl_surface_damage(dri2_surf->wl_surface_wrapper,
                        0, 0, INT32_MAX, INT32_MAX);

So as far as I understand now, the problem is that wl_proxy_get_version returns 0, as wl_proxy is not versioned when compiled on try/official servers. Ignoring the check in mesa works just fine (successfully tested with a try-build), but that's not an option. Versions for these things were introduces in wayland-client 1.10 (feb 2016), while gtk-3.14 only requires 1.6. (~sept 2014).

Karl, would it be possible to bump Wayland requirements to 1.10 and ask older distributions to simply turn it off? That should fix the issue here I think. Something like adding wayland-client >= 1.10.0 to https://searchfox.org/mozilla-central/source/toolkit/moz.configure#279 and making sure it's available on the build servers.

Edit: maybe we can add a runtime check that disables the wayland backend if run on a system with older wayland-client before we enable it by default, so official builds can still run there?

See also: https://gitlab.freedesktop.org/wayland/wayland/-/blob/1.10/src/wayland-client.c (search for version)

Flags: needinfo?(karlt)

Martin, to me it looks like the implicit Wayland requirement is higher than 1.6 anyway, i.e. such systems can build but not run the Wayland backend. Is that right?

Flags: needinfo?(stransky)

I don't have a feel for the consequences of a change in wayland-client requirements.
Martin is the best person to make a decision here.

Flags: needinfo?(karlt)

Robert, IIRC the problem is inside Mesa where proxy is created by wl_proxy_create_wrapper(), right? If so I don't quite understand why wl_proxy_create_wrapper() creates the old proxy in Mesa when FF is built by Mozilla as we use system Mesa, right?

Flags: needinfo?(stransky)

Martin,

The problem seems to be that wl_registry_bind is static inline. The code from Wayland 1.10 creates a versioned proxy using wl_proxy_marshal_constructor_versioned. This version is replicated to other proxies created from this proxy (e.g. by wl_proxy_marshal_constructor in wl_compositor_create_surface or wl_proxy_create_wrapper in Mesa).

But, the wl_registry_bind from Wayland 1.6 creates the proxy using wl_proxy_marshal_constructor, which in Wayland 1.10 then copies wl_registry's version instead of using the version from the arguments. This version is 0 and originally comes from wl_display:

	/* We set this version to 0 for backwards compatibility.
	 *
	 * If a client is using old versions of protocol headers,
	 * it will use unversioned API to create proxies.  Those
	 * proxies will inherit this 0.
	 *
	 * A client could be passing these proxies into library
	 * code newer than the headers that checks proxy
	 * versions.  When the proxy version is reported as 0
	 * the library will know that it can't reliably determine
	 * the proxy version, and should do whatever fallback is
	 * required.
	 *
	 * This trick forces wl_display to always report 0, but
	 * since it's a special object that we can't bind
	 * specific versions of anyway, this should be fine.
	 */
	display->proxy.version = 0;

Wouldn't replacing our uses of wl_registry_bind with something like this be sufficient? I'm not sure the comment is right.

template <class T>
static inline T* WaylandRegistryBind(struct wl_registry* wl_registry,
                                     uint32_t name,
                                     const struct wl_interface* interface,
                                     uint32_t version) {
  struct wl_proxy* id;

  // When libwayland-client does not provide this symbol, it will be
  // linked to the fallback in libmozwayland, which returns NULL.
  id = wl_proxy_marshal_constructor_versioned(
      (struct wl_proxy*)wl_registry, WL_REGISTRY_BIND, interface, version, name,
      interface->name, version, NULL);

  if (id == NULL) {
    id = wl_proxy_marshal_constructor((struct wl_proxy*)wl_registry,
                                      WL_REGISTRY_BIND, interface, name,
                                      interface->name, version, NULL);
  }

  return static_cast<T*>(id);
}

(It's a template because I think this makes it nicer to use, avoiding more casts.)

This indeed looks promising! Unfortunately I won't be able to review and test for the next weeks (PTO) - so Martin, can you have a look (and maybe provide a try push so it can be tested?)

Flags: needinfo?(stransky)

Great, I'll look at it.

Flags: needinfo?(stransky)
Pushed by cbrindusan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6a8ca967e963
Add WaylandRegistryBind, a portable wl_registry_bind. r=stransky
Assignee: robert.mader → jan.steffens
Flags: needinfo?(robert.mader) → needinfo?(jan.steffens)

I see. We need to add wl_proxy_marshal_constructor_versioned() to widget/gtk/mozwayland/mozwayland.h.

Flags: needinfo?(jan.steffens)
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/71d1cfda4d39
Add WaylandRegistryBind, a portable wl_registry_bind. r=stransky
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

Verified fixed, thanks! :)
MOZ_ENABLE_WAYLAND=1 mozregression --launch 71d1cfda4d39 --pref gfx.webrender.all:true gfx.webrender.max-partial-present-rects:-1 -a https://www.youtube.com/watch?v=LXb3EKWsInQ

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.