Open Bug 1795851 Opened 2 months ago Updated 12 hours ago

[wayland] Crash wl_surface@76: error 2: Buffer size (170x113) is not divisible by scale (2)

Categories

(Core :: Widget: Gtk, defect, P3)

Firefox 107
defect

Tracking

()

ASSIGNED

People

(Reporter: luis.pabon, Assigned: stransky)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0

Steps to reproduce:

  • Firefox nightly snap 107.0a1 (2022-10-13)
  • Ubuntu 22.10
  • Sway (master)

I have 3 outputs: laptop's built in 4k panel (scale 2), external 1080p display (scale 1) and another external 900p display (scale 1)

I've been having a lot of instability while moving tabs on Firefox nightly lately.

If I place the firefox window on the scale 2 output, I can move tabs around without issue. If I move firefox to a scale 1 output and then I try to do the same, firefox crashes.

There doesn't seem to be any correlation to which output firefox initially booted into and the crashes. Just which output I'm using.

Actual results:

https://crash-stats.mozilla.org/report/index/bp-f5058e04-5d12-4dbe-bf47-9f5630221018

Expected results:

Should've worked

On firefox 106 beta I also get crashes, but when moving the screen from output to output or moving a PiP window from output to output:

https://crash-stats.mozilla.org/report/index/ae60fb60-72bd-41f5-b069-1d2b20221018

The MOZ_CRASH reason seems to be the same in both cases though, something like:

wl_surface@47: error 2: Buffer size (1920x1049) is not divisible by scale (2)

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core

Crash report for that debug file I just uploaded: https://crash-stats.mozilla.org/report/index/e3250020-c4dc-4c46-b913-5ac590221018

Just to point out that the firefox beta installation I have is from deb, not snap. Doesn't seem to be related to the installation medium.

Blocks: wayland
Priority: -- → P3
Summary: [wayland] Crash when moving tabs on a mixed dpi environment on the lower dpi output → [wayland] Crash wl_surface@76: error 2: Buffer size (170x113) is not divisible by scale (2)
Duplicate of this bug: 1800214

From my duplicate https://bugzilla.mozilla.org/show_bug.cgi?id=1800214:

It's also reproducible on a clean profile.

I see the crash reason is wl_surface@49: error 2: Buffer size (6450x2529) is not divisible by scale (2)

This means that Firefox is setting the scale for a surface to 2 (e.g.: it says this is scaled up 2x), but the surface is not divisible by 2. This pretty much sounds like a bug in Firefox; the data it is sending to the compositor is inconsistent.

I've no idea why it only triggers with two outputs. It also doesn't matter on which output I run Firefox.

It should be noted that my outputs have different scale; one is 2x and the other is 3x. Note that the values above are divisible by 3. Maybe Firefox is only considering the scale for the larger monitor, but then marking surfaces as scale for the other? I admit that I'm guessing at this point

This USED to work on some compositors, but strictly speaking, this was a bug in the compositor; passing a surface with a size that is not divisible by the scale is a protocol violation.

This is rather intuitive when you think about it: you can't have a surface that's 200x199 and say "this has been scaled to 2x", because that means that the unscaled size should be 100x99.5px, where pixels must be integers.

It seems Firefox currently uses the largest scale of all outputs to render, but it should probably use the scale of the output on which the window has entered anyway. Rendering a 3x a window on a 2x display is pointless (and the compositor has to do the downscaling).

We know scale for every active monitor/window so we can align surface size to scale. But we don't do that now because mutter/kwin (major Wayland compositor) doesn't force it. So it's kind of TODO here.

We may add an assert to your code to make sure we follow this rule.

I believe yet-to-be-released Mutter versions do enforce this, so it might just be a matter of time. This is enforced by wlroots (sway's underlying library), so KWinFT will have the same issue as dependencies update.

Aside from this issue, keep in mind that rendering at the largest possible scale is a waste of resources. Note that the window in the above example is rendered at 6450x2529. For a window on a 3440x1400 screen. The CPU and memory consumption at that scale are not minor.

(In reply to Hugo Osvaldo Barrera from comment #9)

Aside from this issue, keep in mind that rendering at the largest possible scale is a waste of resources. Note that the window in the above example is rendered at 6450x2529. For a window on a 3440x1400 screen. The CPU and memory consumption at that scale are not minor.

Window size it not set by widget code, widget renders what Firefox/layout sends. Not sure if that's some monitor size reporting bug or so?
Anyway, looking forward to see updated mutter so we can fix that.

The main problem is that Firefox (and other apps with the same issue like mpv) becomes unusable on Sway if you have a mixed dpi display setup, as it's constantly crashing when moving things around.

Not sure if that's some monitor size reporting bug or so?

It's not, I'll try rephrasing the issue here: the window size in this case is 2150x843. This display has a scale of 2. So the surface area should have been either 2150x843 (with scale=1) or 4300x1686 (with scale=2).

It could also be larger (e.g. 6450x2529 with scale=3) and the compositor would downscale it. That's kinda wasteful, but valid.

For reference, from wl_surface::set_buffer_scale:

Note that if the scale is larger than 1, then you have to attach a buffer that is larger (by a factor of scale in each dimension) than the desired surface size.

So, given that Firefox is calling set_buffer_scale with value 2, the buffer should be 4300x1686, and not 6450x2529.

My other monitor (which was not the window on which Firefox was rendering) has a scale of 3, and it seems that Firefox is using that value.

If you multiply 2150x843 by 3, you get 6450x2529. Firefox is scaling surfaces up by X, where X=3 is the largest scale of all available displays, but it is reporting that it has scaled up the surfaces by a scale equal to the output's scale (in this case, 2). These values are inconsistent. Firefox is sending a surface of 6450x2529 with scale=2. So this surface scales to 3225x1264.5px, a non-integer size. This is a protocol violation for a reason, and swaywm just crashes when receiving invalid input. Mutter seems to be more lenient and work around it (there's likely some visual artefact though, since the numbers simply don't add up).

This results in two big problems:

  1. As mentioned above, it renders something gigantic which the compositor then has to downscale. This is problematic but non-fatal.
  2. The main topic at hand in this issue: Firefox says "here's a 6450x2529 surface which is scale 2x", but 2529 is CLEARLY not divisible by 2. The input is invalid, and swaywm rejects this invalid input.

This bug is fatal. IF the window has a size Y such that Y/3 is integer but Y/2 is not, Firefox crashes immediately (in this case, 3 and 2 are the scales of each of my displays). It is an EXTREMELY annoying bug, because it means that I basically have to chose between running Firefox, or using two displays, but I cannot do both.

To summarise this in shorter terms:

  • Firefox scales the buffer by 3x.
  • Firefox tells the compositor that the buffer is scaled up by 2x.
  • Firefox send the buffer data.
  • The numbers don't make sense, and the compositor kills the client because the client is sending bogus data.

Anyway, looking forward to see updated mutter so we can fix that.

The fact that Mutter accepts this input is really non-standard behaviour. It's being lenient, ignoring the error, and doing something other than what the spec specifies (maybe it's just discarding a column of pixels? maybe it's overflowing somewhere?). The result should have visual artefacts, because the math just doesn't add up.

Why is Mutter's behaviour here a blocker for this issue?

Window size it not set by widget code, widget renders what Firefox/layout sends.

I don't quite understand this statement.

Duplicate of this bug: 1794018

I though I'd heard of this same issue on Mutter. It turns out the non-standard behaviour is merely a bug, not by design.

The bug has been fixed in https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2188. With this change, Firefox also crashes on Mutter.

Any update on this, Martin? We'll release Sway 1.8 soon, which crashes Firefox because of this bug.

Will test the mutter patch. It's difficult for me to use Sway and even configure two different monitors with different scales.

I'm not sure how you're testing sway, but keep in mind that you can run it nested inside another compositor. It'll run with a virtual output, and render as a window on the parent compositor.

You can use WLR_WL_OUTPUTS=2 to have it run with multiple displays. Each "display" will render as a separate window on the parent compositor. For this particular case, you can configure each output to have a different scale via the configuration file, e.g.:

cat sway.conf
exec foot # run a terminal
output WL-1 scale 2
output WL-2 scale 3
WLR_WL_OUTPUTS=2 sway --config sway.conf

See also: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/docs/env_vars.md

Yeah, the above should be enough to reproduce the bug. Additionally, here are instructions to build Sway: https://github.com/swaywm/sway/wiki/Development-Setup#compiling-as-a-subproject

I used mutter with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2188 applied and that reliably crashes Firefox.

Assignee: nobody → stransky
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

We need to return correct EGLWindow from moz_container_wayland_get_egl_window() with correct scale/size
and also keep the EGLWindow up to date. In this patch we do:

  • Implement moz_container_wayland_egl_window_needs_size_update() and use it in nsWindow::SetEGLNativeWindowSize().
    Avoid redundant moz_container_wayland_egl_window_set_size()/moz_container_wayland_set_scale_factor() calls as it may lead to resize callback calls to MESA.
  • If wl_container::eglwindow is present, check its size/scale in moz_container_wayland_get_egl_window() and update size/scale if needed.
  • Use nsIntSize single param instead of width/height pairs in some moz_container_* functions.
  • Assert when gtk_widget_get_window(container) returns null.

Depends on D163697

Pushed by stransky@redhat.com:
https://hg.mozilla.org/integration/autoland/rev/75c7055abbc7
[Linux] Add more logging to GtkCompositorWidget r=emilio
https://hg.mozilla.org/integration/autoland/rev/105058a4927e
[Linux] Log screen scales for ScreenHelperGTK r=emilio
https://hg.mozilla.org/integration/autoland/rev/a8770c8bc4f0
[Wayland] Update EGL window size/scale in moz_container_wayland_get_egl_window() r=emilio

Thank you!

You need to log in before you can comment on or make changes to this bug.