Open Bug 1890074 Opened 1 year ago Updated 4 days ago

Crash in [@ HandleGLibMessage] with broken pipe

Tracking

()

Status:

ASSIGNED

Tracking Flags:

Tracking

Status

firefox-esr128

---

affected

firefox125

---

wontfix

firefox126

---

wontfix

firefox127

---

wontfix

firefox128

---

wontfix

firefox129

---

wontfix

firefox130

---

wontfix

firefox131

---

wontfix

firefox132

---

affected

firefox133

---

affected

People

(Reporter: mccr8, Assigned: stransky)

References

(Blocks 1 open bug)

Details

(Keywords: crash, leave-open)

Crash Data

Attachments

(6 files, 1 obsolete file)

Bug 1890074 [Wayland] Set client wayland buffer size r?emilio 9 months ago Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1890074 [Wayland] Use SCHED_FIFO for wayland proxy thread r?emilio 9 months ago Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1890074 [Wayland] Print compositor name in protocol handler error r?emilio 9 months ago Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1890074 [Linux] Add desktop environment identifier to HandleGLibMessage r?emilio 9 months ago Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1890074 [Wayland] Implement wayland proxy state string r?emilio 9 months ago Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1890074 [Linux] Print more info by Wayland error handlers r?emilio 9 months ago Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1890074 [Linux] Add missing <fstream> to fix GCC build r?emilio 9 months ago Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7) 48 bytes, text/x-phabricator-request		Details \| Review

Andrew McCreight [:mccr8]

Reporter

Description

•

1 year ago

Crash report: https://crash-stats.mozilla.org/report/index/526646de-7201-4b24-b0b5-e15dc0240405

MOZ_CRASH Reason: Error reading events from display: Broken pipe

Top 10 frames of crashing thread:

0  libxul.so  MOZ_Crash  mfbt/Assertions.h:317
0  libxul.so  HandleGLibMessage  toolkit/xre/nsSigHandlers.cpp:178
1  libxul.so  glib_log_writer_func  toolkit/xre/nsSigHandlers.cpp:205
2  libglib-2.0.so.0  g_log_structured_array  glib/gmessages.c:1994
2  libglib-2.0.so.0  g_log_structured_array  glib/gmessages.c:1967
3  libglib-2.0.so.0  g_log_structured_standard  glib/gmessages.c:2051
4  libgdk-3.so.0  gdk_event_source_check  gdk/wayland/gdkeventsource.c:96
5  libglib-2.0.so.0  g_main_context_check  glib/gmain.c:4072
6  libglib-2.0.so.0  g_main_context_iterate  glib/gmain.c:4245
7  libglib-2.0.so.0  g_main_context_iteration  glib/gmain.c:4313

Pretty high volume of these on Nightly.

Andrew McCreight [:mccr8]

Reporter

Updated

•

1 year ago

Summary: Crash in [@ HandleGLibMessage] → Crash in [@ HandleGLibMessage] with broken pipe

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

1 year ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 desktop browser crashes on nightly (startup)

For more information, please visit BugBot documentation.

Keywords: topcrash, topcrash-startup

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

1 year ago

Blocks: wayland

Priority: -- → P3

Wayne Mery (:wsmwk)

Updated

•

1 year ago

Updated

•

1 year ago

status-firefox125: --- → affected

status-firefox126: --- → affected

status-firefox127: --- → affected

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

1 year ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash-startup

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

1 year ago

status-firefox128: --- → affected

Liz Henry (:lizzard) (relman/hg->git project)

Comment 3

•

1 year ago

This is still a top crash in Nightly 128.

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

1 year ago

status-firefox125: affected → wontfix

status-firefox126: affected → wontfix

Pascal Chevrel:pascalc

Comment 4

•

1 year ago

The volume of this crash signature spiked on Nightly over the last 2 months.

Wayne Mery (:wsmwk)

Comment 5

•

1 year ago

Indeed, the nightly crash rate has increased more than 4x, starting around late March.

Odd, in the 2 months prior to Feb 20 there were 15 user comments, but since then not a single nightly crash report has a user comment, even though the crash rate is much higher

The following crashes mention scrolling or clicking on mouse

AFAICT none of the crash comments for channels other than nightly (which are numerous) mention scrolling.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

1 year ago

Flags: needinfo?(stransky)

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

1 year ago

status-firefox127: affected → wontfix

status-firefox128: affected → wontfix

status-firefox129: --- → fix-optional

status-firefox130: --- → affected

Comment hidden (Intermittent Failures Robot)

The 8472

Comment 7

•

11 months ago

(In reply to Wayne Mery (:wsmwk) from comment #5)

Odd, in the 2 months prior to Feb 20 there were 15 user comments, but since then not a single nightly crash report has a user comment, even though the crash rate is much higher

I had a crash that took down sway which also resulted in a firefox crash report tagged as @HandleGLibMessage. Since there's no chance to fill out the crash report dialog in such cases an increase in compositor crashes might explain that.

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

11 months ago

status-firefox131: --- → affected

Dianna Smith [:diannaS]

Updated

•

10 months ago

status-firefox129: fix-optional → wontfix

ksthiele

Comment 8

•

9 months ago

Looks like there is no crash report generated

MOZ_LOG=cubeb:5 MOZ_LOG_FILE=firefox.log flatpak run org.mozilla.firefox
libva info: VA-API version 1.19.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/intel-vaapi-driver/radeonsi_drv_video.so
libva info: Trying to open /usr/lib/x86_64-linux-gnu/GL/lib/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_19
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
libva info: va_openDriver() returns 0
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
Failed to create /var/home/kay/.var/app/org.mozilla.firefox/cache for shader cache (Permission denied)---disabling.
[Child 523, MediaDecoderStateMachine #1] WARNING: 7f1cecde8820 OpenCubeb() failed to init cubeb: file /builds/worker/checkouts/gecko/dom/media/AudioStream.cpp:285

ksthiele

Comment 9

•

9 months ago

sorry wrong ticket

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 10

•

9 months ago

Will look at it.

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Updated

•

9 months ago

Duplicate of this bug: 1917270

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

Priority: P3 → P2

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 12

•

9 months ago

We're getting crash from main process only (although we use wayland-proxy for main process only). I checked on try that wayland proxy is used by default.

There are various crashes produced by Gtk3, like:

Error reading events from display: Broken pipe
Error flushing display: Broken pipe
Error 32 (Broken pipe) dispatching to Wayland display.
Lost connection to Wayland compositor.

but these errors are issues behind the proxy. It means the proxy itself disconnected Gtk3 from Wayland compositor. We need to add more logs/diagnostics to find out what happens here.

We also should publish more data about actual Wayland compositor. It's possible that most of the crashes comes from unstable ones like Sway/Hyprland.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Assignee: nobody → stransky

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

The 8472

Comment 13

•

9 months ago

You also might want to gather pressure stall information (/proc/pressure/*), which can indicate resource starvation/load spikes which could lead to a compositor disconnect if the wayland client can't keep up with the server.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 14

•

9 months ago

•

Edited

Investigation more the reports. Looks like I don't have access to private report data but I can access it via Graphs. Some of them have 'Shutdown reason' set to AppClose which looks like we crash on Firefox quit (if read it correctly).

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Depends on: 1922116

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

See Also: → https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/188/diffs?commit_id=f48e02c3cab9581ea76398eb9d37256216f39b6d

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 15

•

9 months ago

https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/188/diffs?commit_id=f48e02c3cab9581ea76398eb9d37256216f39b6d may be relates here. It claims the buffer is unbounded now but from my testing it's enough to stop even processing for while (block wayland proxy for instance), focus frozen Firefox window and we're disconnected.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 16

•

9 months ago

Did more testing here. Firefox with mutter-46 can be easily crashes with 'broken pipe' crash.

Mutter-47 adds wl_display_set_default_max_buffer_size(..., 1024 * 1024) call which extends default compositor buffer size from 4K to 1M and that makes Firefox more stable - I can't reproduce the 'broken pipe' crash any more.

But that's quite unfortunate as application itself can't control server buffer size and has to rely on default. Let's hope that other compositors will be updated with larger buffers soon.

Wayland client side provides only wl_display_set_max_buffer_size() which adjusts wayland client buffer size and that's not very useful for us as we usually suffer from server buffer overflow.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 17

•

9 months ago

As we still may hit such issues we may consider an update to proxy cache to read events from compositor more aggressively to make sure we get all events from compositor ASAP.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 18

•

9 months ago

Let's also reconsider the real-time priority for compositor reading event. Should not be an issue if we spend most of the time in poll():
https://bugzilla.mozilla.org/show_bug.cgi?id=1743144#c96

  sched_param param;
  if (pthread_attr_getschedparam(&attr, &param) == 0) {
    param.sched_priority = sched_get_priority_max(SCHED_FIFO);
    pthread_attr_setschedparam(&attr, &param);
  }

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 19

•

9 months ago

btw. Looks like KDE/Kwin already adds wl_display_set_default_max_buffer_size() too (https://bugs.kde.org/show_bug.cgi?id=392376)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 20

•

9 months ago

Attached file Bug 1890074 [Wayland] Set client wayland buffer size r?emilio — Details

With wayland protocol 1.23 we can incerease buffer for wayland events on client side as a counterpart of wl_display_set_default_max_buffer_size() on server side.
Let's use the same values as mutter uses, i.e. 1M buffer size for events.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 21

•

9 months ago

Attached file Bug 1890074 [Wayland] Use SCHED_FIFO for wayland proxy thread r?emilio — Details

Use SCHED_FIFO for wayland proxy thread instead of SCHED_RR. It means the proxy will not be interupted until
all events are processed and we'll wait in poll(). It helps to get all events from compositor in time
and don't be disconnected as unresponsible application.

Depends on D224739

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 22

•

9 months ago

Let's see if the two patches here helps to lower the crash ratio on Nightly at least.

Flags: needinfo?(stransky)

Emilio Cobos Álvarez (:emilio)

Comment 23

•

9 months ago

Note that some of these crashes are also compositor crashes (e.g. I've seen this when KWin crashed here for example). But sure the changes looks good.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 24

•

9 months ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #23)

Note that some of these crashes are also compositor crashes (e.g. I've seen this when KWin crashed here for example). But sure the changes looks good.

That's interesting. Let's see how the patches behave in nightly and if there's any change in crash ratio.

Pulsebot

Comment 25

•

9 months ago

Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/1c8b487d8fc1 [Wayland] Set client wayland buffer size r=emilio https://hg.mozilla.org/integration/autoland/rev/6a5f881dcf7f [Wayland] Use SCHED_FIFO for wayland proxy thread r=emilio

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Depends on: 1923086

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 26

•

9 months ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #23)

Note that some of these crashes are also compositor crashes (e.g. I've seen this when KWin crashed here for example). But sure the changes looks good.

I think we can detect wayland compositor crash by wayland-proxy, will look at it (Bug 1923086). We can at least filter out such event and provide better crash message.

Sandor Molnar[:smolnar]

Comment 27

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1c8b487d8fc1
https://hg.mozilla.org/mozilla-central/rev/6a5f881dcf7f

Status: NEW → RESOLVED

Closed: 9 months ago

status-firefox133: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 133 Branch

BugBot [:suhaib / :marco/ :calixte]

Comment 28

•

9 months ago

Since nightly and release are affected, beta will likely be affected too.
For more information, please visit BugBot documentation.

status-firefox132: --- → affected

(PTO June 19-July 7) Ryan VanderMeulen

Updated

•

9 months ago

status-firefox130: affected → wontfix

status-firefox131: affected → wontfix

status-firefox-esr128: --- → affected

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 29

•

9 months ago

Let's keep this open to track the crashes. The patches here may not fix the issue but I hope to get less crash ratio. Bug 1923086 may filter our compositor crashes as we use different crash handler there.

Status: RESOLVED → REOPENED

status-firefox133: fixed → affected

Resolution: FIXED → ---

Target Milestone: 133 Branch → ---

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Status: REOPENED → ASSIGNED

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 30

•

9 months ago

The crash ratio doesn't look affected the patches here. Let's see if Bug 1923086 makes any difference and filters out compositor crashes.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 31

•

9 months ago

Attached file Bug 1890074 [Wayland] Print compositor name in protocol handler error r?emilio — Details

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

Keywords: leave-open

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

9 months ago

Flags: needinfo?(stransky)

Pulsebot

Comment 32

•

9 months ago

Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/f17ad05e0870 [Wayland] Print compositor name in protocol handler error r=emilio

Serban Stanca [:SerbanS]

Comment 33

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f17ad05e0870

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 34

•

9 months ago

Looks like we're still getting crashes without compositor signature here. That means the crash comes directly from gdk_event_source_check() as
https://github.com/GNOME/gtk/blob/1d2fe52e96685464b4bd11b7ba597b434ce60ca7/gdk/wayland/gdkeventsource.c#L116

which means proxy cache is missing. According to the backtraces the proxy cache is off or already terminated by previous error / closed connection or so.

We may add more info about proxy cache to crash report (off / terminated) and extend error messages generated by HandleGLibMessage().

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 35

•

9 months ago

I'll try to add compositor name to HandleGLibMessage() and Wayland proxy state to App Notes.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 36

•

9 months ago

Attached file Bug 1890074 [Linux] Add desktop environment identifier to HandleGLibMessage r?emilio (obsolete) — Details

Phabricator Automation

Updated

•

9 months ago

Attachment #9430717 - Attachment is obsolete: true

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 37

•

9 months ago

Attached file Bug 1890074 [Wayland] Implement wayland proxy state string r?emilio — Details

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 38

•

9 months ago

Attached file Bug 1890074 [Linux] Print more info by Wayland error handlers r?emilio — Details

Depends on D225484

Pulsebot

Comment 39

•

9 months ago

Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/3173bfa616ca [Wayland] Implement wayland proxy state string r=emilio https://hg.mozilla.org/integration/autoland/rev/4b9daccbd5c9 [Linux] Print more info by Wayland error handlers r=emilio

Cristian Tuns

Comment 40

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3173bfa616ca
https://hg.mozilla.org/mozilla-central/rev/4b9daccbd5c9

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 41

•

9 months ago

Attached file Bug 1890074 [Linux] Add missing <fstream> to fix GCC build r?emilio — Details

Pulsebot

Comment 42

•

9 months ago

Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/6d68f30cda4f [Linux] Add missing <fstream> to fix GCC build r=emilio

Sandor Molnar[:smolnar]

Comment 43

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6d68f30cda4f

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 44

•

9 months ago

From the early logs it looks like we're getting compositor crashes mostly from KDE (5x KDE, 1x Gnome) but compositor disconnection from both. I think Fedora 41 / Gnome 47 will improve the situation a bit as it comes with an extended buffer on server side.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 45

•

9 months ago

•

Edited

OTOH we're getting 'disconnection' logs from Gnome only so far (4x).

(gnome) Error 32 (Broken pipe) dispatching to Wayland display. Proxy: WP:E WP:RT WP:CA WP:CR WP:CPCA WP:CPCF

That means Firefox operates as expected and cache is running but we were disconnected by compositor for an unknown reason (CPCF - compositor closed the connection to proxy). Looks like Gnome is more picky about application response time while KDE compositor crashes likely.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

8 months ago

Comment 46

•

8 months ago

We may be covered here at least from long term perspective. Now we know that HandleGLibMessage crash is caused by heavy system load under Gnome due to small Wayland message buffer.

The heavy load issue should be addressed by Mutter 47 where wayland message buffer is greatly extended (from 4k to 1M) by explicit wl_display_set_default_max_buffer_size() call.

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 47

•

8 months ago

We see lot of crashes from Arch. However Arch already packaged mutter-47.0 (https://archlinux.org/packages/extra/x86_64/mutter/) so it may hit users soon as it's rolling release distro.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 48

•

8 months ago

There are Firefox Wayland crashes on Fedora 41 / Gnome which contains extended Wayland buffer and it's supposed to be fixed. So the crash reason must be something different or there's another reason for it.

A possible issue may be https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/7859 - blocked Wayland event read if multiple threads are reading (poll) from Wayland fd connection. That happens if HW rendering is used with Mesa/Wayland backend - Mesa has its own event loop and waits for GL front buffer release. We call that code from Rendering thread as eglSwapBuffers().

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

8 months ago

See Also: → https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/7859

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 49

•

8 months ago

AFAIK we don't use HW rendering on our testsuite so we may not see such crashes there.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 50

•

8 months ago

I just reproduced the crash locally, Fedora 41 / Gnome:

Load system with something (I used Firefox compilation on background)
Run nested Wayland compositor - mutter --nested
Run Firefox testsuite (I used WAYLAND_DISPLAY=wayland-1 ./mach mochitest --setpref gfx.webrender.software=true dom) where is dedicated nested compositor from 2) where the test is actually running so it doesn't interfere with recent desktop session.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

8 months ago

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 51

•

8 months ago

If it's related to https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/7859 we'd need a custom gtk3 build for testing.

Jan Alexander Steffens [:heftig]

Comment 52

•

7 months ago

gtk!7859 got reverted and replaced by gtk!7865.

See Also: https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/7859 → https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/7865

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

7 months ago

Comment 53

•

7 months ago

Hit repeatedly on Firefox start while debugging something else. I expected to get Wayland protocol error but I got this one (compositor disconnected bug). Only info I've got from journal is:

Nov 28 09:14:59 fedora-laptop gnome-shell[2305]: WL: error in client communication (pid 148075)
Nov 28 09:14:59 fedora-laptop gnome-shell[2305]: (../src/wayland/meta-wayland-buffer.c:1013):meta_wayland_buffer_finalize: runtime check failed: (buffer->use_count == 0)
Nov 28 09:15:19 fedora-laptop gnome-shell[2305]: meta_wayland_buffer_process_damage: assertion 'buffer->resource' failed

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 54

•

7 months ago

•

Edited

I hit another incarnation of the bug. Crashes on GlibHandle and journal contains:
Couldn't map window 0x7ff97f923160 as subsurface because its parent is not mapped.

testcase:

Create desktop with two monitors (one with scale 300%, one 200%), place side by side
Open Firefox on 200% screen, go to https://developer.mozilla.org/en-US/docs/Web/API/Geolocation_API/Using_the_Geolocation_API#examples
Open geolocation popup, keep it on top
Flip Firefox state between Tiled/Normal state, tiled mode needs to be on side where 300% scaled monitor is located.
Repeat until crash

This testcase doesn't need system load.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 55

•

7 months ago

Filed as https://gitlab.gnome.org/GNOME/mutter/-/issues/3805

Flags: needinfo?(stransky)

See Also: → https://gitlab.gnome.org/GNOME/mutter/-/issues/3805

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 56

•

7 months ago

I wonder if this sequence is the problem:

[2428839.234] {mesa egl surface queue}  -> wl_surface#66.attach(wl_buffer#75, 0, 0)
[2428839.240] {mesa egl surface queue}  -> wl_surface#66.damage_buffer(0, 0, 1024, 512)
[2428839.245] {mesa egl surface queue}  -> wl_surface#66.commit()
[2428839.249] {mesa egl surface queue}  -> wl_display#1.sync(new id wl_callback#70)
[2428839.451] {mesa egl surface queue}  -> wl_buffer#75.destroy()

MESA destroys wl_buffer#75 before compositor releases it. That may match mutter message from journalctl:

Nov 29 08:25:06 fedora-laptop gnome-shell[2307]: meta_wayland_buffer_process_damage: assertion 'buffer->resource' failed

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 57

•

7 months ago

Seems to be related to EGL as I can't reproduce it with gfx.webrender.software = true.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 58

•

7 months ago

•

Edited

Jonas Adahl pointed out that the bug here is related to actual Firefox bug which causes wayland protocol error but it's not routed to client from server. Local debugging shows that mutter sends:

wl_display#1.error(wl_surface#66, 2, "Buffer size (2337x1767) must be an integer multiple of the buffer_scale (2).")

but the message is not received on Firefox side (with both proxy enabled/disabled) and we're just disconnected by Mutter. Looks like the messages are not flushed to client so we're getting a generic 'client was disconnected' error instead of the real one.

Jonas also suggested to use viewports as workaround for the fixed scale settings (which causes the issues) here so I'll investigate this direction.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

7 months ago

Depends on: 1934217

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 59

•

7 months ago

Bug 1934217 adds more time to process error messages on Firefox side. Let's see what bugs it reveals.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

7 months ago

Duplicate of this bug: 1934247

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

7 months ago

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Comment 61

•

7 months ago

When Bug 1934217 lands I expect HandleGLibMessage crash transfer to mozilla::widget::WlLogHandler crash (Bug 1932639) where we get the actual Wayland error.

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

7 months ago

Duplicate of this bug: 1932415

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

6 months ago

Flags: needinfo?(stransky)

Comment 63

•

5 months ago

Another crash hidden in HandleGLibMessage / broken pipe is D&D one (Bug 1941119) which produces the

[GFX1-]: (gnome) Wayland protocol error: unknown object (4278190080), message error(ous)

bug which is recently a topcrash on Wayland AFAIK.

Priority: P2 → P1

Comment 65

•

4 months ago

(In reply to Martin Stránský [:stransky] (ni? me) from comment #63)

Another crash hidden in HandleGLibMessage / broken pipe is D&D one (Bug 1941119) which produces the
[GFX1-]: (gnome) Wayland protocol error: unknown object (4278190080), message error(ous)
bug which is recently a topcrash on Wayland AFAIK.

Wayland protocol error: unknown object should be fixed by Bug 1949726.

Mathew Hodson

Updated

•

4 months ago

Updated

•

3 months ago

Duplicate of this bug: 1954822

Comment hidden (Intermittent Failures Robot)

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

25 days ago

Depends on: 1970062

BugBot [:suhaib / :marco/ :calixte]

Comment 68

•

11 days ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Martin Stránský [:stransky] (ni? me) (PTO, back on Jul 7)

Assignee

Updated

•

4 days ago

Duplicate of this bug: 1966378

You need to log in before you can comment on or make changes to this bug.

Crash in [@ HandleGLibMessage] with broken pipe

Summary

Repository breakdown:

Platform and build breakdown:

Table

For more details, see: