Closed Bug 1652820 Opened 5 years ago Closed 2 years ago

[sway][keyboard / input setup] On sway reload: crash with "Lost connection to Wayland compositor"

Categories

(Core :: Widget: Gtk, defect, P3)

79 Branch
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: luis.pabon, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(6 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0

Steps to reproduce:

  1. Start firefox (wayland backend) on sway
  2. Reload sway (default shortcut is ctrl+shift+c)

Actual results:

Crash!

When starting sway from the terminal, the following is dumped to the console upon the crash:

Gdk-Message: 18:33:44.305: Lost connection to Wayland compositor.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
[GFX1-]: Receive IPC close with reason=AbnormalShutdown
Exiting due to channel error.

There's nothing on sway's debug logs to indicate it killed Firefox some other way.

Expected results:

No crash!

None of the other apps seem to crash when reloading sway. I use only GTK and Xwayland, don't run any QT apps.

This is not a regression, I've seen this happening with earlier versions of Sway, Ubuntu and Firefox. I only just got around reporting it.

Ubuntu 20.04
Firefox 79.0~b7+build1-0ubuntu0.20.04.1 (mozillateam/firefox-next ppa)
Sway 1.5-rc2
WebRender enabled
i7-7700HQ / Intel graphics

It doesn't happen on every Sway reload, maybe 7 times out of 10. It's easily reproducible on my system.

Blocks: wayland-sway
Component: Untriaged → Widget: Gtk
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64

I can confirm this happens for me with every reload of sway. Here are the messages in dmesg:

[ 537.576597] Web Content[7468]: segfault at 0 ip 00007f1bbbb97a50 sp 00007ffe442f1ad8 error 6 in libxkbcommon.so.0.0.0[7f1bbbb8f000+1b000]
[ 537.576605] Code: 2e 0f 1f 84 00 00 00 00 00 90 48 85 f6 48 8d 05 e6 89 ff ff 48 0f 44 f0 48 89 77 08 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 <83> 07 01 48 89 f8 c3 66 0f 1f 84 00 00 00 00 00 41 55 48 89 f2 49
[ 537.576736] audit: type=1701 audit(1595955626.372:130): auid=1000 uid=1000 gid=1000 ses=1 pid=7468 comm=57656220436F6E74656E74 exe="/home/jonm/usr/local/share/software/firefox/firefox-bin" sig=11 res=1

I'm running Firefox-nightly with Wayland backend and latest Sway.

With G_MESSAGES_DEBUG=all

Exiting due to channel error.
Exiting due to channel error.
Sandbox: Unexpected EOF, op 2 flags 00 path /home/jonm/.config/xkb
Exiting due to channel error.

I don't have a ~/.config/xkb file.

I've also been experiencing this issue.

$ uname -srmov
Linux 5.9.14-1-ck-zen #1 SMP PREEMPT Sun, 13 Dec 2020 02:56:58 +0000 x86_64 GNU/Linux
$ sway --version 
sway version 1.5.1
$ firefox --version
Mozilla Firefox 84.0.2
$ lspci | grep VGA
44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)
$ pacman -Q xf86-video-amdgpu \
 {,xf86-input-}libinput \
 wayland-{,protocols-}git \
 libxkbcommon \
 gtk3 \
 wlroots
xf86-video-amdgpu 19.1.0-2
libinput 1.16.4-1
xf86-input-libinput 0.30.0-1
wayland-git 1.18.0.r25.ga61ae8ec-1
wayland-protocols-git 1.18.189.82d4c15-1
libxkbcommon 1.0.3-1
gtk3 1:3.24.24-2
wlroots 0.12.0-1

This crash seems to only occur when certain xkb configuration commands are used in the sway config file, specifically input * xkb_capslock enabled and/or input * xkb_numlock enabled.

Minimal reproducable sway config file:

input type:keyboard xkb_numlock enabled

Steps to reproduce:

  1. Launch Sway with the provided configuration file. Optionally enable debug output with Sway's -d command line option
  2. Enter Sway's default Mod4+Return key combination to launch a terminal instance (alacritty by default).
  3. Execute the following command to launch Firefox: G_MESSAGES_DEBUG=all WAYLAND_DEBUG=1 MOZ_ENABLE_WAYLAND=1 firefox
  4. Enter Sway's default Mod4+Shift+c key combination to cause Sway to reload its config file.
  5. Repeat step 4 until a Firefox crash occurs.
  6. Comment out the input ... line in the Sway config file, then repeat step 4 and observe that no crash should occur.

Here are my logs:

This crash seems to only occur when certain xkb configuration commands are used in the sway config file, specifically input * xkb_capslock enabled and/or input * xkb_numlock enabled.

I should add that there may be more, but these ones are culprits for sure.

I also have non-standard xkb options by using a custom layout that swaps the mod keys around:

default partial alphanumeric_keys modifier_keys
xkb_symbols "basic" {
include "pt"
name[Group1] = "Portuguese (Custom)";

# Make CapsLock Mod4 and Windows key Mod5
key <CAPS> { [ Hyper_L ] };
modifier_map Mod4 { Hyper_L };

key <LWIN> { [ Super_L ] };
modifier_map Mod5 { Super_L };

};

I'm seeing similar looking crashes but mostly when connecting/disconnecting external screens so it may be a separate issue.

This is my current sway input configuration, I don't have any issues unless I uncomment the xkb_numlock line:

set {
  $x_cap caps:escape
  $x_alt altwin:swap_alt_win

  $x_kp  keypad:future,kpdl:kposs
  $x_etc shift:both_capslock_cancel
  $x_sp  nbsp:level3

  $xkb_opts $x_cap,$x_alt,$x_kp,$x_etc,$x_sp
}

input {
  type:keyboard {
    repeat_delay 300
    repeat_rate  25
    # xkb_numlock enabled
    xkb_options $xkb_opts
  }
}

I don't have anything esoteric on my input config:

### Touchpad
input "1739:31251:DLL07BE:01_06CB:7A13_Touchpad" {
    # accel_profile flat
    drag enabled
    tap enabled
}

### Keyboards
input "*" xkb_layout gb
input "1118:219:Microsoft_Natural®_Ergonomic_Keyboard_4000" xkb_numlock enabled

### Touchscreen
# map_to_output ensures touchscreen only points to things on the laptop's display - not on any other active display
input "1267:9376:ELAN_Touchscreen" {
    map_to_output eDP-1
    natural_scroll disabled
}

How would libinput settings affect Firefox not handling losing connection to the compositor anyhow?

Removing all of my input config entirely and exiting/loading back into sway has no effect. FF still dies if I reload sway.

Luis, have you tested whether this occurs if you start sway with no configuration at all?

I just have, and indeed it does not.

I have a feeling it might be a timing issue. Sway on default config reloads almost instantly, whereas with my config it does not, it takes a good second or two. I presume it's because stuff like waybar, kanshi and others are also restarting alongside sway.

hello luis, what is the refresh rate of the display you are using? i remember having issues with a high refresh rate display. not sure though. fyi.

Nothing out of the ordinary, right now it's a 4k 60Hz panel with optionally an extra 1080p 60Hz and 720p 60Hz displays when I'm docked.

I would recommend manually bisecting your config until you identify the option that is causing the crash. That’s how I determined xkb_numlock was the culprit for me.

I too originally suspected that the reload delay was the issue, but I’ve ruled that out for myself – it still takes about a second or two to fully reload, but Firefox never crashes as long as those bad options aren’t included. Conversely, when testing with an otherwise empty config, adding that single option didn’t affect the very quick reload time, but it did reliably cause the crash.

To be fair, ultimately it does not matter what in particular is tripping Firefox up. FF is the only app crashing when sway reloads and the issue needs to be looked at there.

I agree that the issue seems to lie with Firefox. I just think any additional information regarding what is causing the crash will help a Firefox contributor to debug and fix it sooner.

+1 this happens on my system and it only happens when i start firefox with MOZ_ENABLE_WAYLAND=1.. i usually run three windows with ~10-40 tabs each..

my system:

Linux work-x1c-arch 5.10.14-arch1-1 #1 SMP PREEMPT Sun, 07 Feb 2021 22:42:17 +0000 x86_64 GNU/Linux
sway version 1.5-63420a2c (Feb  8 2021, branch 'master')
Mozilla Firefox 85.0.1

the output i get when it crashes is:

Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.

I've tried

when i remove the input related parts from the sway config it does happen less often, but still does, which leads me to think it is something related to timing..

Definitely related to timing. I've also managed to crash firefox with some weird condition where the clipboard took over a second to be accessed (even with wl-paste), probably related to the data being copied from xwayland. I've also notice firefox crashes on reload less often nowadays, and it crashes more often on my laptop than my desktop (the former takes 1-2 seconds to reload, the latter under a second, despite nearly identical configuration).

There might be some timeout triggering the channel error after 1 or 2 seconds without communication to the WM.

Please run (and crash) Firefox with WAYLAND_DEBUG=1 env variable and attach the log here.
Thanks.

Flags: needinfo?(luis.pabon)
Flags: needinfo?(luis.pabon)
Attached file firefox.log

Apologies, here's the attachment instead

:luis.pabon please gather the sway debug output (sway -d 2>sway.log), and possibly sway's output of WAYLAND_DEBUG=server. Please try to keep the output minimal - a mostly empty config would be best.

It could be a protocol error from firefox's side, it could be a sway bug, who knows.

Flags: needinfo?(luis.pabon)

The sway command that makes this happen to me is:
seat * hide_cursor 8000
You can get the conf of sway from here:
https://github.com/swaywm/sway/blob/master/config.in

Include the seat reload sway an firefox will crash with the error:
Lost connection to Wayland compositor

I opened a ticket with sway, but they closed it pointing to this ticket:
https://github.com/swaywm/sway/issues/6085

Looks like this is related with seats; on sway there is this function:
wl_event_loop_add_idle
It's used only in the reload and on the keyboard part; if I have that function from the keyboard on sway I get a different error on firefox:

Sandbox: Unexpected EOF, op 2 flags 0400000 path /sys/devices/pci0000:00/0000:00:08.1/0000:06:00.0
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Sandbox: Unexpected EOF, op 2 flags 00 path /usr/share/X11/xkb
xkbcommon: ERROR: failed to add default include path /usr/share/X11/xkb```

There is another bug from 3 years ago about seats:
https://bugzilla.mozilla.org/show_bug.cgi?id=1466015

wl_event_loop_add_idle just registers a function to be run next time the server event loop is idle. The one you found registers do_reload, which performs the actual reload.

Commenting that out just means that sway will do nothing on reload, so not particularly surprising that a reload-triggered bug goes away.

I removed it from the input/keyboard.c; not from do_reload:

index 541fc90d..5b70e778 100644
--- a/sway/input/keyboard.c
+++ b/sway/input/keyboard.c
@@ -784,8 +784,9 @@ static void sway_keyboard_group_remove(struct sway_keyboard *keyboard) {
 
                // To prevent use-after-free conditions when handling key events, defer
                // freeing the wlr_keyboard_group until idle
-               wl_event_loop_add_idle(server.wl_event_loop,
-                               destroy_empty_wlr_keyboard_group, wlr_group);
+               //wl_event_loop_add_idle(server.wl_event_loop,
+//                             destroy_empty_wlr_keyboard_group, wlr_group);
+               destroy_empty_wlr_keyboard_group(wlr_group);
        }
 }

Anyway the issue happens removing that part of the code.
I included the crash when using the seat command. I run the test on sway 1.5.1 without modifications on the source.

Looks like this is also makes firefox to crash:

input *  {
    xkb_layout us
    xkb_variant intl
}

But it works fine if you specify the keyboard:

input "1452:591:Keychron_Keychron_K1" {
    xkb_layout us
    xkb_variant intl
}

On Nightly with sway-git on my Arch Linux desktop, I don't see this issue, but I do on FF Nightly on a Debian Testing laptop (Sway 1.5). Same Sway configs. Can I try anything on the Debian laptop to troubleshoot?

(In reply to denisdoria+mozilla from comment #28)

Looks like this is also makes firefox to crash:

input *  {
    xkb_layout us
    xkb_variant intl
}

But it works fine if you specify the keyboard:

input "1452:591:Keychron_Keychron_K1" {
    xkb_layout us
    xkb_variant intl
}

confirming, this works.

For me it was the same issue, changing input * xkb_numlock enable to input 1133:49971:Logitech_Gaming_Keyboard_G610_Keyboard xkb_numlock enable made firefox no longer crash on sway reload

It seems to happen when there is more than one keyboard affected by the input configuration:

input "3141:30354:SONiX_USB_Keyboard" {
  xkb_layout us
  xkb_numlock enabled
  xkb_options compose:caps
  xkb_variant altgr-intl
}
input "1241:513:USB-HID_Keyboard" {
  xkb_layout us
  xkb_numlock enabled
  xkb_options compose:caps
  xkb_variant altgr-intl
}

makes firefox crash on reload when both keyboards are connected. When I disconnect one of them (and it does not matter which one), firefox does not crash on reload.

Attached file sway debug log
Attached file crash2.log

Added output of WAYLAND_DEBUG=1 firefox-trunk while crashing the FF session at the end via sway reload.

Flags: needinfo?(luis.pabon)

The same happens here, except the problem occurs even when there's only one keyboard affected by the input configuration:

input "1:1:AT_Translated_Set_2_keyboard" {
    xkb_options caps:swapescape
}

To be clear here, this report is about Sway compositor crash or Firefox crash?

Summary: [sway] On sway reload: crash with "Lost connection to Wayland compositor" → [sway][keyboard / input setup] On sway reload: crash with "Lost connection to Wayland compositor"

(In reply to Martin Stránský [:stransky] (ni? me) from comment #36)

To be clear here, this report is about Sway compositor crash or Firefox crash?

Firefox crashes. I've uploaded my crash log with WAYLAND_DEBUG=1.

Attached file ff_debug.log

Do you have any crashes at about:crashes? I don't see any particular crash info in the log.
Does 'coredumpctl list' command show any Firefox crash? (In case you use systemd).

No crashes listed in about:crashes, and coredumpctl list doesn't show any Firefox crash.

That means it's a wayland protocol violation (or at least Sway reports that) and Gtk quits Firefox for that.

Hi,
I am also facing the same issue using Sway 1.6.1 and Mozilla Firefox 93.0.
I commented this block of code in sway/config to prevent firefox process from crash on sway config reload.
input * {
xkb_layout "us"
xkb_variant "altgr-intl"
}

Not sway, but Tumbleweed/KDE/Nvidia. Starting Nightly or Release with MOZ_ENABLE_WAYLAND=1 and dragging a tab causes the browser to close and reports:

Gdk-Message: 13:59:25.835: Lost connection to Wayland compositor.

Exiting due to channel error

I have tried moving a tab for a few second and it caused a crash with the exact same log messages:

Gdk-Message: 15:15:19.857: Lost connection to Wayland compositor.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.

Try running firefox with WAYLAND_DEBUG=1 as well and collect the output. There will be a lot.

Attached file firefox_wld_dbg.log

I have just tested Firefox 93 on both Ubuntu (GNOME) and Kubuntu (KDE) under a VM and wayland GNOME does not appear to be suffering from this problem, which is weird, considering the fact that it is reproducible on both KDE and Sway

FWIW, on my setup, having more windows open seems to make it more likely to happen.

Firefox crashes for me on Sway reload when I am connected to multi monitors, I get the same error 'Lost connection to Wayland compositor.' This happens even without any inputs declared in sway config.

Seems to be related to https://github.com/swaywm/sway/issues/6654, though I can't say for sure that that's the only cause.

The crash isn't directly related to reload; I can easily reproduce this with the following command (having my keymap file at ~/.keymap.xkb):

sleep .5; swaymsg 'input * xkb_file ~/.keymap.xkb;input * xkb_file ~/.keymap.xkb;input * xkb_file ~/.keymap.xkb; [... repeat until enough to crash Firefox]'

(sleep .5 is unrelated; it's there so the key-up event of the return key is received by the terminal, otherwise the key is stuck during the command)

On my desktop I need 17 chained input commands.
On my laptop it already crashes with 8 chained input commands. I suspect this is because the laptop has a lot of internal type:keyboard devices.

This also works by repeating seat * hide_cursor 1000 or even output * subpixel none.

The number seems to stay the same if I have a lot of windows, be it Firefox or from other processes.

It doesn't seem related to timing: with a lot of non-Firefox windows open, the operation takes much longer, but the number of needed input commands doesn't change. Opening more windows of the same Firefox instance doesn't change the time.

I have a hacky workaround for the crash on reload, reducing input commands to specific devices and running some others in swaymsg:

# configure keyboard asynchronously with swaymsg to work around Firefox crash on reload
exec_always swaymsg input type:keyboard {
    repeat_delay 250
    repeat_rate 35
    xkb_file ~/.keymap.xkb
}
# cannot use xkb_numlock/xkb_capslock in swaymsg; use device identifier to reduce impact
input 1133:49948:Logitech_USB_Keyboard xkb_numlock enabled

exec_always swaymsg seat '*' hide_cursor 5000

I have an extensive configuration besides these commands, but only input, seat, and output commands seem to affect the crashes.

This is on Firefox 98.0.2, Sway 1.7, Arch Linux.

The reload causes sway to reconfigure all input devices. This in turn makes sway notify all clients about the input devices. This is one of the things that makes firefox choke and die.

The example given by @damoz reproduces something similar to the the "notify all clients about the input devices".

Each input command in sway's config file trigger one more message to clients (AFAIK, also passing layout descriptors).

My suspicion is that when reading these layout descriptors, Firefox blocks on the same thread that receives Wayland events. Debugging with WAYLAND_DEBUG=1 might yield better insight.

Little correction for the workaround above: If the keyboards aren't set up correctly when starting sway, try exec_always swaymsg input '*' { ... } instead of exec_always swaymsg input type:keyboard { ... }.

Here maybe a more useful set of messages to narrow down the crash:

(crashreporter:8105): Gdk-DEBUG: 09:16:14.436: Compositor prefers decoration mode 'server'ok 
(crashreporter:8105): GLib-DEBUG: 09:16:14.436: unsetenv() is not thread-safe and should not be used after threads are created
(crashreporter:8105): GLib-GIO-DEBUG: 09:16:14.437: _g_io_module_get_default: Found default implementation local (GLocalVfs) for ‘gio-vfs’
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.478: gdk_pixbuf_from_pixdata() called on:
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.478: 	Encoding raw
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.478: 	Dimensions: 14 x 14
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.478: 	Rowstride: 56, Length: 808
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.478: 	Copy pixels == false
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.487: gdk_pixbuf_from_pixdata() called on:
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.487: 	Encoding raw
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.487: 	Dimensions: 14 x 14
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.487: 	Rowstride: 56, Length: 808
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.487: 	Copy pixels == false
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.492: gdk_pixbuf_from_pixdata() called on:
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.492: 	Encoding raw
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.492: 	Dimensions: 14 x 14
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.492: 	Rowstride: 56, Length: 808
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.492: 	Copy pixels == false
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.504: gdk_pixbuf_from_pixdata() called on:
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.504: 	Encoding raw
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.504: 	Dimensions: 14 x 14
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.504: 	Rowstride: 56, Length: 808
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:14.504: 	Copy pixels == false
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.209: gdk_pixbuf_from_pixdata() called on:
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.209: 	Encoding raw
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.209: 	Dimensions: 14 x 14
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.209: 	Rowstride: 56, Length: 808
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.209: 	Copy pixels == false
Failed to open curl lib from binary, use libcurl.so instead
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.289: gdk_pixbuf_from_pixdata() called on:
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.289: 	Encoding raw
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.289: 	Dimensions: 14 x 14
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.289: 	Rowstride: 56, Length: 808
(crashreporter:8105): GdkPixbuf-DEBUG: 09:16:35.289: 	Copy pixels == false

Finally! Thanks for the update. Works great for me now!

Fixed by Sway, see comment 55. If this occurs again it may be worthwhile to refile but it might have a different root cause.

Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: