Closed Bug 1606751 Opened 5 years ago Closed 5 years ago

Firefox crashes due to wayland display returning invalid argument 22

Categories

(Core :: Widget: Gtk, defect, P2)

73 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- unaffected
firefox73 --- fixed
firefox74 --- fixed

People

(Reporter: nagisa, Assigned: stransky)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Starting wayland firefox from today (2020-01-02) with sway from git master (sway version 1.2-d510684c (Jan 2 2020, branch 'master')) starts up fine but fairly quickly – within a couple of seconds – "crashes". Firefox does not consider this a crash in a typical sense of the crash.

The std output has this:

(firefox:12834): Gtk-WARNING **: 00:12:20.450: Loading IM context type 'xim' failed
Gdk-Message: 00:12:27.125: Error 22 (Invalid argument) dispatching to Wayland display.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
[GFX1-]: Receive IPC close with reason=AbnormalShutdown
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
[GFX1-]: Receive IPC close with reason=AbnormalShutdown
Exiting due to channel error.
Exiting due to channel error.

I have upgraded from a fairly old version of nightly, but I also upgraded sway at the same time, so I cannot really rule out sway being the issue. OTOH I do not see the same problem with other gtk applications.

Feel free to ask for additional information.

Will test with a clean profile in a moment.

Looks like launching with --ProfileManager prevents this issue from surfacing. I’ll investigate more later.

I found the same today after upgrading both sway/wlroots and firefox-nightly.

I am seeing several different log patterns before exit, which is strange.
3 concurrent sessions each exited at startup but with different output:

❯ firefox-nightly 
Gdk-Message: 21:15:53.194: Error 22 (Invalid argument) dispatching to Wayland display.
Exiting due to channel error.
[GFX1-]: Receive IPC close with reason=AbnormalShutdown
Exiting due to channel error.
                                                                                                                                                                                 
~   6034s
❯ firefox-nightly
ExceptionHandler::GenerateDump cloned child 24956
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
###!!! [Parent][RunMessage] Error: Channel error: cannot send/recv
###!!! [Parent][RunMessage] Error: Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x37006D,name=PContent::Msg_SuspendInputEventQueue) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x37006B,name=PContent::Msg_FlushInputEventQueue) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x37006C,name=PContent::Msg_ResumeInputEventQueue) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x37004D,name=PContent::Msg_Shutdown) Channel error: cannot send/recv
###!!! [Parent][RunMessage] Error: Channel error: cannot send/recv
###!!! [Parent][RunMessage] Error: Channel error: cannot send/recv

Gdk-Message: 21:16:52.146: Error 71 (Protocol error) dispatching to Wayland display.
[GFX1-]: Receive IPC close with reason=AbnormalShutdown
Exiting due to channel error.

i found that running with MOZ_ENABLE_WAYLAND=0 firefox-nightly works for now.

Can you please run firefox with WAYLAND_DEBUG=1 env variable set with wayland enabled and attach the log here?
Thanks.

Flags: needinfo?(simonas+bugzilla.mozilla.org)
Priority: -- → P3

Attempts to reproduce with WAYLAND_DEBUG=1 results in https://crash-stats.mozilla.org/report/index/89ead2cf-c245-4c2d-8b68-1c5e90200103

Flags: needinfo?(simonas+bugzilla.mozilla.org)

Downgrading firefox to nightly from 2019-12-09 fixes the issue.

Therefore, the issue started occuring between 2019-12-09 (works fine) and 2020-01-03 (fails).

Here are a couple more of identical crashes when attempting to debug with WAYLAND_DEBUG=1:

https://crash-stats.mozilla.org/report/index/0d5ac7e2-ea83-4f18-8ce4-a37020200103
https://crash-stats.mozilla.org/report/index/bp-f02eddfb-da52-4616-b3e5-c23c30200103
https://crash-stats.mozilla.org/report/index/bp-3318e8b2-fef3-47d0-bfe0-adb1c0200103
https://crash-stats.mozilla.org/report/index/bp-33b9a2f7-b8cc-41f6-8dad-52c0b0200103

Not sure if it will be helpful at all, here’s the log collected for a session which ultimately resulted in one of the wl_log_set_handler crashes. Not sure if or how it is relevant to the original issue, but here goes.

Thanks, the backtraces are clear, it's a problem with setting an opaque region.

Assignee: nobody → stransky

I suspect this is a sway bug when null opaque region is set. I filed https://github.com/swaywm/sway/issues/4875 for further work.

Priority: P3 → P2

Can you try to disable webrender, i.e. run Firefox with basic compositor?
Set gfx.webrender.force-disabled to true at about:config and restart Firefox.
Thanks.

Flags: needinfo?(simonas+bugzilla.mozilla.org)

Setting gfx.webrender.force-disabled does not make this issue go away. I think it is already disabled by default on my machine anyway, because, as per about:suppot, WEBRENDER_QUALIFIED blocked-device-too-old by env: Device too old.

Flags: needinfo?(simonas+bugzilla.mozilla.org)

I tried Sway on my Fedora 31 box but I can't reproduce it with latest nightly.

I can reproduce it now. It's because we use already released region.

It can be reproduced reliably when doing drag & drop operations.

It's really a multi-thread issue (https://bugzilla.mozilla.org/show_bug.cgi?id=1606848#c2), there's a log from it:

[(null) 69489: Main Thread]: D/WidgetWayland moz_gtk_widget_get_wl_surface [0x7fffdc246a60] wl_surface 0x7fffdc052060 ID 44

[(null) 69489: Main Thread]: D/Widget nsWindow::UpdateTopLevelOpaqueRegionWayland()
[2031351.728] -> wl_compositor@33.create_region(new id wl_region@110)
[2031351.738] -> wl_region@110.add(26, 23, 960, 1020)
[2031351.747] -> wl_surface@44.set_opaque_region(wl_region@110)
[2031351.752] -> wl_region@110.destroy()
[2031351.758] -> wl_compositor@33.create_region(new id wl_region@108)
[2031351.793] -> wl_region@108.add(0, 0, 960, 1020)

[(null) 69489: Compositor]: D/WidgetWayland moz_container_get_wl_surface [0x7fffdc2dc830] surface 0x7fffd92f6510 ready_to_draw 1
wl_surface_set_opaque_region id 107 0x7fffc93d51f0
moz_container_set_opaque_region region id 108 0x7fffcc217f60 BEGIN
[2031351.902] -> wl_region@107.destroy()
moz_container_set_opaque_region region id END, new region is 0x7fffcc217f60
[(null) 69489: Main Thread]: D/Widget END nsWindow::UpdateTopLevelOpaqueRegionWayland() END
Compositor]: D/WidgetWayland moz_container_get_wl_surface

-> We're at compositor thread (moz_container_get_wl_surface) while opaque region is updated from main thread (UpdateTopLevelOpaqueRegionWayland()).

[2031351.943] -> wl_surface@61.set_opaque_region(
Thread 25 "Compositor" received signal SIGSEGV, Segmentation fault.

See Also: → 1538435
Blocks: wayland
Regressed by: 1605120
Has Regression Range: --- → yes

As seen in bug 1606848, this is not restricted to Sway but also affects GNOME.

nsWindow::UpdateOpaqueRegion() is used from Main thread and it collides with
moz_container_get_wl_surface() where opaque region is used and which is called from Compositor thread.

As a fix don't set opaque region directly for mozcontainer but rather just use a flag to signalize
there's an update needed and calculare/set the opaque region at moz_container_get_wl_surface() or
moz_container_egl_window_set_size().

No longer blocks: wayland-sway
Pushed by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9b54914b2037
[Wayland] Manage opaque region of mozcontainer internally, r=heftig
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

Hi Martin, does this need a Beta uplift request for 73?

Flags: needinfo?(stransky)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #22)

Hi Martin, does this need a Beta uplift request for 73?

Yes please. I'll file the uplift request.

Comment on attachment 9118925 [details]
Bug 1606751 [Wayland] Manage opaque region of mozcontainer internally, r?heftig

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes on Wayland backend caused by concurrent writes to mozcontainer.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Linux/Wayland only.
  • String changes made/needed: none
Flags: needinfo?(stransky)
Attachment #9118925 - Flags: approval-mozilla-beta?

Comment on attachment 9118925 [details]
Bug 1606751 [Wayland] Manage opaque region of mozcontainer internally, r?heftig

Wayland crash fix. Approved for 73.0b3.

Attachment #9118925 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: