Closed Bug 1654658 Opened 2 years ago Closed 1 year ago

Crash in XGetWindowAttributes / GtkCompositorWidget::GtkCompositorWidget

Categories

(Core :: Graphics: WebRender, defect)

Desktop
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr78 --- unaffected
firefox90 --- affected
firefox91 --- affected
firefox92 --- affected

People

(Reporter: aosmond, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

This bug is for crash report bp-016f1069-c4fe-47e3-b937-140a00200722.

Top 10 frames of crashing thread:

0 libX11.so.6 XGetWindowAttributes 
1 libxul.so mozilla::widget::GtkCompositorWidget::GtkCompositorWidget widget/gtk/GtkCompositorWidget.cpp:42
2 libxul.so mozilla::widget::CompositorWidgetParent::CompositorWidgetParent widget/gtk/CompositorWidgetParent.cpp:16
3 libxul.so mozilla::layers::CompositorBridgeParent::AllocPCompositorWidgetParent gfx/layers/ipc/CompositorBridgeParent.cpp:2223
4 libxul.so mozilla::layers::PCompositorBridgeParent::OnMessageReceived ipc/ipdl/PCompositorBridgeParent.cpp:1100
5 libxul.so mozilla::layers::PCompositorManagerParent::OnMessageReceived ipc/ipdl/PCompositorManagerParent.cpp:197
6 libxul.so mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2074
7 libxul.so mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1953
8 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
9 libxul.so mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:302

Bug 1603839 (and bug 1633462) was fixed on a related crash, but we still see this with X11 + GPU process.

The first build with the signature is 20200424114754 which bug 1603839 landed in.

We don't see this without the GPU process, and things will work slightly different without it. We are passed in a non-nullptr nsWindow, and we get the mXDisplay from there. Given mXWindow is an ID that it probably (?) verifies, then I assume this line gives us a nullptr:

https://searchfox.org/mozilla-central/rev/dcd9c2d2bc19d96d487825eb70c2333a4d60994e/widget/gtk/GtkCompositorWidget.cpp#36

Assuming that is true, disabling the GPU process should work around these crashes. We need to monitor this when bug 1653443 lands.

Blocks: gpu-process-linux-x11
No longer blocks: wr-linux-mvp

Alexandre, this crash signature (now [@ libglib-2.0.so.0@0x5f0f8] instead of [@ XGetWindowAttributes]) spiked starting in Nightly 91. Could this recent spike be a regression from Linux semi-headless mode (bug 1635451)?

GtkCompositorWidget::GtkCompositorWidget is crashing deep instead a call to XGetWindowAttributes(DefaultXDisplay(), mXWindow, &windowAttrs) here:

https://hg.mozilla.org/mozilla-central/file/83f4bfe5ea71990cf8f4ac5aa0f05b7061289d71/widget/gtk/GtkCompositorWidget.cpp#l42

Crash report: https://crash-stats.mozilla.org/report/index/1ee95652-a060-445c-bbff-89a950210713

Reason: SIGTRAP

Top 10 frames of crashing thread:

0 libglib-2.0.so.0 libglib-2.0.so.0@0x5f0f8 
1 libglib-2.0.so.0 libglib-2.0.so.0@0x5adf4 
2 libglib-2.0.so.0 libglib-2.0.so.0@0x5aff0 
3 libgdk-3.so.0 libgdk-3.so.0@0x919d6 
4 libX11.so.6 libX11.so.6@0x43a34 
5 libX11.so.6 libX11.so.6@0x406a7 
6 libX11.so.6 libX11.so.6@0x40744 
7 libX11.so.6 libX11.so.6@0x417a4 
8 libX11.so.6 libX11.so.6@0x277b9 
9 libX11.so.6 libX11.so.6@0x2792a 
Crash Signature: [@ XGetWindowAttributes] → [@ XGetWindowAttributes] [@ libglib-2.0.so.0@0x5f0f8]
Flags: needinfo?(lissyx+mozillians)
See Also: → semi-headless

(In reply to Chris Peterson [:cpeterson] from comment #3)

Alexandre, this crash signature (now [@ libglib-2.0.so.0@0x5f0f8] instead of [@ XGetWindowAttributes]) spiked starting in Nightly 91. Could this recent spike be a regression from Linux semi-headless mode (bug 1635451)?

GtkCompositorWidget::GtkCompositorWidget is crashing deep instead a call to XGetWindowAttributes(DefaultXDisplay(), mXWindow, &windowAttrs) here:

https://hg.mozilla.org/mozilla-central/file/83f4bfe5ea71990cf8f4ac5aa0f05b7061289d71/widget/gtk/GtkCompositorWidget.cpp#l42

Crash report: https://crash-stats.mozilla.org/report/index/1ee95652-a060-445c-bbff-89a950210713

Reason: SIGTRAP

Top 10 frames of crashing thread:

0 libglib-2.0.so.0 libglib-2.0.so.0@0x5f0f8 
1 libglib-2.0.so.0 libglib-2.0.so.0@0x5adf4 
2 libglib-2.0.so.0 libglib-2.0.so.0@0x5aff0 
3 libgdk-3.so.0 libgdk-3.so.0@0x919d6 
4 libX11.so.6 libX11.so.6@0x43a34 
5 libX11.so.6 libX11.so.6@0x406a7 
6 libX11.so.6 libX11.so.6@0x40744 
7 libX11.so.6 libX11.so.6@0x417a4 
8 libX11.so.6 libX11.so.6@0x277b9 
9 libX11.so.6 libX11.so.6@0x2792a 

I'm not sure, most of the crashes from this signature are reported with buildid 20210705095222, according to https://crash-stats.mozilla.org/signature/?product=Firefox&signature=libglib-2.0.so.0%400x5f0f8&date=%3E%3D2021-07-07T15%3A23%3A00.000Z&date=%3C2021-07-14T15%3A23%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#reports

However,

Flags: needinfo?(lissyx+mozillians)

This is the changelog for the buildid 20210705095222 and build before it:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fac132d8859f8970bd2b754cf923255e87f3c4f6&tochange=095ea66cb9a1dc25f2abb79087dddf6b1b74102f

@ Martin, might your DMABufSurface changes in bug 1712588 have caused an increase in these GtkCompositorWidget crashes calling XGetWindowAttributes?

Flags: needinfo?(stransky)
See Also: semi-headless

(In reply to Chris Peterson [:cpeterson] from comment #5)

This is the changelog for the buildid 20210705095222 and build before it:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fac132d8859f8970bd2b754cf923255e87f3c4f6&tochange=095ea66cb9a1dc25f2abb79087dddf6b1b74102f

@ Martin, might your DMABufSurface changes in bug 1712588 have caused an increase in these GtkCompositorWidget crashes calling XGetWindowAttributes?

No, this is not related.

Flags: needinfo?(stransky)

It's unfortunate we don't have call stack from local libraries. Is there any way how to reproduce it locally?

(In reply to Martin Stránský [:stransky] (ni? me) from comment #7)

It's unfortunate we don't have call stack from local libraries.

@ Gabriele, I thought I read that we now have debug symbols for popular Linux distros' libraries. If so, which distros are those? All of this bug's libglib-2.0.so.0 crashes are from the Arch Linux and Manjaro Linux (a fork of Arch Linux).

Is there any way how to reproduce it locally?

Unfortunately, I don't know of STR. None of the crash reports have user comments about what they were doing. The crash reports' URLs are pretty random, just the usual sites with no patterns.

I suspect this is a regression in Arch Linux's glib. All these crashes are on variants of Arch Linux. They all started on 2021-06-16 on all Firefox channels at the same time (Nightly 91, Beta 90, and Release 89), even though there were no crash reports from Nightly 91 or Beta 89.

Flags: needinfo?(gsvelto)
See Also: → 1722361

(In reply to Chris Peterson [:cpeterson] from comment #8)

@ Gabriele, I thought I read that we now have debug symbols for popular Linux distros' libraries. If so, which distros are those? All of this bug's libglib-2.0.so.0 crashes are from the Arch Linux and Manjaro Linux (a fork of Arch Linux).

Yes, unfortunately Arch does not provide debug packages, you have to build the package locally to get them and I never could find the time to do that. We used to scrape the public symbols from their libraries but since we replaced dump_syms we haven't been able to due to a limitation in the new tool (not that they were particularly useful but at the least they caused crashes to clump together under a signature).

Unfortunately, I don't know of STR. None of the crash reports have user comments about what they were doing. The crash reports' URLs are pretty random, just the usual sites with no patterns.

I found an interesting comment in this crash:

playing a video, opened a new tab, attempted to detach the video tab and move it to other monitor

Which lead me to this signature on Debian's ESR build: @ handle_response | XShapeGetRectangles
. The stack trace is similar (though without having Arch symbols I can't be sure) and the comments point to a similar issue:

it is someething to do with dragging the tab out im on MATE desktop environment on debian with multiple monitors thanks for trying to fix it

i tried to drag a tab out of firefox

Crashes when draggin anything. Loaded defaul google page. Click-dragged google logo.

Anytime I drag/drop anything. A link, a tab, anything. The browser crashes.

And so on... It could be a different issue but I'd start trying to reproduce the first comment about detaching tabs with playing video in order to repro.

Flags: needinfo?(gsvelto)

There's another thing that all the crashes Arch seem to have in common: they're on X11, none of them are using Wayland.

Searched a bit for XGetWindowAttributes:
Latest GPU process XGetWindowAttributes crash seems to be bp-449eccc2-2b1d-48eb-92bc-539310210824 with version 82.0b9.


(Chris Peterson [:cpeterson] from comment #3)

Crash report: https://crash-stats.mozilla.org/report/index/1ee95652-a060-445c-bbff-89a950210713

[@ libglib-2.0.so.0@0x5f0f8 ]

That one has:

GraphicsCriticalError |[G0][GFX1-]: Failed to create EGLSurface!: 0x3009 (t=1.05895) |[G1][GFX1-]: Failed to create EGLSurface (t=1.05898)

GLX is still the default.
If I (non-programmer) understand correctly, MOZ_X11_EGL=1/proprietary Nvidia must have perfectly matching visual between X11 and EGL, otherwise it crashes. Mesa does not seem this strict.
Example of such a crash on X11 Mate desktop environment: bug 1677314
Nvidia driver 470 has many improvements for EGL/Dmabuf/Xwayland/Wayland.
EGL/X11 XFCE desktop/proprietary Nvidia seems(!) to work fine now: https://bug1729900.bmoattachments.org/attachment.cgi?id=9240298

This change was required for WebRender in the GPU process and OOP Webextensions: https://hg.mozilla.org/mozilla-central/rev/bb6817615317
It seems windowAttrs could be used with the struct's default values (=no visual?) if XGetWindowAttributes() fails.

(In reply to Darkspirit from comment #11)

Latest GPU process XGetWindowAttributes crash seems to be bp-449eccc2-2b1d-48eb-92bc-539310210824 with version 82.0b9.

Correction, there are also:
bp-bc4f3b47-71fa-4f73-93ce-6f8080210503 90.0a1 Ubuntu 20.10

Adapter Vendor ID Intel Corporation (0x8086)
App Notes WR! WR+ EGL? EGL- GL Context? GL Context+
KDE Connect
"screenWidth": 1600,
"screenHeight": 900

e.g. Visual problem on non-composited KDE or something like that? Or multitouch tablet with X11 thread safety problem?
(bp-1ee95652-a060-445c-bbff-89a950210713 [@ libglib-2.0.so.0@0x5f0f8 ] from comment 3 also has "GL Context+", but mentions "Failed to create EGLSurface", while this one does not.)

bp-5b9b65bc-e2c6-492b-bf6a-8979d0210429 90.0a1 Ubuntu 20.10

Adapter Vendor ID Intel Corporation (0x8086)
App Notes WR! WR+ EGL? EGL- GL Context? GL Context+
GraphicsCriticalError |[G0][GFX1-]: TOpAddBlobImage failed (t=313362) <--------------------------
KDE Connect
"screenWidth": 1600,
"screenHeight": 900

Please re-test with latest nightly, we remove outdated XWindow from GtkCompositorWidget now.

Should be fixed now.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.