Closed Bug 1738066 Opened 3 years ago Closed 3 years ago

Crash in [@ mozilla::detail::MutexImpl::lock | mozilla::WaylandVsyncSource::WaylandDisplay::FrameCallback]

Categories

(Core :: Widget: Gtk, defect, P2)

Firefox 95
Unspecified
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1738026
Tracking Status
firefox95 --- affected

People

(Reporter: matt.fagnani, Assigned: stransky)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

I ran Firefox Nightly 95.0a1 (2021-10-26) on Wayland in Plasma 5.22.5 in a Fedora 35 KDE Plasma installation. I selected the Help menu then About Firefox. The About Firefox window showed Checking for updates... The internet connection had disconnected unexpectedly. I closed the About Firefox window. Firefox segmentation faulted in mozilla::detail::MutexImpl::lock at mozglue/misc/Mutex_posix.cpp:118. The crash reason was MOZ_CRASH(mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed). The crash address was 0x0 so a null pointer dereference might've happened. I reproduced this crash another time. I haven't seen this type of crash before.

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/642ce06c-520b-4566-9ec2-25b380211027

MOZ_CRASH Reason: MOZ_CRASH(mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed)

Top 10 frames of crashing thread:

0 firefox-bin mozilla::detail::MutexImpl::lock mozglue/misc/Mutex_posix.cpp:118
1 libxul.so mozilla::WaylandVsyncSource::WaylandDisplay::FrameCallback widget/gtk/WaylandVsyncSource.cpp:169
2 libffi.so.6 ffi_call_unix64 
3 libffi.so.6 ffi_call 
4 libwayland-client.so.0 wl_closure_invoke.constprop.0 /usr/src/debug/wayland-1.19.0-2.fc35.x86_64/src/connection.c:1018
5 libwayland-client.so.0 dispatch_event.isra.0 /usr/src/debug/wayland-1.19.0-2.fc35.x86_64/src/wayland-client.c:1452
6 libwayland-client.so.0 wl_display_dispatch_queue_pending /usr/src/debug/wayland-1.19.0-2.fc35.x86_64/src/wayland-client.c:1840
7 libgdk-3.so.0 _gdk_wayland_display_queue_events /usr/src/debug/gtk3-3.24.30-4.fc35.x86_64/gdk/wayland/gdkeventsource.c:201
8 libgdk-3.so.0 gdk_display_get_event /usr/src/debug/gtk3-3.24.30-4.fc35.x86_64/gdk/gdkdisplay.c:442
9 libgdk-3.so.0 gdk_event_source_dispatch.lto_priv.2 /usr/src/debug/gtk3-3.24.30-4.fc35.x86_64/gdk/wayland/gdkeventsource.c:120
Blocks: wayland
Keywords: crash

I've reproduced this crash 8 times by closing the About Firefox window while Checking for updates... was shown. The crash happened about 50% of the time, so a race condition might've been involved. When I ran 95.0a1 (2021-10-27) on Wayland from konsole, the output showed the WebRender compositor fell back to WebRender (Software) right before the crash with the error mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed: Invalid argument.

[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface
[GFX1-]: Fallback WR to SW-WR
mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed: Invalid argument
ExceptionHandler::GenerateDump cloned child 8507
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Failed to open curl lib from binary, use libcurl.so instead

I reproduced the crash twice with the ASan 95.0a1 (2021-10-27) from https://firefox-source-docs.mozilla.org/tools/sanitizer/asan_nightly.html which had some more graphics error details in the output before the crash.

Crash Annotation GraphicsCriticalError: |[0][GFX1-]: window is null (t=25.8439) [GFX1-]: window is null
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: window is null (t=25.8439) |[1][GFX1-]: Failed to create EGLSurface (t=25.8441) [GFX1-]: Failed to create EGLSurface
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: window is null (t=25.8439) |[1][GFX1-]: Failed to create EGLSurface (t=25.8441) |[2][GFX1-]: Fallback WR to SW-WR (t=25.8622) [GFX1-]: Fallback WR to SW-WR
mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed: Invalid argument
AddressSanitizer:DEADLYSIGNAL
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=24.6873)

Is that a recent regression? Can you please use mozregression tool to find which commit caused it?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Use_Mozregression_tool
(It may be a regression from Bug 1737068)

Flags: needinfo?(matt.fagnani)
Assignee: nobody → stransky

(In reply to Martin Stránský [:stransky] (ni? me) from comment #2)

Is that a recent regression? Can you please use mozregression tool to find which commit caused it?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Use_Mozregression_tool
(It may be a regression from Bug 1737068)

The first build I saw the crash in was 95.0a1 (2021-10-26) 20211026214417. I didn't normally close the About FIrefox window while Checking for updates was shown, so I might have not seen it before that. I just closed the About Firefox window while Checking for updates because the internet connection had disconnected unexpectedly. I ran MOZ_ENABLE_WAYLAND=1 mozregression --launch 2021-10-26 to try to reproduce the crash, but the About Firefox window showed Updates disabled by the system administrator. I tried MOZ_ENABLE_WAYLAND=1 mozregression --launch 2021-10-26 --pref app.update.enabled:true but the same message was shown. Given that the crash only happened when Checking for updates was shown, this prevented me from using mozregression normally to identify the changes involved. Do you know how to enable checking for updates while using mozregression?

The last build without the changes in Bug 1737068 was 20211025093729 according to https://hg.mozilla.org/mozilla-central/rev/00569d0fc9b9 I ran MOZ_ENABLE_WAYLAND=1 mozregression --launch 20211025093729 I decompressed ~/.mozilla/mozregression/persist/2021-10-25-09-37-29--mozilla-central--firefox-95.0a1.en-US.linux-x86_64.tar.bz2. I ran that build a few times, and the crash didn't happen when I closed About Firefox while Checking for updates was shown at least 10 times. I had to disconnect the internet connection before trying to reproduce the crash otherwise the updates would already be downloaded and applying by the time I opened About Firefox. I ran MOZ_ENABLE_WAYLAND=1 mozregression --launch 20211025214106 which was the first build with the changes. I decompressed ~/.mozilla/mozregression/persist/2021-10-25-21-41-06--mozilla-central--firefox-95.0a1.en-US.linux-x86_64.tar.bz2. The crash happened the first time with 20211025214106. Changes between 20211025093729 and 20211025214106 like those in Bug 1737068 might be the reason for this problem. ThreadSanitizer warnings about data races happened during the automated tests https://bugzilla.mozilla.org/show_bug.cgi?id=1737068#c7 which could be related to the crashes happening about 50% of the time. Thanks.

Flags: needinfo?(matt.fagnani)
Priority: -- → P2
Regressed by: 1737068
Has Regression Range: --- → yes
Keywords: regression

(In reply to Martin Stránský [:stransky] (ni? me) from comment #2)

Is that a recent regression? Can you please use mozregression tool to find which commit caused it?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Use_Mozregression_tool
(It may be a regression from Bug 1737068)

I ran MOZ_ENABLE_WAYLAND=1 mozregression --bad 20211025214106 --good 2021-10-24 , because if I used --good 2021-10-25 it used 20211025214106 for both good and bad and gave the error ERROR: The url 'https://hg.mozilla.org/mozilla-central/json-pushes?fromchange=6c01444e17210e96f5fb322c7b55a2e9f87ec0b0&tochange=6c01444e17210e96f5fb322c7b55a2e9f87ec0b0' contains no pushlog. Maybe use another range ?

I decompressed each bisected build that mozregression showed, and I tried to reproduce the crash at least several times with the internet connection disconnected before marking that build as good or bad. The bisection ended up with the first bad change as f632271e9d62ad6b7e90acd52e77d3e02a66980e from https://bugzilla.mozilla.org/show_bug.cgi?id=1737068 as you suggested.

21:17.13 INFO: Narrowed integration regression window from [334e3d59, 77e655ba] (3 builds) to [334e3d59, f632271e] (2 builds) (~1 steps left)
21:17.13 INFO: No more integration revisions, bisection finished.
21:17.13 INFO: Last good revision: 334e3d59d932dd2850dec7c582f32f49b504e450
21:17.13 INFO: First bad revision: f632271e9d62ad6b7e90acd52e77d3e02a66980e
21:17.13 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=334e3d59d932dd2850dec7c582f32f49b504e450&tochange=f632271e9d62ad6b7e90acd52e77d3e02a66980e

Can you please try latest nightly with Bug 1738026 fixed?
Thanks.

Flags: needinfo?(matt.fagnani)

Also if you can reproduce it with latest nightly, please run Firefox on terminal with MOZ_LOG="Widget:5" env variable and attach the log here (when it crashes).
Thanks.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #5)

Can you please try latest nightly with Bug 1738026 fixed?
Thanks.

I didn't see this crash with 20211030212731 and 20211031095403 after trying to reproduce it about 10 times each. The errors and fallback to SW-WR in comment 1 didn't appear with those builds either. Thanks.

Flags: needinfo?(matt.fagnani)

Okay, closing as dupe then.
Thanks.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.