Crash in [@ mozilla::detail::MutexImpl::lock | mozilla::layers::NativeSurfaceWayland::FrameCallbackHandler]
Categories
(Core :: Graphics: WebRender, defect, P4)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox89 | --- | unaffected |
firefox90 | --- | unaffected |
firefox91 | --- | fixed |
People
(Reporter: nical, Assigned: rmader)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
I'm getting a lot of crashes with nightly with wayland enabled (MOZ_ENABLE_WAYLAND=1
env var) and wayland compositor (gfx.webrender.compositor.force-enabled
= true). I'm running on top of GNOME 40.1.0 (fedora).
I think I got it each time I tried to use the gecko profiler when showing the profiler interface.
Crash report: https://crash-stats.mozilla.org/report/index/326a7349-2f62-41ba-a390-1a7bc0210611
MOZ_CRASH Reason: MOZ_CRASH(mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed)
Top 10 frames of crashing thread:
0 firefox-bin mozilla::detail::MutexImpl::lock mozglue/misc/Mutex_posix.cpp:118
1 libxul.so mozilla::layers::NativeSurfaceWayland::FrameCallbackHandler gfx/layers/SurfacePoolWayland.cpp:167
2 libffi.so.6 libffi.so.6@0x6c03
3 libffi.so.6 libffi.so.6@0x6106
4 libwayland-client.so.0 libwayland-client.so.0@0x6d0f
5 libwayland-client.so.0 libwayland-client.so.0@0x742a
6 libwayland-client.so.0 libwayland-client.so.0@0x761b
7 libxul.so {virtual override thunk}
8 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1075
9 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:107
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
Thanks for testing! For some reason I haven't managed to reproduce that here. Do you happen to know when this condition can happen? Only if the mutex is already gone, or if the current thread already locked the mutex, or both?
Comment 2•4 years ago
|
||
I've seen these crashes while using YouTube. (I also have a boatload of pinned tabs, so that might not mean much.)
Comment 3•4 years ago
•
|
||
(In reply to Robert Mader [:rmader] from comment #1)
Thanks for testing! For some reason I haven't managed to reproduce that here. Do you happen to know when this condition can happen? Only if the mutex is already gone, or if the current thread already locked the mutex, or both?
Attempts to recursively lock a non-recursive mutex can be detected (PTHREAD_MUTEX_ERRORCHECK) but this is only enabled with --enable-debug
. Otherwise it will just deadlock.
Assignee | ||
Comment 4•4 years ago
|
||
(In reply to Jan Alexander Steffens [:heftig] from comment #3)
Attempts to recursively lock a non-recursive mutex can be detected (PTHREAD_MUTEX_ERRORCHECK) but this is only enabled with
--enable-debug
. Otherwise it will just deadlock.
Do I understand you correctly that a crash implies use after free then? As a recursive lock would just deadlock, not crash?
Comment 5•4 years ago
|
||
(In reply to Robert Mader [:rmader] from comment #4)
Do I understand you correctly that a crash implies use after free then? As a recursive lock would just deadlock, not crash?
Yes, I'm suspecting a use-after-free here.
Assignee | ||
Comment 6•4 years ago
|
||
Unlike buffer release callbacks, which can't happen after the
corresponding buffer was destroyed, frame callbacks can apparently
arrive even if the corresponding surface was destroyed.
This kinda makes sense as frame callbacks have independent objects
which actually can get destroyed manually.
Updated•4 years ago
|
Assignee | ||
Comment 7•4 years ago
|
||
Jan, Nico: I've still not been able to reproduce the issue but it would make sense if the patch above solves it. Could you try the following try build and check if it fixes the issues for you? That would be awesome!
https://treeherder.mozilla.org/jobs?repo=try&revision=178a6bceccac26c263486707dca7819da79b4dde
Comment 8•4 years ago
|
||
Seems I can get Nightly to crash quite often (but not reliably) by having having some animated page open (I used https://www.w3.org/2010/05/video/mediaevents.html with the video playing) and then rapidly opening (Ctrl+T) and closing (Ctrl+W) a new tab.
The try build crashed as well (bp-ae7e2b28-0a4d-4a7d-95cf-599400210618) but on a pthread_mutex_destroy.
Assignee | ||
Comment 9•4 years ago
|
||
Thanks for testing! Still unable to reproduce, but maybe I need a bigger screen like 4K or so to trigger it - then we produce way more tiles.
Can I ask you for one more try with the following build? https://treeherder.mozilla.org/jobs?repo=try&revision=541c8cc0ac3e0b81c3a83e007e20b954ce40991e
It waits for the lock in the destructor and thus should avoid crashing in pthread_mutex_destroy
.
Comment 10•4 years ago
•
|
||
Got a segfault: bp-42508c5e-d3bc-4719-8991-e57a20210619
PS: Haven't managed to reproduce this one yet. The STR from comment #8 no longer seem to work.
Comment 11•4 years ago
|
||
Thanks for testing! Still unable to reproduce, but maybe I need a bigger screen like 4K or so to trigger it - then we produce way more tiles.
You could try reducing gfx.webrender.picture-tile-height
and -width
?
Values too low make Firefox no longer start, though (Error flushing display: Resource temporarily unavailable
). For me (3840×2400@2), 256x256 still works and 128x128 does not. If large tile counts cause crashes, does this mean this could get triggered by the automatic tile subdivision?
Comment 12•4 years ago
|
||
@rmader Regarding maybe needing a bigger screen to produce more tiles, I experienced the crash multiple times and I am on FullHD, but use 75% scaling, so maybe scaling can help to reproduce it more reliably? Just a thought, feel free to ignore if not useful!
Assignee | ||
Comment 13•4 years ago
|
||
(In reply to Jan Alexander Steffens [:heftig] from comment #10)
Got a segfault: bp-42508c5e-d3bc-4719-8991-e57a20210619
PS: Haven't managed to reproduce this one yet. The STR from comment #8 no longer seem to work.
So that's a unrelated/new issue? And the patch seems to fix the issue here?
(In reply to Jan Alexander Steffens [:heftig] from comment #11)
You could try reducing
gfx.webrender.picture-tile-height
and-width
?Values too low make Firefox no longer start, though (
Error flushing display: Resource temporarily unavailable
). For me (3840×2400@2), 256x256 still works and 128x128 does not. If large tile counts cause crashes, does this mean this could get triggered by the automatic tile subdivision?
Thanks for the hint. I think it's more about the fact that more tiles get created and destroyed, making it more likely to hit the issue.
(In reply to pirminbraun16 from comment #12)
@rmader Regarding maybe needing a bigger screen to produce more tiles, I experienced the crash multiple times and I am on FullHD, but use 75% scaling, so maybe scaling can help to reproduce it more reliably? Just a thought, feel free to ignore if not useful!
Thanks for the hint as well :)
Comment 14•4 years ago
|
||
Comment 15•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Updated•4 years ago
|
Description
•