Crash in [@ shutdownhang | __futex_abstimed_wait_common64]
Categories
(Core :: XPCOM, defect)
Tracking
()
People
(Reporter: matt.fagnani, Unassigned)
References
Details
(Keywords: hang, Whiteboard: [QA-not-reproducible][tbird crash])
Crash Data
I updated to Firefox Nightly 90.0a1 (2021-4-20) in a Fedora 34 KDE Plasma installation. I started Firefox Nightly 90.0a1 (2021-4-20) on Wayland in Plasma 5.21.4 on Wayland. FIrefox started on X due to the errors I reported at https://bugzilla.mozilla.org/show_bug.cgi?id=1706452 I closed Firefox. I tried to open Firefox again, but a message stating that Firefox was still running was shown. The crash reporter appeared about a minute after I closed Firefox. The reason for the crash was "Shutdown hanging after all known phases and workers finished." This crash usually doesn't happen. I've only seen a shutdown crash with that reason once.
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/7520dc48-86ec-465d-8a90-2a4150210420
MOZ_CRASH Reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
Top 10 frames of crashing thread:
0 libpthread.so.0 __futex_abstimed_wait_common64 /usr/src/debug/glibc-2.33/sysdeps/nptl/futex-internal.c:74
1 libpthread.so.0 __pthread_cond_wait /usr/src/debug/glibc-2.33/nptl/pthread_cond_wait.c:619
2 firefox-bin mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_posix.cpp:108
3 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1093
4 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:548
5 libxul.so nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:420
6 libxul.so mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:653
7 libxul.so ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1668
8 libxul.so XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:5556
9 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:5598
Im unable to reproduce this since i only have Ubuntu 20, tried to update to nightly 90.0a1 and saw no issues
Comment 2•4 years ago
|
||
This is not a new crash signature, but its crash volume has increased starting in 88 and 89:
The Fission Nightly experiment increased from ~20% Fission to ~40% Fission in 88 Nightly. We expect to see more shutdown hangs and ShutDownKills with Fission, not because Fission is causing more but simply because Fission launches (and shuts down) more content processes. So I suspect this increase in crash volume is correlated with Fission, but not caused by Fission.
Comment 3•4 years ago
|
||
The crash reports being referred to in this bug have unfortunately expired, and we never got around to analyzing them. I'm closing this one out as INCOMPLETE.
Comment 4•4 years ago
|
||
Here's a more current search.
Volume is really low. cpeterson is this worth worrying about or should we keep closed as incomplete?
Updated•4 years ago
|
Comment 5•4 years ago
|
||
(In reply to Mike Kaply [:mkaply] from comment #4)
Volume is really low. cpeterson is this worth worrying about or should we keep closed as incomplete?
I think we should keep a crash bug open as long as we are still receiving crash reports for it, since the cause hasn't been resolved. Here's a crash report from Nightly 96:
bp-094ff203-d415-4e5a-b866-99aea0211103
I see the crash reason is MOZ_CRASH(Shutdown hanging after all known phases and workers finished.), which is the same as a intermittent test crash bug 1719400. The users and tests might be hanging for different reasons, so these bugs aren't necessarily duplicates.
Updated•4 years ago
|
Comment 6•4 years ago
|
||
I looked at my crash reports and found an old one which has the signature [@ shutdownhang | __futex_abstimed_wait_common64 ] linked to this bug, see here:
https://crash-stats.mozilla.org/report/index/4c04f911-696a-47c7-8f10-a40f10210517
It doesn't list DOMFissionEnabled = 1 in the crash annotations and the telemetry environment says fissionEnabled: false, so I guess this bug isn't related to Fission and you might want to change the whiteboard tag.
I assume that the report will soon be deleted, so here is some of its information:
MOZ_CRASH Reason (Sanitized): MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
Top 10 frames of crashing thread:
0 libpthread.so.0 __futex_abstimed_wait_common64
1 libpthread.so.0 __pthread_cond_wait
2 firefox-bin mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) mozglue/misc/ConditionVariable_posix.cpp:108
3 libxul.so nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1093
4 libxul.so nsThreadManager::Shutdown() xpcom/threads/nsThreadManager.cpp:420
5 libxul.so mozilla::ShutdownXPCOM(nsIServiceManager*) xpcom/build/XPCOMInit.cpp:655
6 libxul.so ScopedXPCOMStartup::~ScopedXPCOMStartup() toolkit/xre/nsAppRunner.cpp:1674
7 libxul.so XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:5582
8 libxul.so XRE_main(int, char**, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:5624
9 firefox-bin main browser/app/nsBrowserApp.cpp:351
Comment 7•3 years ago
|
||
Still very rare...
Thunderbird 101.0a1 shutdownhang | __futex_abstimed_wait_common64 bp-67ba2198-fbfe-41a0-82e8-680ce0220419
Firefox 101.0a1 shutdownhang | __futex_abstimed_wait_common64 bp-3f94d5fc-6a22-4b3a-97db-5f1aa0220430
Comment 8•3 years ago
|
||
Unfortunately this stack is quite bad, as the __futex_abstimed_wait_common64 stack doesn't rell us anything about why the futex is being waited on. There are a bunch of different failures like bug 1782445, which also have bad signatures.
:jstutte, do we have a common bug we can/should be duping these to? We might also want to add these calls to the prefix list so that they're split into more useful hang stacks.
Comment 9•3 years ago
|
||
The most helpful way to facet these is through xpcom spin event loop stack, I think. Not sure if we can automate this meaningful, though.
I did not check all reports, but the bug to use is probably bug 1505660, with a signature like
[@ shutdownhang | mozilla::SpinEventLoopUntil | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads ]
:gsvelto, any ideas how we can prefix these better?
Comment 10•3 years ago
|
||
If xpcom spin event loop stack is a better fit than the actual stack then we could use that for the crash signature for shutdown hangs. That is we could make this crash signature go from [@ shutdownhang | __futex_abstimed_wait_common64] to shutdownhang | default: CompositorThreadHolder::Shutdown and this one similarly turn into shutdownhang | default: ThreadEventTarget::Dispatch. Would this be better? What should we use for crashes where the annotation is empty like this one?
Comment 11•3 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #10)
That is we could make this crash signature go from
[@ shutdownhang | __futex_abstimed_wait_common64]toshutdownhang | default: CompositorThreadHolder::Shutdownand this one similarly turn intoshutdownhang | default: ThreadEventTarget::Dispatch. Would this be better? What should we use for crashes where the annotation is empty like this one?
I think we should pay attention to look only at reports that have nsThread::Shutdown on the stack here and make those point to bug 1505660. The rest are different cases, as you can see also from the shutdown phase in MOZ_CRASH reason. Those with nsThreadShutdown should all have a SpinEventLoopUntil on the stack. The rest need further analysis.
If we then want to use xpcom spin event loop stack we should revisit also some signatures that already arrive on bug 1505660. So probably the first step is to have a signature that makes it easier to assign them to bug 1505660 to have a common starting point for further analysis.
Comment 12•3 years ago
|
||
I can make some changes to "peel off" the crashes that should fall under bug 1505660 from this signature. Those changes would alter the other crashes under this signature too. Here's some examples of the new signatures:
- 0cd0c04a-33bd-49d0-9e0f-4a8bb0221211
[@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::layers::CompositorThreadHolder::Shutdown] - ee69c3db-1d9a-4ddc-9e2d-5436a0221116
[@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::ThreadEventTarget::Dispatch] - d29f54de-9b38-4d0a-bc05-768570221214
[@ shutdownhang | mozilla::DataStorage::Observe] - d575cb18-0625-4ea9-b03f-4c4950221214
[@ shutdownhang | mozilla::SpinEventLoopUntil<T> | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads]
The last one would fall under bug 1505660. Would this work for you?
Comment 13•3 years ago
|
||
This looks like a good improvement, thanks!
Comment 14•3 years ago
|
||
Now that 1806107 has landed the signatures here should disappear, and break away into separate ones like I described in comment 12.
Comment 15•3 years ago
|
||
The patch in bug 1806107 was insufficient to clean up the signatures, I'll file another bug.
Description
•