1706500 - Crash in [@ shutdownhang | __futex_abstimed_wait_common64]

Reporter

Description

•

4 years ago

I updated to Firefox Nightly 90.0a1 (2021-4-20) in a Fedora 34 KDE Plasma installation. I started Firefox Nightly 90.0a1 (2021-4-20) on Wayland in Plasma 5.21.4 on Wayland. FIrefox started on X due to the errors I reported at https://bugzilla.mozilla.org/show_bug.cgi?id=1706452 I closed Firefox. I tried to open Firefox again, but a message stating that Firefox was still running was shown. The crash reporter appeared about a minute after I closed Firefox. The reason for the crash was "Shutdown hanging after all known phases and workers finished." This crash usually doesn't happen. I've only seen a shutdown crash with that reason once.

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/7520dc48-86ec-465d-8a90-2a4150210420

MOZ_CRASH Reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)

Top 10 frames of crashing thread:

0 libpthread.so.0 __futex_abstimed_wait_common64 /usr/src/debug/glibc-2.33/sysdeps/nptl/futex-internal.c:74
1 libpthread.so.0 __pthread_cond_wait /usr/src/debug/glibc-2.33/nptl/pthread_cond_wait.c:619
2 firefox-bin mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_posix.cpp:108
3 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1093
4 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:548
5 libxul.so nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:420
6 libxul.so mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:653
7 libxul.so ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1668
8 libxul.so XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:5556
9 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:5598

Pablo

Comment 1

•

4 years ago

Im unable to reproduce this since i only have Ubuntu 20, tried to update to nightly 90.0a1 and saw no issues

Whiteboard: QA-not-reproducible

Chris Peterson [:cpeterson]

Comment 2

•

4 years ago

This is not a new crash signature, but its crash volume has increased starting in 88 and 89:

https://crash-stats.mozilla.org/search/?signature=~__futex_abstimed_wait_common64&product=Firefox&date=%3E%3D2020-10-27T22%3A09%3A00.000Z&date=%3C2021-04-27T22%3A09%3A00.000Z&_facets=signature&_facets=version&_facets=dom_fission_enabled&_facets=platform&_facets=cpu_arch&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version

The Fission Nightly experiment increased from ~20% Fission to ~40% Fission in 88 Nightly. We expect to see more shutdown hangs and ShutDownKills with Fission, not because Fission is causing more but simply because Fission launches (and shuts down) more content processes. So I suspect this increase in crash volume is correlated with Fission, but not caused by Fission.

Crash Signature: [@ shutdownhang | __futex_abstimed_wait_common64] → [@ shutdownhang | __futex_abstimed_wait_common64] [@ IPCError-browser | ShutDownKill | __futex_abstimed_wait_common64]

status-firefox88: --- → affected

status-firefox89: --- → affected

status-firefox-esr78: --- → unaffected

Whiteboard: QA-not-reproducible → QA-not-reproducible [not-a-fission-bug]

Mike Conley (:mconley) (:⚙️)

Comment 3

•

4 years ago

The crash reports being referred to in this bug have unfortunately expired, and we never got around to analyzing them. I'm closing this one out as INCOMPLETE.

Status: UNCONFIRMED → RESOLVED

Closed: 4 years ago

Resolution: --- → INCOMPLETE

Mike Kaply [:mkaply]

Comment 4

•

4 years ago

Here's a more current search.

https://crash-stats.mozilla.org/search/?signature=~__futex_abstimed_wait_common64&product=Firefox&date=%3E%3D2021-05-01T22%3A09%3A00.000Z&date=%3C2021-11-04T22%3A09%3A00.000Z&_facets=signature&_facets=version&_facets=dom_fission_enabled&_facets=platform&_facets=cpu_arch&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version

Volume is really low. cpeterson is this worth worrying about or should we keep closed as incomplete?

Mike Kaply [:mkaply]

Updated

•

4 years ago

Flags: needinfo?(cpeterson)

Chris Peterson [:cpeterson]

Comment 5

•

4 years ago

(In reply to Mike Kaply [:mkaply] from comment #4)

Volume is really low. cpeterson is this worth worrying about or should we keep closed as incomplete?

I think we should keep a crash bug open as long as we are still receiving crash reports for it, since the cause hasn't been resolved. Here's a crash report from Nightly 96:

bp-094ff203-d415-4e5a-b866-99aea0211103

I see the crash reason is MOZ_CRASH(Shutdown hanging after all known phases and workers finished.), which is the same as a intermittent test crash bug 1719400. The users and tests might be hanging for different reasons, so these bugs aren't necessarily duplicates.

Status: RESOLVED → REOPENED

status-firefox93: --- → affected

status-firefox94: --- → affected

status-firefox95: --- → affected

status-firefox96: --- → affected

status-firefox-esr78: unaffected → affected

status-firefox-esr91: --- → affected

Ever confirmed: true

Flags: needinfo?(cpeterson)

Keywords: hang

Product: Firefox → Core

Resolution: INCOMPLETE → ---

Updated

•

4 years ago

Component: General → XPCOM

Viktor Jägersküpper

Comment 6

•

4 years ago

I looked at my crash reports and found an old one which has the signature [@ shutdownhang | __futex_abstimed_wait_common64 ] linked to this bug, see here:
https://crash-stats.mozilla.org/report/index/4c04f911-696a-47c7-8f10-a40f10210517

It doesn't list DOMFissionEnabled = 1 in the crash annotations and the telemetry environment says fissionEnabled: false, so I guess this bug isn't related to Fission and you might want to change the whiteboard tag.

I assume that the report will soon be deleted, so here is some of its information:

MOZ_CRASH Reason (Sanitized): MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)

Top 10 frames of crashing thread:

0 	libpthread.so.0 	__futex_abstimed_wait_common64
1 	libpthread.so.0 	__pthread_cond_wait
2 	firefox-bin 	mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) 	mozglue/misc/ConditionVariable_posix.cpp:108
3 	libxul.so 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1093
4 	libxul.so 	nsThreadManager::Shutdown() 	xpcom/threads/nsThreadManager.cpp:420
5 	libxul.so 	mozilla::ShutdownXPCOM(nsIServiceManager*) 	xpcom/build/XPCOMInit.cpp:655
6 	libxul.so 	ScopedXPCOMStartup::~ScopedXPCOMStartup() 	toolkit/xre/nsAppRunner.cpp:1674
7 	libxul.so 	XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:5582
8 	libxul.so 	XRE_main(int, char**, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:5624
9 	firefox-bin 	main 	browser/app/nsBrowserApp.cpp:351

Wayne Mery (:wsmwk)

Comment 7

•

3 years ago

Still very rare...

Thunderbird 101.0a1 shutdownhang | __futex_abstimed_wait_common64 bp-67ba2198-fbfe-41a0-82e8-680ce0220419
Firefox 101.0a1 shutdownhang | __futex_abstimed_wait_common64 bp-3f94d5fc-6a22-4b3a-97db-5f1aa0220430

Whiteboard: QA-not-reproducible → [QA-not-reproducible][tbird crash]

Nika Layzell [:nika] (ni? for response)

Comment 8

•

3 years ago

Unfortunately this stack is quite bad, as the __futex_abstimed_wait_common64 stack doesn't rell us anything about why the futex is being waited on. There are a bunch of different failures like bug 1782445, which also have bad signatures.

:jstutte, do we have a common bug we can/should be duping these to? We might also want to add these calls to the prefix list so that they're split into more useful hang stacks.

Severity: -- → S3

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 9

•

3 years ago

The most helpful way to facet these is through xpcom spin event loop stack, I think. Not sure if we can automate this meaningful, though.

I did not check all reports, but the bug to use is probably bug 1505660, with a signature like

[@ shutdownhang | mozilla::SpinEventLoopUntil | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads ]

:gsvelto, any ideas how we can prefix these better?

Flags: needinfo?(jstutte) → needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 10

•

3 years ago

If xpcom spin event loop stack is a better fit than the actual stack then we could use that for the crash signature for shutdown hangs. That is we could make this crash signature go from [@ shutdownhang | __futex_abstimed_wait_common64] to shutdownhang | default: CompositorThreadHolder::Shutdown and this one similarly turn into shutdownhang | default: ThreadEventTarget::Dispatch. Would this be better? What should we use for crashes where the annotation is empty like this one?

Flags: needinfo?(gsvelto)

Jens Stutte [:jstutte]

Comment 11

•

3 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #10)

That is we could make this crash signature go from [@ shutdownhang | __futex_abstimed_wait_common64] to shutdownhang | default: CompositorThreadHolder::Shutdown and this one similarly turn into shutdownhang | default: ThreadEventTarget::Dispatch. Would this be better? What should we use for crashes where the annotation is empty like this one?

I think we should pay attention to look only at reports that have nsThread::Shutdown on the stack here and make those point to bug 1505660. The rest are different cases, as you can see also from the shutdown phase in MOZ_CRASH reason. Those with nsThreadShutdown should all have a SpinEventLoopUntil on the stack. The rest need further analysis.

If we then want to use xpcom spin event loop stack we should revisit also some signatures that already arrive on bug 1505660. So probably the first step is to have a signature that makes it easier to assign them to bug 1505660 to have a common starting point for further analysis.

Gabriele Svelto [:gsvelto]

Comment 12

•

3 years ago

I can make some changes to "peel off" the crashes that should fall under bug 1505660 from this signature. Those changes would alter the other crashes under this signature too. Here's some examples of the new signatures:

0cd0c04a-33bd-49d0-9e0f-4a8bb0221211
[@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::layers::CompositorThreadHolder::Shutdown]
ee69c3db-1d9a-4ddc-9e2d-5436a0221116
[@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::ThreadEventTarget::Dispatch]
d29f54de-9b38-4d0a-bc05-768570221214
[@ shutdownhang | mozilla::DataStorage::Observe]
d575cb18-0625-4ea9-b03f-4c4950221214
[@ shutdownhang | mozilla::SpinEventLoopUntil<T> | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads]

The last one would fall under bug 1505660. Would this work for you?

Jens Stutte [:jstutte]

Comment 13

•

3 years ago

This looks like a good improvement, thanks!

Gabriele Svelto [:gsvelto]

Updated

•

3 years ago

Depends on: 1806107

Gabriele Svelto [:gsvelto]

Comment 14

•

3 years ago

Now that 1806107 has landed the signatures here should disappear, and break away into separate ones like I described in comment 12.

Gabriele Svelto [:gsvelto]

Comment 15

•

3 years ago

The patch in bug 1806107 was insufficient to clean up the signatures, I'll file another bug.

Gabriele Svelto [:gsvelto]

Updated

•

3 years ago

Depends on: 1810519

Bugzilla

Crash in [@ shutdownhang | __futex_abstimed_wait_common64]

Categories

(Core :: XPCOM, defect)

Tracking

()

People

(Reporter: matt.fagnani, Unassigned)

References

Details

(Keywords: hang, Whiteboard: [QA-not-reproducible][tbird crash])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14

Comment 15

Updated