Closed Bug 1790983 Opened 2 years ago Closed 2 years ago

Crash in [@ shutdownhang | mozilla::OffTheBooksCondVar::Wait]

Categories

(Core :: XPCOM, defect)

Desktop
All
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox106 - wontfix

People

(Reporter: wsmwk, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [tbird crash])

Crash Data

Possibly during/after an update of daily build. The crash reporter was there when I logged on to the computer.

Crash report: https://crash-stats.mozilla.org/report/index/058c0c51-c3dc-4bdb-8a3d-9dd2a0220915

MOZ_CRASH Reason: Shutdown hanging at step XPCOMShutdownThreads. Something is blocking the main-thread.

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableSRW 
2 KERNELBASE.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll mozilla::TaskController::GetRunnableForMTTask xpcom/threads/TaskController.cpp:586
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1139
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:465
7 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:900
8 xul.dll nsThreadManager::ShutdownNonMainThreads xpcom/threads/nsThreadManager.cpp:391
9 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:613
OS: Windows 10 → All
Product: Thunderbird → Core
Hardware: Unspecified → Desktop

#8 crash for Firefox nightly

Whiteboard: [tbird crash]

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

For more information, please visit auto_nag documentation.

Keywords: topcrash

Seen and referenced in bug 1768344.

Tracking given the volume on beta, Greg could we get this investigated please? Thanks

Flags: needinfo?(ghess)

This signature is not great. I wonder if it used to show up as something else.

For beta crashes in the last week, 54% are hanging at XPCOMShutdownThreads, 20% at AppShutdownConfirmed, 11% at AppShutdownNetTeardown, 5% at AppShutdown.

I think this is just bug 1505660 with a new signature. (Well, also probably with some other signatures mixed in there.) The crashes there are completely gone in 106 beta.

Will, do you know if something happened that broke the shutdown hang signature improvement stuff (like bug 1402037)? Maybe just the inlining? Thanks.

Component: General → XPCOM
Flags: needinfo?(willkg)
See Also: → 1505660

(By "inlining", I mean how inlined functions can now show up in the signature.)

The crashes in bug 1682899 are also completely gone in 106 beta, so maybe the AppShutdownConfirmed hangs are that.

See Also: → 1682899

The first frame of interest that's not inlined in comment 0 is mozilla::TaskController::GetRunnableForMTTask(bool) and a few frames later mozilla::SpinEventLoopUntil(nsTSubstring<char> const&, nsThread::Shutdown::<lambda_7>&&, nsIThread*) so this would have probably fallen inside bug 1542485.

It seems that the signature changes are jumbling around these signatures, we probably might want to ignore or at least skip over mozilla::OffTheBooksCondVar::Wait() given it hides several different stacks above.

Flags: needinfo?(ghess)

[Tracking Requested - why for this release]: This is more of an "untracking request". This looks like it is a jumble of old shutdown hang crashes being combined due to signature changes, so I don't think it needs to be tracked.

See Also: → 1542485
Flags: needinfo?(willkg)

still seeing this on nightly updates
bp-95e2ff16-7076-45f0-87a8-2f0720220930
bp-93e2cbf3-4ef2-4d16-82fb-e3eba0220928

Perhaps related to mail accounts which are prompting for password but don't get logged in. I have removed them, and will see if that helps

So aggregating by MOZ_CRASH_REASON over the last week yields:

Rank Reason # % Example Bug/where
1 Shutdown hanging at step XPCOMShutdownThreads. Something is blocking the main-thread. 76 58.91 % b8bf489a-6cbc-46c1-a6b3-b3d6d0221006 Bug 1505660
2 Shutdown hanging at step AppShutdownConfirmed. Something is blocking the main-thread. 12 9.30 % 7923fd06-56af-4118-9c97-271b70221006 nsJSInspector::EnterNestedEventLoop (called from JS: xpcInspector.enterNestedEventLoop(this);)
3 Shutdown hanging at step AppShutdownNetTeardown. Something is blocking the main-thread. 12 9.30 % 808c0987-71fc-4652-8bd0-f65270221005 Bug 1633342
4 Shutdown hanging at step AppShutdown. Something is blocking the main-thread. 10 7.75 % 7dc5302e-e5bc-4ddb-858d-b35010221006 CacheFileIOManager::Shutdown
5 Shutdown hanging at step XPCOMWillShutdown. Something is blocking the main-thread. 7 5.43 % 28d8203e-3053-4b5a-9d16-148af0221004 Bug 1710018
6 Shutdown hanging at step XPCOMShutdown. Something is blocking the main-thread. 2 1.55 % 5c37520d-6722-4834-ace5-8e8980220930 CanvasManagerParent::Shutdown
7 Shutdown hanging at step AppShutdownTeardown. Something is blocking the main-thread. 1 0.78 % c9dfe588-bd1a-499f-96d0-f79ab0221006 TaskController::GetRunnableForMTTask does not get a next event

Can we adjust the signatures accordingly?

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #9)

The first frame of interest that's not inlined in comment 0 is mozilla::TaskController::GetRunnableForMTTask(bool) and a few frames later mozilla::SpinEventLoopUntil(nsTSubstring<char> const&, nsThread::Shutdown::<lambda_7>&&, nsIThread*) so this would have probably fallen inside bug 1542485.

No, the quota manager shutdown hangs have mozilla::SpinEventLoopUntil<mozilla::ProcessFailureBehavior::ReportToCaller, 'lambda at /builds/worker/checkouts/gecko/dom/quota/ActorsParent.cpp:2956:5'>(nsTSubstring<char> const&, mozilla::dom::quota::QuotaManager::Observer::Observe::<lambda_33>&&, nsIThread*) and keep arriving on bug 1542485, AFAICS. That signature falls into bug 1505660, as :mccr8 suspected in comment 6.

See Also: → 1794376

I filed bug 1794587 to adjust the signatures so they essentially go back to what they were before. In these crashes the addition of inlined functions to the signature provides not benefit, quite the opposite in fact.

Depends on: 1794587
Flags: needinfo?(gsvelto)

I'm tempted to make the signature generation ignore mozilla::TaskController::GetRunnableForMTTask() as it doesn't seem to provide much value, it's just making our shutdown hang signatures longer. Jens what do you think about it? This would affect half a dozen bugs for which the signature would change.

Flags: needinfo?(jstutte)

(In reply to Gabriele Svelto [:gsvelto] from comment #16)

I'm tempted to make the signature generation ignore mozilla::TaskController::GetRunnableForMTTask() as it doesn't seem to provide much value, it's just making our shutdown hang signatures longer. Jens what do you think about it? This would affect half a dozen bugs for which the signature would change.

Looks reasonable, it would be great if you can keep track of the needed adjustments to existing bug signatures.

Flags: needinfo?(jstutte)

(In reply to Jens Stutte [:jstutte] from comment #17)

Looks reasonable, it would be great if you can keep track of the needed adjustments to existing bug signatures.

I will! I can test the signatures beforehand thanks to socorro-siggen so I can add new signatures to the affected bugs even before we roll the changes into Socorro.

I've reviewed most of the affected crashes and there's almost nothing that needs to be done: in most cases the signatures will go back to what they were before we introduced inlined functions support.

Crash Signature: [@ shutdownhang | mozilla::OffTheBooksCondVar::Wait] → [@ mozilla::OffTheBooksCondVar::Wait] [@ shutdownhang | mozilla::OffTheBooksCondVar::Wait]

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on release (startup)
  • Top 20 desktop browser crashes on beta
  • Top 10 desktop browser crashes on nightly
  • Top 5 desktop browser crashes on Mac on beta
  • Top 5 desktop browser crashes on Windows on beta

For more information, please visit auto_nag documentation.

(In reply to Gabriele Svelto [:gsvelto] from comment #19)

I've reviewed most of the affected crashes and there's almost nothing that needs to be done: in most cases the signatures will go back to what they were before we introduced inlined functions support.

IIUC this change has yet to be done, right? The crashes seem to arrive still here. Thanks!

Flags: needinfo?(gsvelto)

I did a Socorro (crash ingestion pipeline and Crash Stats) prod deploy (bug #1797179) on 2022-10-24 which picked up signature changes. Looking at the graphs for both these signatures, there are no new reports since 2022-10-24, so I think we're done here.

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #22)

I did a Socorro (crash ingestion pipeline and Crash Stats) prod deploy (bug #1797179) on 2022-10-24 which picked up signature changes. Looking at the graphs for both these signatures, there are no new reports since 2022-10-24, so I think we're done here.

Oh, sorry for the noise, I'll just wait then to see the stats on the other bugs spike again.

Flags: needinfo?(gsvelto)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

(In reply to Jens Stutte [:jstutte] from comment #23)

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #22)

I did a Socorro (crash ingestion pipeline and Crash Stats) prod deploy (bug #1797179) on 2022-10-24 which picked up signature changes. Looking at the graphs for both these signatures, there are no new reports since 2022-10-24, so I think we're done here.

Oh, sorry for the noise, I'll just wait then to see the stats on the other bugs spike again.

I examined just a few bugs. The following increased since 2022-10-24

  • Firefox Bug 1710018 - Crash in [@ shutdownhang | mozilla::PreferencesWriter::Flush]
  • Thunderbird Bug 1524247 - Shutdown crash/hang due to endless wait loop when no password is entered | WaitForSingleObjectEx | WaitForSingleObject | _PR_MD_WAIT_CV | _PR_WaitCondVar | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent (but this is all version 91)

The following previously mentioned bugs did not increase

  • Bug 1505660 - Crash in shutdownhang | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads (but did increase greatly sept-october)
  • Bug 1633342 - [meta] Crash in [mozilla::net::nsHttpConnectionMgr::Shutdown] and other net related places. Shutdown hang.

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
Keywords: topcrash

Signatures are back to normal since bug 1790983 landed, so let's close this.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.