Crash in shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown

RESOLVED FIXED in Firefox 63

Status

()

defect
--
critical
RESOLVED FIXED
a year ago
6 months ago

People

(Reporter: skywalker333, Assigned: gsvelto)

Tracking

(4 keywords)

45 Branch
mozilla64
x86_64
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox-esr60- wontfix, firefox60 wontfix, firefox61 wontfix, firefox62+ wontfix, firefox63+ fixed, firefox64 fixed)

Details

(Whiteboard: keep, crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

Reporter

Description

a year ago
This bug was filed from the Socorro interface and is
report bp-0a0367ca-2875-4ff4-82d1-0df480180520.
=============================================================

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableCS 
2 kernelbase.dll SleepConditionVariableCS 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58
4 xul.dll mozilla::CondVar::Wait xpcom/threads/CondVar.h:68
5 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:155
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:967
7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:517
8 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:796
9 xul.dll nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:332

=============================================================

#3 Top Crasher in Release.
#7 Top Crasher in Beta.

Top Crashers for Firefox 60.0
Top 50 Crashing Signatures. 7 days ago

3 	2.37% 	0.44% 	shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown	992 	992 	0 	0 	1071 	0 	2017-09-27 

Top Crashers for Firefox 61.0b
Top 50 Crashing Signatures. 7 days ago

7 	1.63% 	-0.58% 	shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown	477 	477 	0 	0 	454 	0 	2017-09-27

Updated

a year ago
Crash Signature: [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown] → [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown] [@ shutdownhang | static bool mozilla::SpinEventLoopUntil<T> | nsThread::Shutdown | nsThreadManager::Shutdown]
Still a top crash in 62 beta 16, definitely in the top 5. It's also showing in very high volume on 61 release. 

Marissa is there anyone who can take a look at this issue? We could still take a patch in 62.

Updated

9 months ago
Assignee: nobody → erahm
All of the reports I've looked at have minidump stuff going. If you expand to view all threads we can see the "Minidump Writer" thread spinning a pretty big stack and repeatedly trying to write minidumps and waiting for them to finish:

> 3     mozglue.dll     mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&)       mozglue/misc/ConditionVariable_windows.cpp:58
> 4     xul.dll         mozilla::ThreadEventQueue<mozilla::EventQueue>::GetEvent(bool, mozilla::EventPriority*)         xpcom/threads/ThreadEventQueue.cpp:155
> 5     xul.dll         nsThread::ProcessNextEvent(bool, bool*)         xpcom/threads/nsThread.cpp:994
> 6     xul.dll         NS_ProcessNextEvent(nsIThread*, bool)   xpcom/threads/nsThreadUtils.cpp:519
> 7     xul.dll         static bool mozilla::SpinEventLoopUntil<1, <lambda_d0cce9a5ce1845d41bddf95b3b00d536> >(class mozilla::ThreadEventTarget::Dispatch::__l13::<lambda_d0cce9a5ce1845d41bddf95b3b00d536>*, class nsIThread*)         xpcom/threads/nsThreadUtils.h:324
> 8     xul.dll         mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)       xpcom/threads/ThreadEventTarget.cpp:165
> 9     xul.dll         nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)         xpcom/threads/nsThread.cpp:654
> 10    xul.dll         static void CrashReporter::NotifyDumpResult(bool, bool, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*)     toolkit/crashreporter/nsExceptionHandler.cpp:3796
> 11    xul.dll         static void CrashReporter::CreatePairedChildMinidumpAsync(void*, unsigned long, class nsTString<char>, class nsCOMPtr<nsIFile>, class nsIFile**, class std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*, bool)       toolkit/crashreporter/nsExceptionHandler.cpp:3859
> 12    xul.dll         static void <lambda_647dabe03738f3130114b2f535bbb481>::operator()()     toolkit/crashreporter/nsExceptionHandler.cpp:3924
> 13    xul.dll         mozilla::detail::RunnableFunction<std::function<void > >::Run()         xpcom/threads/nsThreadUtils.h:552
> 14    xul.dll         nsThread::ProcessNextEvent(bool, bool*)         xpcom/threads/nsThread.cpp:1090
> 15    xul.dll         NS_ProcessNextEvent(nsIThread*, bool)   xpcom/threads/nsThreadUtils.cpp:519
> 16    xul.dll         static bool mozilla::SpinEventLoopUntil<1, <lambda_d0cce9a5ce1845d41bddf95b3b00d536> >(class mozilla::ThreadEventTarget::Dispatch::__l13::<lambda_d0cce9a5ce1845d41bddf95b3b00d536>*, class nsIThread*)         xpcom/threads/nsThreadUtils.h:324
> 17    xul.dll         mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)       xpcom/threads/ThreadEventTarget.cpp:165
> 18    xul.dll         nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)         xpcom/threads/nsThread.cpp:654
> 19    xul.dll         static void CrashReporter::NotifyDumpResult(bool, bool, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*)     toolkit/crashreporter/nsExceptionHandler.cpp:3796
> 20    xul.dll         static void CrashReporter::CreatePairedChildMinidumpAsync(void*, unsigned long, class nsTString<char>, class nsCOMPtr<nsIFile>, class nsIFile**, class std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*, bool)       toolkit/crashreporter/nsExceptionHandler.cpp:3859
> 21    xul.dll         static void <lambda_647dabe03738f3130114b2f535bbb481>::operator()()     toolkit/crashreporter/nsExceptionHandler.cpp:3924
~                                                                                                                                                             

This seems related to bug 1469603, where I noted that we're intending to perform an async dispatch, but use `NS_DISPATCH_SYNC` when dispatching the runnable. My guess is that's never returning and we never shut down. :gsvelto, can you look into this?

[1] https://searchfox.org/mozilla-central/rev/2466b82b729765fb0a3ab62f812c1a96a7362478/toolkit/crashreporter/nsExceptionHandler.cpp#3836,3840
Flags: needinfo?(gsvelto)
Assignee

Comment 3

9 months ago
(In reply to Eric Rahm [:erahm] from comment #2)
> This seems related to bug 1469603, where I noted that we're intending to
> perform an async dispatch, but use `NS_DISPATCH_SYNC` when dispatching the
> runnable. My guess is that's never returning and we never shut down.
> :gsvelto, can you look into this?

Sadly enough I misunderstood your comment in that bug because I thought that if the event queue was empty we'd invoke the callback synchronously, I didn't notice that we're explicitly requesting synchronous dispatch. The lock held in [1] is definitely going to be still held in [2] because of synchronous dispatch. No matter how winding the path that goes there is, the callback is still the last step when generating a minidump so if it stops trying to acquire the lock the writer will never finish. Since the minidump writer thread's stack in the crashes have multiple instances of CrashReporter::CreatePairedChildMinidumpAsync() this hypothesis seems consistent with that of a long plugin hang.

I think I should be able to verify this with the tests we already have so I'll have a look tomorrow and report back.

[1] https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/dom/plugins/ipc/PluginModuleParent.cpp#1104
[2] https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/dom/plugins/ipc/PluginModuleParent.cpp#1126
Flags: needinfo?(gsvelto)
Assignee

Updated

9 months ago
Assignee: erahm → gsvelto
Status: NEW → ASSIGNED
Triaged and assigned.

Updated

9 months ago
Status: ASSIGNED → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → FIXED

Updated

9 months ago
Status: RESOLVED → REOPENED
Flags: needinfo?(mmorris)
Resolution: FIXED → ---
Assignee

Comment 5

9 months ago
I've modified our plugin hang test in the hope of reproducing this consistently but couldn't. So my plan here is the following:

- Switch [1] to normal dispatch and leave the bug open. Hopefully this will be sufficient.

- If that doesn't fix it, manually revert the changes introduced by bug 1360308 and make this code synchronous again. While this might sound a little extreme the rationale is that those changes were designed to reduce hangs, but if they lead to crashes then it's preferable to stick with the hangs. Besides we intend to rip out all of this code and make minidump generation entirely OOP. The idea was to start this work after the next all hands so I believe we can live with a synchronous path here for a few more months.

[1] https://searchfox.org/mozilla-central/rev/71ef4447db179639be9eff4471f32a95423962d7/toolkit/crashreporter/nsExceptionHandler.cpp#3863
Assignee

Comment 8

9 months ago
Phabricator review ping
Flags: needinfo?(ted)
Comment on attachment 9002601 [details]
Bug 1463048 - Invoke the minidump generation callback asynchronsouly

Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9002601 - Flags: review+

Comment 10

9 months ago
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/af8353a3ecbc
Invoke the minidump generation callback asynchronsouly r=ted
Flags: needinfo?(ted)
Assignee

Comment 11

9 months ago
The tentative fix has landed but let's keep this open until we're sure that the crash rate goes down. It should be visible on the nightly channel soon enough.
Keywords: leave-open
Whiteboard: keep
Backed out changeset af8353a3ecbc (Bug 1463048) for causing Bug 1488827 a=backout

https://hg.mozilla.org/mozilla-central/rev/e401d1d666005bcf9ccf7fb9f9aa3256b9446544
Flags: needinfo?(gsvelto)
Depends on: 1488827
Assignee

Comment 14

9 months ago
This reverts the changes in bug 1360308, bug 1390143 and bug 1469603. Minidump
generation will now only happen on the main process' main thread which might
lead to hangs but is known to be fairly robust. Asynchronous generation proved
too brittle and enormously increased the complexity of this already
hard-to-read code.
Assignee

Comment 15

9 months ago
Comment on attachment 9002601 [details]
Bug 1463048 - Invoke the minidump generation callback asynchronsouly

Obsoleting this patch, I've prepared a new one that reverts all asynchronous minidump generation. It's proved to be too brittle and we've already spent too much time trying to get it working.
Attachment #9002601 - Attachment is obsolete: true
Flags: needinfo?(gsvelto)
Assignee

Comment 16

8 months ago
Review ping for the patch in attachment 9006899 [details]
Flags: needinfo?(ted)
Comment on attachment 9006899 [details]
Bug 1463048 - Remove asynchronous minidump generation

Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9006899 - Flags: review+
Flags: needinfo?(ted)
Assignee

Comment 18

8 months ago
I'm going through the final round of testing, hopefully it will land today.

Comment 19

8 months ago
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/588d96e4eec8
Remove asynchronous minidump generation r=ted
Assignee

Comment 21

8 months ago
There have been no new instances of this crash on nightly for builds including attachment 9006899 [details], but it might be too early to be 100% sure. In any case uplifting this patch should be relatively straightforward even though it's big because it just reverts older patches.
Assignee

Comment 22

8 months ago
There's been a handful of crashes with this signature but none of them are caused by the original problem. They can be told apart because among the minidump thread you won't find the "Minidump Writer" thread that was hanging. I think it's safe to close this as it's definitely fixed on nightly.

Updated

8 months ago
Duplicate of this bug: 1489119
Assignee

Comment 24

8 months ago
Still no more crashes with the original issue, this is fixed.
Status: REOPENED → RESOLVED
Last Resolved: 9 months ago8 months ago
Resolution: --- → FIXED
Reporter

Comment 25

8 months ago
[Tracking Requested - why for this release]:

There is still a high volume of crashes with this signature. It remains a top crasher. Requesting uplift for versions FF63 (Beta) and 60ESR.

(In reply to Gabriele Svelto [:gsvelto] from comment #21)
> There have been no new instances of this crash on nightly for builds
> including attachment 9006899 [details], but it might be too early to be 100%
> sure. In any case uplifting this patch should be relatively straightforward
> even though it's big because it just reverts older patches.



Top Crashers for Firefox 63.0b9
Top 50 Crashing Signatures. 7 days ago 

9 	1.1% 	new 	shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown	120 	120 	0 	0 	115 	0 	2017-09-27 
	

Top Crashers for Firefox 60.2.1esr
Top 50 Crashing Signatures. 7 days ago 

39 	0.31% 	0.09% 	shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown	59 	59 	0 	0 	58 	0 	2017-09-27 


Top Crashers for Firefox 62.0.2
Top 50 Crashing Signatures. 7 days ago 

7 	1.26% 	-0.29% 	shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown	1631 	1631 	0 	0 	1486 	0 	2017-09-27


Top Crashers for Firefox 52.9.0esr
Top 50 Crashing Signatures. 7 days ago 

49 	0.24% 	-0.01% 	shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown	167 	167 	0 	0 	101 	0 	2017-09-27

Signature report for shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown
Showing results from 7 days ago

Firefox 	64.0a1 	        3 	0.1% 	3    (Fixed)
Firefox 	63.0b9 	        119 	4.1% 	220  (Not Fixed)
Firefox 	62.0.2 	        1604 	55.5% 	902
Firefox 	62.0 	        310 	10.7% 	313
Firefox 	60.2.1esr 	60 	2.1% 	58
Firefox 	60.2.0esr 	40 	1.4% 	60
Firefox 	60.1.0esr 	11 	0.4% 	15
Firefox 	60.0.2 	        16 	0.6% 	13
Firefox 	60.0.1 	        15 	0.5% 	17
Firefox 	60.0.1esr 	8 	0.3% 	8
Firefox 	60.0esr 	4 	0.1% 	5
Firefox 	60.0.2esr 	2 	0.1% 	2
Firefox 	52.9.0esr 	165 	5.7% 	106
Firefox 	52.8.0esr 	9 	0.3% 	3
Firefox 	52.8.1esr 	3 	0.1% 	1
Firefox 	52.7.3esr 	1 	0.0% 	1
Firefox 	52.7.4esr 	1 	0.0% 	1
Firefox 	52.6.0esr 	4 	0.1% 	5
Firefox 	52.5.3esr 	3 	0.1% 	2
Firefox 	52.5.2esr 	1 	0.0% 	1
Firefox 	52.4.0esr 	1 	0.0% 	1
Firefox 	52.3.0esr 	2 	0.1% 	2
Firefox 	52.1.0esr 	4 	0.1% 	1
Firefox 	52.0.1esr 	1 	0.0% 	1
Firefox 	47.0.2esr 	10 	0.3% 	6
Firefox 	45.9.0esr 	13 	0.4% 	15
Firefox 	45.8.0esr 	20 	0.7% 	15
Firefox 	45.7.0esr 	10 	0.3% 	9
Firefox 	45.6.0esr 	9 	0.3% 	10
Firefox 	45.5.0esr 	2 	0.1% 	2
Firefox 	45.5.1esr 	1 	0.0% 	1
Status: RESOLVED → REOPENED
Keywords: top50
Resolution: FIXED → ---
Gabriele, can we uplift this patch to 63 beta ? Thanks
Flags: needinfo?(gsvelto)
Assignee

Comment 27

8 months ago
Sure, the patch applies cleanly so I'll ask for approval right away.
Flags: needinfo?(gsvelto)
Assignee

Comment 28

8 months ago
Comment on attachment 9006899 [details]
Bug 1463048 - Remove asynchronous minidump generation

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1360308

User impact if declined: Firefox will crash on shutdown if a plugin process is hung

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): The patch is large but not risky because its been made by reversing three patches that were previously applied. It returns the affected code to its previous state which had been used for years and is well tested.

String changes made/needed: None
Attachment #9006899 - Flags: approval-mozilla-beta?
Comment on attachment 9006899 [details]
Bug 1463048 - Remove asynchronous minidump generation

Crash fix for a tracked 63 bug, uplift approved for 63 beta 12, thanks!
Attachment #9006899 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Updated

7 months ago
Depends on: 1498942
This shouldn't have been reopened.
Keywords: leave-open
Target Milestone: --- → mozilla64
Status: REOPENED → RESOLVED
Last Resolved: 8 months ago7 months ago
Resolution: --- → FIXED
What do you think of uplifting this to ESR60? The crash volume there seems significant but not terribly high.
Flags: needinfo?(gsvelto)
Assignee

Comment 33

6 months ago
I don't know if the patch applies cleanly there but I can check. The patch did not cause regressions save for bug 1498942. We've already confirmed the fix for that bug so if we uplift this one we'd also have to uplift that fix.
Flags: needinfo?(gsvelto)
That seems a bit scary. Let's let this wait till the next major ESR.
You need to log in before you can comment on or make changes to this bug.