Closed
Bug 1463048
Opened 7 years ago
Closed 7 years ago
Crash in shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown
Categories
(Core :: General, defect)
Tracking
()
People
(Reporter: skywalker333, Assigned: gsvelto)
References
Details
(4 keywords, Whiteboard: keep)
Crash Data
Attachments
(1 file, 1 obsolete file)
|
46 bytes,
text/x-phabricator-request
|
ted
:
review+
pascalc
:
approval-mozilla-beta+
|
Details | Review |
This bug was filed from the Socorro interface and is
report bp-0a0367ca-2875-4ff4-82d1-0df480180520.
=============================================================
Top 10 frames of crashing thread:
0 ntdll.dll NtWaitForAlertByThreadId
1 ntdll.dll RtlSleepConditionVariableCS
2 kernelbase.dll SleepConditionVariableCS
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58
4 xul.dll mozilla::CondVar::Wait xpcom/threads/CondVar.h:68
5 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:155
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:967
7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:517
8 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:796
9 xul.dll nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:332
=============================================================
#3 Top Crasher in Release.
#7 Top Crasher in Beta.
Top Crashers for Firefox 60.0
Top 50 Crashing Signatures. 7 days ago
3 2.37% 0.44% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 992 992 0 0 1071 0 2017-09-27
Top Crashers for Firefox 61.0b
Top 50 Crashing Signatures. 7 days ago
7 1.63% -0.58% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 477 477 0 0 454 0 2017-09-27
Updated•7 years ago
|
Crash Signature: [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown] → [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown]
[@ shutdownhang | static bool mozilla::SpinEventLoopUntil<T> | nsThread::Shutdown | nsThreadManager::Shutdown]
Comment 1•7 years ago
|
||
Still a top crash in 62 beta 16, definitely in the top 5. It's also showing in very high volume on 61 release.
Marissa is there anyone who can take a look at this issue? We could still take a patch in 62.
status-firefox63:
--- → affected
Flags: needinfo?(mmorris)
Updated•7 years ago
|
tracking-firefox62:
--- → +
tracking-firefox63:
--- → ?
Updated•7 years ago
|
Assignee: nobody → erahm
Comment 2•7 years ago
|
||
All of the reports I've looked at have minidump stuff going. If you expand to view all threads we can see the "Minidump Writer" thread spinning a pretty big stack and repeatedly trying to write minidumps and waiting for them to finish:
> 3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) mozglue/misc/ConditionVariable_windows.cpp:58
> 4 xul.dll mozilla::ThreadEventQueue<mozilla::EventQueue>::GetEvent(bool, mozilla::EventPriority*) xpcom/threads/ThreadEventQueue.cpp:155
> 5 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:994
> 6 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:519
> 7 xul.dll static bool mozilla::SpinEventLoopUntil<1, <lambda_d0cce9a5ce1845d41bddf95b3b00d536> >(class mozilla::ThreadEventTarget::Dispatch::__l13::<lambda_d0cce9a5ce1845d41bddf95b3b00d536>*, class nsIThread*) xpcom/threads/nsThreadUtils.h:324
> 8 xul.dll mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/ThreadEventTarget.cpp:165
> 9 xul.dll nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThread.cpp:654
> 10 xul.dll static void CrashReporter::NotifyDumpResult(bool, bool, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*) toolkit/crashreporter/nsExceptionHandler.cpp:3796
> 11 xul.dll static void CrashReporter::CreatePairedChildMinidumpAsync(void*, unsigned long, class nsTString<char>, class nsCOMPtr<nsIFile>, class nsIFile**, class std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*, bool) toolkit/crashreporter/nsExceptionHandler.cpp:3859
> 12 xul.dll static void <lambda_647dabe03738f3130114b2f535bbb481>::operator()() toolkit/crashreporter/nsExceptionHandler.cpp:3924
> 13 xul.dll mozilla::detail::RunnableFunction<std::function<void > >::Run() xpcom/threads/nsThreadUtils.h:552
> 14 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1090
> 15 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:519
> 16 xul.dll static bool mozilla::SpinEventLoopUntil<1, <lambda_d0cce9a5ce1845d41bddf95b3b00d536> >(class mozilla::ThreadEventTarget::Dispatch::__l13::<lambda_d0cce9a5ce1845d41bddf95b3b00d536>*, class nsIThread*) xpcom/threads/nsThreadUtils.h:324
> 17 xul.dll mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/ThreadEventTarget.cpp:165
> 18 xul.dll nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThread.cpp:654
> 19 xul.dll static void CrashReporter::NotifyDumpResult(bool, bool, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*) toolkit/crashreporter/nsExceptionHandler.cpp:3796
> 20 xul.dll static void CrashReporter::CreatePairedChildMinidumpAsync(void*, unsigned long, class nsTString<char>, class nsCOMPtr<nsIFile>, class nsIFile**, class std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*, bool) toolkit/crashreporter/nsExceptionHandler.cpp:3859
> 21 xul.dll static void <lambda_647dabe03738f3130114b2f535bbb481>::operator()() toolkit/crashreporter/nsExceptionHandler.cpp:3924
~
This seems related to bug 1469603, where I noted that we're intending to perform an async dispatch, but use `NS_DISPATCH_SYNC` when dispatching the runnable. My guess is that's never returning and we never shut down. :gsvelto, can you look into this?
[1] https://searchfox.org/mozilla-central/rev/2466b82b729765fb0a3ab62f812c1a96a7362478/toolkit/crashreporter/nsExceptionHandler.cpp#3836,3840
Flags: needinfo?(gsvelto)
| Assignee | ||
Comment 3•7 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #2)
> This seems related to bug 1469603, where I noted that we're intending to
> perform an async dispatch, but use `NS_DISPATCH_SYNC` when dispatching the
> runnable. My guess is that's never returning and we never shut down.
> :gsvelto, can you look into this?
Sadly enough I misunderstood your comment in that bug because I thought that if the event queue was empty we'd invoke the callback synchronously, I didn't notice that we're explicitly requesting synchronous dispatch. The lock held in [1] is definitely going to be still held in [2] because of synchronous dispatch. No matter how winding the path that goes there is, the callback is still the last step when generating a minidump so if it stops trying to acquire the lock the writer will never finish. Since the minidump writer thread's stack in the crashes have multiple instances of CrashReporter::CreatePairedChildMinidumpAsync() this hypothesis seems consistent with that of a long plugin hang.
I think I should be able to verify this with the tests we already have so I'll have a look tomorrow and report back.
[1] https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/dom/plugins/ipc/PluginModuleParent.cpp#1104
[2] https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/dom/plugins/ipc/PluginModuleParent.cpp#1126
Flags: needinfo?(gsvelto)
| Assignee | ||
Updated•7 years ago
|
Assignee: erahm → gsvelto
Status: NEW → ASSIGNED
Comment 4•7 years ago
|
||
Triaged and assigned.
Updated•7 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Flags: needinfo?(mmorris)
Resolution: FIXED → ---
| Assignee | ||
Comment 5•7 years ago
|
||
I've modified our plugin hang test in the hope of reproducing this consistently but couldn't. So my plan here is the following:
- Switch [1] to normal dispatch and leave the bug open. Hopefully this will be sufficient.
- If that doesn't fix it, manually revert the changes introduced by bug 1360308 and make this code synchronous again. While this might sound a little extreme the rationale is that those changes were designed to reduce hangs, but if they lead to crashes then it's preferable to stick with the hangs. Besides we intend to rip out all of this code and make minidump generation entirely OOP. The idea was to start this work after the next all hands so I believe we can live with a synchronous path here for a few more months.
[1] https://searchfox.org/mozilla-central/rev/71ef4447db179639be9eff4471f32a95423962d7/toolkit/crashreporter/nsExceptionHandler.cpp#3863
| Assignee | ||
Comment 6•7 years ago
|
||
Comment 7•7 years ago
|
||
Too late for 62 at this point.
Comment 9•7 years ago
|
||
Comment on attachment 9002601 [details]
Bug 1463048 - Invoke the minidump generation callback asynchronsouly
Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9002601 -
Flags: review+
Comment 10•7 years ago
|
||
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/af8353a3ecbc
Invoke the minidump generation callback asynchronsouly r=ted
Updated•7 years ago
|
Flags: needinfo?(ted)
| Assignee | ||
Comment 11•7 years ago
|
||
The tentative fix has landed but let's keep this open until we're sure that the crash rate goes down. It should be visible on the nightly channel soon enough.
Keywords: leave-open
Whiteboard: keep
Comment 12•7 years ago
|
||
| bugherder | ||
Comment 13•7 years ago
|
||
Backed out changeset af8353a3ecbc (Bug 1463048) for causing Bug 1488827 a=backout
https://hg.mozilla.org/mozilla-central/rev/e401d1d666005bcf9ccf7fb9f9aa3256b9446544
Flags: needinfo?(gsvelto)
| Assignee | ||
Comment 14•7 years ago
|
||
This reverts the changes in bug 1360308, bug 1390143 and bug 1469603. Minidump
generation will now only happen on the main process' main thread which might
lead to hangs but is known to be fairly robust. Asynchronous generation proved
too brittle and enormously increased the complexity of this already
hard-to-read code.
| Assignee | ||
Comment 15•7 years ago
|
||
Comment on attachment 9002601 [details]
Bug 1463048 - Invoke the minidump generation callback asynchronsouly
Obsoleting this patch, I've prepared a new one that reverts all asynchronous minidump generation. It's proved to be too brittle and we've already spent too much time trying to get it working.
Attachment #9002601 -
Attachment is obsolete: true
Flags: needinfo?(gsvelto)
| Assignee | ||
Comment 16•7 years ago
|
||
Review ping for the patch in attachment 9006899 [details]
Flags: needinfo?(ted)
Comment 17•7 years ago
|
||
Comment on attachment 9006899 [details]
Bug 1463048 - Remove asynchronous minidump generation
Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9006899 -
Flags: review+
Updated•7 years ago
|
Flags: needinfo?(ted)
| Assignee | ||
Comment 18•7 years ago
|
||
I'm going through the final round of testing, hopefully it will land today.
Comment 19•7 years ago
|
||
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/588d96e4eec8
Remove asynchronous minidump generation r=ted
Comment 20•7 years ago
|
||
| bugherder | ||
| Assignee | ||
Comment 21•7 years ago
|
||
There have been no new instances of this crash on nightly for builds including attachment 9006899 [details], but it might be too early to be 100% sure. In any case uplifting this patch should be relatively straightforward even though it's big because it just reverts older patches.
| Assignee | ||
Comment 22•7 years ago
|
||
There's been a handful of crashes with this signature but none of them are caused by the original problem. They can be told apart because among the minidump thread you won't find the "Minidump Writer" thread that was hanging. I think it's safe to close this as it's definitely fixed on nightly.
| Assignee | ||
Comment 24•7 years ago
|
||
Still no more crashes with the original issue, this is fixed.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 25•7 years ago
|
||
[Tracking Requested - why for this release]:
There is still a high volume of crashes with this signature. It remains a top crasher. Requesting uplift for versions FF63 (Beta) and 60ESR.
(In reply to Gabriele Svelto [:gsvelto] from comment #21)
> There have been no new instances of this crash on nightly for builds
> including attachment 9006899 [details], but it might be too early to be 100%
> sure. In any case uplifting this patch should be relatively straightforward
> even though it's big because it just reverts older patches.
Top Crashers for Firefox 63.0b9
Top 50 Crashing Signatures. 7 days ago
9 1.1% new shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 120 120 0 0 115 0 2017-09-27
Top Crashers for Firefox 60.2.1esr
Top 50 Crashing Signatures. 7 days ago
39 0.31% 0.09% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 59 59 0 0 58 0 2017-09-27
Top Crashers for Firefox 62.0.2
Top 50 Crashing Signatures. 7 days ago
7 1.26% -0.29% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 1631 1631 0 0 1486 0 2017-09-27
Top Crashers for Firefox 52.9.0esr
Top 50 Crashing Signatures. 7 days ago
49 0.24% -0.01% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 167 167 0 0 101 0 2017-09-27
Signature report for shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown
Showing results from 7 days ago
Firefox 64.0a1 3 0.1% 3 (Fixed)
Firefox 63.0b9 119 4.1% 220 (Not Fixed)
Firefox 62.0.2 1604 55.5% 902
Firefox 62.0 310 10.7% 313
Firefox 60.2.1esr 60 2.1% 58
Firefox 60.2.0esr 40 1.4% 60
Firefox 60.1.0esr 11 0.4% 15
Firefox 60.0.2 16 0.6% 13
Firefox 60.0.1 15 0.5% 17
Firefox 60.0.1esr 8 0.3% 8
Firefox 60.0esr 4 0.1% 5
Firefox 60.0.2esr 2 0.1% 2
Firefox 52.9.0esr 165 5.7% 106
Firefox 52.8.0esr 9 0.3% 3
Firefox 52.8.1esr 3 0.1% 1
Firefox 52.7.3esr 1 0.0% 1
Firefox 52.7.4esr 1 0.0% 1
Firefox 52.6.0esr 4 0.1% 5
Firefox 52.5.3esr 3 0.1% 2
Firefox 52.5.2esr 1 0.0% 1
Firefox 52.4.0esr 1 0.0% 1
Firefox 52.3.0esr 2 0.1% 2
Firefox 52.1.0esr 4 0.1% 1
Firefox 52.0.1esr 1 0.0% 1
Firefox 47.0.2esr 10 0.3% 6
Firefox 45.9.0esr 13 0.4% 15
Firefox 45.8.0esr 20 0.7% 15
Firefox 45.7.0esr 10 0.3% 9
Firefox 45.6.0esr 9 0.3% 10
Firefox 45.5.0esr 2 0.1% 2
Firefox 45.5.1esr 1 0.0% 1
Status: RESOLVED → REOPENED
status-firefox64:
--- → fixed
tracking-firefox-esr60:
--- → ?
Keywords: top50
Resolution: FIXED → ---
Comment 26•7 years ago
|
||
Gabriele, can we uplift this patch to 63 beta ? Thanks
Flags: needinfo?(gsvelto)
| Assignee | ||
Comment 27•7 years ago
|
||
Sure, the patch applies cleanly so I'll ask for approval right away.
Flags: needinfo?(gsvelto)
| Assignee | ||
Comment 28•7 years ago
|
||
Comment on attachment 9006899 [details]
Bug 1463048 - Remove asynchronous minidump generation
[Beta/Release Uplift Approval Request]
Feature/Bug causing the regression: Bug 1360308
User impact if declined: Firefox will crash on shutdown if a plugin process is hung
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): The patch is large but not risky because its been made by reversing three patches that were previously applied. It returns the affected code to its previous state which had been used for years and is well tested.
String changes made/needed: None
Attachment #9006899 -
Flags: approval-mozilla-beta?
Comment 29•7 years ago
|
||
Comment on attachment 9006899 [details]
Bug 1463048 - Remove asynchronous minidump generation
Crash fix for a tracked 63 bug, uplift approved for 63 beta 12, thanks!
Attachment #9006899 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 30•7 years ago
|
||
| bugherder uplift | ||
Updated•7 years ago
|
Keywords: regression
Comment 31•7 years ago
|
||
This shouldn't have been reopened.
Keywords: leave-open
Target Milestone: --- → mozilla64
Updated•7 years ago
|
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Comment 32•7 years ago
|
||
What do you think of uplifting this to ESR60? The crash volume there seems significant but not terribly high.
Flags: needinfo?(gsvelto)
| Assignee | ||
Comment 33•7 years ago
|
||
I don't know if the patch applies cleanly there but I can check. The patch did not cause regressions save for bug 1498942. We've already confirmed the fix for that bug so if we uplift this one we'd also have to uplift that fix.
Flags: needinfo?(gsvelto)
Comment 34•7 years ago
|
||
That seems a bit scary. Let's let this wait till the next major ESR.
You need to log in
before you can comment on or make changes to this bug.
Description
•