Closed Bug 1463048 Opened 7 years ago Closed 7 years ago

Crash in shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown

Categories

(Core :: General, defect)

45 Branch
x86_64
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 - wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 + wontfix
firefox63 + fixed
firefox64 --- fixed

People

(Reporter: skywalker333, Assigned: gsvelto)

References

Details

(4 keywords, Whiteboard: keep)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is report bp-0a0367ca-2875-4ff4-82d1-0df480180520. ============================================================= Top 10 frames of crashing thread: 0 ntdll.dll NtWaitForAlertByThreadId 1 ntdll.dll RtlSleepConditionVariableCS 2 kernelbase.dll SleepConditionVariableCS 3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58 4 xul.dll mozilla::CondVar::Wait xpcom/threads/CondVar.h:68 5 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:155 6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:967 7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:517 8 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:796 9 xul.dll nsThreadManager::Shutdown xpcom/threads/nsThreadManager.cpp:332 ============================================================= #3 Top Crasher in Release. #7 Top Crasher in Beta. Top Crashers for Firefox 60.0 Top 50 Crashing Signatures. 7 days ago 3 2.37% 0.44% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 992 992 0 0 1071 0 2017-09-27 Top Crashers for Firefox 61.0b Top 50 Crashing Signatures. 7 days ago 7 1.63% -0.58% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 477 477 0 0 454 0 2017-09-27
Crash Signature: [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown] → [@ shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown] [@ shutdownhang | static bool mozilla::SpinEventLoopUntil<T> | nsThread::Shutdown | nsThreadManager::Shutdown]
Still a top crash in 62 beta 16, definitely in the top 5. It's also showing in very high volume on 61 release. Marissa is there anyone who can take a look at this issue? We could still take a patch in 62.
Assignee: nobody → erahm
All of the reports I've looked at have minidump stuff going. If you expand to view all threads we can see the "Minidump Writer" thread spinning a pretty big stack and repeatedly trying to write minidumps and waiting for them to finish: > 3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) mozglue/misc/ConditionVariable_windows.cpp:58 > 4 xul.dll mozilla::ThreadEventQueue<mozilla::EventQueue>::GetEvent(bool, mozilla::EventPriority*) xpcom/threads/ThreadEventQueue.cpp:155 > 5 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:994 > 6 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:519 > 7 xul.dll static bool mozilla::SpinEventLoopUntil<1, <lambda_d0cce9a5ce1845d41bddf95b3b00d536> >(class mozilla::ThreadEventTarget::Dispatch::__l13::<lambda_d0cce9a5ce1845d41bddf95b3b00d536>*, class nsIThread*) xpcom/threads/nsThreadUtils.h:324 > 8 xul.dll mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/ThreadEventTarget.cpp:165 > 9 xul.dll nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThread.cpp:654 > 10 xul.dll static void CrashReporter::NotifyDumpResult(bool, bool, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*) toolkit/crashreporter/nsExceptionHandler.cpp:3796 > 11 xul.dll static void CrashReporter::CreatePairedChildMinidumpAsync(void*, unsigned long, class nsTString<char>, class nsCOMPtr<nsIFile>, class nsIFile**, class std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*, bool) toolkit/crashreporter/nsExceptionHandler.cpp:3859 > 12 xul.dll static void <lambda_647dabe03738f3130114b2f535bbb481>::operator()() toolkit/crashreporter/nsExceptionHandler.cpp:3924 > 13 xul.dll mozilla::detail::RunnableFunction<std::function<void > >::Run() xpcom/threads/nsThreadUtils.h:552 > 14 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1090 > 15 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:519 > 16 xul.dll static bool mozilla::SpinEventLoopUntil<1, <lambda_d0cce9a5ce1845d41bddf95b3b00d536> >(class mozilla::ThreadEventTarget::Dispatch::__l13::<lambda_d0cce9a5ce1845d41bddf95b3b00d536>*, class nsIThread*) xpcom/threads/nsThreadUtils.h:324 > 17 xul.dll mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/ThreadEventTarget.cpp:165 > 18 xul.dll nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThread.cpp:654 > 19 xul.dll static void CrashReporter::NotifyDumpResult(bool, bool, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*) toolkit/crashreporter/nsExceptionHandler.cpp:3796 > 20 xul.dll static void CrashReporter::CreatePairedChildMinidumpAsync(void*, unsigned long, class nsTString<char>, class nsCOMPtr<nsIFile>, class nsIFile**, class std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, class std::function<void __cdecl(bool)>*, class RefPtr<nsIThread>*, bool) toolkit/crashreporter/nsExceptionHandler.cpp:3859 > 21 xul.dll static void <lambda_647dabe03738f3130114b2f535bbb481>::operator()() toolkit/crashreporter/nsExceptionHandler.cpp:3924 ~ This seems related to bug 1469603, where I noted that we're intending to perform an async dispatch, but use `NS_DISPATCH_SYNC` when dispatching the runnable. My guess is that's never returning and we never shut down. :gsvelto, can you look into this? [1] https://searchfox.org/mozilla-central/rev/2466b82b729765fb0a3ab62f812c1a96a7362478/toolkit/crashreporter/nsExceptionHandler.cpp#3836,3840
Flags: needinfo?(gsvelto)
(In reply to Eric Rahm [:erahm] from comment #2) > This seems related to bug 1469603, where I noted that we're intending to > perform an async dispatch, but use `NS_DISPATCH_SYNC` when dispatching the > runnable. My guess is that's never returning and we never shut down. > :gsvelto, can you look into this? Sadly enough I misunderstood your comment in that bug because I thought that if the event queue was empty we'd invoke the callback synchronously, I didn't notice that we're explicitly requesting synchronous dispatch. The lock held in [1] is definitely going to be still held in [2] because of synchronous dispatch. No matter how winding the path that goes there is, the callback is still the last step when generating a minidump so if it stops trying to acquire the lock the writer will never finish. Since the minidump writer thread's stack in the crashes have multiple instances of CrashReporter::CreatePairedChildMinidumpAsync() this hypothesis seems consistent with that of a long plugin hang. I think I should be able to verify this with the tests we already have so I'll have a look tomorrow and report back. [1] https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/dom/plugins/ipc/PluginModuleParent.cpp#1104 [2] https://searchfox.org/mozilla-central/rev/5dbfd833bbb114afe758db4d4bdbc5b13bcc33ef/dom/plugins/ipc/PluginModuleParent.cpp#1126
Flags: needinfo?(gsvelto)
Assignee: erahm → gsvelto
Status: NEW → ASSIGNED
Triaged and assigned.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Flags: needinfo?(mmorris)
Resolution: FIXED → ---
I've modified our plugin hang test in the hope of reproducing this consistently but couldn't. So my plan here is the following: - Switch [1] to normal dispatch and leave the bug open. Hopefully this will be sufficient. - If that doesn't fix it, manually revert the changes introduced by bug 1360308 and make this code synchronous again. While this might sound a little extreme the rationale is that those changes were designed to reduce hangs, but if they lead to crashes then it's preferable to stick with the hangs. Besides we intend to rip out all of this code and make minidump generation entirely OOP. The idea was to start this work after the next all hands so I believe we can live with a synchronous path here for a few more months. [1] https://searchfox.org/mozilla-central/rev/71ef4447db179639be9eff4471f32a95423962d7/toolkit/crashreporter/nsExceptionHandler.cpp#3863
Phabricator review ping
Flags: needinfo?(ted)
Comment on attachment 9002601 [details] Bug 1463048 - Invoke the minidump generation callback asynchronsouly Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9002601 - Flags: review+
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/af8353a3ecbc Invoke the minidump generation callback asynchronsouly r=ted
Flags: needinfo?(ted)
The tentative fix has landed but let's keep this open until we're sure that the crash rate goes down. It should be visible on the nightly channel soon enough.
Keywords: leave-open
Whiteboard: keep
Flags: needinfo?(gsvelto)
Depends on: 1488827
This reverts the changes in bug 1360308, bug 1390143 and bug 1469603. Minidump generation will now only happen on the main process' main thread which might lead to hangs but is known to be fairly robust. Asynchronous generation proved too brittle and enormously increased the complexity of this already hard-to-read code.
Comment on attachment 9002601 [details] Bug 1463048 - Invoke the minidump generation callback asynchronsouly Obsoleting this patch, I've prepared a new one that reverts all asynchronous minidump generation. It's proved to be too brittle and we've already spent too much time trying to get it working.
Attachment #9002601 - Attachment is obsolete: true
Flags: needinfo?(gsvelto)
Review ping for the patch in attachment 9006899 [details]
Flags: needinfo?(ted)
Comment on attachment 9006899 [details] Bug 1463048 - Remove asynchronous minidump generation Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9006899 - Flags: review+
Flags: needinfo?(ted)
I'm going through the final round of testing, hopefully it will land today.
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/588d96e4eec8 Remove asynchronous minidump generation r=ted
There have been no new instances of this crash on nightly for builds including attachment 9006899 [details], but it might be too early to be 100% sure. In any case uplifting this patch should be relatively straightforward even though it's big because it just reverts older patches.
There's been a handful of crashes with this signature but none of them are caused by the original problem. They can be told apart because among the minidump thread you won't find the "Minidump Writer" thread that was hanging. I think it's safe to close this as it's definitely fixed on nightly.
Still no more crashes with the original issue, this is fixed.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
[Tracking Requested - why for this release]: There is still a high volume of crashes with this signature. It remains a top crasher. Requesting uplift for versions FF63 (Beta) and 60ESR. (In reply to Gabriele Svelto [:gsvelto] from comment #21) > There have been no new instances of this crash on nightly for builds > including attachment 9006899 [details], but it might be too early to be 100% > sure. In any case uplifting this patch should be relatively straightforward > even though it's big because it just reverts older patches. Top Crashers for Firefox 63.0b9 Top 50 Crashing Signatures. 7 days ago 9 1.1% new shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 120 120 0 0 115 0 2017-09-27 Top Crashers for Firefox 60.2.1esr Top 50 Crashing Signatures. 7 days ago 39 0.31% 0.09% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 59 59 0 0 58 0 2017-09-27 Top Crashers for Firefox 62.0.2 Top 50 Crashing Signatures. 7 days ago 7 1.26% -0.29% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 1631 1631 0 0 1486 0 2017-09-27 Top Crashers for Firefox 52.9.0esr Top 50 Crashing Signatures. 7 days ago 49 0.24% -0.01% shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown 167 167 0 0 101 0 2017-09-27 Signature report for shutdownhang | nsThread::Shutdown | nsThreadManager::Shutdown Showing results from 7 days ago Firefox 64.0a1 3 0.1% 3 (Fixed) Firefox 63.0b9 119 4.1% 220 (Not Fixed) Firefox 62.0.2 1604 55.5% 902 Firefox 62.0 310 10.7% 313 Firefox 60.2.1esr 60 2.1% 58 Firefox 60.2.0esr 40 1.4% 60 Firefox 60.1.0esr 11 0.4% 15 Firefox 60.0.2 16 0.6% 13 Firefox 60.0.1 15 0.5% 17 Firefox 60.0.1esr 8 0.3% 8 Firefox 60.0esr 4 0.1% 5 Firefox 60.0.2esr 2 0.1% 2 Firefox 52.9.0esr 165 5.7% 106 Firefox 52.8.0esr 9 0.3% 3 Firefox 52.8.1esr 3 0.1% 1 Firefox 52.7.3esr 1 0.0% 1 Firefox 52.7.4esr 1 0.0% 1 Firefox 52.6.0esr 4 0.1% 5 Firefox 52.5.3esr 3 0.1% 2 Firefox 52.5.2esr 1 0.0% 1 Firefox 52.4.0esr 1 0.0% 1 Firefox 52.3.0esr 2 0.1% 2 Firefox 52.1.0esr 4 0.1% 1 Firefox 52.0.1esr 1 0.0% 1 Firefox 47.0.2esr 10 0.3% 6 Firefox 45.9.0esr 13 0.4% 15 Firefox 45.8.0esr 20 0.7% 15 Firefox 45.7.0esr 10 0.3% 9 Firefox 45.6.0esr 9 0.3% 10 Firefox 45.5.0esr 2 0.1% 2 Firefox 45.5.1esr 1 0.0% 1
Status: RESOLVED → REOPENED
Keywords: top50
Resolution: FIXED → ---
Gabriele, can we uplift this patch to 63 beta ? Thanks
Flags: needinfo?(gsvelto)
Sure, the patch applies cleanly so I'll ask for approval right away.
Flags: needinfo?(gsvelto)
Comment on attachment 9006899 [details] Bug 1463048 - Remove asynchronous minidump generation [Beta/Release Uplift Approval Request] Feature/Bug causing the regression: Bug 1360308 User impact if declined: Firefox will crash on shutdown if a plugin process is hung Is this code covered by automated tests?: No Has the fix been verified in Nightly?: Yes Needs manual test from QE?: No If yes, steps to reproduce: List of other uplifts needed: None Risk to taking this patch: Low Why is the change risky/not risky? (and alternatives if risky): The patch is large but not risky because its been made by reversing three patches that were previously applied. It returns the affected code to its previous state which had been used for years and is well tested. String changes made/needed: None
Attachment #9006899 - Flags: approval-mozilla-beta?
Comment on attachment 9006899 [details] Bug 1463048 - Remove asynchronous minidump generation Crash fix for a tracked 63 bug, uplift approved for 63 beta 12, thanks!
Attachment #9006899 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1498942
This shouldn't have been reopened.
Keywords: leave-open
Target Milestone: --- → mozilla64
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
What do you think of uplifting this to ESR60? The crash volume there seems significant but not terribly high.
Flags: needinfo?(gsvelto)
I don't know if the patch applies cleanly there but I can check. The patch did not cause regressions save for bug 1498942. We've already confirmed the fix for that bug so if we uplift this one we'd also have to uplift that fix.
Flags: needinfo?(gsvelto)
That seems a bit scary. Let's let this wait till the next major ESR.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: