Closed Bug 1323084 Opened 4 years ago Closed 4 years ago
Crash in shutdownhang | PR
_MD _WAIT _CV | mozilla::Cond Var::Wait | ns Event Queue::Get Event | ns Thread::ns Chained Event Queue::Get Event
This bug was filed from the Socorro interface and is report bp-d4fc7a65-cd4e-4db2-95d1-8cc152161210. ============================================================= First occurrence of this was in Nightly 20161119030204. Since then it has occurred 57 times. Over the past week it's the #19 topcrash on Nightly; no sign of it on Aurora, which makes sense because the last merge date was November 14. This is the regression window for that Nightly: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=28e2a6dde76ab6ad4464a3662df1bd57af04398a&tochange=f09e137ead39230eaa94f47988ccce2cfcda4195 But this crash is infrequent enough that it doesn't happen every day, so it's possible the true cause landed a day or two before that. Honza, any ideas? The stack trace for the hanging thread contains network shutdown stuff -- though it varies a bit from crash to crash -- so it looks like it's related to networking.
this is a pr_poll shutdiown hang - which is certainly not new (but is usually not win 10) .. maybe the signature is different or something about it is different. dragana is the expert
Flags: needinfo?(mcmanus) → needinfo?(dd.mozilla)
We still do not have an easy way to analyze this shutdown hangs. The hangs here have may different causes, I had look at 20 crashes: https://crash-stats.mozilla.com/report/index/ba47ab2d-0034-4b0c-bf05-d93c22161209 and https://crash-stats.mozilla.com/report/index/b14d7053-c702-4486-82e7-012212161212 and https://crash-stats.mozilla.com/report/index/df190ed9-44e4-4ffd-afcf-0fc272161212 - hang in mozilla::dom::workers::RuntimeService::Cleanup(); the socketThread shutdown have not started yet. In the first and the third crash, there is a thread that is in mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) dom/workers/WorkerPrivate.cpp:4573, but I do not know what it is waiting for. In the second crash: 8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp:6398 9 xul.dll mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp:6267 10 xul.dll `anonymous namespace'::GarbageCollectRunnable::WorkerRun dom/workers/WorkerPrivate.cpp:1394 11 xul.dll mozilla::dom::workers::WorkerRunnable::Run() https://crash-stats.mozilla.com/report/index/025116c3-a3d0-4f58-8142-7ef942161210 and https://crash-stats.mozilla.com/report/index/fa032da1-a037-4107-9f44-dce8c2161210 and https://crash-stats.mozilla.com/report/index/ebc28ddb-f9bf-4965-a9a3-af2442161212 - hangs in mozilla::dom::XMLHttpRequestMainThread::SendInternal(mozilla::dom::XMLHttpRequestMainThread::RequestBodyBase const*) dom/xhr/XMLHttpRequestMainThread.cpp:3011, but I am not sure what it is waiting for, maybe the request to finish, timer to expire etc. https://crash-stats.mozilla.com/report/index/186dcd92-8c38-4f54-b4df-7837a2161210 - js runs on the main thread https://crash-stats.mozilla.com/report/index/e5083587-2a6e-48d7-931e-463962161210 and https://crash-stats.mozilla.com/report/index/d4fc7a65-cd4e-4db2-95d1-8cc152161210 and https://crash-stats.mozilla.com/report/index/47dc423a-34ae-44b9-a936-104162161212 and https://crash-stats.mozilla.com/report/index/4ed8dcf3-d5ba-4102-a193-a5e1e2161212 - our normal poll hangs https://crash-stats.mozilla.com/report/index/fae718bb-0f61-4a77-b0fc-6c24a2161210 - This is waiting on socket thread to shutdown and socket thread is doing: Module Signature Source 0 ntdll.dll NtWaitForSingleObject 1 KERNELBASE.dll WaitForSingleObjectEx 2 nss3.dll PR_MD_WAIT_CV nsprpub/pr/src/md/windows/w95cv.c:248 3 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c:172 4 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c:525 5 nss3.dll PR_Wait nsprpub/pr/src/threads/prmon.c:298 6 xul.dll mozilla::ReentrantMonitor::Wait(unsigned int) obj-firefox/dist/include/mozilla/ReentrantMonitor.h:91 7 xul.dll nsThread::Init() xpcom/threads/nsThread.cpp:660 8 xul.dll nsThreadManager::NewThread(unsigned int, unsigned int, nsIThread**) xpcom/threads/nsThreadManager.cpp:260 9 xul.dll nsThreadPool::PutEvent(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThreadPool.cpp:107 10 xul.dll nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThreadPool.cpp:275 11 xul.dll mozilla::net::nsStreamTransportService::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) netwerk/base/nsStreamTransportService.cpp:490 12 xul.dll nsIEventTarget::Dispatch(nsIRunnable*, unsigned int) obj-firefox/dist/include/nsIEventTarget.h:37 13 xul.dll nsAStreamCopier::PostContinuationEvent_Locked() xpcom/io/nsStreamUtils.cpp:476 14 xul.dll nsAStreamCopier::PostContinuationEvent() xpcom/io/nsStreamUtils.cpp:467 15 xul.dll nsAStreamCopier::OnOutputStreamReady(nsIAsyncOutputStream*) xpcom/io/nsStreamUtils.cpp:435 16 xul.dll mozilla::net::nsSocketOutputStream::OnSocketReady(nsresult) netwerk/base/nsSocketTransport2.cpp:552 17 xul.dll mozilla::net::nsSocketTransport::OnSocketEvent(unsigned int, nsresult, nsISupports*) netwerk/base/nsSocketTransport2.cpp:1845 18 xul.dll mozilla::net::nsSocketEvent::Run() netwerk/base/nsSocketTransport2.cpp:85 19 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1213 20 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:939 https://crash-stats.mozilla.com/report/index/806797aa-be7f-4758-84f2-de4e82161211 and https://crash-stats.mozilla.com/report/index/aa60fdd0-7eb3-4522-96f8-0e8a62161211 - socketThread is already shutdown, We are waiting for the main thread to shutdown and I am not sure what the main thread is waiting for. https://crash-stats.mozilla.com/report/index/1ccd079f-dcf5-4272-8f03-1de1e2161212 - not sure where it hangs https://crash-stats.mozilla.com/report/index/769c61f2-b5d3-4878-96f9-2d5802161212 - hangs in trying to shutdown thread pool in 10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:983 11 xul.dll nsThreadPool::Shutdown() xpcom/threads/nsThreadPool.cpp:328 12 xul.dll mozilla::net::nsStreamTransportService::Observe(nsISupports*, char const*, char16_t const*) netwerk/base/nsStreamTransportService.cpp:556 ttps://crash-stats.mozilla.com/report/index/4241b3dc-fbaa-4611-9c21-248d32161212 - socket thread is waiting for: 10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:983 11 xul.dll nsThreadPool::Shutdown() xpcom/threads/nsThreadPool.cpp:328 12 xul.dll mozilla::psm::StopSSLServerCertVerificationThreads() security/manager/ssl/SSLServerCertVerification.cpp:207 https://crash-stats.mozilla.com/report/index/a94cf788-8027-49dc-868b-6dab32161212 - I am not sure, but it is not socketThread, it is already closed. https://crash-stats.mozilla.com/report/index/b5fe58f4-71bf-4a23-8420-173312161212 and https://crash-stats.mozilla.com/report/index/0cce1bcb-ce6d-493a-9039-817f32161212 - hang in SocketClose
6 crashes in comment #2 are know shutdown hangs cause by a hang in poll and socketClose. the others are different. I assume that the number of the other crashes rose because of a regression.
I have only looked at Windows 10 crashes.
Dragana, seems like you've got this. Feel free to re-triage as appropriate.
Ben, can you look at workers crash? I am not sure your are the right person. Nathan, did something change in the thread pool shutdown, we are hanging in it from a couple of different code paths? Thanks!
(In reply to Dragana Damjanovic [:dragana] from comment #6) > Nathan, did something change in the thread pool shutdown, we are hanging in > it from a couple of different code paths? We didn't change anything in thread pool shutdown recently AFAIK, but... (In reply to Nicholas Nethercote [:njn] from comment #0) > First occurrence of this was in Nightly 20161119030204. Since then it has > occurred 57 times. Over the past week it's the #19 topcrash on Nightly; no > sign of it on Aurora, which makes sense because the last merge date was > November 14. ...we did change some code around the internal event queues used by threads in bug 1306591, which landed on 2016-11-08. Crashes showing up a week and a half later seem kind of unlikely to be correlated with that, but it's possible, I suppose? We also changed some internal nsThread bits around for requestIdle callback (bug 1198381, landed 2016-10-30) and various followup bugs, though I don't think any of the followup bugs were that significant. I'm not sure if that makes things any clearer.
(In reply to Dragana Damjanovic [:dragana] from comment #2) > https://crash-stats.mozilla.com/report/index/ba47ab2d-0034-4b0c-bf05- > d93c22161209 and > https://crash-stats.mozilla.com/report/index/b14d7053-c702-4486-82e7- > 012212161212 and > https://crash-stats.mozilla.com/report/index/df190ed9-44e4-4ffd-afcf- > 0fc272161212 - hang in mozilla::dom::workers::RuntimeService::Cleanup(); the > socketThread shutdown have not started yet. In the first and the third > crash, there is a thread that is in > mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) > dom/workers/WorkerPrivate.cpp:4573, but I do not know what it is waiting > for. In the second crash: > 8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) > js/src/jsgc.cpp:6398 > 9 xul.dll > mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, > bool, bool) dom/workers/WorkerPrivate.cpp:6267 > 10 xul.dll `anonymous namespace'::GarbageCollectRunnable::WorkerRun > dom/workers/WorkerPrivate.cpp:1394 > 11 xul.dll mozilla::dom::workers::WorkerRunnable::Run() > Andrea do you have any thoughts on the worker specifics in comment 2? It seems we can get into a case where the worker thread is stuck waiting for control runnables, but since the browser is shutting down we don't end up sending them the control runnable.
Flags: needinfo?(bkelly) → needinfo?(amarchesini)
See Also: → 1324856
https://crash-stats.mozilla.com/signature/?build_id=20170102004004&product=Firefox&release_channel=aurora&_sort=-date&platform=Windows&signature=shutdownhang%20%7C%20mozilla%3A%3ACondVar%3A%3AWait%20%7C%20nsEventQueue%3A%3AGetEvent%20%7C%20nsThread%3A%3AnsChainedEventQueue%3A%3AGetEvent&date=%3E%3D2017-01-02T00%3A00%3A00.000Z&date=%3C2017-01-04T13%3A23%3A00.000Z#aggregations on aurora looks similar to this one.
See Also: 1324856 →
Aurora has what appears to be a very similar version of this? https://crash-stats.mozilla.com/signature/?product=Firefox&signature=shutdownhang%20%7C%20mozilla%3A%3ACondVar%3A%3AWait%20%7C%20nsEventQueue%3A%3AGetEvent%20%7C%20nsThread%3A%3AnsChainedEventQueue%3A%3AGetEvent
It seems that the worker is waiting for something. The runnable is blocked, and the worker cannot be released. 3 xul.dll js::ConditionVariable::wait(js::LockGuard<js::Mutex>&) js/src/threading/windows/ConditionVariable.cpp:364 4 xul.dll js::GCHelperState::waitBackgroundSweepEnd() js/src/jsgc.cpp:3356 5 xul.dll js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget&, JS::gcreason::Reason, js::AutoLockForExclusiveAccess&) js/src/jsgc.cpp:5927 6 xul.dll js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) js/src/jsgc.cpp:6183 7 xul.dll js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) js/src/jsgc.cpp:6337 8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp:6398 9 xul.dll mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp:6267
The crash no longer happens after build id 20170228030203. Shall we close it by same reason as bug 1329966?
probably yes - those shifting shutdownhang signatures are really getting confusing. the crash in comment #0 seems to be the same one as discussed in bug 1158189.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.