This bug was filed from the Socorro interface and is report bp-d4fc7a65-cd4e-4db2-95d1-8cc152161210. ============================================================= First occurrence of this was in Nightly 20161119030204. Since then it has occurred 57 times. Over the past week it's the #19 topcrash on Nightly; no sign of it on Aurora, which makes sense because the last merge date was November 14. This is the regression window for that Nightly: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=28e2a6dde76ab6ad4464a3662df1bd57af04398a&tochange=f09e137ead39230eaa94f47988ccce2cfcda4195 But this crash is infrequent enough that it doesn't happen every day, so it's possible the true cause landed a day or two before that. Honza, any ideas? The stack trace for the hanging thread contains network shutdown stuff -- though it varies a bit from crash to crash -- so it looks like it's related to networking.
this is a pr_poll shutdiown hang - which is certainly not new (but is usually not win 10) .. maybe the signature is different or something about it is different. dragana is the expert
We still do not have an easy way to analyze this shutdown hangs. The hangs here have may different causes, I had look at 20 crashes: https://crash-stats.mozilla.com/report/index/ba47ab2d-0034-4b0c-bf05-d93c22161209 and https://crash-stats.mozilla.com/report/index/b14d7053-c702-4486-82e7-012212161212 and https://crash-stats.mozilla.com/report/index/df190ed9-44e4-4ffd-afcf-0fc272161212 - hang in mozilla::dom::workers::RuntimeService::Cleanup(); the socketThread shutdown have not started yet. In the first and the third crash, there is a thread that is in mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) dom/workers/WorkerPrivate.cpp:4573, but I do not know what it is waiting for. In the second crash: 8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp:6398 9 xul.dll mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp:6267 10 xul.dll `anonymous namespace'::GarbageCollectRunnable::WorkerRun dom/workers/WorkerPrivate.cpp:1394 11 xul.dll mozilla::dom::workers::WorkerRunnable::Run() https://crash-stats.mozilla.com/report/index/025116c3-a3d0-4f58-8142-7ef942161210 and https://crash-stats.mozilla.com/report/index/fa032da1-a037-4107-9f44-dce8c2161210 and https://crash-stats.mozilla.com/report/index/ebc28ddb-f9bf-4965-a9a3-af2442161212 - hangs in mozilla::dom::XMLHttpRequestMainThread::SendInternal(mozilla::dom::XMLHttpRequestMainThread::RequestBodyBase const*) dom/xhr/XMLHttpRequestMainThread.cpp:3011, but I am not sure what it is waiting for, maybe the request to finish, timer to expire etc. https://crash-stats.mozilla.com/report/index/186dcd92-8c38-4f54-b4df-7837a2161210 - js runs on the main thread https://crash-stats.mozilla.com/report/index/e5083587-2a6e-48d7-931e-463962161210 and https://crash-stats.mozilla.com/report/index/d4fc7a65-cd4e-4db2-95d1-8cc152161210 and https://crash-stats.mozilla.com/report/index/47dc423a-34ae-44b9-a936-104162161212 and https://crash-stats.mozilla.com/report/index/4ed8dcf3-d5ba-4102-a193-a5e1e2161212 - our normal poll hangs https://crash-stats.mozilla.com/report/index/fae718bb-0f61-4a77-b0fc-6c24a2161210 - This is waiting on socket thread to shutdown and socket thread is doing: Module Signature Source 0 ntdll.dll NtWaitForSingleObject 1 KERNELBASE.dll WaitForSingleObjectEx 2 nss3.dll PR_MD_WAIT_CV nsprpub/pr/src/md/windows/w95cv.c:248 3 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c:172 4 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c:525 5 nss3.dll PR_Wait nsprpub/pr/src/threads/prmon.c:298 6 xul.dll mozilla::ReentrantMonitor::Wait(unsigned int) obj-firefox/dist/include/mozilla/ReentrantMonitor.h:91 7 xul.dll nsThread::Init() xpcom/threads/nsThread.cpp:660 8 xul.dll nsThreadManager::NewThread(unsigned int, unsigned int, nsIThread**) xpcom/threads/nsThreadManager.cpp:260 9 xul.dll nsThreadPool::PutEvent(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThreadPool.cpp:107 10 xul.dll nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) xpcom/threads/nsThreadPool.cpp:275 11 xul.dll mozilla::net::nsStreamTransportService::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) netwerk/base/nsStreamTransportService.cpp:490 12 xul.dll nsIEventTarget::Dispatch(nsIRunnable*, unsigned int) obj-firefox/dist/include/nsIEventTarget.h:37 13 xul.dll nsAStreamCopier::PostContinuationEvent_Locked() xpcom/io/nsStreamUtils.cpp:476 14 xul.dll nsAStreamCopier::PostContinuationEvent() xpcom/io/nsStreamUtils.cpp:467 15 xul.dll nsAStreamCopier::OnOutputStreamReady(nsIAsyncOutputStream*) xpcom/io/nsStreamUtils.cpp:435 16 xul.dll mozilla::net::nsSocketOutputStream::OnSocketReady(nsresult) netwerk/base/nsSocketTransport2.cpp:552 17 xul.dll mozilla::net::nsSocketTransport::OnSocketEvent(unsigned int, nsresult, nsISupports*) netwerk/base/nsSocketTransport2.cpp:1845 18 xul.dll mozilla::net::nsSocketEvent::Run() netwerk/base/nsSocketTransport2.cpp:85 19 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1213 20 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:939 https://crash-stats.mozilla.com/report/index/806797aa-be7f-4758-84f2-de4e82161211 and https://crash-stats.mozilla.com/report/index/aa60fdd0-7eb3-4522-96f8-0e8a62161211 - socketThread is already shutdown, We are waiting for the main thread to shutdown and I am not sure what the main thread is waiting for. https://crash-stats.mozilla.com/report/index/1ccd079f-dcf5-4272-8f03-1de1e2161212 - not sure where it hangs https://crash-stats.mozilla.com/report/index/769c61f2-b5d3-4878-96f9-2d5802161212 - hangs in trying to shutdown thread pool in 10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:983 11 xul.dll nsThreadPool::Shutdown() xpcom/threads/nsThreadPool.cpp:328 12 xul.dll mozilla::net::nsStreamTransportService::Observe(nsISupports*, char const*, char16_t const*) netwerk/base/nsStreamTransportService.cpp:556 ttps://crash-stats.mozilla.com/report/index/4241b3dc-fbaa-4611-9c21-248d32161212 - socket thread is waiting for: 10 xul.dll nsThread::Shutdown() xpcom/threads/nsThread.cpp:983 11 xul.dll nsThreadPool::Shutdown() xpcom/threads/nsThreadPool.cpp:328 12 xul.dll mozilla::psm::StopSSLServerCertVerificationThreads() security/manager/ssl/SSLServerCertVerification.cpp:207 https://crash-stats.mozilla.com/report/index/a94cf788-8027-49dc-868b-6dab32161212 - I am not sure, but it is not socketThread, it is already closed. https://crash-stats.mozilla.com/report/index/b5fe58f4-71bf-4a23-8420-173312161212 and https://crash-stats.mozilla.com/report/index/0cce1bcb-ce6d-493a-9039-817f32161212 - hang in SocketClose
6 crashes in comment #2 are know shutdown hangs cause by a hang in poll and socketClose. the others are different. I assume that the number of the other crashes rose because of a regression.
I have only looked at Windows 10 crashes.
Dragana, seems like you've got this. Feel free to re-triage as appropriate.
Ben, can you look at workers crash? I am not sure your are the right person. Nathan, did something change in the thread pool shutdown, we are hanging in it from a couple of different code paths? Thanks!
(In reply to Dragana Damjanovic [:dragana] from comment #6) > Nathan, did something change in the thread pool shutdown, we are hanging in > it from a couple of different code paths? We didn't change anything in thread pool shutdown recently AFAIK, but... (In reply to Nicholas Nethercote [:njn] from comment #0) > First occurrence of this was in Nightly 20161119030204. Since then it has > occurred 57 times. Over the past week it's the #19 topcrash on Nightly; no > sign of it on Aurora, which makes sense because the last merge date was > November 14. ...we did change some code around the internal event queues used by threads in bug 1306591, which landed on 2016-11-08. Crashes showing up a week and a half later seem kind of unlikely to be correlated with that, but it's possible, I suppose? We also changed some internal nsThread bits around for requestIdle callback (bug 1198381, landed 2016-10-30) and various followup bugs, though I don't think any of the followup bugs were that significant. I'm not sure if that makes things any clearer.
(In reply to Dragana Damjanovic [:dragana] from comment #2) > https://crash-stats.mozilla.com/report/index/ba47ab2d-0034-4b0c-bf05- > d93c22161209 and > https://crash-stats.mozilla.com/report/index/b14d7053-c702-4486-82e7- > 012212161212 and > https://crash-stats.mozilla.com/report/index/df190ed9-44e4-4ffd-afcf- > 0fc272161212 - hang in mozilla::dom::workers::RuntimeService::Cleanup(); the > socketThread shutdown have not started yet. In the first and the third > crash, there is a thread that is in > mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) > dom/workers/WorkerPrivate.cpp:4573, but I do not know what it is waiting > for. In the second crash: > 8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) > js/src/jsgc.cpp:6398 > 9 xul.dll > mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, > bool, bool) dom/workers/WorkerPrivate.cpp:6267 > 10 xul.dll `anonymous namespace'::GarbageCollectRunnable::WorkerRun > dom/workers/WorkerPrivate.cpp:1394 > 11 xul.dll mozilla::dom::workers::WorkerRunnable::Run() > Andrea do you have any thoughts on the worker specifics in comment 2? It seems we can get into a case where the worker thread is stuck waiting for control runnables, but since the browser is shutting down we don't end up sending them the control runnable.
https://crash-stats.mozilla.com/signature/?build_id=20170102004004&product=Firefox&release_channel=aurora&_sort=-date&platform=Windows&signature=shutdownhang%20%7C%20mozilla%3A%3ACondVar%3A%3AWait%20%7C%20nsEventQueue%3A%3AGetEvent%20%7C%20nsThread%3A%3AnsChainedEventQueue%3A%3AGetEvent&date=%3E%3D2017-01-02T00%3A00%3A00.000Z&date=%3C2017-01-04T13%3A23%3A00.000Z#aggregations on aurora looks similar to this one.
Aurora has what appears to be a very similar version of this? https://crash-stats.mozilla.com/signature/?product=Firefox&signature=shutdownhang%20%7C%20mozilla%3A%3ACondVar%3A%3AWait%20%7C%20nsEventQueue%3A%3AGetEvent%20%7C%20nsThread%3A%3AnsChainedEventQueue%3A%3AGetEvent
It seems that the worker is waiting for something. The runnable is blocked, and the worker cannot be released. 3 xul.dll js::ConditionVariable::wait(js::LockGuard<js::Mutex>&) js/src/threading/windows/ConditionVariable.cpp:364 4 xul.dll js::GCHelperState::waitBackgroundSweepEnd() js/src/jsgc.cpp:3356 5 xul.dll js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget&, JS::gcreason::Reason, js::AutoLockForExclusiveAccess&) js/src/jsgc.cpp:5927 6 xul.dll js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) js/src/jsgc.cpp:6183 7 xul.dll js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) js/src/jsgc.cpp:6337 8 xul.dll js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp:6398 9 xul.dll mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) dom/workers/WorkerPrivate.cpp:6267
Too late for firefox 52, mass-wontfix.
The crash no longer happens after build id 20170228030203. Shall we close it by same reason as bug 1329966?
probably yes - those shifting shutdownhang signatures are really getting confusing. the crash in comment #0 seems to be the same one as discussed in bug 1158189.