Closed Bug 916531 Opened 6 years ago Closed 6 years ago
Firefox freezes randomly
STR: 1. Start Firefox with clean profile 2. Open the following links in new tabs http://www.tvguide.com/special/fall-preview/ http://www.tvguide.com/special/Fall-Preview/photogallery/Returning-Shows-Where-We-Left-Off-1068390 http://www.tvguide.com/special/Fall-Preview/photogallery/Must-Watch-New-Shows-1068492 http://www.tvguide.com/special/Fall-Preview/photogallery/Familiar-Faces-in-New-Places-1068442 3. (Optional) Press Ctrl + Tab and hold 4. The browser freezes with no CPU usage First bad: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1379053262/firefox-26.0a1.en-US.win32.installer.exe Last good: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1379052662/firefox-26.0a1.en-US.win32.installer.exe Regression range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b322938b37ef&tochange=47e05e8df03b
bp-be8f7607-d411-4f8c-a993-cdfe52130915 Stack when browser hang up using https://github.com/bsmedberg/crashfirefox-intentionally ( http://benjamin.smedbergs.us/crashfirefox.exe )
Yes I can confirm this on Win7
bp-af2cf564-def7-439f-8dd4-d52b52130915 another stack
I can't reproduce this on Linux. And it's *really* surprising (and worrying) that 47e05e8df03b would be the cause, since that patch just removed some unnecessary #include statements, which shouldn't change behaviour at all. Nonetheless, I've backed it out: https://hg.mozilla.org/integration/mozilla-inbound/rev/14a4dbb53c37 Please check again once this hits a new Nightly to confirm if this patch really was at fault. Thanks!
if bug#915482 cause freeze/hang problems, i think patch #1 and/or #2 (not #4)should be back-out. because #3 and 4 are landed after 0913 NIGHTLY. freeze/hang problem start on 0913 NIGHTLY. http://forums.mozillazine.org/viewtopic.php?p=13069511#p13069511 (see my post)
(In reply to Nicholas Nethercote [:njn] from comment #4) > I can't reproduce this on Linux. And it's *really* surprising (and > worrying) that 47e05e8df03b would be the cause, since that patch just > removed some unnecessary #include statements, which shouldn't change > behaviour at all. > > Nonetheless, I've backed it out: > https://hg.mozilla.org/integration/mozilla-inbound/rev/14a4dbb53c37 > > Please check again once this hits a new Nightly to confirm if this patch > really was at fault. Thanks! I checked the latest mozilla-inbound build in http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1379289423/ (cset 14a4dbb53c37) and the problem is still reproducible. Maybe I should check the last good build again.
Using the nightly from http://hg.mozilla.org/mozilla-central/rev/a4e9c9c9dbf9 I cannot reproduce the hang/freeze
The bug seems harder to trigger in earlier versions. I managed to trigger it twice in: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378977038/ and once in: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378956008/ http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378955107/ http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378954690/ So the regression range in Description is incorrect. We need better STR.
No longer blocks: 915482
Summary: Firefox freezes due to a regression in Bug 915482 → Firefox freezes randomly
I got a hang in 2 times the following tinderbox build(so, no symbol information). bp-27ef076d-1c78-4155-8195-03b9c2130916 bp-f7d9436e-534e-445c-90cb-7ee022130916 bp-560c8f15-3868-4871-a2f7-6d7d72130916 http://hg.mozilla.org/integration/mozilla-inbound/rev/3ca22e239a1d Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130911164259
err, s/2 times/3 times/
(In reply to Alice0775 White from comment #10) > I got a hang in 2 times the following tinderbox build(so, no symbol > information). > http://hg.mozilla.org/integration/mozilla-inbound/rev/3ca22e239a1d That's bug 906371, which is a much more likely culprit than bug 916531.
It seems that some race condition causes dead lock.
I see AutoPauseWorkersForGC on the stack so this is probably a dupe of bug 916504.
The stack from comment 1 on thread 0 is: 5 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c 6 mozjs.dll js::AutoPauseWorkersForGC::AutoPauseWorkersForGC(JSRuntime *) js/src/jsworkers.cpp 7 mozjs.dll GCCycle js/src/jsgc.cpp 8 mozjs.dll Collect js/src/jsgc.cpp 9 xul.dll nsAppShell::ProcessNextNativeEvent(bool) widget/windows/nsAppShell.cpp 10 mozjs.dll js::GCSlice(JSRuntime *,js::JSGCInvocationKind,JS::gcreason::Reason,__int64) js/src/jsgc.cpp 11 xul.dll nsJSContext::GarbageCollectNow(JS::gcreason::Reason,nsJSContext::IsIncremental,nsJSContext::IsCompartment,nsJSContext::IsShrinking,__int64) dom/base/nsJSEnvironment.cpp 12 xul.dll GCTimerFired(nsITimer *,void *) dom/base/nsJSEnvironment.cpp The line linked to is the state.wait(WorkerThreadState::MAIN) call. On thread 17, we have: 6 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c 7 mozjs.dll js::SourceCompressionTask::complete() js/src/jsworkers.cpp 8 mozjs.dll js::frontend::CompileScript(js::ExclusiveContext *,js::LifoAlloc *,JS::Handle<JSObject *>,JS::Handle<JSScript *>,JS::CompileOptions const &,wchar_t const *,unsigned int,JSString *,unsigned int,js::SourceCompressionTask *) js/src/frontend/BytecodeCompiler.cpp 9 mozjs.dll js::WorkerThread::handleParseWorkload(js::WorkerThreadState &) js/src/jsworkers.cpp 10 mozjs.dll js::WorkerThread::threadLoop() js/src/jsworkers.cpp The line linked to is the state.wait(WorkerThreadState::MAIN) call. Both threads have an AutoLockWorkerThreadState on the stack. And on threads 18,19 we have: 5 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c 6 mozjs.dll js::WorkerThread::threadLoop() js/src/jsworkers.cpp and the linked-to line is pause()... and we're still holding a AutoLockWorkerThreadState there too. Hoping someone understands how AutoLockWorkerThreadState is supposed to work. ;)
Depends on: 916504
There's actually another worker thread in comment 1's stack, thread 16, with this stack: 16|3|nss3.dll|_PR_MD_WAIT_CV|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/md/windows/w95cv.c:53d5e43e23cc|248|0xe 16|4|nss3.dll|_PR_WaitCondVar|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/prucv.c:53d5e43e23cc|172|0x1c 16|5|nss3.dll|PR_WaitCondVar|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/prucv.c:53d5e43e23cc|515|0xb 16|6|mozjs.dll|js::WorkerThreadState::wait(js::WorkerThreadState::CondVar,unsigned int)|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|435|0x20 16|7|mozjs.dll|js::WorkerThread::pause()|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|1015|0x8 16|8|mozjs.dll|js::SourceCompressionTask::MOZ_Z_compress()|hg:hg.mozilla.org/mozilla-central:js/src/jsscript.cpp:53d5e43e23cc|1187|0x15 16|9|mozjs.dll|js::WorkerThread::threadLoop()|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|944|0x3d 16|10|nss3.dll|_PR_NativeRunThread|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/pruthr.c:53d5e43e23cc|397|0x9 16|11|nss3.dll|pr_root|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/md/windows/w95thred.c:53d5e43e23cc|90|0xc 16|12|msvcr100.dll|_callthreadstartex|f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c|314|0x6 16|13|msvcr100.dll|_threadstartex|f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c|292|0x5 The problem here is I think one which Bill noticed while reviewing bug 906371 and which I forgot to apply when landing the patch. Multiple threads can now be waiting for worker threads to complete --- in this case, the main thread is waiting for worker threads to pause, and one worker thread (#17) handling a parse workload is waiting for another worker thread (#16) to finish its compression workload. What I think happened is that #16 was the last thread to pause, and when trying to wake up the main thread it woke up thread #17 instead, which just resumed waiting. The presence of the AutoLockWorkerThreadState's are red herrings, as the lock is released when one thread is waiting for another. The fix here should be to just apply Bill's review comments from bug 906371 (:(). I'd like to remove the pausing functionality completely but it seems better to get this fixed first and then rip that stuff out.
Attachment #805332 - Flags: review?(wmccloskey)
Comment on attachment 805332 [details] [diff] [review] patch Review of attachment 805332 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsworkers.h @@ +134,5 @@ > # ifdef DEBUG > PRThread *lockOwner; > # endif > > + /* Condvar to notify threads waiting for work that they may be able to make progress. */ Can you move these comments up so they go with the CONSUMER and PRODUCER decls? I'm guessing that's where people will actually look.
Attachment #805332 - Flags: review?(wmccloskey) → review+
Comment on attachment 805332 [details] [diff] [review] patch [Approval Request Comment] Bug caused by (feature/regressing bug #): 906371 User impact if declined: deadlocks Testing completed (on m-c, etc.): on m-i Risk to taking this patch (and alternatives if risky): low Note that the patch in rev 4bcf9b261b94 also contains fixes for bugs 916753 and bug 916504, which are similar deadlocks. This approval request is for the combined patch.
Attachment #805332 - Flags: approval-mozilla-aurora?
Attachment #805332 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 Verified as fixed on latest Aurora 26.0a2 (buildID: 20131001004005).
verified with Nightly build 20131024030204 on Win 7
You need to log in before you can comment on or make changes to this bug.