Closed Bug 916531 Opened 6 years ago Closed 6 years ago

Firefox freezes randomly

Categories

(Core :: JavaScript Engine, defect, critical)

x86_64
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla27
Tracking Status
firefox25 --- unaffected
firefox26 + verified
firefox27 + verified

People

(Reporter: coolypf, Assigned: bhackett)

References

Details

(Keywords: hang)

Attachments

(1 file)

Blocks: 915482
bp-be8f7607-d411-4f8c-a993-cdfe52130915
Stack when browser hang up using https://github.com/bsmedberg/crashfirefox-intentionally (
 http://benjamin.smedbergs.us/crashfirefox.exe )
Severity: major → critical
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: hang
Yes I can confirm this on Win7
I can't reproduce this on Linux.  And it's *really* surprising (and worrying) that 47e05e8df03b would be the cause, since that patch just removed some unnecessary #include statements, which shouldn't change behaviour at all.

Nonetheless, I've backed it out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/14a4dbb53c37

Please check again once this hits a new Nightly to confirm if this patch really was at fault.  Thanks!
Flags: needinfo?(coolypf)
Not sure is this will help any, but the only hang I was able to catch in a debugger had the following (I have the Status4Ever extension; however, many hang that don't have the extension):

[JavaScript Error: "this._getters.statusOverlay is null" {file: "resource://status4evar/Status.jsm" line: 306}]
eax=00000000 ebx=778ce120 ecx=00000000 edx=00000000 esi=00000000 edi=008e2ef0
eip=7780deb4 esp=004ff164 ebp=004ff17c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200202
ntdll!NtTerminateProcess+0xc:
7780deb4 c20800          ret     8
if bug#915482 cause freeze/hang problems, i think patch #1 and/or #2 (not #4)should be back-out.
because #3 and 4 are landed after 0913 NIGHTLY.
freeze/hang problem start on 0913 NIGHTLY.

http://forums.mozillazine.org/viewtopic.php?p=13069511#p13069511 (see my post)
(In reply to Nicholas Nethercote [:njn] from comment #4)
> I can't reproduce this on Linux.  And it's *really* surprising (and
> worrying) that 47e05e8df03b would be the cause, since that patch just
> removed some unnecessary #include statements, which shouldn't change
> behaviour at all.
> 
> Nonetheless, I've backed it out:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/14a4dbb53c37
> 
> Please check again once this hits a new Nightly to confirm if this patch
> really was at fault.  Thanks!

I checked the latest mozilla-inbound build in http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1379289423/ (cset 14a4dbb53c37) and the problem is still reproducible.
Maybe I should check the last good build again.
Flags: needinfo?(coolypf)
Using the nightly from  http://hg.mozilla.org/mozilla-central/rev/a4e9c9c9dbf9
I cannot reproduce the hang/freeze
No longer blocks: 915482
Summary: Firefox freezes due to a regression in Bug 915482 → Firefox freezes randomly
I got a hang in 2 times the following tinderbox build(so, no symbol information).
bp-27ef076d-1c78-4155-8195-03b9c2130916
bp-f7d9436e-534e-445c-90cb-7ee022130916
bp-560c8f15-3868-4871-a2f7-6d7d72130916
http://hg.mozilla.org/integration/mozilla-inbound/rev/3ca22e239a1d
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130911164259
err, s/2 times/3 times/
(In reply to Alice0775 White from comment #10)
> I got a hang in 2 times the following tinderbox build(so, no symbol
> information).
> http://hg.mozilla.org/integration/mozilla-inbound/rev/3ca22e239a1d

That's bug 906371, which is a much more likely culprit than bug 916531.
Blocks: 906371
It seems that some race condition causes dead lock.
Changing javascript.options.parallel_parsing to false 
fixes the hangs for me using the links in the bug
I see AutoPauseWorkersForGC on the stack so this is probably a dupe of bug 916504.
The stack from comment 1 on thread 0 is:

5 	nss3.dll 	PR_WaitCondVar 	nsprpub/pr/src/threads/combined/prucv.c
6 	mozjs.dll 	js::AutoPauseWorkersForGC::AutoPauseWorkersForGC(JSRuntime *) 	js/src/jsworkers.cpp
7 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp
8 	mozjs.dll 	Collect 	js/src/jsgc.cpp
9 	xul.dll 	nsAppShell::ProcessNextNativeEvent(bool) 	widget/windows/nsAppShell.cpp
10 	mozjs.dll 	js::GCSlice(JSRuntime *,js::JSGCInvocationKind,JS::gcreason::Reason,__int64) 	js/src/jsgc.cpp
11 	xul.dll 	nsJSContext::GarbageCollectNow(JS::gcreason::Reason,nsJSContext::IsIncremental,nsJSContext::IsCompartment,nsJSContext::IsShrinking,__int64) 	dom/base/nsJSEnvironment.cpp
12      xul.dll 	GCTimerFired(nsITimer *,void *) 	dom/base/nsJSEnvironment.cpp

The line linked to is the state.wait(WorkerThreadState::MAIN) call.

On thread 17, we have:

6 	nss3.dll 	PR_WaitCondVar 	nsprpub/pr/src/threads/combined/prucv.c
7 	mozjs.dll 	js::SourceCompressionTask::complete() 	js/src/jsworkers.cpp
8 	mozjs.dll 	js::frontend::CompileScript(js::ExclusiveContext *,js::LifoAlloc *,JS::Handle<JSObject *>,JS::Handle<JSScript *>,JS::CompileOptions const &,wchar_t const *,unsigned int,JSString *,unsigned int,js::SourceCompressionTask *) 	js/src/frontend/BytecodeCompiler.cpp
9 	mozjs.dll 	js::WorkerThread::handleParseWorkload(js::WorkerThreadState &) 	js/src/jsworkers.cpp
10 	mozjs.dll 	js::WorkerThread::threadLoop() 	js/src/jsworkers.cpp

The line linked to is the state.wait(WorkerThreadState::MAIN) call.

Both threads have an AutoLockWorkerThreadState on the stack.

And on threads 18,19 we have:

5 	nss3.dll 	PR_WaitCondVar 	nsprpub/pr/src/threads/combined/prucv.c
6 	mozjs.dll 	js::WorkerThread::threadLoop() 	js/src/jsworkers.cpp

and the linked-to line is pause()... and we're still holding a AutoLockWorkerThreadState there too.

Hoping someone understands how AutoLockWorkerThreadState is supposed to work.  ;)
Depends on: 916504
Flags: needinfo?(bhackett1024)
Attached patch patchSplinter Review
There's actually another worker thread in comment 1's stack, thread 16, with this stack:

16|3|nss3.dll|_PR_MD_WAIT_CV|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/md/windows/w95cv.c:53d5e43e23cc|248|0xe 16|4|nss3.dll|_PR_WaitCondVar|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/prucv.c:53d5e43e23cc|172|0x1c 16|5|nss3.dll|PR_WaitCondVar|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/prucv.c:53d5e43e23cc|515|0xb 16|6|mozjs.dll|js::WorkerThreadState::wait(js::WorkerThreadState::CondVar,unsigned int)|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|435|0x20 16|7|mozjs.dll|js::WorkerThread::pause()|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|1015|0x8 16|8|mozjs.dll|js::SourceCompressionTask::MOZ_Z_compress()|hg:hg.mozilla.org/mozilla-central:js/src/jsscript.cpp:53d5e43e23cc|1187|0x15 16|9|mozjs.dll|js::WorkerThread::threadLoop()|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|944|0x3d 16|10|nss3.dll|_PR_NativeRunThread|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/pruthr.c:53d5e43e23cc|397|0x9 16|11|nss3.dll|pr_root|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/md/windows/w95thred.c:53d5e43e23cc|90|0xc 16|12|msvcr100.dll|_callthreadstartex|f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c|314|0x6 16|13|msvcr100.dll|_threadstartex|f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c|292|0x5

The problem here is I think one which Bill noticed while reviewing bug 906371 and which I forgot to apply when landing the patch.  Multiple threads can now be waiting for worker threads to complete --- in this case, the main thread is waiting for worker threads to pause, and one worker thread (#17) handling a parse workload is waiting for another worker thread (#16) to finish its compression workload.  What I think happened is that #16 was the last thread to pause, and when trying to wake up the main thread it woke up thread #17 instead, which just resumed waiting.

The presence of the AutoLockWorkerThreadState's are red herrings, as the lock is released when one thread is waiting for another.

The fix here should be to just apply Bill's review comments from bug 906371 (:().  I'd like to remove the pausing functionality completely but it seems better to get this fixed first and then rip that stuff out.
Attachment #805332 - Flags: review?(wmccloskey)
Flags: needinfo?(bhackett1024)
Comment on attachment 805332 [details] [diff] [review]
patch

Review of attachment 805332 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsworkers.h
@@ +134,5 @@
>  # ifdef DEBUG
>      PRThread *lockOwner;
>  # endif
>  
> +    /* Condvar to notify threads waiting for work that they may be able to make progress. */

Can you move these comments up so they go with the CONSUMER and PRODUCER decls? I'm guessing that's where people will actually look.
Attachment #805332 - Flags: review?(wmccloskey) → review+
Comment on attachment 805332 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 906371
User impact if declined: deadlocks
Testing completed (on m-c, etc.): on m-i
Risk to taking this patch (and alternatives if risky): low

Note that the patch in rev 4bcf9b261b94 also contains fixes for bugs 916753 and bug 916504, which are similar deadlocks.  This approval request is for the combined patch.
Attachment #805332 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/4bcf9b261b94
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Attachment #805332 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee: general → bhackett1024
Depends on: 918862
Keywords: verifyme
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0

Verified as fixed on latest Aurora 26.0a2 (buildID: 20131001004005).
verified with Nightly build 20131024030204 on Win 7
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.