916531 - Firefox freezes randomly

Reporter

Description

•

11 years ago

STR:
1. Start Firefox with clean profile
2. Open the following links in new tabs
    http://www.tvguide.com/special/fall-preview/
    http://www.tvguide.com/special/Fall-Preview/photogallery/Returning-Shows-Where-We-Left-Off-1068390
    http://www.tvguide.com/special/Fall-Preview/photogallery/Must-Watch-New-Shows-1068492
    http://www.tvguide.com/special/Fall-Preview/photogallery/Familiar-Faces-in-New-Places-1068442
3. (Optional) Press Ctrl + Tab and hold
4. The browser freezes with no CPU usage

First bad:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1379053262/firefox-26.0a1.en-US.win32.installer.exe
Last good:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1379052662/firefox-26.0a1.en-US.win32.installer.exe
Regression range:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b322938b37ef&tochange=47e05e8df03b

Yuan Pengfei

Reporter

Updated

•

11 years ago

Blocks: 915482

Alice0775 White

Comment 1

•

11 years ago

bp-be8f7607-d411-4f8c-a993-cdfe52130915
Stack when browser hang up using https://github.com/bsmedberg/crashfirefox-intentionally (
 http://benjamin.smedbergs.us/crashfirefox.exe )

Severity: major → critical

Status: UNCONFIRMED → NEW

tracking-firefox26: --- → ?

Ever confirmed: true

Keywords: hang

M** A****

Comment 2

•

11 years ago

Yes I can confirm this on Win7

Alice0775 White

Comment 3

•

11 years ago

bp-af2cf564-def7-439f-8dd4-d52b52130915
another stack

Nicholas Nethercote [inactive]

Comment 4

•

11 years ago

I can't reproduce this on Linux.  And it's *really* surprising (and worrying) that 47e05e8df03b would be the cause, since that patch just removed some unnecessary #include statements, which shouldn't change behaviour at all.

Nonetheless, I've backed it out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/14a4dbb53c37

Please check again once this hits a new Nightly to confirm if this patch really was at fault.  Thanks!

Flags: needinfo?(coolypf)

IU

Comment 5

•

11 years ago

Not sure is this will help any, but the only hang I was able to catch in a debugger had the following (I have the Status4Ever extension; however, many hang that don't have the extension):

[JavaScript Error: "this._getters.statusOverlay is null" {file: "resource://status4evar/Status.jsm" line: 306}]
eax=00000000 ebx=778ce120 ecx=00000000 edx=00000000 esi=00000000 edi=008e2ef0
eip=7780deb4 esp=004ff164 ebp=004ff17c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200202
ntdll!NtTerminateProcess+0xc:
7780deb4 c20800          ret     8

pal-moz

Comment 6

•

11 years ago

if bug#915482 cause freeze/hang problems, i think patch #1 and/or #2 (not #4)should be back-out.
because #3 and 4 are landed after 0913 NIGHTLY.
freeze/hang problem start on 0913 NIGHTLY.

http://forums.mozillazine.org/viewtopic.php?p=13069511#p13069511 (see my post)

Yuan Pengfei

Reporter

Comment 7

•

11 years ago

(In reply to Nicholas Nethercote [:njn] from comment #4)
> I can't reproduce this on Linux.  And it's *really* surprising (and
> worrying) that 47e05e8df03b would be the cause, since that patch just
> removed some unnecessary #include statements, which shouldn't change
> behaviour at all.
> 
> Nonetheless, I've backed it out:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/14a4dbb53c37
> 
> Please check again once this hits a new Nightly to confirm if this patch
> really was at fault.  Thanks!

I checked the latest mozilla-inbound build in http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1379289423/ (cset 14a4dbb53c37) and the problem is still reproducible.
Maybe I should check the last good build again.

Flags: needinfo?(coolypf)

M** A****

Comment 8

•

11 years ago

Using the nightly from  http://hg.mozilla.org/mozilla-central/rev/a4e9c9c9dbf9
I cannot reproduce the hang/freeze

Yuan Pengfei

Reporter

Comment 9

•

11 years ago

The bug seems harder to trigger in earlier versions.

I managed to trigger it twice in:
 http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378977038/
and once in:
 http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378956008/
 http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378955107/
 http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1378954690/

So the regression range in Description is incorrect.
We need better STR.

Yuan Pengfei

Reporter

Updated

•

11 years ago

No longer blocks: 915482

Summary: Firefox freezes due to a regression in Bug 915482 → Firefox freezes randomly

Alice0775 White

Comment 10

•

11 years ago

I got a hang in 2 times the following tinderbox build(so, no symbol information).
bp-27ef076d-1c78-4155-8195-03b9c2130916
bp-f7d9436e-534e-445c-90cb-7ee022130916
bp-560c8f15-3868-4871-a2f7-6d7d72130916
http://hg.mozilla.org/integration/mozilla-inbound/rev/3ca22e239a1d
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130911164259

Alice0775 White

Comment 11

•

11 years ago

err, s/2 times/3 times/

Nicholas Nethercote [inactive]

Comment 12

•

11 years ago

(In reply to Alice0775 White from comment #10)
> I got a hang in 2 times the following tinderbox build(so, no symbol
> information).
> http://hg.mozilla.org/integration/mozilla-inbound/rev/3ca22e239a1d

That's bug 906371, which is a much more likely culprit than bug 916531.

Blocks: 906371

Yuan Pengfei

Reporter

Comment 13

•

11 years ago

It seems that some race condition causes dead lock.

M** A****

Comment 14

•

11 years ago

Changing javascript.options.parallel_parsing to false 
fixes the hangs for me using the links in the bug

Jan de Mooij [:jandem]

Comment 15

•

11 years ago

I see AutoPauseWorkersForGC on the stack so this is probably a dupe of bug 916504.

Boris Zbarsky [:bzbarsky]

Comment 16

•

11 years ago

The stack from comment 1 on thread 0 is:

5 	nss3.dll 	PR_WaitCondVar 	nsprpub/pr/src/threads/combined/prucv.c
6 	mozjs.dll 	js::AutoPauseWorkersForGC::AutoPauseWorkersForGC(JSRuntime *) 	js/src/jsworkers.cpp
7 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp
8 	mozjs.dll 	Collect 	js/src/jsgc.cpp
9 	xul.dll 	nsAppShell::ProcessNextNativeEvent(bool) 	widget/windows/nsAppShell.cpp
10 	mozjs.dll 	js::GCSlice(JSRuntime *,js::JSGCInvocationKind,JS::gcreason::Reason,__int64) 	js/src/jsgc.cpp
11 	xul.dll 	nsJSContext::GarbageCollectNow(JS::gcreason::Reason,nsJSContext::IsIncremental,nsJSContext::IsCompartment,nsJSContext::IsShrinking,__int64) 	dom/base/nsJSEnvironment.cpp
12      xul.dll 	GCTimerFired(nsITimer *,void *) 	dom/base/nsJSEnvironment.cpp

The line linked to is the state.wait(WorkerThreadState::MAIN) call.

On thread 17, we have:

6 	nss3.dll 	PR_WaitCondVar 	nsprpub/pr/src/threads/combined/prucv.c
7 	mozjs.dll 	js::SourceCompressionTask::complete() 	js/src/jsworkers.cpp
8 	mozjs.dll 	js::frontend::CompileScript(js::ExclusiveContext *,js::LifoAlloc *,JS::Handle<JSObject *>,JS::Handle<JSScript *>,JS::CompileOptions const &,wchar_t const *,unsigned int,JSString *,unsigned int,js::SourceCompressionTask *) 	js/src/frontend/BytecodeCompiler.cpp
9 	mozjs.dll 	js::WorkerThread::handleParseWorkload(js::WorkerThreadState &) 	js/src/jsworkers.cpp
10 	mozjs.dll 	js::WorkerThread::threadLoop() 	js/src/jsworkers.cpp

The line linked to is the state.wait(WorkerThreadState::MAIN) call.

Both threads have an AutoLockWorkerThreadState on the stack.

And on threads 18,19 we have:

5 	nss3.dll 	PR_WaitCondVar 	nsprpub/pr/src/threads/combined/prucv.c
6 	mozjs.dll 	js::WorkerThread::threadLoop() 	js/src/jsworkers.cpp

and the linked-to line is pause()... and we're still holding a AutoLockWorkerThreadState there too.

Hoping someone understands how AutoLockWorkerThreadState is supposed to work.  ;)

Boris Zbarsky [:bzbarsky]

Updated

•

11 years ago

Depends on: 916504

Flags: needinfo?(bhackett1024)

Brian Hackett [Laid off!]

Assignee

Comment 17

•

11 years ago

Attached patch patch — Details — Splinter Review

There's actually another worker thread in comment 1's stack, thread 16, with this stack:

16|3|nss3.dll|_PR_MD_WAIT_CV|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/md/windows/w95cv.c:53d5e43e23cc|248|0xe 16|4|nss3.dll|_PR_WaitCondVar|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/prucv.c:53d5e43e23cc|172|0x1c 16|5|nss3.dll|PR_WaitCondVar|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/prucv.c:53d5e43e23cc|515|0xb 16|6|mozjs.dll|js::WorkerThreadState::wait(js::WorkerThreadState::CondVar,unsigned int)|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|435|0x20 16|7|mozjs.dll|js::WorkerThread::pause()|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|1015|0x8 16|8|mozjs.dll|js::SourceCompressionTask::MOZ_Z_compress()|hg:hg.mozilla.org/mozilla-central:js/src/jsscript.cpp:53d5e43e23cc|1187|0x15 16|9|mozjs.dll|js::WorkerThread::threadLoop()|hg:hg.mozilla.org/mozilla-central:js/src/jsworkers.cpp:53d5e43e23cc|944|0x3d 16|10|nss3.dll|_PR_NativeRunThread|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/threads/combined/pruthr.c:53d5e43e23cc|397|0x9 16|11|nss3.dll|pr_root|hg:hg.mozilla.org/mozilla-central:nsprpub/pr/src/md/windows/w95thred.c:53d5e43e23cc|90|0xc 16|12|msvcr100.dll|_callthreadstartex|f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c|314|0x6 16|13|msvcr100.dll|_threadstartex|f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c|292|0x5

The problem here is I think one which Bill noticed while reviewing bug 906371 and which I forgot to apply when landing the patch.  Multiple threads can now be waiting for worker threads to complete --- in this case, the main thread is waiting for worker threads to pause, and one worker thread (#17) handling a parse workload is waiting for another worker thread (#16) to finish its compression workload.  What I think happened is that #16 was the last thread to pause, and when trying to wake up the main thread it woke up thread #17 instead, which just resumed waiting.

The presence of the AutoLockWorkerThreadState's are red herrings, as the lock is released when one thread is waiting for another.

The fix here should be to just apply Bill's review comments from bug 906371 (:().  I'd like to remove the pausing functionality completely but it seems better to get this fixed first and then rip that stuff out.

Attachment #805332 - Flags: review?(wmccloskey)

Flags: needinfo?(bhackett1024)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 18

•

11 years ago

Comment on attachment 805332 [details] [diff] [review]
patch

Review of attachment 805332 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsworkers.h
@@ +134,5 @@
>  # ifdef DEBUG
>      PRThread *lockOwner;
>  # endif
>  
> +    /* Condvar to notify threads waiting for work that they may be able to make progress. */

Can you move these comments up so they go with the CONSUMER and PRODUCER decls? I'm guessing that's where people will actually look.

Attachment #805332 - Flags: review?(wmccloskey) → review+

Brian Hackett [Laid off!]

Assignee

Comment 19

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/4bcf9b261b94

Brian Hackett [Laid off!]

Assignee

Comment 20

•

11 years ago

Comment on attachment 805332 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 906371
User impact if declined: deadlocks
Testing completed (on m-c, etc.): on m-i
Risk to taking this patch (and alternatives if risky): low

Note that the patch in rev 4bcf9b261b94 also contains fixes for bugs 916753 and bug 916504, which are similar deadlocks.  This approval request is for the combined patch.

Attachment #805332 - Flags: approval-mozilla-aurora?

Ekanan Ketunuti

Comment 21

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/4bcf9b261b94

Status: NEW → RESOLVED

Closed: 11 years ago

status-firefox26: --- → affected

status-firefox27: --- → fixed

tracking-firefox27: --- → ?

Resolution: --- → FIXED

Target Milestone: --- → mozilla27

Carsten Book [:Tomcat]

Comment 22

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/fcdd366cd2f5

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

11 years ago

status-firefox25: --- → unaffected

tracking-firefox26: ? → +

tracking-firefox27: ? → +

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

11 years ago

Attachment #805332 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Tom S [:evilpie]

Updated

•

11 years ago

Assignee: general → bhackett1024

Ryan VanderMeulen [:RyanVM]

Comment 23

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/290e1e44e8b3

status-firefox26: affected → fixed

Matt Brubeck (:mbrubeck)

Updated

•

11 years ago

Depends on: 918862

Tracy Walker [:tracy]

Updated

•

11 years ago

Keywords: verifyme

Bogdan Maris, Desktop QA

Comment 24

•

11 years ago

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0

Verified as fixed on latest Aurora 26.0a2 (buildID: 20131001004005).

status-firefox26: fixed → verified

Tracy Walker [:tracy]

Comment 25

•

11 years ago

verified with Nightly build 20131024030204 on Win 7

Status: RESOLVED → VERIFIED

status-firefox27: fixed → verified

Keywords: verifyme