Last Comment Bug 679551 - Workers: Deadlock in WorkerPrivate::BlockAndCollectRuntimeStats if worker is blocked (LastPass extension)
: Workers: Deadlock in WorkerPrivate::BlockAndCollectRuntimeStats if worker is ...
Status: VERIFIED FIXED
[qa!]
: hang
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All Mac OS X
: -- normal with 2 votes (vote)
: mozilla9
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
: Andrew Overholt [:overholt]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-16 16:08 PDT by Mr. Gecko
Modified: 2013-12-27 14:37 PST (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
Anylis of nightly while frozen. (70.57 KB, text/plain)
2011-08-16 16:08 PDT, Mr. Gecko
no flags Details
About:Config (4.92 KB, text/plain)
2011-08-19 04:03 PDT, Mr. Gecko
no flags Details
List of Extensions (552 bytes, text/plain)
2011-08-19 04:15 PDT, Mr. Gecko
no flags Details
Patch, v1 (5.51 KB, patch)
2011-08-22 01:14 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
no flags Details | Diff | Splinter Review
Patch, v2 (21.08 KB, patch)
2011-08-31 10:36 PDT, Ben Turner (not reading bugmail, use the needinfo flag!)
mrbkap: review+
bugzilla: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Mr. Gecko 2011-08-16 16:08:11 PDT
Created attachment 553615 [details]
Anylis of nightly while frozen.

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:5.0.1) Gecko/20100101 Firefox/5.0.1
Build ID: 20110707182747

Steps to reproduce:

Opened nightly


Actual results:

Loads pages then becomes unresponsive.

This may happen whenever I make it not the front application and then come back, unsure.


Expected results:

Nightly loads up and shows pages, never becomes unresponsive.
Comment 1 Tim (fmdeveloper) 2011-08-16 20:45:31 PDT
Please attach the contents from about:support as a text attachment to this report
Comment 2 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-17 07:58:59 PDT
Weird, a worker is using ctypes to call back into xpcom while it's blocked... Can you list your installed extensions?
Comment 3 Mr. Gecko 2011-08-19 04:03:19 PDT
Created attachment 554357 [details]
About:Config

I cannot get this after it's frozen, so I am unaware as if it's a dynamic page and you need it while it's frozen or not. I am downloading the latest version of Nightly from https://nightly.mozilla.org/ now to see if that fixes it. If it does, I'll close this report.
Comment 4 Mr. Gecko 2011-08-19 04:15:06 PDT
Created attachment 554362 [details]
List of Extensions

Here is a list of the extensions I have. It still freezes.
Comment 5 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-19 09:32:20 PDT
(In reply to Mr. Gecko from comment #4)

I'm 90% sure it's LastPass. Try disabling it and see if you still freeze?
Comment 6 Mr. Gecko 2011-08-20 05:28:41 PDT
It is LastPass, however extensions should not be able to freeze up the browser. If the extension has a bug, the browser should realize it is taking too long to execute something and stop the extension warning the user just like how it already does this for scripts on websites.
Comment 7 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-20 12:19:26 PDT
Yeah, definitely, I was just making sure that I knew the cause.
Comment 8 Andy 2011-08-21 09:37:50 PDT
I think I need to get a debug build to confirm, but I saw a persistent issue with the browser hanging (0% CPU, unresponsive browser) on both Windows XP and Mac OS X 10.6 after Aurora updated from 7 to 8 on Friday.

The issue seemed to occur randomly when a page was loading, sometimes hanging during session restore, sometimes a little later as I was trying to browse. Disabling LastPass appears to have resolved the hangs on both systems.
Comment 9 Nicholas Nethercote [:njn] 2011-08-21 16:58:23 PDT
I can reproduce this on Linux.  Steps to reproduce:

- Create a new profile.
- Install LastPass v1.75.0 from https://lastpass.com/misc_download.php.
- Re-start Firefox as requested.
- When it re-starts and asks you to create an account, decline, it's not necessary to reproduce the bug.
- Open about:memory.  The browser freezes, no errors are reported in the error console.

Here's the debug spew between start-up and freezing:

WARNING: 1 sort operation has occurred for the SQL statement '0x7ffff6d48808'.  See https://developer.mozilla.org/En/Storage/Warnings details.: file /home/njn/moz/mi9/storage/src/mozStoragePrivateHelpers.cpp, line 144
[New Thread 0x7fffe0ff9700 (LWP 12044)]
WARNING: Subdocument container has no content: file /home/njn/moz/mi9/layout/base/nsDocumentViewer.cpp, line 2402
WARNING: Subdocument container has no content: file /home/njn/moz/mi9/layout/base/nsDocumentViewer.cpp, line 2402
[New Thread 0x7fffe04ff700 (LWP 12045)]
WARNING: Subdocument container has no content: file /home/njn/moz/mi9/layout/base/nsDocumentViewer.cpp, line 2402
WARNING: Subdocument container has no content: file /home/njn/moz/mi9/layout/base/nsDocumentViewer.cpp, line 2402
WARNING: NS_ENSURE_SUCCESS(rv, 1) failed with result 0x80520012: file /home/njn/moz/mi9/parser/htmlparser/src/nsExpatDriver.cpp, line 711
JavaScript strict warning: file:///home/njn/.mozilla/firefox/ijfatw57.lastpass/extensions/support@lastpass.com/components/lastpass.js, line 1427: assignment to undeclared variable Hex
[New Thread 0x7fffdd14a700 (LWP 12047)]
[New Thread 0x7fffdcf49700 (LWP 12048)]
WARNING: NS_ENSURE_SUCCESS(rv, 1) failed with result 0x80520012: file /home/njn/moz/mi9/parser/htmlparser/src/nsExpatDriver.cpp, line 711
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /home/njn/moz/mi9/content/base/src/nsFrameLoader.cpp, line 420
WARNING: Subdocument container has no frame: file /home/njn/moz/mi9/layout/base/nsDocumentViewer.cpp, line 2422
WARNING: Subdocument container has no frame: file /home/njn/moz/mi9/layout/base/nsDocumentViewer.cpp, line 2422
[New Thread 0x7fffd90ff700 (LWP 12049)]
JavaScript strict warning: file:///home/njn/moz/mi9/d64/dist/bin/components/nsSessionStore.js, line 370: reference to undefined property this._initialState.windows[0].sizemode
[New Thread 0x7fffd86ff700 (LWP 12050)]
[New Thread 0x7fffd7efe700 (LWP 12051)]
[New Thread 0x7fffd76fd700 (LWP 12052)]
pldhash: for the table at address 0x7fffddb64df0, the given entrySize of 168 definitely favors chaining over double hashing.
WARNING: OpenGL-accelerated layers are not supported on this system.: file /home/njn/moz/mi9/widget/src/xpwidgets/nsBaseWidget.cpp, line 852
[New Thread 0x7fffd6cff700 (LWP 12053)]
[New Thread 0x7fffd60f3700 (LWP 12054)]
pldhash: for the table at address 0x7fffde17f978, the given entrySize of 112 probably favors chaining over double hashing.
[New Thread 0x7fffd56ff700 (LWP 12055)]
[New Thread 0x7fffd4efe700 (LWP 12056)]
[New Thread 0x7fffd46fd700 (LWP 12057)]
WARNING: OpenGL-accelerated layers are not supported on this system.: file /home/njn/moz/mi9/widget/src/xpwidgets/nsBaseWidget.cpp, line 852
[New Thread 0x7fffd3aff700 (LWP 12058)]
WARNING: SQLite returned error code 1 , Storage will convert it to NS_ERROR_FAILURE: file /home/njn/moz/mi9/storage/src/mozStoragePrivateHelpers.cpp, line 113


And here's the stack when we freeze:

#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ffff7eb310d in PR_WaitCondVar (cvar=0x7fffd2d23b00, timeout=4294967295)
    at /home/njn/moz/mi9/nsprpub/pr/src/pthreads/ptsynch.c:417
#2  0x00007ffff587f1f0 in mozilla::CondVar::Wait (this=0x7fffffff8420, interval=4294967295)
    at /home/njn/moz/mi9/d64/xpcom/build/BlockingResourceBase.cpp:372
#3  0x00007ffff4dd3d1c in mozilla::dom::workers::WorkerPrivate::BlockAndCollectRuntimeStats (
    this=0x7fffdd84d800, aData=0x7fffffff84b0)
    at /home/njn/moz/mi9/dom/workers/WorkerPrivate.cpp:2482
#4  0x00007ffff4dc9ae1 in (anonymous namespace)::WorkerMemoryReporter::CollectReports (
    this=0x7fffdd79ba90, aCallback=0x7fffd2d222c0, aClosure=0x0)
    at /home/njn/moz/mi9/dom/workers/RuntimeService.cpp:333
#5  0x00007ffff58f9961 in NS_InvokeByIndex_P (that=<value optimised out>, 
    methodIndex=<value optimised out>, paramCount=<value optimised out>, 
    params=<value optimised out>)
    at /home/njn/moz/mi9/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:195
#6  0x00007ffff51ec629 in Invoke (ccx=<value optimised out>, mode=<value optimised out>)
    at /home/njn/moz/mi9/js/src/xpconnect/src/xpcwrappednative.cpp:3119
#7  Call (ccx=<value optimised out>, mode=<value optimised out>)
    at /home/njn/moz/mi9/js/src/xpconnect/src/xpcwrappednative.cpp:2373
#8  XPCWrappedNative::CallMethod (ccx=<value optimised out>, mode=<value optimised out>)
    at /home/njn/moz/mi9/js/src/xpconnect/src/xpcwrappednative.cpp:2337
#9  0x00007ffff51f361a in XPC_WN_CallMethod (cx=0x7fffdef26800, argc=2, vp=0x7fffe21fe298)
    at /home/njn/moz/mi9/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1599
#10 0x00007ffff5bfabf2 in js::CallJSNative (cx=0x7fffdef26800, 
    native=0x7ffff51f336a <XPC_WN_CallMethod(JSContext*, uintN, jsval*)>, 
    args=<value optimised out>) at /home/njn/moz/mi9/js/src/jscntxtinlines.h:281
#11 0x00007ffff5e1acde in CallCompiler::generateNativeStub (this=0x7fffffff9100)
    at /home/njn/moz/mi9/js/src/methodjit/MonoIC.cpp:815
#12 0x00007ffff5e14e8d in js::mjit::ic::NativeCall (f=<value optimised out>, ic=0x7fffd2d24430)
    at /home/njn/moz/mi9/js/src/methodjit/MonoIC.cpp:1033
#13 0x00007fffdfc43a1c in ?? ()
#14 0x00007fffdfc41066 in ?? ()
#15 0x00007fffd2d00000 in ?? ()
#16 0x00007fffffff9170 in ?? ()
#17 0x0000000000000000 in ?? ()

Looks like some kind of deadlock relating to condvars.

Interestingly, if about:memory loads on start-up (due to being open when you last closed the browser) it doesn't hang.  I guess LastPass hasn't quite finished setting things up at that point.
Comment 10 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-21 18:05:20 PDT
The only way I can think to fix this at the moment is to time out the memory stat collection and report no stats for blocked workers. We can't really fix this without an asynchronous memory reporter API.
Comment 11 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-22 01:14:40 PDT
Created attachment 554800 [details] [diff] [review]
Patch, v1

This times out the memory reporter at 2 seconds... What do you guys think? Is that too long? If someone else writes an extension that tries to have a live memory meter that updates once every second then this could still suck.

If blocking for any substantial amount of time is completely out of the question then one other idea is to just never report memory usage for chrome workers that are using ctypes. That's lame too, but it would avoid the problem cases.
Comment 12 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-22 04:24:04 PDT
I think we should take this bandaid and implement an async reporting api if it becomes a real problem.
Comment 13 Stappel 2011-08-22 06:14:04 PDT
I do not know if it is the same bug, but i have the hang problem with lastpass also, but it is on XP and not just on the about:memory, i have it on every page after sometime. just enabling lastpass+restart will hang firefox after a small amount of time, around 30 seconds.

this started when the auora channel switched from 7.0 to 8.0 last week.

just to provide more info.
Comment 14 Andrew Zitnay 2011-08-22 09:41:41 PDT
In Firefox versions 6.0 and later (where we no longer use binary XPCOM), we do indeed start a ChromeWorker that blocks in a js-ctypes binary function call the vast majority of the time.  It blocks via WaitForSingleObject() / SetEvent() in Windows, and sem_wait() / sem_post() in Mac and Linux.

This works perfectly across Windows, Mac, and Linux in Firefox 6.0 and 7.0, but does indeed appear to freeze in Firefox 8.0+.  So, I can only assume that this WorkerPrivate::BlockAndCollectRuntimeStats() functionality is new in Firefox 8.0.

Is it expected that we can't block in a ChromeWorker like this?  I ask because this was suggested to us by a Mozilla employee as an alternative solution to something we previously did in binary XPCOM that's no longer directly possible in js-ctypes.  Back in the good old binary XPCOM days, we started a thread in our binary XPCOM component that was then able to call back into JavaScript via nsIObserverService any time it needed to.  However, js-ctypes can apparently only operate on the thread they were called from (I verified this by calling a js-ctypes function that started another thread which attempted to call back into JavaScript, and crashes ensued).

If anyone has a solution to calling back into JavaScript from an arbitrary thread at an arbitrary time that doesn't involve blocking in a ChromeWorker, I'll gladly implement it.
Comment 15 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-22 09:43:08 PDT
(In reply to Andrew Zitnay from comment #14)
> Is it expected that we can't block in a ChromeWorker like this?

No, as I understand it that's one of the explicit use cases for these.  We need to fix this on our side.
Comment 16 Andrew Zitnay 2011-08-22 10:26:06 PDT
Do you expect this to be fixed before Firefox 7 is pushed to beta (which I guess would be September 27th, 5 weeks from tomorrow)?  I'd rather err on the side of not locking up the browsers of beta users if possible.
Comment 17 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-22 10:27:19 PDT
(In reply to Andrew Zitnay from comment #16)
> Do you expect this to be fixed before Firefox 7 is pushed to beta (which I
> guess would be September 27th, 5 weeks from tomorrow)?  I'd rather err on
> the side of not locking up the browsers of beta users if possible.

Definitely.
Comment 18 Andrew Zitnay 2011-08-28 19:10:31 PDT
It appears to me that this has been fixed in nightly, but not yet in aurora.  Any idea when the fix will make it to aurora?
Comment 19 Nicholas Nethercote [:njn] 2011-08-28 19:13:51 PDT
(In reply to Andrew Zitnay from comment #18)
> It appears to me that this has been fixed in nightly, but not yet in aurora.
> Any idea when the fix will make it to aurora?

The next Aurora uplift is September 27 (https://wiki.mozilla.org/RapidRelease/Calendar).  

But are you sure it's been fixed?  No code changes have landed that would fix it, AFAICT.
Comment 20 Andrew Zitnay 2011-08-28 19:16:13 PDT
I haven't looked at the code or anything...  I've just observed that the latest Firefox 9 no longer seems to immediately freeze when I go to about:memory, and the latest Firefox 8 still does.
Comment 21 Andrew Zitnay 2011-08-29 08:37:41 PDT
Never mind, Firefox 9 is still freezing on Windows 7.  Not sure why it wasn't freezing on Windows XP for me.
Comment 22 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-08-31 10:36:44 PDT
Created attachment 557234 [details] [diff] [review]
Patch, v2

Instead of timing out we're just going to not report memory for ChromeWorkers that use ctypes.
Comment 23 Daniel Einspanjer [:dre] [:deinspanjer] 2011-09-01 06:51:52 PDT
(In reply to ben turner [:bent] from comment #22)
> Created attachment 557234 [details] [diff] [review]
> Patch, v2
> 
> Instead of timing out we're just going to not report memory for
> ChromeWorkers that use ctypes.

Seems like a choice that could potentially cause us grief in the future.  Do we have a new bug open to implement an async API or otherwise eventually allow the resumption of memory statistics collection for this class of ChromeWorkers?
Comment 24 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-09-01 09:34:57 PDT
Yes, bug 673323, referenced in the patch comment too.
Comment 25 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-09-08 15:21:14 PDT
Comment on attachment 557234 [details] [diff] [review]
Patch, v2

My recollection was that we weren't going to do anything here. Or at least that this patch wasn't the right approach. Please re-request review if you still think this patch is correct.
Comment 26 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-09-08 17:28:11 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/58d026601240
Comment 27 :Ehsan Akhgari 2011-09-08 19:06:31 PDT
Backed out as part of <http://hg.mozilla.org/integration/mozilla-inbound/rev/cc0753a23f8b> because of mochitest-3 crashes like this: <https://tbpl.mozilla.org/php/getParsedLog.php?id=6340142&full=1>
Comment 28 Andrew Zitnay 2011-09-14 06:13:09 PDT
This bug is still unresolved, with less than two weeks to go before it makes it into the beta channel.  That makes me quite nervous.  Will this bug still "definitely" be fixed by then?
Comment 29 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-09-14 07:57:44 PDT
Yep, it'll be fixed. We've just been a little distracted.

https://hg.mozilla.org/integration/mozilla-inbound/rev/03d57c393397
Comment 30 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-09-14 07:58:59 PDT
Comment on attachment 557234 [details] [diff] [review]
Patch, v2

I think we need this on aurora, otherwise beta users will start hanging if they have the LastPass extension installed.
Comment 32 Andrew Zitnay 2011-09-19 06:23:54 PDT
Good work on the patch, looks great on Nightly.  Sorry to be a consistent nuisance, but will this indeed make it to Firefox 8.0 before it goes into beta next week?  I only ask because the latest Aurora still freezes, and I see "Target Milestone: --- ➔ mozilla9" above.
Comment 33 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-09-19 08:11:01 PDT
(In reply to Andrew Zitnay from comment #32)
> will this indeed make it to Firefox 8.0 before it goes into
> beta next week?

I hope so! Drivers will need to make the final decision, all the flags are set to get it on their radar.
Comment 34 Johnathan Nightingale [:johnath] 2011-09-22 14:34:00 PDT
Comment on attachment 557234 [details] [diff] [review]
Patch, v2

Discussed in triage. This makes us really worried because it's non-trivial and we're late late late in the Aurora cycle. Approved for aurora landing so that we don't export all these hangs to beta, but it would be really helpful to offer QA as much as possible to get out in front of testing this.
Comment 35 Johnny Stenback (:jst, jst@mozilla.com) 2011-09-23 09:32:05 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/fd8ec1055118
Comment 36 Ioana (away) 2011-10-04 00:29:55 PDT
Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0
Mozilla/5.0 (Windows NT 5.1; rv:9.0a2) Gecko/20111002 Firefox/9.0a2
Mozilla/5.0 (Windows NT 5.1; rv:10.0a1) Gecko/20111003 Firefox/10.0a1

Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0
Mozilla/5.0 (Windows NT 6.1; rv:9.0a2) Gecko/20111003 Firefox/9.0a2
Mozilla/5.0 (Windows NT 6.1; rv:10.0a1) Gecko/20111003 Firefox/10.0a1
Verified fixed on:
Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0
Mozilla/5.0 (X11; Linux i686; rv:9.0a2) Gecko/20111003 Firefox/9.0a2
Mozilla/5.0 (X11; Linux i686; rv:10.0a1) Gecko/20111003 Firefox/10.0a1

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0) Gecko/20100101 Firefox/8.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0a2) Gecko/20110929 Firefox/9.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0a1) Gecko/20110927 Firefox/10.0a1

STR:
 1. Install LastPass (v1.75.0).
 2. Restart Firefox as requested.
 3. Open about:memory.
 4. Open several websites and use them (gmail.com,yahoo.com etc).

Everything worked fine. No hanging or freezing reproduced.

Note You need to log in before you can comment on or make changes to this bug.