Last Comment Bug 608021 - Crashes based in nsFrameMessageManager::ReceiveMessage [@ js::Interpret] [@ js::TraceRecorder::record_NativeCallComplete] [@ js_LookupPropertyWithFlags]
: Crashes based in nsFrameMessageManager::ReceiveMessage [@ js::Interpret] [@ j...
Status: RESOLVED FIXED
fennec-related-jscript-crashers
: crash, topcrash
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 2.0 Branch
: All All
: -- normal (vote)
: ---
Assigned To: Olli Pettay [:smaug]
:
Mentors:
Depends on:
Blocks: 642175
  Show dependency treegraph
 
Reported: 2010-10-28 10:42 PDT by Josh Matthews [:jdm] (away until 9/3)
Modified: 2011-06-27 08:52 PDT (History)
18 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
-
-


Attachments
WIP (5.34 KB, patch)
2011-06-12 04:36 PDT, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
WIP2 (9.24 KB, patch)
2011-06-13 04:22 PDT, Olli Pettay [:smaug]
jst: review+
asa: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Josh Matthews [:jdm] (away until 9/3) 2010-10-28 10:42:32 PDT
I've been seeing mysterious JS engine crashes that are all a few frames removed from nsFrameMessageManager::ReceiveMessage, which makes me suspicious.

http://crash-stats.mozilla.com/report/index/dd2e2b60-a87d-401d-bc6c-7c76d2101028

0 	libxul.so 	js::Interpret 	js/src/jsinterp.cpp:2760
1 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:637
2 	libxul.so 	js::ExternalInvoke 	js/src/jsinterp.cpp:855
3 	libxul.so 	JS_CallFunctionValue 	js/src/jsinterp.h:955
4 	libxul.so 	nsFrameMessageManager::ReceiveMessage 	content/base/src/nsFrameMessageManager.cpp:468
5 	libxul.so 	nsAsyncMessageToChild::Run 	content/base/src/nsFrameLoader.cpp:1812
6 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:547
7 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:250
8 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:111
9 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:220
10 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:512
11 	libxul.so 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:187
12 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:192
13 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3684
14 	libxul.so 	GeckoStart 	toolkit/xre/nsAndroidStartup.cpp:131
15 	libc.so 	libc.so@0x11453 	
16 	libc.so 	libc.so@0x10f3f 	

http://crash-stats.mozilla.com/report/index/89f2c194-93ac-49f8-a565-f4e392101027

0 	libxul.so 	js::TraceRecorder::record_NativeCallComplete 	js/src/jstracer.cpp:8093
1 	libxul.so 	js::Interpret 	js/src/jstracer.h:981
2 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:638
3 	libxul.so 	js::ExternalInvoke 	js/src/jsinterp.cpp:856
4 	libxul.so 	JS_CallFunctionValue 	js/src/jsinterp.h:956
5 	libxul.so 	nsFrameMessageManager::ReceiveMessage 	content/base/src/nsFrameMessageManager.cpp:468
6 	libxul.so 	nsAsyncMessageToChild::Run 	content/base/src/nsFrameLoader.cpp:1812
7 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:547
8 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:250
9 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:111
10 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:220
11 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:512
12 	libxul.so 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:187
13 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:192
14 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3684
15 	libxul.so 	GeckoStart 	toolkit/xre/nsAndroidStartup.cpp:131
16 	libc.so 	libc.so@0xfdd7 	
17 	libc.so 	libc.so@0xf8a3

http://crash-stats.mozilla.com/report/index/bd647b32-12aa-4a75-9083-f5b5b2101028

0 	libxul.so 	js::NewBuiltinClassInstance 	js/src/jsgc.h:629
1 	libxul.so 	js_ConsumeJSONText 	js/src/json.cpp:848
2 	libxul.so 	JS_ConsumeJSONText 	js/src/jsapi.cpp:5353
3 	libxul.so 	nsFrameMessageManager::ReceiveMessage 	content/base/src/nsFrameMessageManager.cpp:392
4 	libxul.so 	mozilla::dom::TabParent::ReceiveMessage 	jsapi.h:757
5 	libxul.so 	mozilla::dom::TabParent::RecvAsyncMessage 	dom/ipc/TabParent.cpp:286
6 	libxul.so 	mozilla::dom::PBrowserParent::OnMessageReceived 	PBrowserParent.cpp:573
7 	libxul.so 	mozilla::dom::PContentParent::OnMessageReceived 	PContentParent.cpp:463
8 	libxul.so 	mozilla::ipc::AsyncChannel::OnDispatchMessage 	ipc/glue/AsyncChannel.cpp:262
9 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:440
10 	libxul.so 	RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run 	ipc/chromium/src/base/task.h:308
11 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:474
12 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:344
13 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:354
14 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:451
15 	libxul.so 	mozilla::ipc::DoWorkRunnable::Run 	ipc/glue/MessagePump.cpp:71
16 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:547
17 	libxul.so 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:191
18 	libxul.so 	XPCWrappedNative::CallMethod 	js/src/xpconnect/src/xpcwrappednative.cpp:3054
19 	libxul.so 	XPC_WN_CallMethod 	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1631
20 	libxul.so 	js::Interpret 	js/src/jsinterp.cpp:4704
21 	libxul.so 	js::Execute 	js/src/jsinterpinlines.h:493
22 	libxul.so 	JS_ExecuteScript 	js/src/jsapi.cpp:4816
23 	libxul.so 	mozJSComponentLoader::GlobalForLocation 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:1240
24 	libxul.so 	mozJSComponentLoader::ImportInto 	nsTHashtable.h:199
25 	libxul.so 	mozJSComponentLoader::Import 	js/src/xpconnect/loader/mozJSComponentLoader.cpp:1375
26 	libxul.so 	nsXPCComponents_Utils::Import 	js/src/xpconnect/src/xpccomponents.cpp:3779
27 	libxul.so 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:191
28 	libxul.so 	XPCWrappedNative::CallMethod 	js/src/xpconnect/src/xpcwrappednative.cpp:3054
29 	libxul.so 	XPC_WN_CallMethod 	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1631
30 	libxul.so 	js::Interpret 	js/src/jsinterp.cpp:4704
31 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:638
32 	libxul.so 	js::ExternalGetOrSet 	js/src/jsinterp.cpp:856
33 	libxul.so 	js_GetPropertyHelper 	js/src/jsobj.cpp:4856
34 	libxul.so 	js::Interpret 	js/src/jsinterp.cpp:4114
35 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:638
36 	libxul.so 	js::ExternalInvoke 	js/src/jsinterp.cpp:856
37 	libxul.so 	JS_CallFunctionValue 	js/src/jsinterp.h:956
38 	libxul.so 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1696
39 	libxul.so 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:572
40 	libxul.so 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:134
41 	libxul.so 	libxul.so@0x8fba3c 	
42 	libxul.so 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:429

Further crashes:

http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-28%2008%3A00%3A00&signature=js%3A%3AInterpret&version=Fennec%3A4.0b2pre
http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-28%2008%3A00%3A00&signature=js%3A%3ATraceRecorder%3A%3Arecord_NativeCallComplete&version=Fennec%3A4.0b2pre
http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-28%2007%3A00%3A00&signature=js%3A%3ANewBuiltinClassInstance&version=Fennec%3A4.0b2pre
Comment 1 Olli Pettay [:smaug] 2010-10-28 11:01:49 PDT
Nothing has changed in message manager for some time.
Is it possible that something in JS engine has regressed?
Comment 2 Brendan Eich [:brendan] 2010-10-28 11:43:47 PDT
Try cc'ing more JS people.

Has anyone reviewed the JS API usage in the message manager code? Such code in our experience needs guru-level code review...

/be
Comment 3 Olli Pettay [:smaug] 2010-10-28 11:50:53 PDT
JS exception handling in message manager needs still work, I believe.
And yes, it would be great to get someone very familiar with JS API to
look at the code.

But if the crashes are something new, perhaps there is some real
JS engine bug, or some change how to use API.
Comment 4 Josh Matthews [:jdm] (away until 9/3) 2010-11-07 20:46:36 PST
The combined signatures of js::Interpret, js::TraceRecord::record_NativeCallComplete and js_LookupPropertyWithFlags make this one of the higher-volume crashes for Fennec 4.0b2, and we continue to see these stacks in 4.0b3pre.  Seeing as this is also one of the few actionable stacks we possess in the top crashes, it would be wonderful for a knowledgeable JSAPI person to peruse the message manager's usage and make sure it's kosher.
Comment 5 Josh Matthews [:jdm] (away until 9/3) 2011-01-02 21:35:54 PST
js::Interpret crashes are the #7 topcrash for 4.0b3 at the moment.  It's interesting to note that every stack I look at has this signature:

0 	libxul.so 	js::Interpret 	js/src/jsinterp.cpp:2824
1 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:657
2 	libxul.so 	js::ExternalInvoke 	js/src/jsinterp.cpp:858
3 	libxul.so 	JS_CallFunctionValue 	js/src/jsinterp.h:962
4 	libxul.so 	nsFrameMessageManager::ReceiveMessage 	content/base/src/nsFrameMessageManager.cpp:459
5 	libxul.so 	nsAsyncMessageToChild::Run 	content/base/src/nsFrameLoader.cpp:1831

So it appears to be limited to situations in which we're using the message manager in non-remote situations, with uptimes varying from 0 to 1329.
Comment 6 Scoobidiver (away) 2011-03-31 09:37:19 PDT
It is #13 top crasher in Fennec 4.0.

In Fennec, stack traces now look like:
0 	libxul.so 	js::Interpret 	js/src/jscompartment.h:553
1 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:653
2 	libxul.so 	js::ExternalInvoke 	js/src/jsinterp.cpp:863
3 	libxul.so 	JS_CallFunctionValue 	js/src/jsapi.cpp:5173
4 	libxul.so 	nsFrameMessageManager::ReceiveMessage 	content/base/src/nsFrameMessageManager.cpp:463
5 	libxul.so 	mozilla::dom::TabChild::RecvAsyncMessage 	dom/ipc/TabChild.cpp:761
6 	libxul.so 	mozilla::dom::PBrowserChild::OnMessageReceived 	PBrowserChild.cpp:884
7 	libxul.so 	mozilla::dom::PContentChild::OnMessageReceived 	PContentChild.cpp:1133
8 	libxul.so 	mozilla::ipc::AsyncChannel::OnDispatchMessage 	ipc/glue/AsyncChannel.cpp:262
9 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:435
10 	libxul.so 	RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run 	ipc/chromium/src/base/task.h:308
11 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:475
Comment 7 Cameron McCormack (:heycam) 2011-06-05 21:37:03 PDT
Running my patch queue (which makes some SpecialPowers-related changes) on Linux opt tryserver builds, I am hitting what I think might be this bug seemingly consistently, right at the end of the test run at about the time I would expect things to be winding up for a normal process exit.

http://tbpl.mozilla.org/?tree=Try&pusher=cmccormack@mozilla.com&rev=e0065ad9ad00
http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307326012.1307326810.8802.gz
Comment 8 Olli Pettay [:smaug] 2011-06-09 04:32:29 PDT
Cameron, can you reproduce the crash easily?
If so, what are the steps to reproduce?
Comment 9 Josh Matthews [:jdm] (away until 9/3) 2011-06-09 08:17:15 PDT
I have been able to reproduce the crash with the following method:
1) perform each build step at http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307324450.1307328061.12699.gz&fulltext=1
2) simply running the failing test command |python mochitest/runtests.py --appname=firefox/firefox-bin --utility-path=bin --extra-profile-file=bin/plugins --certificate-path=certs --autorun --close-when-done --console-level=INFO --symbols-path=symbols --setpref=dom.ipc.plugins.enabled=false --ipcplugins| makes the crash reproduce for me.
3) running --debugger=gdb makes the crash non-reproducible, of course
4) attaching gdb to the process near the end of the mochitest run makes it reproducible

Now I just need to figure out how to load the symbols.
Comment 10 Mark Finkle (:mfinkle) (use needinfo?) 2011-06-10 11:34:12 PDT
Update: #5 top crasher in Fennec 5 Betas
Comment 11 Cameron McCormack (:heycam) 2011-06-11 04:51:29 PDT
In case it's useful, here's another stack trace, this time on a Linux x86 debug build, and this time getting a bit further and hitting an assertion:

http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307779731.1307783410.2819.gz#err0

Assertion failure: !regs_, at ../../../js/src/vm/Stack.cpp:376

 0  linux-gate.so + 0x424
 1  libxul.so!JS_Assert [jsutil.cpp : 89 + 0xb]
 2  libxul.so!js::ContextStack::~ContextStack [Stack.cpp : 376 + 0x28]
 3  libxul.so!JSContext::~JSContext [jscntxt.cpp : 1466 + 0x2d]
 4  libxul.so!js::Foreground::delete_<JSContext> [jsutil.h : 493 + 0x10]
 5  libxul.so!js_DestroyContext [jscntxt.cpp : 675 + 0xa]
 6  libxul.so!JS_DestroyContext [jsapi.cpp : 1031 + 0x12]
 7  libxul.so!XPCCallContext::~XPCCallContext [xpccallcontext.cpp : 402 + 0xd]
 8  libxul.so!XPC_WN_CallMethod [xpcwrappednativejsops.cpp : 1597 + 0xd]
 9  libxul.so!js::CallJSNative [jscntxtinlines.h : 277 + 0x18]
10  libxul.so!js::Interpret [jsinterp.cpp : 4676 + 0x2b]
11  libxul.so!js::RunScript [jsinterp.cpp : 613 + 0x19]
12  libxul.so!js::Invoke [jsinterp.cpp : 694 + 0x18]
13  libxul.so!js::ExternalInvoke [jsinterp.cpp : 816 + 0x19]
14  libxul.so!js::JSProxyHandler::call [jsproxy.cpp : 273 + 0x27]
15  libxul.so!JSWrapper::call [jswrapper.cpp : 250 + 0x69]
16  libxul.so!JSCrossCompartmentWrapper::call [jswrapper.cpp : 652 + 0x26]
17  libxul.so!js::JSProxy::call [jsproxy.cpp : 839 + 0x32]
18  libxul.so!js::proxy_Call [jsproxy.cpp : 1104 + 0x1f]
19  libxul.so!js::CallJSNative [jscntxtinlines.h : 277 + 0x18]
20  libxul.so!js::Invoke [jsinterp.cpp : 649 + 0x34]
21  libxul.so!js::Interpret [jsinterp.cpp : 4687 + 0x3e]
22  libxul.so!js::RunScript [jsinterp.cpp : 613 + 0x19]
23  libxul.so!js::Invoke [jsinterp.cpp : 694 + 0x18]
24  libxul.so!js::ExternalInvoke [jsinterp.cpp : 816 + 0x19]
25  libxul.so!JS_CallFunctionValue [jsapi.cpp : 5087 + 0x3c]
26  libxul.so!nsFrameMessageManager::ReceiveMessage [nsFrameMessageManager.cpp : 452 + 0x49]
27  libxul.so!nsAsyncMessageToChild::Run [nsFrameLoader.cpp : 1907 + 0x53]
28  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 618 + 0x16]
29  libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp : 245 + 0x1f]
30  libxul.so!mozilla::ipc::MessagePump::Run [MessagePump.cpp : 110 + 0x15]
31  libxul.so!MessageLoop::RunInternal [message_loop.cc : 218 + 0x20]
32  libxul.so!MessageLoop::RunHandler [message_loop.cc : 202 + 0xa]
33  libxul.so!MessageLoop::Run [message_loop.cc : 176 + 0xa]
34  libxul.so!nsBaseAppShell::Run [nsBaseAppShell.cpp : 189 + 0xc]
35  libxul.so!nsAppStartup::Run [nsAppStartup.cpp : 222 + 0x19]
...
Comment 12 Luke Wagner [:luke] 2011-06-11 05:35:42 PDT
The assertion indicates that code is running on the stack which is part of the context being destroyed.  Since this all involves nsFrameMessageManager, my suspicion would be that nsFrameMessageManager is passing a 'cx' to JS_CallFunctionValue (frame 25) that gets used to do all the calls between frame 25 and the XPC call in frame 8.  Then, some content code (called by frame 8) calls ReleaseJSContext (there's only a few call-sites: http://mxr.mozilla.org/mozilla-central/ident?i=releaseJSContext) passing this same context used by frame 8 through 25, hence the assert.  Thus, I would look for data-flow paths whereby the 'cx' passed to frame 25 makes its way to ReleaseJSContext before control returns to frame 26.

I hope that is helpful.
Comment 13 Cameron McCormack (:heycam) 2011-06-11 21:00:34 PDT
Thanks Luke.  The patches I have applied (both for comment 7 and comment 11), modify the mochitest framework so that at the end of each test, inside TestRunner.testFinished, some extra communication is done with the chrome process:

  TestRunner.testFinished:
    * send an async "ping" message from content to chrome
    * wait for the async "pong" message to be received from chrome
    * inside the message listener for that "pong" message, do
        * send a couple of sync messages to chrome asking it to delete some
          crash dump files
        * call TestRunner.runNextTest()

At the time TestRunner.testFinished is called by each individual test, the chrome process might have got a plugin crash notification, which I get SpecialPowersObserver.js to send on with an async message to the content's SpecialPowers.js object.  The sending of the async ping/pong messages at the top of TestRunner.testFinished is done so that any crash notification message can be processed by the test harness before it moves on to the next test.

The behaviour of TestRunner.runNextTest when called after the last test has run is to immediately call nsIAppStartup.quit(eForceQuit) (when not in e10s mode).

I don't really know how shutdown works, but I am thinking that it must be within that quit() call that things are torn down while the message listener code is still on the stack.  If I wrap the contents of my "pong" listener inside a setTimeout(..., 0), so that the quit() is done without the message listener on the stack, then I don't get the assertion and crash.
Comment 14 Olli Pettay [:smaug] 2011-06-12 02:36:59 PDT
OK, I'll try to reproduce the bug and if I can, I'll take this.
Comment 15 Olli Pettay [:smaug] 2011-06-12 02:55:31 PDT
(In reply to comment #9)
> 1) perform each build step at
> http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307324450.1307328061.12699.
> gz&fulltext=1
Not sure what "each build step"


> 2) simply running the failing test command |python mochitest/runtests.py
> --appname=firefox/firefox-bin --utility-path=bin
> --extra-profile-file=bin/plugins --certificate-path=certs --autorun
> --close-when-done --console-level=INFO --symbols-path=symbols
> --setpref=dom.ipc.plugins.enabled=false --ipcplugins| makes the crash
> reproduce for me.
Where should I run this?
I tried few obvious ones, and it didn't work
Comment 16 Olli Pettay [:smaug] 2011-06-12 02:57:28 PDT
And 
python runtests.py  --autorun --close-when-done  --setpref=dom.ipc.plugins.enabled=false --ipcplugins
didn't crash.
Comment 17 Olli Pettay [:smaug] 2011-06-12 04:36:36 PDT
Created attachment 538746 [details] [diff] [review]
WIP

Something like this might help, but since I don't know how to
reproduce this, I can't verify.
Comment 18 Josh Matthews [:jdm] (away until 9/3) 2011-06-12 07:41:22 PDT
(In reply to comment #15)
> (In reply to comment #9)
> > 1) perform each build step at
> > http://tinderbox.mozilla.org/showlog.cgi?log=Try/1307324450.1307328061.12699.
> > gz&fulltext=1
> Not sure what "each build step"

From the log:

======== BuildStep started ========
download
=== Output ===
wget --progress=dot:mega -N http://stage.mozilla.org/pub/mozilla.org/firefox/try-builds/cmccormack@mozilla.com-e0065ad9ad00/try-linux-debug/firefox-7.0a1.en-US.linux-i686.tar.bz2
 in dir /home/cltbld/talos-slave/test/build (timeout 1200 secs)

I still haven't been able to reproduce in my local build; only by downloading the try build and running the relevant steps. Of course, this is a linux crash, so we can't use the debug symbols provided since they only work on windows.
Comment 19 Cameron McCormack (:heycam) 2011-06-12 15:11:32 PDT
Olli, trying with your patch gives me this assertion not long after startup of a `make mochitest-ipcplugins` with my patch queue also applied.

###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file ../../../dist/include/nsCOMPtr.h, line 544
nsCOMPtr<nsIInProcessContentFrameMessageManager>::Assert_NoQueryNeeded() (/home/cam/mozilla/obj-linux64-dbg-gcc4.5/content/base/src/../../../dist/include/nsCOMPtr.h:543)
nsCOMPtr<nsIInProcessContentFrameMessageManager>::operator=(nsIInProcessContentFrameMessageManager*) (/home/cam/mozilla/obj-linux64-dbg-gcc4.5/content/base/src/../../../dist/include/nsCOMPtr.h:665)
nsFrameLoader::EnsureMessageManager() (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:2052)
nsFrameLoader::MaybeCreateDocShell() (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:1513)
nsFrameLoader::CheckURILoad(nsIURI*) (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:526)
nsFrameLoader::LoadURI(nsIURI*) (/home/cam/mozilla/content/base/src/nsFrameLoader.cpp:414)
...

Just to clarify, from the description in comment 13, do you think my use of the message manager is correct or not (i.e., should I not be calling nsIAppStartup.quit() inside a message listener)?

BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug build, and then to do a `make mochitest-ipcplugins`:

http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/bug-642175-mochitest-cleanup
http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/bug-642175-plugin-crash-cleanup-mochitests
http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/bug-642175-plugin-crash-cleanup-mochitest-fixes
Comment 20 Olli Pettay [:smaug] 2011-06-12 15:23:07 PDT
(In reply to comment #19)
> Olli, trying with your patch gives me this assertion not long after startup
> of a `make mochitest-ipcplugins` with my patch queue also applied.
Bah. Though the assertion may not be that bad, or easily fixable?
I assume the crash is still there?



> (i.e., should I not be calling
> nsIAppStartup.quit() inside a message listener)?
That sounds pretty evil. I wonder if we can really handle that...


> BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug
> build, and then to do a `make mochitest-ipcplugins`:
In top level objdir?


> 
> http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/
> bug-642175-mochitest-cleanup
> http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/
> bug-642175-plugin-crash-cleanup-mochitests
> http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/
> bug-642175-plugin-crash-cleanup-mochitest-fixes
Comment 21 Olli Pettay [:smaug] 2011-06-12 15:28:29 PDT
(In reply to comment #20)
> > (i.e., should I not be calling
> > nsIAppStartup.quit() inside a message listener)?
> That sounds pretty evil. I wonder if we can really handle that...
I guess I need to debug when the cx is being deleted during quit().
Comment 22 Cameron McCormack (:heycam) 2011-06-12 15:31:34 PDT
(In reply to comment #20)
> > Olli, trying with your patch gives me this assertion not long after startup
> > of a `make mochitest-ipcplugins` with my patch queue also applied.
> Bah. Though the assertion may not be that bad, or easily fixable?
> I assume the crash is still there?

I am too much of an XPCOM noob to know how to fix it, unfortunately.  The assertion happens early on and then crashes, so it doesn't get to the crash we care about.

> > (i.e., should I not be calling
> > nsIAppStartup.quit() inside a message listener)?
> That sounds pretty evil. I wonder if we can really handle that...

I am happy with my workaround of dispatching a runnable to do the quit() if it's decided that calling it inside the listener is evil and shouldn't be supported.  (Though this doesn't help with the crashes for which this bug was originally filed.)

> > BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug
> > build, and then to do a `make mochitest-ipcplugins`:
> In top level objdir?

Yep.
Comment 23 Josh Matthews [:jdm] (away until 9/3) 2011-06-12 16:27:51 PDT
With regards to Luke's hint, my money is on the frameloader being destroyed by content JS, which in turn calls Disconnect on the child message manager. This posts an event which calls nsFrameScriptExecutor::DestroyCx, which in turn can call ReleaseJSContext. It looks like there's machinery in nsInProcessTabChildGlobal to prevent this in the case of LoadFrameScript (see mLoadingScript) - could we extend this to work with receiving async messages as well?
Comment 24 Olli Pettay [:smaug] 2011-06-12 16:31:11 PDT
(In reply to comment #19)
> BTW, STR should be to apply the following patches on a Linux64 gcc4.5 debug
> build, and then to do a `make mochitest-ipcplugins`:
> 
> http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/
> bug-642175-mochitest-cleanup
> http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/
> bug-642175-plugin-crash-cleanup-mochitests
> http://hg.mozilla.org/users/cmccormack_mozilla.com/mq/raw-file/07e9d5fa54e7/
> bug-642175-plugin-crash-cleanup-mochitest-fixes

With the WIP patch and those patches make mochitest-ipcplugins 
to crash in ###!!! ABORT: file /home/smaug/mozilla/hg/mozilla/ipc/chromium/src/base/pickle.cc, line 98

And in fact, I get that crash even without WIP.

So, I still don't know how to reproduce this :(
Comment 25 Olli Pettay [:smaug] 2011-06-12 16:32:38 PDT
(In reply to comment #23)
> With regards to Luke's hint, my money is on the frameloader being destroyed
> by content JS, which in turn calls Disconnect on the child message manager.
> This posts an event which calls nsFrameScriptExecutor::DestroyCx, which in
> turn can call ReleaseJSContext. It looks like there's machinery in
> nsInProcessTabChildGlobal to prevent this in the case of LoadFrameScript
> (see mLoadingScript) - could we extend this to work with receiving async
> messages as well?
Did you look at the WIP? It is delaying DestroyCx
Comment 26 Josh Matthews [:jdm] (away until 9/3) 2011-06-12 18:17:54 PDT
Cameron, I couldn't reproduce the assertion you saw with Olli's patch. Granted, I'm on a Mac, but for the output you pasted, that shouldn't make a difference. Are you sure that you did a thorough rebuild?
Comment 27 Cameron McCormack (:heycam) 2011-06-12 20:54:00 PDT
You're right, I must not have.  I did a clean build and don't get that assertion now.  Now I still get a crash on shutdown, but slightly earlier I think:

#0  0x00002aaaabddf1b2 in nsFrameMessageManager::ReceiveMessage (this=0x5a5a5a5a5a5a5a5a, aTarget=0x2aaac34313d0, 
    aMessage=..., aSync=0, aJSON=..., aObjectsArray=0x2aaacd0554c8, aJSONRetVal=0x0, aContext=0x5a5a5a5a5a5a5a5a)
    at /home/cam/mozilla/content/base/src/nsFrameMessageManager.cpp:337
#1  0x00002aaaabddfd0d in nsFrameMessageManager::ReceiveMessage (this=0x2aaac34811d0, aTarget=0x2aaac34313d0, aMessage=..., 
    aSync=0, aJSON=..., aObjectsArray=0x2aaacd0554c8, aJSONRetVal=0x0, aContext=0x0)
    at /home/cam/mozilla/content/base/src/nsFrameMessageManager.cpp:467
#2  0x00002aaaabd54261 in nsAsyncMessageToChild::Run (this=0x2aaac656b580)
    at /home/cam/mozilla/content/base/src/nsFrameLoader.cpp:1908
#3  0x00002aaaacdd327c in nsThread::ProcessNextEvent (this=0x2aaab742dc60, mayWait=0, result=0x7fffffffd38c)
    at /home/cam/mozilla/xpcom/threads/nsThread.cpp:618
...
Comment 28 Olli Pettay [:smaug] 2011-06-13 02:50:19 PDT
That is interesting stack. "this=0x5a5a5a5a5a5a5a5a"

/me thinks..
Comment 29 Olli Pettay [:smaug] 2011-06-13 04:22:52 PDT
Created attachment 538855 [details] [diff] [review]
WIP2

Keep message managers alive long enough.
Would be great if someone could test this.
Comment 30 Cameron McCormack (:heycam) 2011-06-13 04:51:47 PDT
That seems to work for me!  I just pushed this to try to make sure: http://tbpl.mozilla.org/?tree=Try&rev=683c7ee50b20
Comment 31 Olli Pettay [:smaug] 2011-06-13 07:17:05 PDT
Comment on attachment 538855 [details] [diff] [review]
WIP2

So there are two parts here.
One is the keep message manager alive when calling
ReceiveMessage, the other is to make sure
cx isn't deleted when calling ReceiveMessage on tabchilds.
(In the chrome message manager the cx from chrome XUL is reused.)
Comment 32 Josh Matthews [:jdm] (away until 9/3) 2011-06-16 06:49:33 PDT
Johnny, do you think you'll be able to get to this review soon? I'm really hopeful that it'll squash one of the topcrashers for Fennec, and it deserves some baking time before we try to get it into Aurora.
Comment 33 Olli Pettay [:smaug] 2011-06-16 13:10:59 PDT
http://hg.mozilla.org/mozilla-central/rev/97bf954c6970

Marking fixed, but please reopen or file followups if this doesn't help
with the Fennec problems.
Comment 34 Josh Matthews [:jdm] (away until 9/3) 2011-06-21 07:07:32 PDT
Comment on attachment 538855 [details] [diff] [review]
WIP2

I have not seen any new crashes with stacks that look like this patch could be a culprit, which is a long way of saying that this patch seems to be, if not helping, at least not harming. It also has the potential to fix a top crasher for us, so I'd like to get it into aurora asap.
Comment 35 Cameron McCormack (:heycam) 2011-06-21 15:50:42 PDT
I found a few crashes on tinderbox this morning which seem to be the one I was having problems with in bug 642175 (quitting the browser from inside a message listener) now that that has landed, but it looks intermittent now.  I'm going to put back in my workaround in bug 666060.  I don't know if that means anything for the Fennec-related crashes.
Comment 36 Olli Pettay [:smaug] 2011-06-21 15:53:03 PDT
Do you have links to those crashes?
Could you file a followup bug, and post the links there?
Comment 37 Cameron McCormack (:heycam) 2011-06-21 16:00:20 PDT
Oh, sorry, I think I misread the logs.  It's not a crash in the message manager at all.  It is just a real, unexpected crash in a test.  Sorry for the noise.  (http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1308677954.1308682390.5623.gz&fulltext=1#err7 was one of the logs, btw.)
Comment 38 Johnathan Nightingale [:johnath] 2011-06-22 11:51:42 PDT
(In reply to comment #34)
> Comment on attachment 538855 [details] [diff] [review] [review]
> WIP2
> 
> I have not seen any new crashes with stacks that look like this patch could
> be a culprit, which is a long way of saying that this patch seems to be, if
> not helping, at least not harming. It also has the potential to fix a top
> crasher for us, so I'd like to get it into aurora asap.

How much of this patch impacts desktop/single-process firefox?
Comment 39 Olli Pettay [:smaug] 2011-06-22 11:55:41 PDT
The patch should be pretty safe. But the code is run in single-process Fx, at
least with InstallTrigger etc. which is implemented as Messagemanager script.
Comment 40 Josh Matthews [:jdm] (away until 9/3) 2011-06-24 14:37:37 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/00b11a879d55

Note You need to log in before you can comment on or make changes to this bug.