Closed Bug 561457 Opened 14 years ago Closed 14 years ago

[e10s] Fennec: Content process crash on shutdown [@nsDocShellTreeOwner::RemoveChromeListeners]

Categories

(Core Graveyard :: Embedding: APIs, defect)

x86
Linux
defect
Not set
critical

Tracking

(blocking2.0 final+)

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: cjones, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

(Not sure if we still in IPC or the more specific DOM component, please advise.)

(gdb) bt
#0  0x00007ff5a84b1f51 in nanosleep () from /lib/libc.so.6
#1  0x00007ff5a84b1da0 in __sleep (seconds=<value optimized out>) at ../sysdeps/unix/sysv/linux/sleep.c:138
#2  0x00007ff5a9dba693 in ah_crap_handler (signum=11) at /home/cjones/mozilla/electrolysis/toolkit/xre/nsSigHandlers.cpp:164
#3  0x00007ff5a9dba6de in child_ah_crap_handler (signum=11) at /home/cjones/mozilla/electrolysis/toolkit/xre/nsSigHandlers.cpp:177
#4  <signal handler called>
#5  0x00007ff5aad0dd65 in nsDocShellTreeOwner::RemoveChromeListeners (this=0x7ff59e5b9ab0) at /home/cjones/mozilla/electrolysis/embedding/browser/webBrowser/nsDocShellTreeOwner.cpp:917
#6  0x00007ff5aad0d1cc in nsDocShellTreeOwner::WebBrowser (this=0x7ff59e5b9ab0, aWebBrowser=0x0) at /home/cjones/mozilla/electrolysis/embedding/browser/webBrowser/nsDocShellTreeOwner.cpp:768
#7  0x00007ff5aad1434d in nsWebBrowser::InternalDestroy (this=0x7ff59e4b7500) at /home/cjones/mozilla/electrolysis/embedding/browser/webBrowser/nsWebBrowser.cpp:144
#8  0x00007ff5aad19a6e in nsWebBrowser::Destroy (this=0x7ff59e4b7500) at /home/cjones/mozilla/electrolysis/embedding/browser/webBrowser/nsWebBrowser.cpp:1248
#9  0x00007ff5ab14c1ca in mozilla::dom::TabChild::destroyWidget (this=0x7ff5a0319120) at /home/cjones/mozilla/electrolysis/dom/ipc/TabChild.cpp:361
#10 0x00007ff5ab14c66f in ~TabChild (this=0x7ff5a0319120, __in_chrg=<value optimized out>) at /home/cjones/mozilla/electrolysis/dom/ipc/TabChild.cpp:368
#11 0x00007ff5ab14bbe1 in mozilla::dom::TabChild::Release (this=0x7ff5a0319120) at /home/cjones/mozilla/electrolysis/dom/ipc/TabChild.cpp:140
#12 0x00007ff5ab1470de in mozilla::dom::ContentProcessChild::DeallocPIFrameEmbedding (this=0x7ff5a030dcb8, iframe=0x7ff5a0319120) at /home/cjones/mozilla/electrolysis/dom/ipc/ContentProcessChild.cpp:100
#13 0x00007ff5ab222634 in mozilla::dom::PContentProcessChild::RemoveManagee (this=0x7ff5a030dcb8, aProtocolId=7, aListener=0x7ff5a0319120) at PContentProcessChild.cpp:424
#14 0x00007ff5ab227081 in mozilla::dom::PIFrameEmbeddingChild::OnMessageReceived (this=0x7ff5a0319120, msg=...) at PIFrameEmbeddingChild.cpp:506
#15 0x00007ff5ab221b1b in mozilla::dom::PContentProcessChild::OnMessageReceived (this=0x7ff5a030dcb8, msg=...) at PContentProcessChild.cpp:178
#16 0x00007ff5ab1715c3 in mozilla::ipc::AsyncChannel::OnDispatchMessage (this=0x7ff5a030dcc8, msg=...) at /home/cjones/mozilla/electrolysis/ipc/glue/AsyncChannel.cpp:244
#17 0x00007ff5ab179ecc in mozilla::ipc::RPCChannel::OnMaybeDequeueOne (this=0x7ff5a030dcc8) at /home/cjones/mozilla/electrolysis/ipc/glue/RPCChannel.cpp:417
#18 0x00007ff5ab17fbc6 in DispatchToMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)()> (obj=0x7ff5a030dcc8, method=0x7ff5ab179c40 <mozilla::ipc::RPCChannel::OnMaybeDequeueOne()>, arg=...) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/tuple.h:383
#19 0x00007ff5ab17fa6e in RunnableMethod<mozilla::ipc::RPCChannel, void (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run (this=0x7ff5a0307600) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/task.h:307
#20 0x00007ff5ab17b7ad in mozilla::ipc::RPCChannel::RefCountedTask::Run (this=0x7ff5a031b440) at ../../dist/include/mozilla/ipc/RPCChannel.h:421
#21 0x00007ff5ab17b8b0 in mozilla::ipc::RPCChannel::DequeueTask::Run (this=0x7ff59a4f73a0) at ../../dist/include/mozilla/ipc/RPCChannel.h:446
#22 0x00007ff5ab377176 in MessageLoop::RunTask (this=0x7ff5a02fee20, task=0x7ff59a4f73a0) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:336
#23 0x00007ff5ab3771e6 in MessageLoop::DeferOrRunPendingTask (this=0x7ff5a02fee20, pending_task=...) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:344
#24 0x00007ff5ab3775e4 in MessageLoop::DoWork (this=0x7ff5a02fee20) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:444
#25 0x00007ff5ab1774c9 in mozilla::ipc::DoWorkRunnable::Run (this=0x7ff5a0309340) at /home/cjones/mozilla/electrolysis/ipc/glue/MessagePump.cpp:75
#26 0x00007ff5ab2fc3ad in nsThread::ProcessNextEvent (this=0x7ff5a0353080, mayWait=1, result=0x7ff5a02fecbc) at /home/cjones/mozilla/electrolysis/xpcom/threads/nsThread.cpp:527
#27 0x00007ff5ab28ae50 in NS_ProcessNextEvent_P (thread=0x7ff5a0353080, mayWait=1) at nsThreadUtils.cpp:250
#28 0x00007ff5ab177927 in mozilla::ipc::MessagePump::Run (this=0x7ff5a0307700, aDelegate=0x7ff5a02fee20) at /home/cjones/mozilla/electrolysis/ipc/glue/MessagePump.cpp:142
#29 0x00007ff5ab376c81 in MessageLoop::RunInternal (this=0x7ff5a02fee20) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:216
#30 0x00007ff5ab376c06 in MessageLoop::RunHandler (this=0x7ff5a02fee20) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:199
#31 0x00007ff5ab376b97 in MessageLoop::Run (this=0x7ff5a02fee20) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:173
#32 0x00007ff5ab39e23c in base::Thread::ThreadMain (this=0x7ff5a030dc10) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/thread.cc:165
#33 0x00007ff5ab3d24b7 in ThreadFunc (closure=0x7ff5a030dc10) at /home/cjones/mozilla/electrolysis/ipc/chromium/src/base/platform_thread_posix.cc:26
#34 0x00007ff5ac77ba04 in start_thread (arg=<value optimized out>) at pthread_create.c:300
#35 0x00007ff5a84ed80d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#36 0x0000000000000000 in ?? ()
(gdb) f 5
#5  0x00007ff5aad0dd65 in nsDocShellTreeOwner::RemoveChromeListeners (this=0x7ff59e5b9ab0) at /home/cjones/mozilla/electrolysis/embedding/browser/webBrowser/nsDocShellTreeOwner.cpp:917
Current language:  auto
The current source language is "auto; currently c++".
(gdb) p piTarget
$1 = {
  mRawPtr = 0x0
}
Severity: normal → critical
Component: IPC → Embedding: APIs
Keywords: crash
QA Contact: ipc → apis
Attached patch BandaidSplinter Review
This patch avoids the crash (and a subsequent one), but I highly doubt it's the correct way to do so.  The problem is as follows:


> NS_IMETHODIMP nsWebBrowser::InternalDestroy()
> {
>
>    if (mInternalWidget) {
>      mInternalWidget->SetClientData(0);
>      mInternalWidget->Destroy();
>      mInternalWidget = nsnull; // Force release here.
>    }
> 
>    SetDocShell(nsnull);
>
>    if(mDocShellTreeOwner)
>       {
>       mDocShellTreeOwner->WebBrowser(nsnull);
>       NS_RELEASE(mDocShellTreeOwner);
>       }

SetDocShell(nsnull) obviously clears the mDocShell member.  However, then mDocShellTreeOwner->WebBrowser(nsnull) calls DocShellTreeOwner::RemoveChromeListeners(), which then calls GetPIDOMEventTarget:


> static nsresult
> GetPIDOMEventTarget( nsWebBrowser* inBrowser, nsPIDOMEventTarget** aTarget)
> {
>   nsCOMPtr<nsIDOMWindow> domWindow;
>   inBrowser->GetContentDOMWindow(getter_AddRefs(domWindow));
>   NS_ENSURE_TRUE(domWindow, NS_ERROR_FAILURE);

The inBrowser is the same browser from earlier in the call stack, and GetContentDOMWindow tries to use the mDocShell which was previously cleared.  The method fails, so GetPIDOMEventTarget returns NULL, which we then dereference in RemoveChromeListeners().

Once this crash is worked around, we get another from the NS_RELEASE(mDocShellTreeOwner) in nsWebBrowser::InternalDestroy().  The tree owner dies, and calls RemoveChromeListeners() in its destructor.  This ends up in GetPIDOMEventTarget once more, but this time passing the cleared mWebBrowser variable, so we get another NULL dereference at |inBrowser->GetContentDOMWindow(getter_AddRefs(domWindow));|
We haven't made local changes to nsWebBrowser, right?  So this is just broken for all consumers right now?

We should just fix this mess to hold refs on the stack as needed.... and keep track of what objects are gone.
Blocks: 559544
Blocks: 516521
blocking2.0: --- → ?
blocking2.0: ? → final+
Attachment #442090 - Flags: review?(Olli.Pettay)
Any progress with this bug? any help needed? I'm using attached patch now, and it works fine.
Attachment #442090 - Flags: review?(Olli.Pettay) → review+
tracking-fennec: --- → ?
This continues to be a problem for people.  Technically we could check in the patch I wrote, but it was never intended for that purpose from my point of view.  If somebody else understands the code here, they're welcome to dive in and either do a correct patch or reassure me that my changes are actually beneficial/correct.  I don't really have the time right now to do the digging myself.
http://hg.mozilla.org/projects/electrolysis/rev/6dc80b2a7bc0

olli, could you see comment 7. You review this patch.  I assume that means you are happy with the approach (and thus I pushed this patch to e10s).  Leaving open until I hear back from you.
Fwiw, this makes Electrolyis Fennec builds very hard to use.
Summary: Fennec/e10s: Content process crash on shutdown [@nsDocShellTreeOwner::RemoveChromeListeners] → [e10s] Fennec: Content process crash on shutdown [@nsDocShellTreeOwner::RemoveChromeListeners]
mw22 - this fix is in.  you should see it in today's nightly build.
I still see this crash, using:
Mozilla/5.0 (X11; U; Linux armv7l; en-US; rv:2.0b2pre) Gecko/20100630 Namoroka/4.0b2pre Fennec/2.0a1pre

I downloaded that build from:
http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/latest-electrolysis-maemo5-gtk/fennec_2.0~a1~20100630041515_armel.deb

Am I doing something wrong?
When you say "this crash", do you actually mean this one?  I know of a couple shutdown problems which come and go - could I see a stack trace?
Sorry, I don't know how to get a stacktrace on the n900. I filed bug 576068 for the crashes that I'm seeing. I did send the crash reports to the Maemo crash reporter. Is that useful?
(In reply to comment #14)
> Sorry, I don't know how to get a stacktrace on the n900. I filed bug 576068 for
> the crashes that I'm seeing. I did send the crash reports to the Maemo crash
> reporter. Is that useful?

Yes. Can you link to the reports here?
It's not the Mozilla crash reporter that is coming up. I'm getting the Maemo crash reporter, which supposedly is sending these crashes to https://files.maemo.org/nitro/
But that site is password protected.
Should the Mozilla crash reporter have come up?

I tried again with e10s Fennec build from 2010-07-01, I'm still getting these crashes on the n900.
Ok, chinook builds don't crash, so I was able to type about:crashes and find out what stacktrace these crashes consist about.

This is a typical stacktrace of the Fennec crashes that I seem to be getting all the time:
http://crash-stats.mozilla.com/report/index/bp-7a9a8d77-4406-4771-aaa6-ec6222100705
0  	libxul.so  	mozilla::ipc::RPCChannel::CxxStackFrame::CxxStackFrame  	 RPCChannel.h:220
1 	libxul.so 	mozilla::ipc::RPCChannel::Send 	ipc/glue/RPCChannel.cpp:141
2 	libxul.so 	mozilla::dom::PContentProcessParent::SendSetOffline 	PContentProcessParent.cpp:254
3 	libxul.so 	mozilla::dom::ContentProcessParent::Observe 	dom/ipc/ContentProcessParent.cpp:325
4 	libxul.so 	nsObserverList::NotifyObservers 	xpcom/ds/nsObserverList.cpp:130
5 	libxul.so 	nsObserverService::NotifyObservers 	xpcom/ds/nsObserverService.cpp:182
6 	libxul.so 	nsIOService::SetOffline 	netwerk/base/src/nsIOService.cpp:696
7 	libxul.so 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:190
8 	libxul.so 	XPCWrappedNative::CallMethod 	js/src/xpconnect/src/xpcwrappednative.cpp:3028
9 	libxul.so 	XPC_WN_GetterSetter 	js/src/xpconnect/src/xpcprivate.h:2558
10 	libmozjs.so 	js_Invoke 	js/src/jsinterp.cpp:654
11 	libmozjs.so 	js_InternalInvoke 	js/src/jsinterp.cpp:694
12 	libmozjs.so 	js_InternalGetOrSet 	js/src/jsinterp.cpp:730
13 	libmozjs.so 	JSScopeProperty::set 	js/src/jsscope.h:1019
14 	libmozjs.so 	js_NativeSet 	js/src/jsobj.cpp:4819
15 	libmozjs.so 	js_SetPropertyHelper 	js/src/jsobj.cpp:5222
16 	libmozjs.so 	js_Interpret 	js/src/jsops.cpp:1827
17 	libmozjs.so 	js_Invoke 	js/src/jsinterp.cpp:664
18 	libmozjs.so 	js_fun_apply 	js/src/jsfun.cpp:2062
19 	libmozjs.so 	js_Interpret 	js/src/jsops.cpp:2148
20 	libmozjs.so 	js_Invoke 	js/src/jsinterp.cpp:664
21 	libxul.so 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1689
22 	libxul.so 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:570
23 	libxul.so 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/unix/xptcstubs_arm.cpp:132
24 	libxul.so 	libxul.so@0xe61fab
And the crash reporter seems to be working again in the latest builds, btw.
Ok, the crash you're looking at is bug 570132 which is ready to be checked in by anybody; I just haven't had the time to do so.
Unfortunately, the fix for bug 570132 didn't fix the crashes that I'm seeing. Apparently, those breakpad reports were unrelated.

I'm seeing those crashes with the fremantle qt trunk builds of Fennec on the n900. I haven't seen them with the chinook trunk builds. The crash reporter doesn't come up, so I can't give crash reports. Only the Maemo crash reporter appears.
Martijn, can you move this problem to a new bug, in that case?  CC me and anybody else who might have ideas; to my knowledge all known shutdown crash bugs are resolved at this point.
I was CC-ed to bug 577740. I guess I'm suffering from that bug?
In any case, the crashes that I get are happening while loading a page or shortly after that (within one minute).
marking fixed.  it was pushed to e10s, merged into m-c, and smaug didn't really comment about the approach.  pinging him again.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Crash Signature: [@nsDocShellTreeOwner::RemoveChromeListeners]
tracking-fennec: ? → ---
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: