Closed Bug 603680 Opened 14 years ago Closed 14 years ago

PBrowser:Destroy racing with PRenderFrameCtor crashes the chrome process [@ mozilla::layout::RenderFrameParent::GetLayerManager]

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Maemo
defect
Not set
normal

Tracking

(fennec2.0b3+)

VERIFIED FIXED
Tracking Status
fennec 2.0b3+ ---

People

(Reporter: nhirata, Assigned: cjones)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Mozilla/5.0 (Maemo;Linux armv71; rv:2.0b8pre)Gecko/20101012 Firefox/4.0b8pre Fennec/4.0b2pre

Not 100 % sure of repro steps.  This is as far as I remember:

1. open:
a) about:firstrun
b) about:home
c) http://www.amazon.com
d) http://bit.ly/akf4Fn
e) http://www.hulu.com
2. play around with the tabs; ie close a tab while a page is downloading, and reopen it through the recently closed tab icon, close other tabs, and reopen them again.

Expected: No crash
Actual: crash occurs http://crash-stats.mozilla.com/report/index/bp-f1e5a403-ad07-41f7-a9ad-511492101012

Note:
Fennec 4.0b2pre Crash Report [@ mozilla::layout::RenderFrameParent::GetLayerManager ]
Search Mozilla Support for Help
ID: f1e5a403-ad07-41f7-a9ad-511492101012
Signature: mozilla::layout::RenderFrameParent::GetLayerManager

    Details
    Modules
    Raw Dump
    Extensions
    Comments
    Correlations

Signature	mozilla::layout::RenderFrameParent::GetLayerManager
UUID	f1e5a403-ad07-41f7-a9ad-511492101012
Time 	2010-10-12 10:54:27.98417
Uptime	140
Last Crash	7731 seconds (2.1 hours) before submission
Install Age	8073 seconds (2.2 hours) since version was first installed.
Product	Fennec
Version	4.0b2pre
Build ID	20101012024103
OS	Linux
OS Version	0.0.0 Linux 2.6.28-omap1 #1 PREEMPT Thu Apr 15 09:47:09 EEST 2010 armv7l
CPU	arm
CPU Info	
Crash Reason	SIGSEGV
Crash Address	0x10
User Comments	
Processor Notes 	
EMCheckCompatibility	False
Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	libxul.so 	mozilla::layout::RenderFrameParent::GetLayerManager 	content/base/src/nsFrameLoader.h:215
1 	libxul.so 	mozilla::layout::RenderFrameParent::AllocPLayers 	layout/ipc/RenderFrameParent.cpp:250
2 	libxul.so 	mozilla::layout::PRenderFrameParent::OnMessageReceived 	PRenderFrameParent.cpp:103
3 	libxul.so 	mozilla::dom::PContentParent::OnMessageReceived 	PContentParent.cpp:463
4 	libxul.so 	mozilla::ipc::AsyncChannel::OnDispatchMessage 	ipc/glue/AsyncChannel.cpp:262
5 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:438
6 	libxul.so 	RunnableMethod<mozilla::ipc::RPCChannel,bool ,Tuple0>::Run 	ipc/chromium/src/base/tuple.h:383
7 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:449
8 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:343
9 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:351
10 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:451
11 	libxul.so 	mozilla::ipc::DoWorkRunnable::Run 	ipc/glue/MessagePump.cpp:70
12 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:547
13 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:250
14 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
15 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:219
16 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:202
17 	libxul.so 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:180
18 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:191
19 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3670
20 	fennec 	main 	mobile/app/nsBrowserApp.cpp:155
21 	libc-2.5.so 	libc-2.5.so@0x14973 	
22 	fennec 	Output 	mobile/app/nsBrowserApp.cpp:77
Summary: Crash may occur after opening the recently closed while other tabs are loading → Crash may occur after opening the recently closed while other tabs are loading [@ mozilla::layout::RenderFrameParent::GetLayerManager ]
I can't for the life of me reproduce this crash or the similar-looking one in bug 604249.  There are a few plausible hypotheses, but I'd have to get this in gdb to see what's up.  Will try later.
http://crash-stats.mozilla.com/report/index/ae2f9be3-a5b3-49b0-a755-09e2c2101014

Mozilla/5.0 (Maemo;Linux armv71; rv:2.0b8pre) Gecko/20101014 Firefox/4.0b8pre Fennec/4.0b2pre

I got another crash today with easier steps right after reboot of the Nokia:
1. launch fennec
2. open a new tab : go to www.gmail.com
3. log into gmail.com
4. while gmail.com is loading, open a new tab and open : about:fennec

Note:
1. What I was trying to do was reproduce a focusing issue I had when switching from reader.google.com to www.gmail.com after having closed two tabs in between that were local.  The bug that I was trying to reproduce showed the content of reader.google.com in the content window even though I had switched focus from that over to gmail.com in the tabs panel.
2. It may be Nokia only.  I can't seem to repro this on the android.


Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	libxul.so 	mozilla::layout::RenderFrameParent::GetLayerManager 	content/base/src/nsFrameLoader.h:215
1 	libxul.so 	mozilla::layout::RenderFrameParent::AllocPLayers 	layout/ipc/RenderFrameParent.cpp:297
2 	libxul.so 	mozilla::layout::PRenderFrameParent::OnMessageReceived 	PRenderFrameParent.cpp:103
3 	libxul.so 	mozilla::dom::PContentParent::OnMessageReceived 	PContentParent.cpp:463
4 	libxul.so 	mozilla::ipc::AsyncChannel::OnDispatchMessage 	ipc/glue/AsyncChannel.cpp:262
5 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:438
6 	libxul.so 	RunnableMethod<mozilla::ipc::RPCChannel,bool ,Tuple0>::Run 	ipc/chromium/src/base/tuple.h:383
7 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:449
8 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:343
9 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:351
10 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:451
11 	libxul.so 	mozilla::ipc::DoWorkRunnable::Run 	ipc/glue/MessagePump.cpp:70
12 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:547
13 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:250
14 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:110
15 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:219
16 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:202
17 	libxul.so 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:180
18 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:191
19 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3670
20 	fennec 	main 	mobile/app/nsBrowserApp.cpp:155
21 	libc-2.5.so 	libc-2.5.so@0x14973 	
22 	fennec 	Output 	mobile/app/nsBrowserApp.cpp:77
Finally caught this!  I used the steps in comment 2, but loaded the html5 spec.  Thanks Naoki.

Relevant things

#6  0x00007f74fa33f4ac in mozilla::layout::RenderFrameParent::GetLayerManager (this=0x7f74e0225cf0) at /home/cjones/mozilla/mozilla-central/layout/ipc/RenderFrameParent.cpp:324
(gdb) p mFrameLoader
$2 = {
  mRawPtr = 0x0
}
(gdb) p this->mManager->mFrameElement
$9 = (class nsIDOMElement *) 0x0
(gdb) p *this->mManager
[snip]
    mState = mozilla::dom::PBrowser::__Null, 
[snip]
(gdb) ptarray this->mManager->mManagedPRenderFrameParent
[Thread 0x7f74dd94a710 (LWP 11517) exited]
elem[0]: $12 = (mozilla::layout::RenderFrameParent *) 0x7f74e0225cf0
nsTArray length = 1
nsTArray capacity = 1
Element type = class mozilla::layout::PRenderFrameParent *

All this implies
 - the RenderFrameParent is live, not ActorDestroy()d
 - it was created with a null nsFrameLoader o_O (the RenderFrameParent's ref is only nulled at ActorDestroy)
 - the TabParent is live, not ActorDestroy()d, though it has possibly already sent the Destroy message to the child (too bad PBrowser is stateless or we'd know for sure)

RenderFrameParent's assumed-invariant being violated here is that nsFrameLoader begat TabParent begat RenderFrameParent, so RenderFrameParent always has an nsFrameLoader.  It doesn't in this crash.  This situation could arise if

 (1) TabParent is created, then nsFrameLoader::Destroy() happens concurrently with the child sending PRenderFrameConstructor, so that TabParent::mFrameElement is null when RenderFrameParent is created (sigh, PBrowser *really* needs explicit state)
 (2) At nsFrameLoader::TryRemoteBrowser(),
    nsCOMPtr<nsIDOMElement> element = do_QueryInterface(mOwnerContent);
    mRemoteBrowser->SetOwnerElement(element);
gives back a null element.

(1) seems more plausible on the surface, but I don't believe this could happen unless a new tab was opened and closed before PRenderFrameConstructor came through.  I didn't close any tabs when repro'ing.  This is a legitimate bug, but the odds of it happening in practice seem pretty slim to me.  If the frontend creates then destroys a <browser> in quick succession, that could trigger this.  Extensions could too.

So, maybe (2), but I have no earthly idea why such a thing would happen.  Luckily, just need two hard asserts to find the culprit.
###!!! ABORT: option (1): 'aFrameLoader', file /home/cjones/mozilla/mozilla-central/layout/ipc/RenderFrameParent.cpp, line 178

Very strange.  Digging.
During the sequence of events
 - Load HTML5 spec
 - Open new tab
 - Load about:fennec

something causes nsFrameLoader::Show to be called, which sends the PBrowser:Show message.  Then before the content process's PRenderFrameChildCtor message comes in, something causes nsFrameLoader::Destroy to be called, which calls TabParent::Destroy.  Again, this is a legitimate bug which we could hit if, say, a user were to open and close tabs while the content process were extremely (extremely) busy.  But, there's no close-tab above.  I suspect the fennec frontend is not behaving well here, by showing then throwing away a <browser>.

But this is a platform bug, so it needs to be fixed.  Bug 589337 covers tidying up the Show/Hide interface, and IPDL |discard| covers language support for dealing with these racy Destroy/PRenderFrameParentCtor messages.  In the meantime for both, this patch changes this crash into "###!!! [Parent][AsyncChannel] Error: Route error: message sent to unknown actor ID", with approximately the same overall effect that |discard| would have had.
Assignee: nobody → jones.chris.g
Attachment #483709 - Flags: review?(benjamin)
Beta2 is a good target for this: safe patch that fixes a chrome-process crasher.
tracking-fennec: --- → ?
Filed bug 604863 on the unexpected <browser remote> life cycle.
Summary: Crash may occur after opening the recently closed while other tabs are loading [@ mozilla::layout::RenderFrameParent::GetLayerManager ] → PBrowser:Destroy racing with PRenderFrameCtor crashes the chrome process [ @mozilla::layout::RenderFrameParent::GetLayerManager]
Summary: PBrowser:Destroy racing with PRenderFrameCtor crashes the chrome process [ @mozilla::layout::RenderFrameParent::GetLayerManager] → PBrowser:Destroy racing with PRenderFrameCtor crashes the chrome process [@ mozilla::layout::RenderFrameParent::GetLayerManager]
Ran into this again.  This time I ran into it with the content crashed dialog appearing at the same time I closed the tab for the crashed content.

http://crash-stats.mozilla.com/report/pending/bp-1bba86fb-c098-42c8-9f40-5ac3a2101020
tracking-fennec: ? → 2.0b3+
Attachment #483709 - Flags: review?(benjamin) → review+
http://hg.mozilla.org/mozilla-central/rev/0b7bfb9bba5a
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
It no longer seems to crash.  Not sure if azakai's work on the niceness will help out with the responsiveness while gmail is loading in another tab and you open up a new tab. ( Bug 606574 )


Verified:
Mozilla/5.0 (Maemo;Linux armv71; rv:2.0b8pre) Gecko/20101029 Firefox/4.0b8pre Fennec/4.0b2pre
Status: RESOLVED → VERIFIED
Yes, adjusting priorities should help.  Too bad it just missed beta2.
Crash Signature: [@ mozilla::layout::RenderFrameParent::GetLayerManager]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: