Closed Bug 919878 Opened 7 years ago Closed 6 years ago

Removing a tab before ::Show() completes causes assertions and child process to terminate

Categories

(Core :: IPC, defect, P1)

All
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: markh, Assigned: billm)

References

Details

Attachments

(1 file)

STR:
* Set browser.tabs.remote=true

* In a "browser" scratchpad, execute:
      gBrowser.selectedTab = gBrowser.addTab();
      gBrowser.removeCurrentTab();

In a debug build, the following will be seen on the console:

[Child 11280] WARNING: nsWindow::GetNativeData not implemented for this type: file o:/src/mozilla-git/central2/widget/xpwidgets/PuppetWidget.cpp, line 672
++DOCSHELL 03461140 == 2 [id = 2]
++DOMWINDOW == 3 (03744B30) [serial = 4] [outer = 00000000]
[Parent 10348] WARNING: No docshells for remote frames!: file o:/src/mozilla-git/central2/content/base/src/nsFrameLoader.cpp, line 571
<repeated a few times>
[Parent 10348] WARNING: Can't allocate graphics resources, aborting subprocess: file o:/src/mozilla-git/central2/dom/ipc/TabParent.cpp, line 1397
###!!! [Parent][SyncChannel] Error: Value error: message was deserialized, but contained an illegal value
###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
...
[Parent 10348] WARNING: RemoveObserver() called for unregistered observer: file o:/src/mozilla-git/central2/hal/Hal.cpp, line 205
[Parent 10348] WARNING: RemoveObserver() called for unregistered observer: file o:/src/mozilla-git/central2/hal/Hal.cpp, line 205
[Parent 10348] WARNING: RemoveObserver() called for unregistered observer: file o:/src/mozilla-git/central2/hal/Hal.cpp, line 205
[Parent 10348] WARNING: RemoveObserver() called for unregistered observer: file o:/src/mozilla-git/central2/hal/Hal.cpp, line 205
[Parent 10348] WARNING: RemoveObserver() called for unregistered observer: file o:/src/mozilla-git/central2/hal/Hal.cpp, line 205
[Parent 10348] WARNING: file o:/src/mozilla-git/central2/ipc/chromium/src/base/process_util_win.cc, line 184

The problem is, roughly:

* The parent creates the browser element and calls SendPBrowserConstructor to do the client side.
* TabParent::Show() is called - this sends an async message to the child.
* We then remove the browser element - XULElement::RemoveChildAt() is called, which ends up destroying the TabParent, and in the process calls TabParent::SetOwnerElement(null)
* The child process, acting on the ::Show() message, send a message back which ends up in TabParent::AllocPRenderFrameParent().  This sees frameLoader being null (due to the SetOwnerElement(null)) and hits NS_WARNING("Can't allocate graphics resources, aborting subprocess");
* This returns null, which we see as the "Error: Value error: message was deserialized, but contained an illegal value" and we terminate the child.
* A little later, we end up at process_util_win.cc, line 184, which is attempting to get the PID for the child, and fails.
* Later, and somewhat non-deterministically, we end up with a variety of failures and assertions about condition variables and or InvalidHandle exceptions as we try and close the process handle.  I haven't dug into these, but they start to appear within seconds (or up to a minute) of doing the above.  I suspect we are getting confused about the child state and/or not cleaning everything up correctly.

I'm marking this as blocking the crash-e10s bug (899758), as even though there isn't really a "crash" here, the parent reports it as one and about:tabcrashed may be displayed.
(In reply to Mark Hammond (:markh) from comment #0)
> * Later, and somewhat non-deterministically, we end up with a variety of
> failures and assertions about condition variables and or InvalidHandle
> exceptions as we try and close the process handle.

I've narrowed down the InvalidHandle exception while closing the process handle - see bug 920397.
Blocks: 924260
Priority: -- → P1
I can reproduce this. I'll see if I can figure it out.
Assignee: nobody → wmccloskey
Mark's comment tells pretty much what happens. The TabParent is asked to allocate a RenderFrameParent. It tries to find the FrameLoader and can't get one because the tab has already been closed by that time. So it returns NULL. IPDL takes that as a sign that it should kill the child process. We're not very good at killing the child, so that goes badly.

However, we shouldn't be killing the child in the first place. This patch splits RenderFrameParent to have a mostly empty constructor, with the main logic going in an Init method. The Init method returns a success/fail boolean to the child.
Attachment #832635 - Flags: review?(matt.woodrow)
Attachment #832635 - Flags: review?(matt.woodrow) → review+
https://hg.mozilla.org/mozilla-central/rev/998d2d5e743d
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
This appears to be the patch that was causing the shutdown hangs. I'll need to investigate more.
I did a try run with debugging info:
https://tbpl.mozilla.org/?tree=Try&rev=13892bdf30ec

It looks like the problem is triggered some time during the social tests. They create some extra processes, and during that time they exercise the code added in this patch. I see this message:

19:23:00     INFO -  [Parent 1191] WARNING: Can't allocate graphics resources. May already be shutting down.: file ../../../dom/ipc/TabParent.cpp, line 1515

Then we add some code that tries to wait until it gets a SIGCHLD from the child process created. It never comes, and we hang at shutdown waiting for it. I'll investigate more tomorrow, but I thought I'd post this in case somebody wants to look into it.
I'm still waiting for try results, but I think I tracked down the problem here. Bug 951908.
Depends on: 951908
https://hg.mozilla.org/mozilla-central/rev/d1548d802855
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Do we need this on any other branches?
Target Milestone: mozilla28 → mozilla29
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #12)
> Do we need this on any other branches?

While they are technically affected, I don't think that's necessary.
We do hit bug 960605 on Aurora, so I'd love if you'd reconsider for Fx28 at least :)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #14)
> We do hit bug 960605 on Aurora, so I'd love if you'd reconsider for Fx28 at
> least :)

Fair enough :)  Bill - any objections to nominating this for Aurora?
Flags: needinfo?(wmccloskey)
I think bug 951908 is the one we want for aurora. I've requested a backport for that. Bug 960605 has more details.
Flags: needinfo?(wmccloskey)
Depends on: 1003041
You need to log in before you can comment on or make changes to this bug.