Closed
Bug 902755
Opened 11 years ago
Closed 9 years ago
Deadlock in mozilla::ipc::GeckoChildProcessHost::LaunchAndWaitForProcessHandle after turning on new tab page thumbnails
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
e10s | + | --- |
People
(Reporter: MattN, Assigned: adw)
References
Details
(Keywords: hang, Whiteboard: p=0)
Attachments
(3 files, 2 obsolete files)
87.81 KB,
text/plain
|
Details | |
56.68 KB,
image/png
|
Details | |
3.23 KB,
patch
|
Details | Diff | Splinter Review |
Steps I took in my everyday UX Nightly profile:
1) Open the new tab page with thumbnails turned off. I have had them off for many weeks.
2) Click the grid to turn on thumbnails
Expected result:
Thumbnails start loading from the background thumbnail process
Actual result:
Browser hangs/deadlocks
I haven't yet tried to reproduce this.
Some notes which may be relevant:
* I don't see a plugin process started for thumbnailing
* I had 2 plugin processes already running (Shockware Flash and Google Talk Plugin)
* Built from http://hg.mozilla.org/projects/ux/rev/eba98a54c087
* OS X 10.7.5
The following was in the OS X console:
8/7/13 6:51:04.912 PM [0x0-0x591591].org.mozilla.ux: [Parent 54023] WARNING: parent WaitForMessage() failed: 0x10004003 (ipc/rcv) timed out: file ../../../../ipc/glue/GeckoChildProcessHost.cpp, line 670
Reporter | ||
Comment 1•11 years ago
|
||
Comment 2•11 years ago
|
||
Not technically a "crash", but still worth blocking bug 899758 for.
Blocks: 899758
Comment 3•11 years ago
|
||
I suspect this is the Mac variant of bug 902790 (although that doesn't explain why the child process creation failed, but possibly does explain the hang after it fails)
Comment 4•11 years ago
|
||
Bug 902790 has fixed the issue for Windows/Linux - this bug remains to resolve a hang on Mac in the same situation.
The fix is to arrange so the block at http://mxr.mozilla.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#662 is only entered if |process| is non-zero, indicating the child was successfully created.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → adw
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•11 years ago
|
||
As Mark says, base::LaunchApp's success is checked once, right after the #if defined(OS_POSIX) section. The false branch of that #if exits immediately after calling LaunchApp and therefore ends up checking for success immediately, but the true branch does a bunch of stuff before reaching the check. So this patch checks for success immediately within both branches.
This fixes the hang, but in a debug build it reveals another, because this compositor-related assertion is hit and then Firefox hangs: http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#1454 Without that assertion and the one a few lines down, no more hang.
Ben, Chris flagged you once for feedback in the similar bug 902790, so could you please review or pass this along?
Attachment #817413 -
Flags: review?(bent.mozilla)
Comment on attachment 817413 [details] [diff] [review]
check LaunchApp's success in both branches
Review of attachment 817413 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for tackling this. I'm not sure about your assertion problem with the compositor, we'll need to nail that down before this gets landed I think.
::: ipc/glue/GeckoChildProcessHost.cpp
@@ +663,5 @@
> + if (!process) {
> + MonitorAutoLock lock(mMonitor);
> + mProcessState = PROCESS_ERROR;
> + lock.Notify();
> + return false;
I don't think you should bail out before calling CloseClientFileDescriptor below, that will probably leak.
But more specifically why not just make the MOZ_WIDGET_COCOA block check for a non-zero |process| rather than duplicate this code here? Then you'd get down to the common |mProcessState = PROCESS_ERROR;| below.
I'm also concerned about the other |return false| points in this MOZ_WIDGET_COCOA block... Shouldn't they set |mProcessState| too?
Attachment #817413 -
Flags: review?(bent.mozilla) → review-
Assignee | ||
Comment 7•11 years ago
|
||
Thanks, Ben.
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #6)
> Thanks for tackling this. I'm not sure about your assertion problem with the
> compositor, we'll need to nail that down before this gets landed I think.
OK. According to the comment there, it's messaging the child to create the CompostitorChild, which of course fails when the child doesn't exist.
> I'm also concerned about the other |return false| points in this
> MOZ_WIDGET_COCOA block... Shouldn't they set |mProcessState| too?
I guess so? Handling this part made the patch a little more complex, as opposed to simply wrapping the Cocoa block in a conditional and falling through to the common mProcessState check. But certainly there are multiple ways to structure this, so I'm open to better alternatives than what's in this patch.
Attachment #817413 -
Attachment is obsolete: true
Attachment #820081 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 8•11 years ago
|
||
I filed bug 930838 for the failing ContentParent/CompositorChild assertions.
Comment on attachment 820081 [details] [diff] [review]
add SetErrorAndReturnFalse and use it in the Cocoa block
Review of attachment 820081 [details] [diff] [review]:
-----------------------------------------------------------------
::: ipc/glue/GeckoChildProcessHost.h
@@ +166,5 @@
> #endif
>
> void OpenPrivilegedHandle(base::ProcessId aPid);
>
> + bool SetErrorAndReturnFalse();
This is cute, but I think it should just be a void function called SetErrorState.
Then let's move it into a private block. And then I think you should be able to inline it (might have to use 'mozilla::MonitorAutoLock' but whatever).
Then just return false everywhere ;)
Attachment #820081 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 10•11 years ago
|
||
Thanks! Thanks to Mark, too, for pointing out the fix.
https://tbpl.mozilla.org/?tree=Try&rev=13b28bcfd91d
Attachment #820081 -
Attachment is obsolete: true
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Comment 11•11 years ago
|
||
Keywords: checkin-needed
Comment 12•11 years ago
|
||
This was backed out for causing mochitest-2 asserts on OSX debug.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a4a1d7f06a4
https://tbpl.mozilla.org/php/getParsedLog.php?id=29951559&tree=Mozilla-Inbound
Assignee | ||
Comment 13•11 years ago
|
||
Sigh, that's the compositor assertion being revealed.
What would have happened on those runs without this patch? Would they have hung or crashed? There are bugs concerning the two tests where the assertion was triggered, all of them being timeouts or "exited with code -20" on OS X:
test_browserElement_oop_ContextmenuEvents.html: bug 835927, bug 930449
test_browserElement_oop_CloseApp.html: bug 820412, bug 925200
I suspect that if we had run the same runs without the patch, failures on the same machines would have happened, but they would have been chalked up to these bugs.
Assignee | ||
Comment 14•11 years ago
|
||
Looks like bug 930838 is going to block this from landing, so marking the dependency.
Depends on: 930838
Assignee | ||
Updated•11 years ago
|
Blocks: fxdesktopbacklog
Whiteboard: [triage]
Updated•11 years ago
|
Whiteboard: [triage]
Updated•11 years ago
|
Whiteboard: p=0
Comment 15•11 years ago
|
||
Mass tracking-e10s flag change. Filter bugmail on "2be0fcce-e36a-4e2c-aa80-0e3d33eb5406".
tracking-e10s:
--- → +
Updated•11 years ago
|
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Comment 17•9 years ago
|
||
Hi Matthew,
I cannot reproduce this on latest Aurora build (46.0a2-20160204004009) and latest Nightly build (47.0a1-20160204030229). I have set the "browser.newtabpage.enabled" to false, opened multiple pages on the browser and then re-enabled the preference. No hangs occurred while doing this and the previous visited pages were displayed in tiles.
I have test this on Mac OS X 10.7 (iMac) and 10.10 (MBP Retina 13").
Considering this, I am marking the issue as Resolved-Worksforme. If anyone can still reproduce this, feel free to reopen it providing more information.
Thanks,
Paul.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•