Mach port timing problem in OS X child process launching

RESOLVED FIXED in Firefox 62

Status

()

defect
P1
normal
RESOLVED FIXED
Last year
11 months ago

People

(Reporter: jld, Assigned: jld)

Tracking

60 Branch
mozilla63
Unspecified
macOS
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox61 wontfix, firefox62 fixed, firefox63 fixed)

Details

Attachments

(1 attachment)

When we launch child processes on OS X, the parent process creates and registers a Mach port so that the child can look it up and send to it to bootstrap Mach IPC between them.

However, the port isn't registered (by GeckoChildProcessHost) until after LaunchApp has returned, which I believe means that the child process could reach the point of trying to look it up before the parent has registered it, in which case it would fail.

Bug 1461459 comment #19 suggests that this is what's causing approximately all of our failures to launch child processes on Mac.
Assignee: nobody → jld
Priority: -- → P1
Duplicate of this bug: 1467331

Comment 3

Last year
mozreview-review
Comment on attachment 8989256 [details]
Bug 1469691 - Fix intermittent launch failure caused by registering a Mach port too late.

https://reviewboard.mozilla.org/r/254310/#review261146

Setting up before launching definitely makes sense, r=me.

::: ipc/glue/GeckoChildProcessHost.cpp:823
(Diff revision 1)
>    childArgv.push_back(childProcessType);
>  
> +# ifdef MOZ_WIDGET_COCOA
> +  // Register the listening port before launching the child, to ensure
> +  // that it's there when the child tries to look it up.
> +  ReceivePort parent_recv_port(mach_connection_name.c_str());

It's kind of a shame we can't error check here and not bother tryhing to launch.
Attachment #8989256 - Flags: review?(erahm) → review+

Comment 4

Last year
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/583f3742e6c9
Fix intermittent launch failure caused by registering a Mach port too late. r=erahm
(Reminding myself to request uplift, because this will cause mysteriously blank tabs on Beta/Release.)
Flags: needinfo?(jld)

Comment 6

Last year
bugherder
https://hg.mozilla.org/mozilla-central/rev/583f3742e6c9
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Comment on attachment 8989256 [details]
Bug 1469691 - Fix intermittent launch failure caused by registering a Mach port too late.

Approval Request Comment
[Feature/Bug causing the regression]: e10s (and more likely with e10s-multi)
[User impact if declined]: Users could sometimes see a blank tab when loading a page in a new tab.  (On Nightly we made it crash instead and it was frequent enough to be a top crash — see bug 1461459 comment #22 — so that should give a general idea of how prevalent it might be on beta/release.)  I think reloading the tab would be a workaround but I haven't verified that.
[Is this code covered by automated tests?]: Yes: anything that uses e10s on Mac.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No; this is an intermittent failure and I had to patch the source to make it happen reliably.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: The change is small (it just changes the order of two things we're already doing that don't directly interact) and it's gotten a lot of testing by this point.
[String changes made/needed]: None
Flags: needinfo?(jld)
Attachment #8989256 - Flags: approval-mozilla-beta?
Comment on attachment 8989256 [details]
Bug 1469691 - Fix intermittent launch failure caused by registering a Mach port too late.

This sounds difficult to verify but worth uplifting to beta 62 to fix an intermittent test failure/ blank tab problem for users. 
Should land for beta 9.
Attachment #8989256 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #7)
> [Is this code covered by automated tests?]: Yes: anything that uses e10s on
> Mac.
> [Needs manual test from QE? If yes, steps to reproduce]: No; this is an
> intermittent failure and I had to patch the source to make it happen
> reliably.

Marking as qe-verify- based on Jed's assessment for manual testing needs and due to the fact that it's already covered by automated tests.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.