Closed Bug 1469691 Opened 2 years ago Closed 2 years ago

Mach port timing problem in OS X child process launching

Categories

(Core :: IPC, defect, P1)

60 Branch
Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox61 --- wontfix
firefox62 --- fixed
firefox63 --- fixed

People

(Reporter: jld, Assigned: jld)

References

Details

Attachments

(1 file)

When we launch child processes on OS X, the parent process creates and registers a Mach port so that the child can look it up and send to it to bootstrap Mach IPC between them.

However, the port isn't registered (by GeckoChildProcessHost) until after LaunchApp has returned, which I believe means that the child process could reach the point of trying to look it up before the parent has registered it, in which case it would fail.

Bug 1461459 comment #19 suggests that this is what's causing approximately all of our failures to launch child processes on Mac.
Assignee: nobody → jld
Priority: -- → P1
Duplicate of this bug: 1467331
Comment on attachment 8989256 [details]
Bug 1469691 - Fix intermittent launch failure caused by registering a Mach port too late.

https://reviewboard.mozilla.org/r/254310/#review261146

Setting up before launching definitely makes sense, r=me.

::: ipc/glue/GeckoChildProcessHost.cpp:823
(Diff revision 1)
>    childArgv.push_back(childProcessType);
>  
> +# ifdef MOZ_WIDGET_COCOA
> +  // Register the listening port before launching the child, to ensure
> +  // that it's there when the child tries to look it up.
> +  ReceivePort parent_recv_port(mach_connection_name.c_str());

It's kind of a shame we can't error check here and not bother tryhing to launch.
Attachment #8989256 - Flags: review?(erahm) → review+
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/583f3742e6c9
Fix intermittent launch failure caused by registering a Mach port too late. r=erahm
(Reminding myself to request uplift, because this will cause mysteriously blank tabs on Beta/Release.)
Flags: needinfo?(jld)
https://hg.mozilla.org/mozilla-central/rev/583f3742e6c9
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
See Also: → 1473156
Comment on attachment 8989256 [details]
Bug 1469691 - Fix intermittent launch failure caused by registering a Mach port too late.

Approval Request Comment
[Feature/Bug causing the regression]: e10s (and more likely with e10s-multi)
[User impact if declined]: Users could sometimes see a blank tab when loading a page in a new tab.  (On Nightly we made it crash instead and it was frequent enough to be a top crash — see bug 1461459 comment #22 — so that should give a general idea of how prevalent it might be on beta/release.)  I think reloading the tab would be a workaround but I haven't verified that.
[Is this code covered by automated tests?]: Yes: anything that uses e10s on Mac.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No; this is an intermittent failure and I had to patch the source to make it happen reliably.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: The change is small (it just changes the order of two things we're already doing that don't directly interact) and it's gotten a lot of testing by this point.
[String changes made/needed]: None
Flags: needinfo?(jld)
Attachment #8989256 - Flags: approval-mozilla-beta?
Comment on attachment 8989256 [details]
Bug 1469691 - Fix intermittent launch failure caused by registering a Mach port too late.

This sounds difficult to verify but worth uplifting to beta 62 to fix an intermittent test failure/ blank tab problem for users. 
Should land for beta 9.
Attachment #8989256 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #7)
> [Is this code covered by automated tests?]: Yes: anything that uses e10s on
> Mac.
> [Needs manual test from QE? If yes, steps to reproduce]: No; this is an
> intermittent failure and I had to patch the source to make it happen
> reliably.

Marking as qe-verify- based on Jed's assessment for manual testing needs and due to the fact that it's already covered by automated tests.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.