Closed Bug 1406971 Opened 3 years ago Closed 3 years ago

Change nsProcess to not use NSPR on Unix

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: jld, Assigned: jld)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

See bug 227246 for context — NSPR's child process handling interacts badly with anything else trying to create child processes on Unix, like Gecko IPC, and fixing this within NSPR is difficult.  Bug 227246 comment #8 encourages moving Gecko away from PR_CreateProcess, so I wrote a patch to do that for nsProcess.

This is a separate bug because it doesn't fix the problem with NSPR.  It also doesn't cover the use of PR_CreateProcess in nsAuthSambaNTLM.cpp, which seems to be the only other use of PR_CreateProcess in the Firefox tree that would be affected — there are others, but they're all either ifdef'ed to non-Unix or in standalone utilities that probably use only NSPR for child process.
*applause*
Priority: -- → P2
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8dc7c4a1ccb36ca54ab804b125ea4a9a8f7fe330

I'm assuming it's safe to use code from ipc/ in xpcom/ because there were already some minor uses of it, and this avoids some code duplication / wheel-reinvention.

Possible refinements that I didn't do:

1. Delete the not-Unix-or-Windows case (still using NSPR) as dead code

2. Merge the OS X case into regular Unix — bug 600711 comment #2 suggests it was needed when we were shipping Universal Binaries, but this is no longer the case.
Comment on attachment 8916667 [details]
Bug 1406971 - Change nsProcess to use LaunchApp from IPC, instead of NSPR, on Unix.

https://reviewboard.mozilla.org/r/187760/#review192886

Not thrilled about more dependencies, but as you say, we have them already and even nastier ones besides (e.g. xpcom/ depends on things in dom/).
Attachment #8916667 - Flags: review?(nfroyd) → review+
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #3)
> 2. Merge the OS X case into regular Unix — bug 600711 comment #2 suggests it
> was needed when we were shipping Universal Binaries, but this is no longer
> the case.

posix_spawnp() is not required anymore, but be wary of bug 1376567. I
removed one of the posix_spawnp() uses in the exception handler code only to
be bitten by that nasty bug. That seems to manifest itself only if the
parent process shuts down very quickly after we call fork() on Mac.
Unfortunately at least one of our uses of nsIProcess happens precisely on
shutdown when telemetry launches the pingsender to relay the shutdown ping.
If we could fix bug 1376567 there would be no reason to have different
code-paths for Linux and Mac, and both could just use fork().
(In reply to Gabriele Svelto [:gsvelto] from comment #6)
> (In reply to Jed Davis [:jld] (⏰UTC-6) from comment #3)
> > 2. Merge the OS X case into regular Unix — bug 600711 comment #2 suggests it
> > was needed when we were shipping Universal Binaries, but this is no longer
> > the case.
> 
> posix_spawnp() is not required anymore, but be wary of bug 1376567. I
> removed one of the posix_spawnp() uses in the exception handler code only to
> be bitten by that nasty bug. That seems to manifest itself only if the
> parent process shuts down very quickly after we call fork() on Mac.
> Unfortunately at least one of our uses of nsIProcess happens precisely on
> shutdown when telemetry launches the pingsender to relay the shutdown ping.
> If we could fix bug 1376567 there would be no reason to have different
> code-paths for Linux and Mac, and both could just use fork().

Thanks for the warning.  IPC LaunchApp already uses posix_spawn on OS X; this was originally to deal with Universal Binaries, but I'm not in favor of switching back to fork().  There are the weird deficiencies in OS X fork() that I don't know enough about Mach messaging to understand (like bug 1376567), and as of 10.8 posix_spawn gives us an elegant solution for bug 147659 in POSIX_SPAWN_CLOEXEC_DEFAULT (nonstandard Apple extension, so won't help other platforms).  Also, fork() has some inherent peformance costs for copy-on-write that posix_spawn can avoid.

https://crbug.com/179923 discusses all of this: Chromium had weird bugs around fork() and wanted to move away from it, they had to figure out to close superfluous fds (and didn't have a good answer until POSIX_SPAWN_CLOEXEC_DEFAULT; even with fork/exec they had to spam close() on every possible fd number), and there's even a comment where fork() is benchmarked.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #7)
> Thanks for the warning.  IPC LaunchApp already uses posix_spawn on OS X;
> this was originally to deal with Universal Binaries, but I'm not in favor of
> switching back to fork().  There are the weird deficiencies in OS X fork()
> that I don't know enough about Mach messaging to understand (like bug
> 1376567), and as of 10.8 posix_spawn gives us an elegant solution for bug
> 147659 in POSIX_SPAWN_CLOEXEC_DEFAULT (nonstandard Apple extension, so won't
> help other platforms).  Also, fork() has some inherent peformance costs for
> copy-on-write that posix_spawn can avoid.
> 
> https://crbug.com/179923 discusses all of this: Chromium had weird bugs
> around fork() and wanted to move away from it, they had to figure out to
> close superfluous fds (and didn't have a good answer until
> POSIX_SPAWN_CLOEXEC_DEFAULT; even with fork/exec they had to spam close() on
> every possible fd number), and there's even a comment where fork() is
> benchmarked.

I think that settles it, if it's so problematic then there's no point in trying to adapt MacOS X code to use fork(). Instead if we want a single path for launching applications we might use posix_spawnp() on Linux too.
(In reply to Gabriele Svelto [:gsvelto] from comment #8)
> I think that settles it, if it's so problematic then there's no point in
> trying to adapt MacOS X code to use fork(). Instead if we want a single path
> for launching applications we might use posix_spawnp() on Linux too.

Unfortunately, that won't work without bringing back bug 147659, because POSIX_SPAWN_CLOEXEC_DEFAULT doesn't exist outside of OS X.
(I've adjusted the commit message to reflect the other bugs that karlt pointed out.)
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e0e042fe8d40
Change nsProcess to use LaunchApp from IPC, instead of NSPR, on Unix. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/e0e042fe8d40
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Duplicate of this bug: 147659
You need to log in before you can comment on or make changes to this bug.