Closed Bug 1752638 Opened 4 years ago Closed 11 months ago

Fork server: remoting for child process exit handling

Categories

(Core :: IPC, task)

Desktop
Linux
task

Tracking

()

RESOLVED FIXED
140 Branch
Tracking Status
firefox140 --- fixed

People

(Reporter: jld, Assigned: jld)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

Currently, the fork server ignores child processes exiting, and the parent process needs to guess whether the process exited by probing its pid, which can be wrong if the pid is reused (in the worst case, this could mean that we kill -9 an unrelated process). This also doesn't allow seeing the child process's exit status; see also bug 1752625.

Currently, the fork server's MiniTransceiver requires messages to alternate between the parent sending and the parent receiving. I'd suggest removing that, and instead handling synchronous calls by having the parent send the fork server one end of a socketpair or pipe for that message's reply, like the Linux sandbox broker and the crash reporter. The socket should also be changed from SOCK_STREAM to SOCK_SEQPACKET to allow sending atomically from any thread; note that it currently doesn't handle short writes anyway.

The resulting protocol would look like this:

  • [parent→forkserver, sync] LaunchApp; takes various parameters, returns pid
  • [parent→forkserver, sync] WaitPid; takes pid and whether to block, returns status or error
  • [forkserver→parent, async] SigChld; this will need to be connected to ProcessWatcher (which, after bug 1658072, will use the pipe-to-self trick in our code instead of libevent, so writing a byte to that pipe would work).

The remote waitpid being synchronous and done only at the parent's request (not automatically when the SIGCHLD is received) is subtle: before we call it, we know the pid is still in use and can't be reused; afterwards, we know it's not. This lets us (where “us” includes my ProcessWatcher rewrite) safely index child process information by pid and know that there can't be duplicates as long as we remove any information for a process before doing the waitpid.

For the SIGCHLD notifications, the forkserver socket will need to be connected to an event loop; the IPC I/O thread will work. Alternately, the write end of the ProcessWatcher signal pipe could be sent to the fork server process so it can wake up the ProcessWatcher directly; that's kind of hacky but might be simpler. Also, there's going to be some complication as far as when to waitpid a child directly vs. calling the fork server; one possibility is that, for a fork server child, waitpiding it directly will always fail with ECHILD (we know that the pid hasn't been freed for reuse yet), so it could just always try to wait directly and then fall back to messaging the fork server in that case.

Assignee: nobody → jld

We probably don't want to ship the forkserver in release before we fix this.

Severity: -- → S2
See Also: → 1888965

This patch has my half-finished work making some changes to how fork
server communications works. Before it used a stream socket and always
had matched pairs of synchronous request/reply with no interruptions.
This changes the main protocol to use a sequential-packet socket with
asynchronous messages in (eventually) both directions; ones that need
a reply (currently the fork request, later also the wait request) will
attach a socket for that, like what we do in the Linux sandbox and crash
reporter.

That also allows requests to be sent from any thread in the main process
without locking, although that will probably need threadsafe refcouting
for safety. The attached sockets can be stream sockets, e.g. so that
the arg/env strings could exceed the size of the socket buffer if
needed.

What is not done here is to actually implement the new messages.
Remoting waitid should be relatively straightforward, along similar
lines to how the fork request works after this patch. Sending SIGCHLD
back is more interesting: it's possible to sendmsg directly from the
signal handler, but the message would have to be pre-serialized, because
malloc is not safe there; it's also possible to use the pipe-to-self
idiom. In the main process, connecting those messages to the I/O thread
event loop will also need some work.

There is also some subtlety about child processes outliving the fork
server -- currently we intentionally shut it down while child processes
are still running, and it will always be possible for it to crash, so
we'll need some kind of best-effort fallback there.

There are also some FIXME comments in this patch.

And, integration with ProcessWatcher should be aware of D141309.

Hardware: Unspecified → Desktop
Duplicate of this bug: 1888965
Duplicate of this bug: 1918588
Attachment #9406045 - Attachment is obsolete: true

There's a simpler way to do this, pointed out by Nika on Matrix a while back. The simpler version is:

  • The fork server IPC protocol continues to have only sync messages from the parent, but probably with a mutex around the parent side to allow use from other threads.
  • There are now two RPC calls the fork server accepts: launch and waitpid
  • There's a separate pipe for SIGCHLD events: the fork server's signal handler writes a byte to it directly, and the parent process polls the other end for reading (like the current use of pipe-to-self, but pipe-to-other).

One potential issue here is that if we have n child processes in the process of exiting and then they all exit one by one, we'll make O(n^2) RPC calls to the fork server, which could be expensive. (But maybe not; the constant factor could be relatively low and this only counts child processes after the GeckoChildProcessHost is destroyed but before OS-level process exit.) Also, a remote waitpid request could block the calling thread (the I/O thread?) for several milliseconds while the fork server clones itself (unfortunately, Linux doesn't heavily optimize the case of forking a process with many memory mappings but very little changed since the last fork, possibly as a tradeoff against more common use cases). To avoid this, the advanced version of this idea adds:

  • The fork server keeps track of all running child processes; alternately, the parent process can register interest in them when ProcessWatcher starts watching.
  • On SIGCHLD, the fork server will waitid with WNOWAIT on each child process, and write the pid of each one that exited to the pipe; ProcessWatcher would be adjusted accordingly.
  • The actual waitpid to collect the process should still be a sync IPC, so that we don't have pids being reused when the parent process still thinks it's a running child process. This is important because we're trying to remove the potential for annoying edge cases like accidentally SIGKILLing the wrong process.

Basically this moves more of ProcessWatcher's logic into the fork server itself, and also duplicates some of the state, so it's more complicated and there are some subtleties.

Calling waitpid() is async to the status changes of the target process, it doesn't mean the process is still alive if waitpid() said it is alive. It is definitely dead if waitpid() said it is dead. With a table in the parent process to received dead pids from the fork server, checking the table semantically should equal to waitpid().

Reaping child processes by the fork server withou a RPC call from the parent can fork a new process with the same pid before the parent process reaping it virtually. However, with single thread at the fork server handling fork requests and SIGCHLD, the messages of reaping old processes and creating new processes could be in a promised order.

The idea is

  • adding a separate pipe to send reaped PIDs,
  • adding an atomic counter to count the number of pending SIGCHLD at the fork server,
  • calling waitpid in the main loop of the fork server if there are pending SIGCHLD, and
  • having a table to receive reaped PIDs in the parent process.

(In reply to Thinker Li [:sinker] from comment #6)

Calling waitpid() is async to the status changes of the target process, it doesn't mean the process is still alive if waitpid() said it is alive. It is definitely dead if waitpid() said it is dead. With a table in the parent process to received dead pids from the fork server, checking the table semantically should equal to waitpid().
I missed the point of sending SIGKILL after reaping.

Regarding sending SIGKILL to wrong processes, it should send a message to the fork server instead of call kill() directly.

On top of comment #6, ProcessWatcher should send a message to the fork server to kill the process by fork server.

Incoming messages to the ForkServiceChild are replies to its synchronous
requests to the fork server, read with blocking I/O, so handling the
reply can be done inline. And there will soon be several requests,
with different replies, so it doesn't make sense to have a single
OnMessageRecevied method like this.

This does a few things to ForkServiceChild that will be needed in future
patches:

  • protects the channel with a mutex, so that multiple threads can
    make calls to the fork server; note that the fork server itself is
    single-threaded so these are necessarily serialized

  • ForkServiceChild is now thread-safely refcounted, so that attempts to
    use it which race with a fork server crash/restart will fail safely

There are two parts to this:

  1. A new sync IPC call to the fork server which is basically just
    waitpid: it takes a pid and a flag for whether to block, and returns
    either the exit status, an error, or an indication that the process is
    still running. The code in the parent process which would previously
    hack around the fork server case with kill(pid, 0) now uses this
    instead.

  2. The fork server now has a SIGCHLD handler which writes to a pipe
    given at startup. Currently this is connected to the same pipe-to-self
    used in ProcessWatcher for the parents' own SIGCHLDs. Thus, the
    existing ProcessWatcher is mostly unmodified.

Measured locally, these remote waitpids take about 50µs, compared to
about 10µs for in-process. So, while it's possible for the callback
for SIGCHLD pipe reads to call this O(n) times when n child
processes are exiting at once, the time taken shouldn't be a significant
responsiveness problem; I measured 250µs max while closing a tab for a
popular website with a large number of out-of-process iframes. This can
be optimized later to batch the messages if needed.

The other potential performance issue is if a remote waitpid is blocked
behind a remote fork. However, this is still an improvement over the
non-forkserver status quo: if we clone the parent process, that blocks
all threads for, empirically, about 15ms; cloning the fork server is
about 2.5ms, and that would only block another thread if it happens to
race with another child process exiting and not every time. We also
prelaunch content processes during idle time, so this type of jank is
less likely to be user-visible in any case.

Importantly, the way this patch works, the waitpid is synchronous with
respect to the thread in the parent process making the requeste, so as
long as the fork server hasn't crashed then we can know that the process
with a given pid is in fact the process we expect it to be.

Because the fork server is used to remotely waitpid child processes,
and this can be done until very late in shutdown, this patch extends the
fork server's lifetime. This is a little complicated, because destroying a
GeckoChildProcessHost requires the IPC I/O thread to be acceptaing
tasks, and this is later than that. So, this adds a method to extract a
GeckoChildProcessHost's process handle (on Unix, the pid) and take
responsibility for handling it, and this is done for the fork server.

Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b32f24f0aa7d Part 0: inline ForkServiceChild::OnMessageReceived. r=ipc-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/b196790ed141 Part 1: add thread safety to the fork server client. r=ipc-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/7c04f8c2c5a1 Part 2: remote `waitpid`/`SIGCHLD` to/from the fork server. r=ipc-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/8299aa1dbb2b Part 3: delay fork server shutdown until browser shutdown. r=ipc-reviewers,nika
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: