Move process launch off the I/O thread and into a thread pool

RESOLVED FIXED in Firefox 67

Status

()

defect
P1
normal
RESOLVED FIXED
9 months ago
a month ago

People

(Reporter: jld, Assigned: jld)

Tracking

(Regressed 1 bug)

unspecified
mozilla67
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox67 fixed)

Details

(Whiteboard: [qf:p2:responsiveness])

Attachments

(4 attachments)

Process launch can take nontrivial amounts of time, for reasons that aren't all under our control, which is why we're trying to move it off the main thread for responsiveness.  However, the parent process I/O thread mediates all IPDL communication with child processes, so blocking it is also not ideal.

It looks like what actually needs to run on the I/O thread is, at most, the creation of the channel; the launch itself should be able to run on any thread.

This doesn't need its own thread pool or dedicated threads, just a place to send runnables that isn't in contention with anything important, but giving this its own thread pool may be the easiest thing to do with the infrastructure we currently have.
Actually, on Windows we do need a dedicated thread, and specifically *a* thread: https://searchfox.org/mozilla-central/rev/43969de34f5d4b113133d090f024e5eed7a82af0/security/sandbox/chromium/sandbox/win/src/broker_services.cc#323

(Whereas on Mac, we'll probably want thread pool size >1 because of the Mach IPC setup.)
Also, AutoSetProfilerEnvVarsForChildProcess will break if there are overlapping launches on multiple threads (and in general it's preferred to pass strings to LaunchApp instead of side-effecting the parent process).
Whiteboard: [qf] → [qf:p1:f64]
We can directly set environment variables for the child process on
all platforms now, instead of changing the parent's environment and
inheriting the changes.  This simplifies memory management, but more
importantly it's necessary for thread safety to allow launching
processes from a thread pool.

Depends on D8944
Launching processes takes enough time that we should avoid blocking the
parent process's IPC I/O thread for it; it's less bad for responsiveness
than blocking the main thread, but it's not good.

On Windows we need to use a dedicated thread, because the sandbox isn't
thread-safe and it asserts that the same thread is used for every
launch.  Otherwise, a thread pool is used.

Depends on D8945

Updated

7 months ago
Priority: -- → P1

Comment 5

6 months ago
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8fa5e81ad801
Set profiler env vars in child processes without side-effecting the parent process. r=mstange
https://hg.mozilla.org/integration/autoland/rev/8b1f88d7bfeb
Move child process launch off the I/O thread. r=mccr8
This was backed out as part of the big async launch patch stack, and I believe it's responsible for both of the failures that were seen:

1. This means that the I/O thread can be woken up for OnChannel{Connected,Error} before the launch thread has finished the launch and set the state to PROCESS_CREATED.  I added some assertions about the state in bug 1446161, inspired by review comments, which caught this.

2. Web Replay launches content processes from the middleman process, which doesn't initialize SharedThreadPool.  This should be simple to fix.
Also, RegisterChildCrashAnnotationFileDescriptor isn't thread-safe.

https://phabricator.services.mozilla.com/D8946 needs re-review for the changes I made; I tried to set that in Phabricator but I don't know how well it worked.

Flags: needinfo?(continuation)
Flags: needinfo?(continuation)

Comment 10

4 months ago
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e818109ab4e1
Set profiler env vars in child processes without side-effecting the parent process. r=mstange
https://hg.mozilla.org/integration/autoland/rev/5a5e1801bb36
Fix thread-safety of crash reporter pid map. r=gsvelto

Comment 11

4 months ago
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a0cf88b1fe5b
Move child process launch off the I/O thread. r=mccr8

Comment 12

4 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Depends on: 1519145

Backed out the main patch (D8946) for causing bug 1519145.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

4 months ago
Whiteboard: [qf:p1:f64] → [qf:p2:responsiveness]
Depends on: 1521003

In order to enable asynchronous launch, destruction of
GeckoChildProcessHost (and its subclasses) has to be delayed until after
launching (or anything else that might be made asynchronous in the
future) has completed, to prevent use-after-free. However, there are
other dependencies on process hosts always being destroyed on the I/O
thread, so refcounting would be difficult to use.

Instead, GeckoChildProcessHost now may not be destroyed directly, but
must go through a method that handles the scheduling.

There are also some minor cleanups to the affected headers (removed
duplicate access modifiers, and made PluginProcessParent final).

Depends on D18010

No longer blocks: 1521839

Comment 16

4 months ago
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/242953b250a5
Synchronize GeckoChildProcessHost destruction with launching. r=mccr8
https://hg.mozilla.org/integration/autoland/rev/073aa330041c
Move child process launch off the I/O thread. r=mccr8
Status: REOPENED → RESOLVED
Last Resolved: 4 months ago3 months ago
Resolution: --- → FIXED

Are you intending uplift to beta 66? I notice the target milestone is set to 66 here though this landed on 67 nightly. Maybe a little complicated for uplift though.

Flags: needinfo?(jld)

The flags probably got set like that because this got backed out twice and needed some time to fix. It's a risky change, and it's for a performance improvement that's probably not significant pre-Fission (and not well quantified in general), so no uplift.

Flags: needinfo?(jld)
Target Milestone: mozilla66 → mozilla67

Perfect, that's what I like to hear right now in beta. Thanks.

Updated

a month ago
Regressions: 1544412
You need to log in before you can comment on or make changes to this bug.