Closed Bug 774131 Opened 12 years ago Closed 12 years ago

Optimize process-launching code

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17
blocking-kilimanjaro +
blocking-basecamp +

People

(Reporter: cjones, Assigned: cjones)

References

Details

(Keywords: main-thread-io)

Attachments

(2 files)

As we re-discovered in bug 745148, our process-launching code is in a pretty sad state.  We have to initialize plugins synchronously, so it doesn't really affect them, but after bug 745148 we're going to have sadmaking for content processes.

The basic problem is that to share shmem (on win32), use PFoo::Open()/Bridge() (on win32), and kill subprocesses, we need the subprocess handle.  The first two are needed for ShareHandle(), which works different on POSIX, and doesn't requite the process handle.

On POSIX we can get the subprocess handle immediately on launch, after fork(), without having to block on any disk IO.  On win32, CreateProcess() does at least the win32 equivalent of stat(), so there's no way to get the HANDLE without disk IO (using public APIs).

There's an additional problem, which is that GeckoChildProcessHost::SyncLaunch() is additionally dumb, and doesn't just wait on fork()/CreateProcess(): it waits for an ack back from the subprocess, through its IPC transport.  That blocks SyncLaunch() not just on fork/createprocess, but also on image loading, dynamic linking, and XPCOM init.  Bah.

We should optimize this.  On POSIX we're golden, we can make SyncLaunch() almost free.  This is the important case for b2g.

On win32, we're kinda screwed, no way to make SyncLaunch() cheap.  But we can rejigger things so that we only wait on CreateProcess(), not on the subprocess ack.

I think we should probably fix this bug for b2g v1.
I'm not sure I understand. Why would b2g be using synclaunch at all? Wasn't the whole point of AsyncLaunch to allow content processes to start without blocking on any of that?
See bug 745148 comment 9.  It's not b2g-specific, it's required for content-process-and-omtc.
After async animations and gralloc direct texturing land, this will be the biggest user-visible jank when loading apps.  It's visible when loading apps on the Nexus S.

Need to profile to see what's slow, but XPCOM init is the most likely candidate.

The corollary to this is that, although we can hide the user-visible jank with async animations, this will be among the candidates for biggest app-startup latency, so we might need to look at the pre-fork approach for reducing that.
blocking-basecamp: --- → ?
blocking-kilimanjaro: --- → ?
blocking-basecamp: ? → +
blocking-kilimanjaro: ? → +
Here's a sampling of the amount of time we block loading various OOP apps in the parent process

I/Gecko   (  370): [PARENT] SyncLaunch() took 527.735 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 533.286 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 531.068 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 532.735 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 531.793 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 536.337 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 532.292 ms

Ouch! :(
Assignee: nobody → jones.chris.g
These times are again surprisingly consistent.  This one is representative

I/Gecko   (  628): LAUNCH [Child] ContentProcess::Init()
I/Gecko   (  537): LAUNCH [Parent] SyncLaunch() total 531.154 ms
I/Gecko   (  628): LAUNCH [Child]     mContent.Init() --- 16.2264 ms
I/Gecko   (  628): LAUNCH [Child]     mXREEmbed.Start() ---- 422.134 ms
I/Gecko   (  628): LAUNCH [Child]     mContent.InitXPCOM() --- 423.867 ms

The parent process is blocked on the child until after mContent.Init() (approximately), so even though we're taking ~400ms to initialize XPCOM, it's not on the critical path.  Something else in process startup is taking possibly up to 500ms.
The time is definitely spent between fork() and main() (as expected), but now more sophisticated profiling tools are needed.

I/Gecko   (  863): LAUNCH [Parent] Before fork(), time is 36 sec 321 ms
I/Gecko   (  964): LAUNCH [Child] In XRE_InitChildProcess(), time is 36 sec 859 ms
I/Gecko   (  964): LAUNCH [Child] ContentProcess::Init()
I/Gecko   (  863): LAUNCH [Parent] SyncLaunch() total 543.664 ms
I/Gecko   (  964): LAUNCH [Child]     mContent.Init() --- 17.982 ms
I/Gecko   (  964): LAUNCH [Child]     mXREEmbed.Start() ---- 418.869 ms
I/Gecko   (  964): LAUNCH [Child]     mContent.InitXPCOM() --- 426.522 ms

At any rate, this all means we'll shave 500ms of UI blocking off on app launch with the upcoming patch here (on POSIX), but there's more work after that ...
You're not using elfhack, are you?
I/Gecko   ( 1107): LAUNCH [Parent] Before fork(), time is 11 sec 960 ms
I/Gecko   ( 1107): LAUNCH [Parent] SyncLaunch() total 12.6 ms

\o/

Oops, but then

I/Gecko   ( 1107): [Parent 1107] ###!!! ABORT: Creating second IPC server for '' while first still exists: file /home/cjones/mozilla/inbound/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 106
(In reply to Mike Hommey [:glandium] from comment #8)
> You're not using elfhack, are you?

Nope.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #10)
> (In reply to Mike Hommey [:glandium] from comment #8)
> > You're not using elfhack, are you?
> 
> Nope.

I'd expect vast improvements from elfhack.
Comment on attachment 649562 [details] [diff] [review]
part 1: Refactor GeckoChildProcessHost to use a state enum and eagerly create a ProcessHandle

Review of attachment 649562 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/GeckoChildProcessHost.h
@@ +80,5 @@
> +  // This value must be accessed while holding mMonitor.
> +  enum {
> +    // This object has been constructed, but the OS process has not
> +    // yet.
> +    CONSTRUCTED,

Nit, make this = 0.
Attachment #649562 - Flags: review?(bent.mozilla) → review+
Comment on attachment 649564 [details] [diff] [review]
part 2: Add GeckoChildProcessHost::LaunchAndWaitForProcessHandle() to do that, use it in ContentParnet, and share more process-launching code

Review of attachment 649564 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!
Attachment #649564 - Flags: review?(bent.mozilla) → review+
(In reply to ben turner [:bent] from comment #14)
> Comment on attachment 649562 [details] [diff] [review]
> part 1: Refactor GeckoChildProcessHost to use a state enum and eagerly
> create a ProcessHandle
> 
> Review of attachment 649562 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/GeckoChildProcessHost.h
> @@ +80,5 @@
> > +  // This value must be accessed while holding mMonitor.
> > +  enum {
> > +    // This object has been constructed, but the OS process has not
> > +    // yet.
> > +    CONSTRUCTED,
> 
> Nit, make this = 0.

Done.

BTW, try told me I needed to rename ERROR -> PROCESS_ERROR (because windows.h), so I went ahead and renamed this enum value too for consistency.
https://hg.mozilla.org/mozilla-central/rev/ba6d6b6b58bc
https://hg.mozilla.org/mozilla-central/rev/b93b161b1ad4
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: