Last Comment Bug 774131 - Optimize process-launching code
: Optimize process-launching code
Status: RESOLVED FIXED
: main-thread-io
Product: Core
Classification: Components
Component: IPC (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla17
Assigned To: Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
:
: [PTO to Dec5] Bill McCloskey (:billm)
Mentors:
Depends on:
Blocks: b2g-e10s-work
  Show dependency treegraph
 
Reported: 2012-07-15 16:40 PDT by Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
Modified: 2012-08-08 09:31 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
+


Attachments
part 1: Refactor GeckoChildProcessHost to use a state enum and eagerly create a ProcessHandle (7.06 KB, patch)
2012-08-07 01:23 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
bent.mozilla: review+
Details | Diff | Splinter Review
part 2: Add GeckoChildProcessHost::LaunchAndWaitForProcessHandle() to do that, use it in ContentParnet, and share more process-launching code (8.69 KB, patch)
2012-08-07 01:23 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
bent.mozilla: review+
Details | Diff | Splinter Review

Description Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-15 16:40:01 PDT
As we re-discovered in bug 745148, our process-launching code is in a pretty sad state.  We have to initialize plugins synchronously, so it doesn't really affect them, but after bug 745148 we're going to have sadmaking for content processes.

The basic problem is that to share shmem (on win32), use PFoo::Open()/Bridge() (on win32), and kill subprocesses, we need the subprocess handle.  The first two are needed for ShareHandle(), which works different on POSIX, and doesn't requite the process handle.

On POSIX we can get the subprocess handle immediately on launch, after fork(), without having to block on any disk IO.  On win32, CreateProcess() does at least the win32 equivalent of stat(), so there's no way to get the HANDLE without disk IO (using public APIs).

There's an additional problem, which is that GeckoChildProcessHost::SyncLaunch() is additionally dumb, and doesn't just wait on fork()/CreateProcess(): it waits for an ack back from the subprocess, through its IPC transport.  That blocks SyncLaunch() not just on fork/createprocess, but also on image loading, dynamic linking, and XPCOM init.  Bah.

We should optimize this.  On POSIX we're golden, we can make SyncLaunch() almost free.  This is the important case for b2g.

On win32, we're kinda screwed, no way to make SyncLaunch() cheap.  But we can rejigger things so that we only wait on CreateProcess(), not on the subprocess ack.

I think we should probably fix this bug for b2g v1.
Comment 1 Benjamin Smedberg [:bsmedberg] 2012-07-15 17:36:54 PDT
I'm not sure I understand. Why would b2g be using synclaunch at all? Wasn't the whole point of AsyncLaunch to allow content processes to start without blocking on any of that?
Comment 2 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-15 17:43:31 PDT
See bug 745148 comment 9.  It's not b2g-specific, it's required for content-process-and-omtc.
Comment 3 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-15 17:43:45 PDT
*processes
Comment 4 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-20 00:49:54 PDT
After async animations and gralloc direct texturing land, this will be the biggest user-visible jank when loading apps.  It's visible when loading apps on the Nexus S.

Need to profile to see what's slow, but XPCOM init is the most likely candidate.

The corollary to this is that, although we can hide the user-visible jank with async animations, this will be among the candidates for biggest app-startup latency, so we might need to look at the pre-fork approach for reducing that.
Comment 5 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-06 22:00:18 PDT
Here's a sampling of the amount of time we block loading various OOP apps in the parent process

I/Gecko   (  370): [PARENT] SyncLaunch() took 527.735 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 533.286 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 531.068 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 532.735 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 531.793 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 536.337 ms
I/Gecko   (  370): [PARENT] SyncLaunch() took 532.292 ms

Ouch! :(
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-06 22:14:40 PDT
These times are again surprisingly consistent.  This one is representative

I/Gecko   (  628): LAUNCH [Child] ContentProcess::Init()
I/Gecko   (  537): LAUNCH [Parent] SyncLaunch() total 531.154 ms
I/Gecko   (  628): LAUNCH [Child]     mContent.Init() --- 16.2264 ms
I/Gecko   (  628): LAUNCH [Child]     mXREEmbed.Start() ---- 422.134 ms
I/Gecko   (  628): LAUNCH [Child]     mContent.InitXPCOM() --- 423.867 ms

The parent process is blocked on the child until after mContent.Init() (approximately), so even though we're taking ~400ms to initialize XPCOM, it's not on the critical path.  Something else in process startup is taking possibly up to 500ms.
Comment 7 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-06 22:46:19 PDT
The time is definitely spent between fork() and main() (as expected), but now more sophisticated profiling tools are needed.

I/Gecko   (  863): LAUNCH [Parent] Before fork(), time is 36 sec 321 ms
I/Gecko   (  964): LAUNCH [Child] In XRE_InitChildProcess(), time is 36 sec 859 ms
I/Gecko   (  964): LAUNCH [Child] ContentProcess::Init()
I/Gecko   (  863): LAUNCH [Parent] SyncLaunch() total 543.664 ms
I/Gecko   (  964): LAUNCH [Child]     mContent.Init() --- 17.982 ms
I/Gecko   (  964): LAUNCH [Child]     mXREEmbed.Start() ---- 418.869 ms
I/Gecko   (  964): LAUNCH [Child]     mContent.InitXPCOM() --- 426.522 ms

At any rate, this all means we'll shave 500ms of UI blocking off on app launch with the upcoming patch here (on POSIX), but there's more work after that ...
Comment 8 Mike Hommey [:glandium] 2012-08-06 23:16:56 PDT
You're not using elfhack, are you?
Comment 9 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-07 00:17:41 PDT
I/Gecko   ( 1107): LAUNCH [Parent] Before fork(), time is 11 sec 960 ms
I/Gecko   ( 1107): LAUNCH [Parent] SyncLaunch() total 12.6 ms

\o/

Oops, but then

I/Gecko   ( 1107): [Parent 1107] ###!!! ABORT: Creating second IPC server for '' while first still exists: file /home/cjones/mozilla/inbound/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 106
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-07 00:18:14 PDT
(In reply to Mike Hommey [:glandium] from comment #8)
> You're not using elfhack, are you?

Nope.
Comment 11 Mike Hommey [:glandium] 2012-08-07 00:27:59 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #10)
> (In reply to Mike Hommey [:glandium] from comment #8)
> > You're not using elfhack, are you?
> 
> Nope.

I'd expect vast improvements from elfhack.
Comment 12 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-07 01:23:12 PDT
Created attachment 649562 [details] [diff] [review]
part 1: Refactor GeckoChildProcessHost to use a state enum and eagerly create a ProcessHandle
Comment 13 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-07 01:23:43 PDT
Created attachment 649564 [details] [diff] [review]
part 2: Add GeckoChildProcessHost::LaunchAndWaitForProcessHandle() to do that, use it in ContentParnet, and share more process-launching code
Comment 14 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-08-07 15:28:08 PDT
Comment on attachment 649562 [details] [diff] [review]
part 1: Refactor GeckoChildProcessHost to use a state enum and eagerly create a ProcessHandle

Review of attachment 649562 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/GeckoChildProcessHost.h
@@ +80,5 @@
> +  // This value must be accessed while holding mMonitor.
> +  enum {
> +    // This object has been constructed, but the OS process has not
> +    // yet.
> +    CONSTRUCTED,

Nit, make this = 0.
Comment 15 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-08-07 15:33:00 PDT
Comment on attachment 649564 [details] [diff] [review]
part 2: Add GeckoChildProcessHost::LaunchAndWaitForProcessHandle() to do that, use it in ContentParnet, and share more process-launching code

Review of attachment 649564 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!
Comment 16 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-07 16:07:59 PDT
(In reply to ben turner [:bent] from comment #14)
> Comment on attachment 649562 [details] [diff] [review]
> part 1: Refactor GeckoChildProcessHost to use a state enum and eagerly
> create a ProcessHandle
> 
> Review of attachment 649562 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/GeckoChildProcessHost.h
> @@ +80,5 @@
> > +  // This value must be accessed while holding mMonitor.
> > +  enum {
> > +    // This object has been constructed, but the OS process has not
> > +    // yet.
> > +    CONSTRUCTED,
> 
> Nit, make this = 0.

Done.

BTW, try told me I needed to rename ERROR -> PROCESS_ERROR (because windows.h), so I went ahead and renamed this enum value too for consistency.

Note You need to log in before you can comment on or make changes to this bug.