Closed Bug 1348361 Opened 3 years ago Closed 2 years ago

Remove sync IPC when launching a child process

Categories

(Core :: IPC, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: ehsan, Assigned: Alex_Gaynor)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, Whiteboard: [qf:f63][qf:p1][bhr])

Attachments

(3 files, 8 obsolete files)

59 bytes, text/x-review-board-request
jld
: review+
Details
59 bytes, text/x-review-board-request
jld
: review+
Details
59 bytes, text/x-review-board-request
jld
: review+
Details
https://perfht.ml/2mMU6Di

Bill, is this something we can fix?  And how hard is it?
Flags: needinfo?(wmccloskey)
(In reply to :Ehsan Akhgari from comment #0)
> https://perfht.ml/2mMU6Di
> 
> Bill, is this something we can fix?  And how hard is it?

Is this the right profile? The problem you were showing me earlier was with the call to LaunchAndWaitForProcessHandle (http://searchfox.org/mozilla-central/rev/006005beff40d377cfd2f69d3400633c5ff09127/dom/ipc/ContentParent.cpp#2013). This profile doesn't seem to have that method on the call stack from my quick look.
Yes, I think we can fix this. The reason we're waiting is to get a HANDLE or pid for the new process. That information isn't directly used right away. However, it is passed around to all sorts of other channels shortly after startup. This search pretty much turns up the places where we use this handle:

http://searchfox.org/mozilla-central/search?q=symbol:_ZNK7mozilla3ipc17IToplevelProtocol8OtherPidEv%2C_ZNK7mozilla3ipc9IProtocol8OtherPidEv&redirect=false

I think the best way to fix this would be to create some sort of ProcessHandlePromise abstraction. You could do something like SyncWaitOtherPid() and it would do the condvar wait thing if the process hasn't started. But most of the time we would just pass the promise around from one channel to another. The channels don't actually care about the handle--they just need it in case someone asks them for it.

We would also have to make the startup code more async, but I don't think that should be too hard. The GPU process code already works that way, so we can use it as a template:
http://searchfox.org/mozilla-central/rev/c2a60adfc7b16761cbbfcefa2093fa402ba1aa69/gfx/ipc/GPUProcessHost.cpp#44
It calls AsyncLaunch and then it waits for the OnChannelConnected method call before it opens the channel and runs does anything to the process.
Flags: needinfo?(wmccloskey)
To state the obvious, of course: the user is still going to have to wait for those 60ms or whatever before the content process can do anything. But at least the main process would be unblocked.
Is CreateProcess On Windows on this path too?  See https://bugs.llvm.org/show_bug.cgi?id=20253#c4.  There, an overhead of 60ms was measured for CreateProcess.  :(
Whiteboard: [qf:p1]
See Also: → 1350634
We call CreateProcess on a separate thread. Sadly, that thread is the Gecko I/O thread. Janking that is, in many ways, worse than janking the main thread. So we should fix that too.
In GeckoView on Android, we run the content process in an Android service [1] to enable Java VM support.
The services need to be defined at compile time in the manifest [2], which puts a limit on the amount of content services we can start (currently and ideally it is 1, due to increased overhead).

Due to the Gecko thread is blocking in LaunchAndWaitForProcessHandle, we run into deadlock situations with compositor creation [3] and it prevents us from deferring the service start to handle the shutdown/startup race of the service.

An async content process launch without Gecko thread blocking would allow us to handle the platform-specific restrictions without undesired workarounds.


[1] https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/gecko/process/GeckoServiceChildProcess.java
[2] https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/AndroidManifest.xml#49
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1350924#c4
Taking a stab at this.
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → ASSIGNED
Blocks: 1350924
Attached patch wip (obsolete) — Splinter Review
To quickly summarize: There appear to be three points that need to be addressed here (rephrasing comment 2 and comment 5):
1. Make startup code more async by replacing LaunchAndWaitForProcessHandle with AsyncLaunch in ContentParent::LaunchSubprocess, using the GPU process code as template.
2. Move the CreateProcess call away from the Gecko I/O thread to avoid janking it.
3. Introduce some form of ProcessHandlePromise abstraction to be used by IProtocol::OtherPid() instead of the actual PID while the subprocess is launching.

This patch is a wip to address the first point. The issue that I'm running into is that we currently seem to delay launching the subprocess until we need it, i.e. when we call TryRemoteBrowser[1]. We currently expect a sync response from these calls and immediately try to send messages to the subprocess, such as [2] and [3]. This seems to be a bit in contrast to comment 2, where we assumed that we didn't need to immediately send messages to the subprocess and that some form of ProcessHandlePromise would be sufficient.

My question is: Should we make TryRemoteBrowser async and change the callers such that they no longer expect sync responses? I have not checked if this is possible or how difficult this would be yet. Or, should we instead start using an async form of ContentParent::PreallocateProcess() and wait in TryRemoteBrowser should the subprocess not have launched yet?

Regarding point 2: What thread should we move the CreateProcess call to?


[1] https://dxr.mozilla.org/mozilla-central/search?q=%2Bcallers%3AnsFrameLoader%3A%3ATryRemoteBrowser%28%29
[2] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#622
[3] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#792-793
Flags: needinfo?(wmccloskey)
(In reply to Stephen A Pohl [:spohl] from comment #8)
> Created attachment 8858562 [details] [diff] [review]
> wip
> 
> To quickly summarize: There appear to be three points that need to be
> addressed here (rephrasing comment 2 and comment 5):
> 1. Make startup code more async by replacing LaunchAndWaitForProcessHandle
> with AsyncLaunch in ContentParent::LaunchSubprocess, using the GPU process
> code as template.
> 2. Move the CreateProcess call away from the Gecko I/O thread to avoid
> janking it.
> 3. Introduce some form of ProcessHandlePromise abstraction to be used by
> IProtocol::OtherPid() instead of the actual PID while the subprocess is
> launching.

Yes, this looks right.

> This patch is a wip to address the first point. The issue that I'm running
> into is that we currently seem to delay launching the subprocess until we
> need it, i.e. when we call TryRemoteBrowser[1]. We currently expect a sync
> response from these calls and immediately try to send messages to the
> subprocess, such as [2] and [3]. This seems to be a bit in contrast to
> comment 2, where we assumed that we didn't need to immediately send messages
> to the subprocess and that some form of ProcessHandlePromise would be
> sufficient.

Just to clarify: the messages you pointed out are not sync. They don't expect a response. The only thing the call returns is whether it succeeded or failed.

> My question is: Should we make TryRemoteBrowser async and change the callers
> such that they no longer expect sync responses? I have not checked if this
> is possible or how difficult this would be yet. Or, should we instead start
> using an async form of ContentParent::PreallocateProcess() and wait in
> TryRemoteBrowser should the subprocess not have launched yet?

I'm hoping we don't need to worry about those messages. As far as I can tell, we create the channel before we ever even launch the subprocess:
http://searchfox.org/mozilla-central/rev/214345204f1e7d97abb571b7992b6deedb5ff98f/ipc/glue/GeckoChildProcessHost.cpp#598

So I think we should be able to send messages on the channel before the new process has even started. However, the fact that you're needinfo'ing me on this suggests that you ran into a problem with that :-). Did something fail?

> Regarding point 2: What thread should we move the CreateProcess call to?

Maybe we could use the stream transport service? That's a generic thread to service blocking I/O, and I don't think that launching a process would be out of scope for it. It's not really my area though.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #9)
> (In reply to Stephen A Pohl [:spohl] from comment #8)
> > This patch is a wip to address the first point. The issue that I'm running
> > into is that we currently seem to delay launching the subprocess until we
> > need it, i.e. when we call TryRemoteBrowser[1]. We currently expect a sync
> > response from these calls and immediately try to send messages to the
> > subprocess, such as [2] and [3]. This seems to be a bit in contrast to
> > comment 2, where we assumed that we didn't need to immediately send messages
> > to the subprocess and that some form of ProcessHandlePromise would be
> > sufficient.
> 
> Just to clarify: the messages you pointed out are not sync. They don't
> expect a response. The only thing the call returns is whether it succeeded
> or failed.

I worded this poorly. What I meant to say is that we currently launch the subprocess synchronously via TryRemoteBrowser and once this call completes, we currently expect to have a successfully launched subprocess to work with.

> > My question is: Should we make TryRemoteBrowser async and change the callers
> > such that they no longer expect sync responses? I have not checked if this
> > is possible or how difficult this would be yet. Or, should we instead start
> > using an async form of ContentParent::PreallocateProcess() and wait in
> > TryRemoteBrowser should the subprocess not have launched yet?
> 
> I'm hoping we don't need to worry about those messages. As far as I can
> tell, we create the channel before we ever even launch the subprocess:
> http://searchfox.org/mozilla-central/rev/
> 214345204f1e7d97abb571b7992b6deedb5ff98f/ipc/glue/GeckoChildProcessHost.
> cpp#598
> 
> So I think we should be able to send messages on the channel before the new
> process has even started. However, the fact that you're needinfo'ing me on
> this suggests that you ran into a problem with that :-). Did something fail?

With the patch as it stands (and using a local debug build) I'm running into an ASSERT_UNLESS_FUZZING in ContentProcessManager::AllocateTabId here:

  auto iter = mContentParentMap.find(aChildCpId);
  if (NS_WARN_IF(iter == mContentParentMap.end())) {
    ASSERT_UNLESS_FUZZING();
    return TabId(0);
  }

Callstack:
0   XUL                           	0x000000010f5b0486 mozilla::dom::ContentProcessManager::AllocateTabId(mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&) + 742 (ContentProcessManager.cpp:147)
1   XUL                           	0x000000010f59de80 mozilla::dom::ContentParent::AllocateTabId(mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&) + 80 (ContentParent.cpp:4186)
2   XUL                           	0x000000010f59f81f mozilla::dom::ContentParent::CreateBrowser(mozilla::dom::TabContext const&, mozilla::dom::Element*, mozilla::dom::ContentParent*, mozilla::dom::TabParent*) + 991 (ContentParent.cpp:1193)
3   XUL                           	0x000000010d4b52f9 nsFrameLoader::TryRemoteBrowser() + 2457 (nsFrameLoader.cpp:2961)
[...]

Allocating tab IDs is being reworked in bug 1337064. Disabling this assert just caused subsequent asserts to fail. I was trying to look past these asserts and see where we might ultimately fail because the existing code expected the subprocess to be successfully launched. It seemed like the code following the calls to TryRemoteBrowser would be that place because we're sending messages to the subprocess (see links in comment 8). I assumed that we would want to delay sending these messages until the subprocess is running, but it sounds like this should be possible based on your comment. I'll go back and work through the asserts to better understand what's left to do here.
Hi Stephen,

Were you blocked on bug 1337064 here?  Just wanted to let you know that bug has landed recently and seems like it stuck.  :-)
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #11)
> Hi Stephen,
> 
> Were you blocked on bug 1337064 here?  Just wanted to let you know that bug
> has landed recently and seems like it stuck.  :-)

Thanks for the heads up! I have been unexpectedly sidetracked by drag & drop regressions that were caused by the replacement of the native drag & drop API on OSX (bug 1235162). At this time, I have patches for 2 of the 4 regressions. I'm hoping to get back to this bug shortly.
Thanks Stephen!

FWIW this shows up in the BHR data, see https://people-mozilla.org/~mlayzell/bhr/20170429/all.html#1466.
Whiteboard: [qf:p1] → [qf:p1][bhr]
A quick note to say that I was able to clear my plate of drag & drop regressions and I'm back on this.
I've finally had time to look into this some more and here is what I'm running into. The wip patch (attachment 8858562 [details] [diff] [review]) launches the child process asynchronously. For the browser to be able to handle this, we need to make sure that any code that follows the previously synchronous launching of the child process can handle the fact that there may not be a running child process yet. This involves more than sending messages on the channel. For example, the first assert I'm running into with this patch applied is the following:

0   XUL                           	0x000000011780a135 mozilla::dom::ContentProcessManager::RegisterRemoteFrame(mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&) + 741 (ContentProcessManager.cpp:146)
1   XUL                           	0x000000011780c1b5 mozilla::dom::ContentParent::CreateBrowser(mozilla::dom::TabContext const&, mozilla::dom::Element*, mozilla::dom::ContentParent*, mozilla::dom::TabParent*, unsigned long long) + 1253 (ContentParent.cpp:1210)
2   XUL                           	0x000000011574c0f7 nsFrameLoader::TryRemoteBrowser() + 2919 (nsFrameLoader.cpp:2986)
3   XUL                           	0x000000011574c4cb nsFrameLoader::ShowRemoteFrame(mozilla::gfx::IntSizeTyped<mozilla::ScreenPixel> const&, nsSubDocumentFrame*) + 187 (nsFrameLoader.cpp:1249)
4   XUL                           	0x000000011574f1fa nsFrameLoader::Show(int, int, int, int, nsSubDocumentFrame*) + 202 (nsFrameLoader.cpp:1103)
5   XUL                           	0x00000001184c695b nsSubDocumentFrame::ShowViewer() + 251 (nsSubDocumentFrame.cpp:185)
6   XUL                           	0x00000001184fba6c AsyncFrameInit::Run() + 124 (nsSubDocumentFrame.cpp:92)
7   XUL                           	0x000000011546a58c nsContentUtils::RemoveScriptBlocker() + 492 (nsContentUtils.cpp:5546)
[...]

This assert occurs because mContentParentMap in ContentProcessManager is still empty, because no child ID has been added to the map yet. It seems like we would need to delay processing this code until the process has launched, for example by replacing the call to nsFrameLoader::Show in nsSubDocumentFrame::ShowViewer with an asynchronous call with a callback that will set nsSubDocumentFrame's mDidCreateDoc once the child process has launched, or failed to launch asynchronously. However, this would need to be repeated for every single call to TryRemoteBrowser[1]. This quickly becomes hairy. For example, TryRemoteBrowser is also called by nsFrameLoader::GetLoadContext, which gets called in about 30 different places[2]. If we were to make GetLoadContext asynchronous, we would need to add callbacks for every single caller of GetLoadContext. Then, each callback would have to process any code that follows the call to GetLoadContext in the current, synchronous sequence.

:billm, am I missing something here? Right now, the effort to make this work seems extremely high.

[1] https://dxr.mozilla.org/mozilla-central/search?q=TryRemoteBrowser
[2] https://dxr.mozilla.org/mozilla-central/search?q=GetLoadContext(&redirect=false
Flags: needinfo?(wmccloskey)
You need to call AddContentProcess before TryRemoteBrowser is called. It looks like you've moved that into ContentParent::InitAfterConnect, but I don't think that's right. It should happen immediately after mSubprocess->Launch.

It seems like a lot of other stuff in InitAfterConnect needs to happen right away as well. For example, I think you need to call Open right away or else you won't be able to send anything over the IPC channel. For now you'll have to make up a value for aOtherPid until #3 is fixed from comment 8.
Flags: needinfo?(wmccloskey)
See Also: → 1373660
Addressed feedback. This seems to work well and as expected for me locally. I launched a try run[1], but I'm having difficulty reading the results. I seem to be hitting a number of failures, but all the failing tests that I've tried locally have been passing for me...

Note that I have not had to pass around pid promises (yet?), as was originally believed in comment 2. My observation was that the various inits etc. (InitInternal(), Init(), etc.) that occur right after the process launch seem to take up more time than actually launching the process. By the time OtherPid() was called for the first time, the process had launched and we had a valid pid to work with. Should we find that we now spend a lot of time waiting for OtherPid() to return (because we're waiting for a valid pid to become available), we could still change this to pass ProcessIdPromises around, instead of an actual pid.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1147354b10c59e2d905b7c721ad1202f2606fd11
Attachment #8858562 - Attachment is obsolete: true
Attachment #8891816 - Flags: feedback?(wmccloskey)
Comment on attachment 8891816 [details] [diff] [review]
Part 1: Launch child processes asynchronously

Review of attachment 8891816 [details] [diff] [review]:
-----------------------------------------------------------------

This mostly looks pretty good. The lack of a Notify call could certainly explain some timeouts.

::: dom/ipc/ContentParent.cpp
@@ -2073,5 @@
> -
> -  Open(mSubprocess->GetChannel(), procId);
> -
> -#ifdef MOZ_CODE_COVERAGE
> -  Unused << SendShareCodeCoverageMutex(CodeCoverageHandler::Get()->GetMutexHandle(procId));

Can you just move this to ContentParent::OnChannelConnected? It seems to run at pretty much the same time as InitAfterConnect. Then you could get rid of InitAfterConnect.

::: dom/ipc/ContentProcessHost.h
@@ +20,5 @@
> +// case, is a ContentChild.
> +//
> +// ContentProcessHosts are allocated and managed by ContentProcessManager. For
> +// all intents and purposes it is a singleton, though more than one may be
> +// allocated at a time due to its shutdown being asynchronous.

The second paragraph of this comment is not correct. ContentProcessHosts are allocated by ContentParent, and they're not singletons. There is one per process.

::: ipc/glue/ProtocolUtils.cpp
@@ +612,5 @@
>  {
> +  if (mOtherPidPromise.state == base::ProcessIdPromise::eResolved) {
> +    return mOtherPidPromise.pid;
> +  }
> +  MonitorAutoLock lock(const_cast<Monitor&>(mMonitor));

Perhaps it would be best to declare the monitor as mutable.

@@ +629,2 @@
>  {
> +  mOtherPidPromise = {aOtherPid, aState};

I think you need to take mMonitor here and also call NotifyAll on it (although Notify is probably sufficient since only the main thread is likely to be waiting).

::: ipc/glue/ProtocolUtils.h
@@ +422,4 @@
>    private:
>      ProtocolId mProtocolId;
>      UniquePtr<Transport> mTrans;
> +    base::ProcessIdPromise mOtherPidPromise;

Rather than define this Promise data type in process.h, I think it would be better just to have two fields here for the pid and the status.
Attachment #8891816 - Flags: feedback?(wmccloskey) → feedback+
Thank you for the feedback! I've sent this to try and we seem to be in decent shape (all the failures seem to be known intermittents), but I would welcome a second look:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbc4ced238741648c7b98bd06a90e1e635a7bf33
Attachment #8891816 - Attachment is obsolete: true
Attachment #8894265 - Flags: review?(wmccloskey)
Here is a startup profile on the reference hardware where starting the content process blocks the main thread for 555ms before first paint: https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's about half the time between getting the profile lock and first paint).
(In reply to Florian Quèze [:florian] [:flo] from comment #21)
> Here is a startup profile on the reference hardware where starting the
> content process blocks the main thread for 555ms before first paint:
> https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's
> about half the time between getting the profile lock and first paint).

Is it be possible to generate a startup profile with a build from the last try run to compare the results?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbc4ced238741648c7b98bd06a90e1e635a7bf33
Flags: needinfo?(florian)
Comment on attachment 8894265 [details] [diff] [review]
Part 1: Launch child processes asynchronously

Review of attachment 8894265 [details] [diff] [review]:
-----------------------------------------------------------------

The big changes I'd like to see here are removal of OnChannelConnectedTask and OnChannelErrorTask. It's possible I've misunderstood something, though, so please let me know if there's a reason to keep them.

::: dom/ipc/ContentProcessHost.cpp
@@ +28,5 @@
> +
> +bool
> +ContentProcessHost::Launch(StringVector aExtraOpts)
> +{
> +  MOZ_ASSERT(mLaunchPhase == LaunchPhase::Unlaunched);

Aside from this assertion, mLaunchPhase is never read in any non-trivial way. Can you just replace this with a boolean saying whether we've called Launch?

@@ +44,5 @@
> +void
> +ContentProcessHost::OnChannelConnected(int32_t aPid)
> +{
> +  MOZ_ASSERT(!NS_IsMainThread());
> +  mContentParent->SetOtherProcessId(aPid);

Please comment that this will wake up the main thread if it is waiting on the process to launch.

@@ +52,5 @@
> +  RefPtr<Runnable> runnable;
> +  {
> +    MonitorAutoLock lock(mMonitor);
> +    runnable = mTaskFactory.NewRunnableMethod(
> +                 &ContentProcessHost::OnChannelConnectedTask);

I'd much prefer to remove all this main-thread stuff. It's error-prone, and the MessageChannel implementation already takes care of it (via mOnChannelConnectedTask).

To make this work, you would need to call RevokeAll in the destructor to ensure that this stuff doesn't run too late. (Relying on the TaskFactory destructor won't work because you don't hold mMonitor then.)

@@ +69,5 @@
> +  // thread-safe.
> +  RefPtr<Runnable> runnable;
> +  {
> +    MonitorAutoLock lock(mMonitor);
> +    runnable = mTaskFactory.NewRunnableMethod(

Same here.

@@ +83,5 @@
> +    mLaunchPhase = LaunchPhase::Complete;
> +  }
> +#if defined(ANDROID) || defined(LINUX)
> +  // Check nice preference
> +  int32_t nice = Preferences::GetInt("dom.ipc.content.nice", 0);

I think all this could stay where it is in ContentParent.

@@ +110,5 @@
> +#endif
> +#ifdef MOZ_CODE_COVERAGE
> +  base::ProcessId procId =
> +    base::GetProcId(mContentParent->Process()->GetChildProcessHandle());
> +  Unused << SendShareCodeCoverageMutex(

I don't see how this will work. SendShareCodeCoverageMutex is a method of ContentParent. Can we move that to ContentParent::OnChannelConnected?

@@ +121,5 @@
> +{
> +  if (mLaunchPhase == LaunchPhase::Waiting) {
> +    mLaunchPhase = LaunchPhase::Complete;
> +  }
> +  mContentParent->OnChannelError();

I think this will cause OnChannelError to run twice on ContentParent since MessageChannel already calls it:
http://searchfox.org/mozilla-central/rev/b52285fffc13f36eca6b47de735d4e4403b3859e/ipc/glue/MessageChannel.cpp#2555

::: ipc/glue/ProtocolUtils.cpp
@@ +609,5 @@
>  
>  base::ProcessId
>  IToplevelProtocol::OtherPid() const
>  {
> +  if (mOtherPidState == ProcessIdState::eReady) {

It's a race to read this without owning the lock. It's possible that mOtherPid is undefined even if mOtherPidState is eReady. You need a memory fence to ensure that they're consistent, and taking the lock will do that.

You should be able to just take out this bit and rely on |return mOtherPid;| below.

::: ipc/glue/ProtocolUtils.h
@@ +425,5 @@
>  
>      virtual already_AddRefed<nsIEventTarget>
>      GetActorEventTargetInternal(IProtocol* aActor);
>  
> +    mutable mozilla::Monitor mMonitor;

Please comment that this monitor protects mOtherPid and mOtherPidState. All other fields should be accessed only on the worker thread.
Attachment #8894265 - Flags: review?(wmccloskey) → review-
Attached patch Patch (obsolete) — Splinter Review
Addressed feedback with the exception below. New try run in comment 24.

(In reply to Bill McCloskey (:billm) from comment #23)
> Comment on attachment 8894265 [details] [diff] [review]
> Part 1: Launch child processes asynchronously
> 
> Review of attachment 8894265 [details] [diff] [review]:
> -----------------------------------------------------------------
> ::: dom/ipc/ContentProcessHost.cpp
> @@ +121,5 @@
> > +{
> > +  if (mLaunchPhase == LaunchPhase::Waiting) {
> > +    mLaunchPhase = LaunchPhase::Complete;
> > +  }
> > +  mContentParent->OnChannelError();
> 
> I think this will cause OnChannelError to run twice on ContentParent since
> MessageChannel already calls it:
> http://searchfox.org/mozilla-central/rev/
> b52285fffc13f36eca6b47de735d4e4403b3859e/ipc/glue/MessageChannel.cpp#2555

So, it looks like ContentParent::OnChannelError and ContentParent::OnChannelConnected were never called. This wasn't necessarily obvious because SetOtherProcessId() was also called from Open(). What used to be called, however, was GeckoChildProcessHost::OnChannelConnected and now ContentProcessHost::OnChannelConnected. This patch makes sure that we call both ContentParent::OnChannelError and ContentParent::OnChannelConnected when required.
Attachment #8891817 - Attachment is obsolete: true
Attachment #8894265 - Attachment is obsolete: true
Attachment #8895507 - Flags: review?(wmccloskey)
Not calling ContentParent::OnChannelConnected is a bug. I filed bug 1388920 for that. I'd like to fix that and then we can do this bug. The patch looks very close.
I(In reply to Stephen A Pohl [:spohl] from comment #22)
> (In reply to Florian Quèze [:florian] [:flo] from comment #21)
> > Here is a startup profile on the reference hardware where starting the
> > content process blocks the main thread for 555ms before first paint:
> > https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's
> > about half the time between getting the profile lock and first paint).
> 
> Is it be possible to generate a startup profile with a build from the last
> try run to compare the results?

Here is a startup profile with the try build from comment 24: https://perfht.ml/2uv3McI
ContentParent::LaunchSubprocess still blocks the main thread for more than 100ms.

Unfortunately, these profiles aren't very useful as we don't have symbols in them for try builds.
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] [:flo] from comment #27)
> I(In reply to Stephen A Pohl [:spohl] from comment #22)
> > (In reply to Florian Quèze [:florian] [:flo] from comment #21)
> > > Here is a startup profile on the reference hardware where starting the
> > > content process blocks the main thread for 555ms before first paint:
> > > https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's
> > > about half the time between getting the profile lock and first paint).
> > 
> > Is it be possible to generate a startup profile with a build from the last
> > try run to compare the results?
> 
> Here is a startup profile with the try build from comment 24:
> https://perfht.ml/2uv3McI
> ContentParent::LaunchSubprocess still blocks the main thread for more than
> 100ms.
> 
> Unfortunately, these profiles aren't very useful as we don't have symbols in
> them for try builds.

This is good to know, thank you! Once the patch here lands we should take another look at this with symbols. There may be low hanging fruit, or we may determine that we will have to pass around some kind of pid promise after all. Either way, we will benefit from landing these changes in staggered form to allow for individual backouts if they became necessary.
Comment on attachment 8895507 [details] [diff] [review]
Patch

Review of attachment 8895507 [details] [diff] [review]:
-----------------------------------------------------------------

Very close. Let's just do one more round.

::: dom/ipc/ContentParent.cpp
@@ +1511,1 @@
>    SetOtherProcessId(pid);

This line should move back to ContentProcessHost::OnChannelConnected. And we should have an NS_IsMainThread() assertion in ContentParent::OnChannelConnected.

::: dom/ipc/ContentProcessHost.cpp
@@ +43,5 @@
> +{
> +  MOZ_ASSERT(!NS_IsMainThread());
> +
> +  mHasLaunched = true;
> +  mContentParent->OnChannelConnected(aPid);

Please remove this line.

@@ +53,5 @@
> +{
> +  MOZ_ASSERT(!NS_IsMainThread());
> +
> +  mHasLaunched = true;
> +  mContentParent->OnChannelError();

Same for this line. (If there's a separate problem with ContentParent::OnChannelError not being called, we should address that separately as well.)

::: dom/ipc/ContentProcessHost.h
@@ +41,5 @@
> +  DISALLOW_COPY_AND_ASSIGN(ContentProcessHost);
> +
> +  bool mHasLaunched;
> +
> +  ContentParent* mContentParent; // weak

Please remove this field as it will no longer be needed.
Attachment #8895507 - Flags: review?(wmccloskey)
Attached patch Patch (obsolete) — Splinter Review
Thank you! Addressed feedback, with one question below. New try run in comment 30.

(In reply to Bill McCloskey (:billm) from comment #29)
> Comment on attachment 8895507 [details] [diff] [review]
> Patch
> 
> Review of attachment 8895507 [details] [diff] [review]:
> -----------------------------------------------------------------
> ::: dom/ipc/ContentProcessHost.h
> @@ +41,5 @@
> > +  DISALLOW_COPY_AND_ASSIGN(ContentProcessHost);
> > +
> > +  bool mHasLaunched;
> > +
> > +  ContentParent* mContentParent; // weak
> 
> Please remove this field as it will no longer be needed.

I believe this is still needed to call SetOtherProcessId() in ContentProcessHost::OnChannelConnected. Did I miss something obvious?
Attachment #8895507 - Attachment is obsolete: true
Attachment #8897284 - Flags: review?(wmccloskey)
Comment on attachment 8897284 [details] [diff] [review]
Patch

Review of attachment 8897284 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, you're right. Sorry I missed that.
Attachment #8897284 - Flags: review?(wmccloskey) → review+
Attached patch Patch (obsolete) — Splinter Review
Thank you! Updated the patch for current trunk. Carrying over r+.
Attachment #8897284 - Attachment is obsolete: true
Attachment #8897451 - Flags: review+
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c8135aa1f750
Backed out changeset 21619c674c86 for failing mochitest test_group_zoom.html on Android and robocop tests. r=backout
This also caused a massive ts_paint regression on all desktop platforms, which went away when it was backed out.
(In reply to Kris Maglione [:kmag] from comment #37)
> This also caused a massive ts_paint regression on all desktop platforms,
> which went away when it was backed out.

What's the best way to access these results? Are there gains in other performance tests to compensate, or does this not look salvageable?
Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(kmaglione+bmo)
I haven't looked at other numbers. I only noticed the ts_paint regression because I was watching those numbers for other reasons, but you can see the regression pretty clearly in these graphs:

https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,f305a43917d199e7114001ba9024835bd68530a9,1,1%5D&series=%5Bmozilla-inbound,e7caea89663f1e94f198311432ecbf89e9623a75,1,1%5D&series=%5Bmozilla-inbound,e394aab72917d169024558cbab33eb4e7e9504e1,1,1%5D
Flags: needinfo?(kmaglione+bmo)
Thanks! From what I've been able to gather so far, the perf regressions appear to be Linux-only. The question might boil down to whether or not the performance gains from launching child processes asynchronously (especially on Windows) outweigh the perf regressions on Linux.
Nope, they're pretty clear across all platforms. They just look a bit better on Windows because we have fewer Windows runs in that range, and most of the ones we do have are pgo.
This definitely needs more investigation. I can't think of any reason why this patch should make anything slower. OtherPid() got a little slower, but it shouldn't be on any hot paths. Either something weird is happening or we're measuring wrong.
Depends on: 1390606
I must not be reading the charts right, as I have only been able to determine a regression on Linux.

What's the best way to look into these performance regressions?

I've also been looking into the Android failures, but I'm currently blocked by bug 1390606.
The regression seems to be biggest on Linux, so it's probably best to start there. This page explains how to run Talos locally:
https://wiki.mozilla.org/Buildbot/Talos/Running
It's probably best to see if you can reproduce a difference locally.

If you can, you can play around with turning on or off difference parts of the patch and see how performance is affected. For example, you could add locking to OtherPid() without making startup async and see if the regression remains.
Linux ts_paint numbers are weird, for reasons I don't entirely understand. We tend to get the firstPaint timing event there after stuff like delayedStartup has finished a lot of the time, so the numbers fluctuate a lot more than they normally would elsewhere. But that might still be the easiest place to start.

On Windows, the regressions are more obvious in the sessionrestore tests (comment 40) and the ts_paint_webext test (which is where I first noticed it):

https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=mozilla-inbound,1500971,1,1&series=mozilla-inbound,1500748,1,1&series=mozilla-inbound,1314200,1,1&series=mozilla-inbound,1492653,1,1&highlightedRevisions=4af3a98934c4&highlightedRevisions=643363573c85

but you can still see them in the ordinary ts_paint values.

It might help to retrigger or backfill some more T-e10s-o jobs in that range, since there are so few of them for Windows, and the numbers are noisy. It's also probably worth noting that I landed a 30-50ms ts_paint improvement shortly before your patch landed, so comparing to the numbers from a few hours earlier won't be very reliable.
We had hoped this would make 57. Remaining work:

1) android test failure issues
2) talos performance regressions, which may be caused by talos vs. the changes here.

The first 57 merge is in a couple days, changes like this need bake time - this isn't going to make 57. Pushing back onto the back burner and unassigning. Stephen is moving on to more pressing and achievable tasks.
Assignee: spohl.mozilla.bugs → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(mconley)
Priority: P1 → P2
Changing it from qf:p1 to qf:p2 for post 57 work based on #comment47.
Whiteboard: [qf:p1][bhr] → [qf:p2][bhr]
Keywords: perf
Whiteboard: [qf:p2][bhr] → [perf:p3][bhr]
Whiteboard: [perf:p3][bhr] → [qf:p3][bhr]
I'm hoping this could be prioritized higher than p3 given how big of an impact this bug has.

See this startup profile https://perfht.ml/2CRu0KB captured on a Nightly from 2017-11-13 on the quantum reference hardware where creating content processes blocks the main thread for several hundread ms 3 times (with the longest time of the three lasting 840ms).
Whiteboard: [qf:p3][bhr] → [qf][bhr]
Working through this bug in QF triage.  Are we resuming trying to land this for at least 63?
Flags: needinfo?(spohl.mozilla.bugs)
Whiteboard: [qf][bhr] → [qf:f63][qf:p1][bhr]
I will not be able to return to this bug anytime soon and can't give a recommendation here.
Flags: needinfo?(spohl.mozilla.bugs)
Summary: sync IPC when launching a child process → Remove sync IPC when launching a child process
Creating the first content processes seems to block startup for about 14s in this profile: https://perfht.ml/2EAudCh

This is captured on my mother's computer, a 5 year old Dell laptop with an i3 CPU, and the most notable thing about it is probably Kaspersky scanning for viruses in a way that significantly delays process creations (likely doing main thread I/O to scan).

Running the same thing on a similar machine without antivirus software starts the process in < 100ms.

Given that we have lots of users with antivirus installed and who may be badly affected by this, is there any chance of finding someone to finish this for 61?
(In reply to Florian Quèze [:florian] from comment #52)
> Creating the first content processes seems to block startup for about 14s in
> this profile: https://perfht.ml/2EAudCh
> 
> This is captured on my mother's computer, a 5 year old Dell laptop with an
> i3 CPU, and the most notable thing about it is probably Kaspersky scanning
> for viruses in a way that significantly delays process creations (likely
> doing main thread I/O to scan).
> 
> Running the same thing on a similar machine without antivirus software
> starts the process in < 100ms.
> 
> Given that we have lots of users with antivirus installed and who may be
> badly affected by this, is there any chance of finding someone to finish
> this for 61?

Maybe. This is a tough engineering problem that needs additional effort. I'll find someone to pick this up and figure out what's left. I can't commit to a release quite yet.
(In reply to Florian Quèze [:florian] from comment #52)
> Creating the first content processes seems to block startup for about 14s in
> this profile: https://perfht.ml/2EAudCh

Recording the "Gecko_IOThread" thread might show in more detail where those seconds are going — that's the thread that does the actual process creation.  (At least on Linux and Mac — on Windows the sandbox might make things more complicated.)

I don't know how much the change proposed in this bug would help in cases like that — the browser UI could remain responsive instead of freezing, but if it takes 14s to start a content process then it'll take ≥14s until content is rendered.
Worth noting that we need this for GeckoView too. The synchronous process start creates problems for us due to how we launch child processes on Android (via Services).
(In reply to Jed Davis [:jld] (⏰UTC-7) from comment #54)

> I don't know how much the change proposed in this bug would help in cases
> like that — the browser UI could remain responsive instead of freezing, but
> if it takes 14s to start a content process then it'll take ≥14s until
> content is rendered.

The browser UI being displayed and responsive would be a huge improvement over the current situation (no browser window at all). Then I think we should create a second content process asynchronously as soon as we get a chance, so that it's ready for when the user opens a second tab.
This is also a major issue after startup with e10s multi. With the preallocated process manager, we can handle long process startup delay fairly cleanly as long as it doesn't block the parent process, since the new content process should generally already be available for us when we need it. If we need to block the parent process for multiple seconds each time we spawn a new content process, though, it becomes a major issue.

We also have the issue at startup of sometimes needing to spawn multiple content processes. Even in the normal case, we need to spawn both the first web content process and an extension content process. When restoring multiple tabs, though, we'll often need to spawn multiple web content processes, and if we need sync IPC for each one, we have to eat the startup delay multiple times in series rather than in parallel.
Attached patch Patch (obsolete) — Splinter Review
Updated for current trunk. Carrying over r+.
Attachment #8897451 - Attachment is obsolete: true
Attachment #8951518 - Flags: review+
Assignee: nobody → agaynor
Priority: P2 → P1
Depends on: 1440019
Did some measuring, the lock in OtherPid() is not the problem: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=f266c062de5b059a8828cdff071510a2eda12325&framework=1&showOnlyImportant=1&selectedTimeRange=172800

Will try factoring out various other parts of the patch and see what I can find.
So I've got some bad news, and I've got some bad news.

The bad news is, I still don't have a great understanding of why this slows down session restore by nearly 20%.

The bad news is, in trying to diagnose that, I've realized that InitInternal, which is called right after we spawn the process (https://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2062-2075) calls OtherPid, which needs to block until the process is spawned and the channel connected: https://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2349 This means this patch doesn't yet address the underlying issues we were originally trying to solve.

Having typed this out, I'm pretty sure I understand why this patch slows things down: The previous code only blocks on the new process having a HANDLE ready, whereas the new code blocks until a channel is connected to the child process.

If I'm right, this means I should try to find a way to fire the SetOtherProcessId as soon as we have the PID, instead of waiting for the channel to be connected.
So… with the benefit of hindsight, I had some information that would've helped here: a few months ago I was experimenting on Linux with having processes launched indirectly via a wrapper executable, for reasons, and I found that the wrong pid (either the wrapper process or 0, depending on how I altered GeckoChildProcessHost to try to handle the situation) had already escaped into Endpoint objects by the time the IPC ClientHello came in.  But people seemed confident about the earlier patch on this bug, so I thought maybe I just didn't understand the IPC core well enough.

In any case, if the main thread is just going to immediately block on getting the pid via OtherPid(), is this really helping?  Could we instead reconsider how Endpoints identify processes, so they don't need to depend on OS-level pids if they don't need them and/or don't force the evaluation of the pid until it's needed?
I'm currently measuring a patch that has this use "process handle available" instead of "channel initialized". Assuming that's successful, I think the next thing to play with will be deferring the creation of code in InitInternal() that needs OtherPid(). I don't know how hard that will be.
Depends on: 1445249
Comment on attachment 8958906 [details]
Bug 1348361 - Part 1 - Added locking to IToplevelProtocol's management of the peer's pid;

https://reviewboard.mozilla.org/r/221704/#review233598
Attachment #8958906 - Flags: review?(jld) → review+
Comment on attachment 8958907 [details]
Bug 1348361 - Part 2 - Introduce a GeckoChildProcessHost subclass, ContentProcessHost, that will be used for async process launches;

https://reviewboard.mozilla.org/r/221706/#review233602

::: commit-message-ae130:1
(Diff revision 1)
> +Bug 1348361 - Part 2 - Introduce a GeckoChildProcessHost that will be used for async process launches; r?jld

Nit: “…a GeckoChildProcessHost subclass that will…”?
Attachment #8958907 - Flags: review?(jld) → review+
Comment on attachment 8958908 [details]
Bug 1348361 - Part 3 - Do not block the thread when spawning a gecko child process;

https://reviewboard.mozilla.org/r/221708/#review233612

::: dom/ipc/ContentParent.cpp:1519
(Diff revision 1)
>  
>  void
>  ContentParent::OnChannelError()
>  {
> -  RefPtr<ContentParent> content(this);
>    PContentParent::OnChannelError();

Normally an unused RefPtr like this is because the rest of the method might (indirectly) drop the last reference and cause a use-after-free.  Are we sure that can't happen here?

(See also: https://searchfox.org/mozilla-central/search?q=deathgrip&case=false&regexp=false&path= )

::: dom/ipc/ContentParent.cpp:2082
(Diff revision 1)
>  
>    // Set a reply timeout for CPOWs.
>    SetReplyTimeoutMs(Preferences::GetInt("dom.ipc.cpow.timeout", 0));
>  
> +  // TODO: If OtherPid() is not called between mSubprocess->Launch() and this,
> +  // then we're not really measuring how long it took to spawn the process.

This should have a followup bug, if there isn't one yet.  The people who use this telemetry probably have ideas about whether they want to leave it here, move it to a later point like when the pid is received, or have separate histograms for both of those, and/or measure how long the I/O thread spends on launching, etc.

::: dom/ipc/ContentProcessHost.h:31
(Diff revision 1)
>    ContentProcessHost(ContentParent* aContentParent,
>                       bool aIsFileContent = false);
>    ~ContentProcessHost();
>  
> +  // Launch the subprocess asynchronously. On failure, false is returned.
> +  // Otherwise, true is returned, and the OnLaunchComplete listener callback

There doesn't seem to be anything in the code called OnLaunchComplete.  The GPU process patches added a comment like this (but no actual identifier with that name; I assume the callbacks changed during development and nobody noticed the comment was stale), and then the earlier patch in this bug seems to have copied it when adapting GPUProcessHost into ContentProcessHost.
Attachment #8958908 - Flags: review?(jld) → review+
Comment on attachment 8958908 [details]
Bug 1348361 - Part 3 - Do not block the thread when spawning a gecko child process;

https://reviewboard.mozilla.org/r/221708/#review233612

> Normally an unused RefPtr like this is because the rest of the method might (indirectly) drop the last reference and cause a use-after-free.  Are we sure that can't happen here?
> 
> (See also: https://searchfox.org/mozilla-central/search?q=deathgrip&case=false&regexp=false&path= )

This was inherted from :spohl's original patch, I'll check in with him why this change was made.

> This should have a followup bug, if there isn't one yet.  The people who use this telemetry probably have ideas about whether they want to leave it here, move it to a later point like when the pid is received, or have separate histograms for both of those, and/or measure how long the I/O thread spends on launching, etc.

Will file one, covering deferring the InitInternal work and noting that the telemetry needs to be reviewed.

> There doesn't seem to be anything in the code called OnLaunchComplete.  The GPU process patches added a comment like this (but no actual identifier with that name; I assume the callbacks changed during development and nobody noticed the comment was stale), and then the earlier patch in this bug seems to have copied it when adapting GPUProcessHost into ContentProcessHost.

The GPU comment is unrelated; will fix my comment and also file a bug updating the GPU comment: https://bugzilla.mozilla.org/show_bug.cgi?id=1445958
Blocks: 1446161
Comment on attachment 8958908 [details]
Bug 1348361 - Part 3 - Do not block the thread when spawning a gecko child process;

https://reviewboard.mozilla.org/r/221708/#review233612

> Will file one, covering deferring the InitInternal work and noting that the telemetry needs to be reviewed.

https://bugzilla.mozilla.org/show_bug.cgi?id=1446161
Please note the c-n does not include :spohl's original patch.
Keywords: checkin-needed
Attachment #8951518 - Attachment is obsolete: true
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dd3dc46a8d7d
Part 1 - Added locking to IToplevelProtocol's management of the peer's pid; r=jld
https://hg.mozilla.org/integration/autoland/rev/d29a6c667eba
Part 2 - Introduce a GeckoChildProcessHost subclass, ContentProcessHost, that will be used for async process launches; r=jld
https://hg.mozilla.org/integration/autoland/rev/0520f0545678
Part 3 - Do not block the thread when spawning a gecko child process; r=jld
Keywords: checkin-needed
Testing the first Nightly that includes the patches from this bug on the reference hardware, I got this startup profile: https://perfht.ml/2pn9CYf
It shows the main thread is blocked on mozilla::ipc::ProcessLink::Open when creating content processes. Was this expected to be fixed as part of this bug, or is it something for a follow-up?
Depends on: 1446900
Florian: Yes, this sets up the infrastructure, bug 1446161 is needed to really see the wins.
Blocks: 1461459
You need to log in before you can comment on or make changes to this bug.