Closed
Bug 1348361
Opened 3 years ago
Closed 2 years ago
Remove sync IPC when launching a child process
Categories
(Core :: IPC, enhancement, P1)
Core
IPC
P1
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: ehsan, Assigned: Alex_Gaynor)
References
(Blocks 2 open bugs)
Details
(Keywords: perf, Whiteboard: [qf:f63][qf:p1][bhr])
Attachments
(3 files, 8 obsolete files)
https://perfht.ml/2mMU6Di Bill, is this something we can fix? And how hard is it?
Flags: needinfo?(wmccloskey)
Comment 1•3 years ago
|
||
(In reply to :Ehsan Akhgari from comment #0) > https://perfht.ml/2mMU6Di > > Bill, is this something we can fix? And how hard is it? Is this the right profile? The problem you were showing me earlier was with the call to LaunchAndWaitForProcessHandle (http://searchfox.org/mozilla-central/rev/006005beff40d377cfd2f69d3400633c5ff09127/dom/ipc/ContentParent.cpp#2013). This profile doesn't seem to have that method on the call stack from my quick look.
Yes, I think we can fix this. The reason we're waiting is to get a HANDLE or pid for the new process. That information isn't directly used right away. However, it is passed around to all sorts of other channels shortly after startup. This search pretty much turns up the places where we use this handle: http://searchfox.org/mozilla-central/search?q=symbol:_ZNK7mozilla3ipc17IToplevelProtocol8OtherPidEv%2C_ZNK7mozilla3ipc9IProtocol8OtherPidEv&redirect=false I think the best way to fix this would be to create some sort of ProcessHandlePromise abstraction. You could do something like SyncWaitOtherPid() and it would do the condvar wait thing if the process hasn't started. But most of the time we would just pass the promise around from one channel to another. The channels don't actually care about the handle--they just need it in case someone asks them for it. We would also have to make the startup code more async, but I don't think that should be too hard. The GPU process code already works that way, so we can use it as a template: http://searchfox.org/mozilla-central/rev/c2a60adfc7b16761cbbfcefa2093fa402ba1aa69/gfx/ipc/GPUProcessHost.cpp#44 It calls AsyncLaunch and then it waits for the OnChannelConnected method call before it opens the channel and runs does anything to the process.
Flags: needinfo?(wmccloskey)
To state the obvious, of course: the user is still going to have to wait for those 60ms or whatever before the content process can do anything. But at least the main process would be unblocked.
Reporter | ||
Comment 4•3 years ago
|
||
Is CreateProcess On Windows on this path too? See https://bugs.llvm.org/show_bug.cgi?id=20253#c4. There, an overhead of 60ms was measured for CreateProcess. :(
Whiteboard: [qf:p1]
We call CreateProcess on a separate thread. Sadly, that thread is the Gecko I/O thread. Janking that is, in many ways, worse than janking the main thread. So we should fix that too.
Comment 6•3 years ago
|
||
In GeckoView on Android, we run the content process in an Android service [1] to enable Java VM support. The services need to be defined at compile time in the manifest [2], which puts a limit on the amount of content services we can start (currently and ideally it is 1, due to increased overhead). Due to the Gecko thread is blocking in LaunchAndWaitForProcessHandle, we run into deadlock situations with compositor creation [3] and it prevents us from deferring the service start to handle the shutdown/startup race of the service. An async content process launch without Gecko thread blocking would allow us to handle the platform-specific restrictions without undesired workarounds. [1] https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/gecko/process/GeckoServiceChildProcess.java [2] https://searchfox.org/mozilla-central/source/mobile/android/geckoview/src/main/AndroidManifest.xml#49 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1350924#c4
Blocks: geckoview_mvp
Comment 7•3 years ago
|
||
Taking a stab at this.
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → ASSIGNED
Priority: -- → P1
Comment 8•3 years ago
|
||
To quickly summarize: There appear to be three points that need to be addressed here (rephrasing comment 2 and comment 5): 1. Make startup code more async by replacing LaunchAndWaitForProcessHandle with AsyncLaunch in ContentParent::LaunchSubprocess, using the GPU process code as template. 2. Move the CreateProcess call away from the Gecko I/O thread to avoid janking it. 3. Introduce some form of ProcessHandlePromise abstraction to be used by IProtocol::OtherPid() instead of the actual PID while the subprocess is launching. This patch is a wip to address the first point. The issue that I'm running into is that we currently seem to delay launching the subprocess until we need it, i.e. when we call TryRemoteBrowser[1]. We currently expect a sync response from these calls and immediately try to send messages to the subprocess, such as [2] and [3]. This seems to be a bit in contrast to comment 2, where we assumed that we didn't need to immediately send messages to the subprocess and that some form of ProcessHandlePromise would be sufficient. My question is: Should we make TryRemoteBrowser async and change the callers such that they no longer expect sync responses? I have not checked if this is possible or how difficult this would be yet. Or, should we instead start using an async form of ContentParent::PreallocateProcess() and wait in TryRemoteBrowser should the subprocess not have launched yet? Regarding point 2: What thread should we move the CreateProcess call to? [1] https://dxr.mozilla.org/mozilla-central/search?q=%2Bcallers%3AnsFrameLoader%3A%3ATryRemoteBrowser%28%29 [2] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#622 [3] https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#792-793
Flags: needinfo?(wmccloskey)
(In reply to Stephen A Pohl [:spohl] from comment #8) > Created attachment 8858562 [details] [diff] [review] > wip > > To quickly summarize: There appear to be three points that need to be > addressed here (rephrasing comment 2 and comment 5): > 1. Make startup code more async by replacing LaunchAndWaitForProcessHandle > with AsyncLaunch in ContentParent::LaunchSubprocess, using the GPU process > code as template. > 2. Move the CreateProcess call away from the Gecko I/O thread to avoid > janking it. > 3. Introduce some form of ProcessHandlePromise abstraction to be used by > IProtocol::OtherPid() instead of the actual PID while the subprocess is > launching. Yes, this looks right. > This patch is a wip to address the first point. The issue that I'm running > into is that we currently seem to delay launching the subprocess until we > need it, i.e. when we call TryRemoteBrowser[1]. We currently expect a sync > response from these calls and immediately try to send messages to the > subprocess, such as [2] and [3]. This seems to be a bit in contrast to > comment 2, where we assumed that we didn't need to immediately send messages > to the subprocess and that some form of ProcessHandlePromise would be > sufficient. Just to clarify: the messages you pointed out are not sync. They don't expect a response. The only thing the call returns is whether it succeeded or failed. > My question is: Should we make TryRemoteBrowser async and change the callers > such that they no longer expect sync responses? I have not checked if this > is possible or how difficult this would be yet. Or, should we instead start > using an async form of ContentParent::PreallocateProcess() and wait in > TryRemoteBrowser should the subprocess not have launched yet? I'm hoping we don't need to worry about those messages. As far as I can tell, we create the channel before we ever even launch the subprocess: http://searchfox.org/mozilla-central/rev/214345204f1e7d97abb571b7992b6deedb5ff98f/ipc/glue/GeckoChildProcessHost.cpp#598 So I think we should be able to send messages on the channel before the new process has even started. However, the fact that you're needinfo'ing me on this suggests that you ran into a problem with that :-). Did something fail? > Regarding point 2: What thread should we move the CreateProcess call to? Maybe we could use the stream transport service? That's a generic thread to service blocking I/O, and I don't think that launching a process would be out of scope for it. It's not really my area though.
Flags: needinfo?(wmccloskey)
Comment 10•3 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #9) > (In reply to Stephen A Pohl [:spohl] from comment #8) > > This patch is a wip to address the first point. The issue that I'm running > > into is that we currently seem to delay launching the subprocess until we > > need it, i.e. when we call TryRemoteBrowser[1]. We currently expect a sync > > response from these calls and immediately try to send messages to the > > subprocess, such as [2] and [3]. This seems to be a bit in contrast to > > comment 2, where we assumed that we didn't need to immediately send messages > > to the subprocess and that some form of ProcessHandlePromise would be > > sufficient. > > Just to clarify: the messages you pointed out are not sync. They don't > expect a response. The only thing the call returns is whether it succeeded > or failed. I worded this poorly. What I meant to say is that we currently launch the subprocess synchronously via TryRemoteBrowser and once this call completes, we currently expect to have a successfully launched subprocess to work with. > > My question is: Should we make TryRemoteBrowser async and change the callers > > such that they no longer expect sync responses? I have not checked if this > > is possible or how difficult this would be yet. Or, should we instead start > > using an async form of ContentParent::PreallocateProcess() and wait in > > TryRemoteBrowser should the subprocess not have launched yet? > > I'm hoping we don't need to worry about those messages. As far as I can > tell, we create the channel before we ever even launch the subprocess: > http://searchfox.org/mozilla-central/rev/ > 214345204f1e7d97abb571b7992b6deedb5ff98f/ipc/glue/GeckoChildProcessHost. > cpp#598 > > So I think we should be able to send messages on the channel before the new > process has even started. However, the fact that you're needinfo'ing me on > this suggests that you ran into a problem with that :-). Did something fail? With the patch as it stands (and using a local debug build) I'm running into an ASSERT_UNLESS_FUZZING in ContentProcessManager::AllocateTabId here: auto iter = mContentParentMap.find(aChildCpId); if (NS_WARN_IF(iter == mContentParentMap.end())) { ASSERT_UNLESS_FUZZING(); return TabId(0); } Callstack: 0 XUL 0x000000010f5b0486 mozilla::dom::ContentProcessManager::AllocateTabId(mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&) + 742 (ContentProcessManager.cpp:147) 1 XUL 0x000000010f59de80 mozilla::dom::ContentParent::AllocateTabId(mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&) + 80 (ContentParent.cpp:4186) 2 XUL 0x000000010f59f81f mozilla::dom::ContentParent::CreateBrowser(mozilla::dom::TabContext const&, mozilla::dom::Element*, mozilla::dom::ContentParent*, mozilla::dom::TabParent*) + 991 (ContentParent.cpp:1193) 3 XUL 0x000000010d4b52f9 nsFrameLoader::TryRemoteBrowser() + 2457 (nsFrameLoader.cpp:2961) [...] Allocating tab IDs is being reworked in bug 1337064. Disabling this assert just caused subsequent asserts to fail. I was trying to look past these asserts and see where we might ultimately fail because the existing code expected the subprocess to be successfully launched. It seemed like the code following the calls to TryRemoteBrowser would be that place because we're sending messages to the subprocess (see links in comment 8). I assumed that we would want to delay sending these messages until the subprocess is running, but it sounds like this should be possible based on your comment. I'll go back and work through the asserts to better understand what's left to do here.
Reporter | ||
Comment 11•3 years ago
|
||
Hi Stephen, Were you blocked on bug 1337064 here? Just wanted to let you know that bug has landed recently and seems like it stuck. :-)
Comment 12•3 years ago
|
||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #11) > Hi Stephen, > > Were you blocked on bug 1337064 here? Just wanted to let you know that bug > has landed recently and seems like it stuck. :-) Thanks for the heads up! I have been unexpectedly sidetracked by drag & drop regressions that were caused by the replacement of the native drag & drop API on OSX (bug 1235162). At this time, I have patches for 2 of the 4 regressions. I'm hoping to get back to this bug shortly.
Reporter | ||
Comment 13•3 years ago
|
||
Thanks Stephen! FWIW this shows up in the BHR data, see https://people-mozilla.org/~mlayzell/bhr/20170429/all.html#1466.
Whiteboard: [qf:p1] → [qf:p1][bhr]
Comment 14•3 years ago
|
||
A quick note to say that I was able to clear my plate of drag & drop regressions and I'm back on this.
Comment 15•3 years ago
|
||
I've finally had time to look into this some more and here is what I'm running into. The wip patch (attachment 8858562 [details] [diff] [review]) launches the child process asynchronously. For the browser to be able to handle this, we need to make sure that any code that follows the previously synchronous launching of the child process can handle the fact that there may not be a running child process yet. This involves more than sending messages on the channel. For example, the first assert I'm running into with this patch applied is the following: 0 XUL 0x000000011780a135 mozilla::dom::ContentProcessManager::RegisterRemoteFrame(mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IdType<mozilla::dom::TabParent> const&, mozilla::dom::IPCTabContext const&, mozilla::dom::IdType<mozilla::dom::ContentParent> const&) + 741 (ContentProcessManager.cpp:146) 1 XUL 0x000000011780c1b5 mozilla::dom::ContentParent::CreateBrowser(mozilla::dom::TabContext const&, mozilla::dom::Element*, mozilla::dom::ContentParent*, mozilla::dom::TabParent*, unsigned long long) + 1253 (ContentParent.cpp:1210) 2 XUL 0x000000011574c0f7 nsFrameLoader::TryRemoteBrowser() + 2919 (nsFrameLoader.cpp:2986) 3 XUL 0x000000011574c4cb nsFrameLoader::ShowRemoteFrame(mozilla::gfx::IntSizeTyped<mozilla::ScreenPixel> const&, nsSubDocumentFrame*) + 187 (nsFrameLoader.cpp:1249) 4 XUL 0x000000011574f1fa nsFrameLoader::Show(int, int, int, int, nsSubDocumentFrame*) + 202 (nsFrameLoader.cpp:1103) 5 XUL 0x00000001184c695b nsSubDocumentFrame::ShowViewer() + 251 (nsSubDocumentFrame.cpp:185) 6 XUL 0x00000001184fba6c AsyncFrameInit::Run() + 124 (nsSubDocumentFrame.cpp:92) 7 XUL 0x000000011546a58c nsContentUtils::RemoveScriptBlocker() + 492 (nsContentUtils.cpp:5546) [...] This assert occurs because mContentParentMap in ContentProcessManager is still empty, because no child ID has been added to the map yet. It seems like we would need to delay processing this code until the process has launched, for example by replacing the call to nsFrameLoader::Show in nsSubDocumentFrame::ShowViewer with an asynchronous call with a callback that will set nsSubDocumentFrame's mDidCreateDoc once the child process has launched, or failed to launch asynchronously. However, this would need to be repeated for every single call to TryRemoteBrowser[1]. This quickly becomes hairy. For example, TryRemoteBrowser is also called by nsFrameLoader::GetLoadContext, which gets called in about 30 different places[2]. If we were to make GetLoadContext asynchronous, we would need to add callbacks for every single caller of GetLoadContext. Then, each callback would have to process any code that follows the call to GetLoadContext in the current, synchronous sequence. :billm, am I missing something here? Right now, the effort to make this work seems extremely high. [1] https://dxr.mozilla.org/mozilla-central/search?q=TryRemoteBrowser [2] https://dxr.mozilla.org/mozilla-central/search?q=GetLoadContext(&redirect=false
Flags: needinfo?(wmccloskey)
You need to call AddContentProcess before TryRemoteBrowser is called. It looks like you've moved that into ContentParent::InitAfterConnect, but I don't think that's right. It should happen immediately after mSubprocess->Launch. It seems like a lot of other stuff in InitAfterConnect needs to happen right away as well. For example, I think you need to call Open right away or else you won't be able to send anything over the IPC channel. For now you'll have to make up a value for aOtherPid until #3 is fixed from comment 8.
Flags: needinfo?(wmccloskey)
Comment 17•2 years ago
|
||
Addressed feedback. This seems to work well and as expected for me locally. I launched a try run[1], but I'm having difficulty reading the results. I seem to be hitting a number of failures, but all the failing tests that I've tried locally have been passing for me... Note that I have not had to pass around pid promises (yet?), as was originally believed in comment 2. My observation was that the various inits etc. (InitInternal(), Init(), etc.) that occur right after the process launch seem to take up more time than actually launching the process. By the time OtherPid() was called for the first time, the process had launched and we had a valid pid to work with. Should we find that we now spend a lot of time waiting for OtherPid() to return (because we're waiting for a valid pid to become available), we could still change this to pass ProcessIdPromises around, instead of an actual pid. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1147354b10c59e2d905b7c721ad1202f2606fd11
Attachment #8858562 -
Attachment is obsolete: true
Attachment #8891816 -
Flags: feedback?(wmccloskey)
Comment 18•2 years ago
|
||
This fixes compile failures on Linux, as seen in this try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc9cc0f88999973aebffa2623c7fd1b4e53c125b&selectedJob=119284082
Comment on attachment 8891816 [details] [diff] [review] Part 1: Launch child processes asynchronously Review of attachment 8891816 [details] [diff] [review]: ----------------------------------------------------------------- This mostly looks pretty good. The lack of a Notify call could certainly explain some timeouts. ::: dom/ipc/ContentParent.cpp @@ -2073,5 @@ > - > - Open(mSubprocess->GetChannel(), procId); > - > -#ifdef MOZ_CODE_COVERAGE > - Unused << SendShareCodeCoverageMutex(CodeCoverageHandler::Get()->GetMutexHandle(procId)); Can you just move this to ContentParent::OnChannelConnected? It seems to run at pretty much the same time as InitAfterConnect. Then you could get rid of InitAfterConnect. ::: dom/ipc/ContentProcessHost.h @@ +20,5 @@ > +// case, is a ContentChild. > +// > +// ContentProcessHosts are allocated and managed by ContentProcessManager. For > +// all intents and purposes it is a singleton, though more than one may be > +// allocated at a time due to its shutdown being asynchronous. The second paragraph of this comment is not correct. ContentProcessHosts are allocated by ContentParent, and they're not singletons. There is one per process. ::: ipc/glue/ProtocolUtils.cpp @@ +612,5 @@ > { > + if (mOtherPidPromise.state == base::ProcessIdPromise::eResolved) { > + return mOtherPidPromise.pid; > + } > + MonitorAutoLock lock(const_cast<Monitor&>(mMonitor)); Perhaps it would be best to declare the monitor as mutable. @@ +629,2 @@ > { > + mOtherPidPromise = {aOtherPid, aState}; I think you need to take mMonitor here and also call NotifyAll on it (although Notify is probably sufficient since only the main thread is likely to be waiting). ::: ipc/glue/ProtocolUtils.h @@ +422,4 @@ > private: > ProtocolId mProtocolId; > UniquePtr<Transport> mTrans; > + base::ProcessIdPromise mOtherPidPromise; Rather than define this Promise data type in process.h, I think it would be better just to have two fields here for the pid and the status.
Attachment #8891816 -
Flags: feedback?(wmccloskey) → feedback+
Comment 20•2 years ago
|
||
Thank you for the feedback! I've sent this to try and we seem to be in decent shape (all the failures seem to be known intermittents), but I would welcome a second look: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbc4ced238741648c7b98bd06a90e1e635a7bf33
Attachment #8891816 -
Attachment is obsolete: true
Attachment #8894265 -
Flags: review?(wmccloskey)
Comment 21•2 years ago
|
||
Here is a startup profile on the reference hardware where starting the content process blocks the main thread for 555ms before first paint: https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's about half the time between getting the profile lock and first paint).
Comment 22•2 years ago
|
||
(In reply to Florian Quèze [:florian] [:flo] from comment #21) > Here is a startup profile on the reference hardware where starting the > content process blocks the main thread for 555ms before first paint: > https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's > about half the time between getting the profile lock and first paint). Is it be possible to generate a startup profile with a build from the last try run to compare the results? https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbc4ced238741648c7b98bd06a90e1e635a7bf33
Flags: needinfo?(florian)
Comment on attachment 8894265 [details] [diff] [review] Part 1: Launch child processes asynchronously Review of attachment 8894265 [details] [diff] [review]: ----------------------------------------------------------------- The big changes I'd like to see here are removal of OnChannelConnectedTask and OnChannelErrorTask. It's possible I've misunderstood something, though, so please let me know if there's a reason to keep them. ::: dom/ipc/ContentProcessHost.cpp @@ +28,5 @@ > + > +bool > +ContentProcessHost::Launch(StringVector aExtraOpts) > +{ > + MOZ_ASSERT(mLaunchPhase == LaunchPhase::Unlaunched); Aside from this assertion, mLaunchPhase is never read in any non-trivial way. Can you just replace this with a boolean saying whether we've called Launch? @@ +44,5 @@ > +void > +ContentProcessHost::OnChannelConnected(int32_t aPid) > +{ > + MOZ_ASSERT(!NS_IsMainThread()); > + mContentParent->SetOtherProcessId(aPid); Please comment that this will wake up the main thread if it is waiting on the process to launch. @@ +52,5 @@ > + RefPtr<Runnable> runnable; > + { > + MonitorAutoLock lock(mMonitor); > + runnable = mTaskFactory.NewRunnableMethod( > + &ContentProcessHost::OnChannelConnectedTask); I'd much prefer to remove all this main-thread stuff. It's error-prone, and the MessageChannel implementation already takes care of it (via mOnChannelConnectedTask). To make this work, you would need to call RevokeAll in the destructor to ensure that this stuff doesn't run too late. (Relying on the TaskFactory destructor won't work because you don't hold mMonitor then.) @@ +69,5 @@ > + // thread-safe. > + RefPtr<Runnable> runnable; > + { > + MonitorAutoLock lock(mMonitor); > + runnable = mTaskFactory.NewRunnableMethod( Same here. @@ +83,5 @@ > + mLaunchPhase = LaunchPhase::Complete; > + } > +#if defined(ANDROID) || defined(LINUX) > + // Check nice preference > + int32_t nice = Preferences::GetInt("dom.ipc.content.nice", 0); I think all this could stay where it is in ContentParent. @@ +110,5 @@ > +#endif > +#ifdef MOZ_CODE_COVERAGE > + base::ProcessId procId = > + base::GetProcId(mContentParent->Process()->GetChildProcessHandle()); > + Unused << SendShareCodeCoverageMutex( I don't see how this will work. SendShareCodeCoverageMutex is a method of ContentParent. Can we move that to ContentParent::OnChannelConnected? @@ +121,5 @@ > +{ > + if (mLaunchPhase == LaunchPhase::Waiting) { > + mLaunchPhase = LaunchPhase::Complete; > + } > + mContentParent->OnChannelError(); I think this will cause OnChannelError to run twice on ContentParent since MessageChannel already calls it: http://searchfox.org/mozilla-central/rev/b52285fffc13f36eca6b47de735d4e4403b3859e/ipc/glue/MessageChannel.cpp#2555 ::: ipc/glue/ProtocolUtils.cpp @@ +609,5 @@ > > base::ProcessId > IToplevelProtocol::OtherPid() const > { > + if (mOtherPidState == ProcessIdState::eReady) { It's a race to read this without owning the lock. It's possible that mOtherPid is undefined even if mOtherPidState is eReady. You need a memory fence to ensure that they're consistent, and taking the lock will do that. You should be able to just take out this bit and rely on |return mOtherPid;| below. ::: ipc/glue/ProtocolUtils.h @@ +425,5 @@ > > virtual already_AddRefed<nsIEventTarget> > GetActorEventTargetInternal(IProtocol* aActor); > > + mutable mozilla::Monitor mMonitor; Please comment that this monitor protects mOtherPid and mOtherPidState. All other fields should be accessed only on the worker thread.
Attachment #8894265 -
Flags: review?(wmccloskey) → review-
Comment 24•2 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0766829ffaa6a96a0cf1cb68281bdd6d6730ccc
Comment 25•2 years ago
|
||
Addressed feedback with the exception below. New try run in comment 24. (In reply to Bill McCloskey (:billm) from comment #23) > Comment on attachment 8894265 [details] [diff] [review] > Part 1: Launch child processes asynchronously > > Review of attachment 8894265 [details] [diff] [review]: > ----------------------------------------------------------------- > ::: dom/ipc/ContentProcessHost.cpp > @@ +121,5 @@ > > +{ > > + if (mLaunchPhase == LaunchPhase::Waiting) { > > + mLaunchPhase = LaunchPhase::Complete; > > + } > > + mContentParent->OnChannelError(); > > I think this will cause OnChannelError to run twice on ContentParent since > MessageChannel already calls it: > http://searchfox.org/mozilla-central/rev/ > b52285fffc13f36eca6b47de735d4e4403b3859e/ipc/glue/MessageChannel.cpp#2555 So, it looks like ContentParent::OnChannelError and ContentParent::OnChannelConnected were never called. This wasn't necessarily obvious because SetOtherProcessId() was also called from Open(). What used to be called, however, was GeckoChildProcessHost::OnChannelConnected and now ContentProcessHost::OnChannelConnected. This patch makes sure that we call both ContentParent::OnChannelError and ContentParent::OnChannelConnected when required.
Attachment #8891817 -
Attachment is obsolete: true
Attachment #8894265 -
Attachment is obsolete: true
Attachment #8895507 -
Flags: review?(wmccloskey)
Depends on: 1388920
Not calling ContentParent::OnChannelConnected is a bug. I filed bug 1388920 for that. I'd like to fix that and then we can do this bug. The patch looks very close.
Comment 27•2 years ago
|
||
I(In reply to Stephen A Pohl [:spohl] from comment #22) > (In reply to Florian Quèze [:florian] [:flo] from comment #21) > > Here is a startup profile on the reference hardware where starting the > > content process blocks the main thread for 555ms before first paint: > > https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's > > about half the time between getting the profile lock and first paint). > > Is it be possible to generate a startup profile with a build from the last > try run to compare the results? Here is a startup profile with the try build from comment 24: https://perfht.ml/2uv3McI ContentParent::LaunchSubprocess still blocks the main thread for more than 100ms. Unfortunately, these profiles aren't very useful as we don't have symbols in them for try builds.
Flags: needinfo?(florian)
Comment 28•2 years ago
|
||
(In reply to Florian Quèze [:florian] [:flo] from comment #27) > I(In reply to Stephen A Pohl [:spohl] from comment #22) > > (In reply to Florian Quèze [:florian] [:flo] from comment #21) > > > Here is a startup profile on the reference hardware where starting the > > > content process blocks the main thread for 555ms before first paint: > > > https://perfht.ml/2wByIog (as seen on https://perfht.ml/2vIKyAO, that's > > > about half the time between getting the profile lock and first paint). > > > > Is it be possible to generate a startup profile with a build from the last > > try run to compare the results? > > Here is a startup profile with the try build from comment 24: > https://perfht.ml/2uv3McI > ContentParent::LaunchSubprocess still blocks the main thread for more than > 100ms. > > Unfortunately, these profiles aren't very useful as we don't have symbols in > them for try builds. This is good to know, thank you! Once the patch here lands we should take another look at this with symbols. There may be low hanging fruit, or we may determine that we will have to pass around some kind of pid promise after all. Either way, we will benefit from landing these changes in staggered form to allow for individual backouts if they became necessary.
Comment on attachment 8895507 [details] [diff] [review] Patch Review of attachment 8895507 [details] [diff] [review]: ----------------------------------------------------------------- Very close. Let's just do one more round. ::: dom/ipc/ContentParent.cpp @@ +1511,1 @@ > SetOtherProcessId(pid); This line should move back to ContentProcessHost::OnChannelConnected. And we should have an NS_IsMainThread() assertion in ContentParent::OnChannelConnected. ::: dom/ipc/ContentProcessHost.cpp @@ +43,5 @@ > +{ > + MOZ_ASSERT(!NS_IsMainThread()); > + > + mHasLaunched = true; > + mContentParent->OnChannelConnected(aPid); Please remove this line. @@ +53,5 @@ > +{ > + MOZ_ASSERT(!NS_IsMainThread()); > + > + mHasLaunched = true; > + mContentParent->OnChannelError(); Same for this line. (If there's a separate problem with ContentParent::OnChannelError not being called, we should address that separately as well.) ::: dom/ipc/ContentProcessHost.h @@ +41,5 @@ > + DISALLOW_COPY_AND_ASSIGN(ContentProcessHost); > + > + bool mHasLaunched; > + > + ContentParent* mContentParent; // weak Please remove this field as it will no longer be needed.
Attachment #8895507 -
Flags: review?(wmccloskey)
Comment 30•2 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=96c1fbf68d5c82e56d4dcf10d5b8f3c2b7e1985b
Comment 31•2 years ago
|
||
Thank you! Addressed feedback, with one question below. New try run in comment 30. (In reply to Bill McCloskey (:billm) from comment #29) > Comment on attachment 8895507 [details] [diff] [review] > Patch > > Review of attachment 8895507 [details] [diff] [review]: > ----------------------------------------------------------------- > ::: dom/ipc/ContentProcessHost.h > @@ +41,5 @@ > > + DISALLOW_COPY_AND_ASSIGN(ContentProcessHost); > > + > > + bool mHasLaunched; > > + > > + ContentParent* mContentParent; // weak > > Please remove this field as it will no longer be needed. I believe this is still needed to call SetOtherProcessId() in ContentProcessHost::OnChannelConnected. Did I miss something obvious?
Attachment #8895507 -
Attachment is obsolete: true
Attachment #8897284 -
Flags: review?(wmccloskey)
Comment on attachment 8897284 [details] [diff] [review] Patch Review of attachment 8897284 [details] [diff] [review]: ----------------------------------------------------------------- Yes, you're right. Sorry I missed that.
Attachment #8897284 -
Flags: review?(wmccloskey) → review+
Comment 33•2 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/21619c674c8688a74f93eb752667c6910b5f6563 Bug 1348361: Make launching of subprocesses async. r=billm
Comment 34•2 years ago
|
||
Thank you! Updated the patch for current trunk. Carrying over r+.
Attachment #8897284 -
Attachment is obsolete: true
Attachment #8897451 -
Flags: review+
![]() |
||
Comment 35•2 years ago
|
||
Backed out for failing mochitest test_group_zoom.html on Android and robocop tests: Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=21619c674c8688a74f93eb752667c6910b5f6563&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log mochitest (many failures): https://treeherder.mozilla.org/logviewer.html#?job_id=123287968&repo=mozilla-inbound Push which ran more tests (e.g. robocop): https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4af3a98934c427226e095246a098008bc3474ee7&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Flags: needinfo?(spohl.mozilla.bugs)
Comment 36•2 years ago
|
||
Backout by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/mozilla-inbound/rev/c8135aa1f750 Backed out changeset 21619c674c86 for failing mochitest test_group_zoom.html on Android and robocop tests. r=backout
Comment 37•2 years ago
|
||
This also caused a massive ts_paint regression on all desktop platforms, which went away when it was backed out.
Comment 38•2 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #37) > This also caused a massive ts_paint regression on all desktop platforms, > which went away when it was backed out. What's the best way to access these results? Are there gains in other performance tests to compensate, or does this not look salvageable?
Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(kmaglione+bmo)
Comment 39•2 years ago
|
||
I haven't looked at other numbers. I only noticed the ts_paint regression because I was watching those numbers for other reasons, but you can see the regression pretty clearly in these graphs: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,f305a43917d199e7114001ba9024835bd68530a9,1,1%5D&series=%5Bmozilla-inbound,e7caea89663f1e94f198311432ecbf89e9623a75,1,1%5D&series=%5Bmozilla-inbound,e394aab72917d169024558cbab33eb4e7e9504e1,1,1%5D
Flags: needinfo?(kmaglione+bmo)
Comment 40•2 years ago
|
||
It looks like there was a pretty clear sessionrestore regression too, though: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,66dbde001b68b274cd7b350641f995cc06c6e1a9,1,1%5D&series=%5Bmozilla-inbound,60bcc1b34705f14eaa3e6d8108a323cefc14cb61,1,1%5D&series=%5Bmozilla-inbound,fbd9aab82203e641b6a4c717bd7509504b1e1ad9,1,1%5D&series=%5Bmozilla-inbound,f662fda60fcfef71f97f3541aff009d2b5bb837e,1,1%5D&series=%5Bmozilla-inbound,639a8626a20b70264e8800e4d23f770a898408cc,1,1%5D&series=%5Bmozilla-inbound,ff7fe8783ea3a7d6feb2000be0142b1b11351206,1,1%5D
Comment 41•2 years ago
|
||
Thanks! From what I've been able to gather so far, the perf regressions appear to be Linux-only. The question might boil down to whether or not the performance gains from launching child processes asynchronously (especially on Windows) outweigh the perf regressions on Linux.
Comment 42•2 years ago
|
||
Nope, they're pretty clear across all platforms. They just look a bit better on Windows because we have fewer Windows runs in that range, and most of the ones we do have are pgo.
This definitely needs more investigation. I can't think of any reason why this patch should make anything slower. OtherPid() got a little slower, but it shouldn't be on any hot paths. Either something weird is happening or we're measuring wrong.
Comment 44•2 years ago
|
||
I must not be reading the charts right, as I have only been able to determine a regression on Linux. What's the best way to look into these performance regressions? I've also been looking into the Android failures, but I'm currently blocked by bug 1390606.
The regression seems to be biggest on Linux, so it's probably best to start there. This page explains how to run Talos locally: https://wiki.mozilla.org/Buildbot/Talos/Running It's probably best to see if you can reproduce a difference locally. If you can, you can play around with turning on or off difference parts of the patch and see how performance is affected. For example, you could add locking to OtherPid() without making startup async and see if the regression remains.
Comment 46•2 years ago
|
||
Linux ts_paint numbers are weird, for reasons I don't entirely understand. We tend to get the firstPaint timing event there after stuff like delayedStartup has finished a lot of the time, so the numbers fluctuate a lot more than they normally would elsewhere. But that might still be the easiest place to start. On Windows, the regressions are more obvious in the sessionrestore tests (comment 40) and the ts_paint_webext test (which is where I first noticed it): https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=mozilla-inbound,1500971,1,1&series=mozilla-inbound,1500748,1,1&series=mozilla-inbound,1314200,1,1&series=mozilla-inbound,1492653,1,1&highlightedRevisions=4af3a98934c4&highlightedRevisions=643363573c85 but you can still see them in the ordinary ts_paint values. It might help to retrigger or backfill some more T-e10s-o jobs in that range, since there are so few of them for Windows, and the numbers are noisy. It's also probably worth noting that I landed a 30-50ms ts_paint improvement shortly before your patch landed, so comparing to the numbers from a few hours earlier won't be very reliable.
Flags: needinfo?(mconley)
![]() |
||
Comment 47•2 years ago
|
||
We had hoped this would make 57. Remaining work: 1) android test failure issues 2) talos performance regressions, which may be caused by talos vs. the changes here. The first 57 merge is in a couple days, changes like this need bake time - this isn't going to make 57. Pushing back onto the back burner and unassigning. Stephen is moving on to more pressing and achievable tasks.
Assignee: spohl.mozilla.bugs → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(mconley)
Priority: P1 → P2
Comment 48•2 years ago
|
||
Changing it from qf:p1 to qf:p2 for post 57 work based on #comment47.
Whiteboard: [qf:p1][bhr] → [qf:p2][bhr]
Whiteboard: [qf:p2][bhr] → [perf:p3][bhr]
Whiteboard: [perf:p3][bhr] → [qf:p3][bhr]
Comment 49•2 years ago
|
||
I'm hoping this could be prioritized higher than p3 given how big of an impact this bug has. See this startup profile https://perfht.ml/2CRu0KB captured on a Nightly from 2017-11-13 on the quantum reference hardware where creating content processes blocks the main thread for several hundread ms 3 times (with the longest time of the three lasting 840ms).
Whiteboard: [qf:p3][bhr] → [qf][bhr]
Comment 50•2 years ago
|
||
Working through this bug in QF triage. Are we resuming trying to land this for at least 63?
Flags: needinfo?(spohl.mozilla.bugs)
Updated•2 years ago
|
Whiteboard: [qf][bhr] → [qf:f63][qf:p1][bhr]
Comment 51•2 years ago
|
||
I will not be able to return to this bug anytime soon and can't give a recommendation here.
Flags: needinfo?(spohl.mozilla.bugs)
![]() |
||
Updated•2 years ago
|
Summary: sync IPC when launching a child process → Remove sync IPC when launching a child process
Comment 52•2 years ago
|
||
Creating the first content processes seems to block startup for about 14s in this profile: https://perfht.ml/2EAudCh This is captured on my mother's computer, a 5 year old Dell laptop with an i3 CPU, and the most notable thing about it is probably Kaspersky scanning for viruses in a way that significantly delays process creations (likely doing main thread I/O to scan). Running the same thing on a similar machine without antivirus software starts the process in < 100ms. Given that we have lots of users with antivirus installed and who may be badly affected by this, is there any chance of finding someone to finish this for 61?
![]() |
||
Comment 53•2 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #52) > Creating the first content processes seems to block startup for about 14s in > this profile: https://perfht.ml/2EAudCh > > This is captured on my mother's computer, a 5 year old Dell laptop with an > i3 CPU, and the most notable thing about it is probably Kaspersky scanning > for viruses in a way that significantly delays process creations (likely > doing main thread I/O to scan). > > Running the same thing on a similar machine without antivirus software > starts the process in < 100ms. > > Given that we have lots of users with antivirus installed and who may be > badly affected by this, is there any chance of finding someone to finish > this for 61? Maybe. This is a tough engineering problem that needs additional effort. I'll find someone to pick this up and figure out what's left. I can't commit to a release quite yet.
Comment 54•2 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #52) > Creating the first content processes seems to block startup for about 14s in > this profile: https://perfht.ml/2EAudCh Recording the "Gecko_IOThread" thread might show in more detail where those seconds are going — that's the thread that does the actual process creation. (At least on Linux and Mac — on Windows the sandbox might make things more complicated.) I don't know how much the change proposed in this bug would help in cases like that — the browser UI could remain responsive instead of freezing, but if it takes 14s to start a content process then it'll take ≥14s until content is rendered.
Worth noting that we need this for GeckoView too. The synchronous process start creates problems for us due to how we launch child processes on Android (via Services).
Comment 56•2 years ago
|
||
(In reply to Jed Davis [:jld] (⏰UTC-7) from comment #54) > I don't know how much the change proposed in this bug would help in cases > like that — the browser UI could remain responsive instead of freezing, but > if it takes 14s to start a content process then it'll take ≥14s until > content is rendered. The browser UI being displayed and responsive would be a huge improvement over the current situation (no browser window at all). Then I think we should create a second content process asynchronously as soon as we get a chance, so that it's ready for when the user opens a second tab.
Comment 57•2 years ago
|
||
This is also a major issue after startup with e10s multi. With the preallocated process manager, we can handle long process startup delay fairly cleanly as long as it doesn't block the parent process, since the new content process should generally already be available for us when we need it. If we need to block the parent process for multiple seconds each time we spawn a new content process, though, it becomes a major issue. We also have the issue at startup of sometimes needing to spawn multiple content processes. Even in the normal case, we need to spawn both the first web content process and an extension content process. When restoring multiple tabs, though, we'll often need to spawn multiple web content processes, and if we need sync IPC for each one, we have to eat the startup delay multiple times in series rather than in parallel.
Comment 58•2 years ago
|
||
Updated for current trunk. Carrying over r+.
Attachment #8897451 -
Attachment is obsolete: true
Attachment #8951518 -
Flags: review+
Assignee | ||
Updated•2 years ago
|
Assignee: nobody → agaynor
Priority: P2 → P1
Assignee | ||
Comment 59•2 years ago
|
||
Did some measuring, the lock in OtherPid() is not the problem: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=f266c062de5b059a8828cdff071510a2eda12325&framework=1&showOnlyImportant=1&selectedTimeRange=172800 Will try factoring out various other parts of the patch and see what I can find.
Assignee | ||
Comment 60•2 years ago
|
||
So I've got some bad news, and I've got some bad news. The bad news is, I still don't have a great understanding of why this slows down session restore by nearly 20%. The bad news is, in trying to diagnose that, I've realized that InitInternal, which is called right after we spawn the process (https://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2062-2075) calls OtherPid, which needs to block until the process is spawned and the channel connected: https://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#2349 This means this patch doesn't yet address the underlying issues we were originally trying to solve. Having typed this out, I'm pretty sure I understand why this patch slows things down: The previous code only blocks on the new process having a HANDLE ready, whereas the new code blocks until a channel is connected to the child process. If I'm right, this means I should try to find a way to fire the SetOtherProcessId as soon as we have the PID, instead of waiting for the channel to be connected.
Comment 61•2 years ago
|
||
So… with the benefit of hindsight, I had some information that would've helped here: a few months ago I was experimenting on Linux with having processes launched indirectly via a wrapper executable, for reasons, and I found that the wrong pid (either the wrapper process or 0, depending on how I altered GeckoChildProcessHost to try to handle the situation) had already escaped into Endpoint objects by the time the IPC ClientHello came in. But people seemed confident about the earlier patch on this bug, so I thought maybe I just didn't understand the IPC core well enough. In any case, if the main thread is just going to immediately block on getting the pid via OtherPid(), is this really helping? Could we instead reconsider how Endpoints identify processes, so they don't need to depend on OS-level pids if they don't need them and/or don't force the evaluation of the pid until it's needed?
Assignee | ||
Comment 62•2 years ago
|
||
I'm currently measuring a patch that has this use "process handle available" instead of "channel initialized". Assuming that's successful, I think the next thing to play with will be deferring the creation of code in InitInternal() that needs OtherPid(). I don't know how hard that will be.
Assignee | ||
Comment 63•2 years ago
|
||
Ok, definitely appears to have resolved the performance issues: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=f948da46f5d9b6597cd4079f2ee06c6f1d9e3f48&framework=1&filter=linux64&selectedTimeRange=172800 Looks like there are still android bugs, so those are up next, then I'll put the patches up for review: https://treeherder.mozilla.org/#/jobs?repo=try&revision=44c0f3598c8f402c40fec2a72bf7f58bc3e00454&group_state=expanded
Assignee | ||
Comment 64•2 years ago
|
||
Green tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0932e9bdf19ab483cca4b404e3cd18095204c269 !
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 68•2 years ago
|
||
mozreview-review |
Comment on attachment 8958906 [details] Bug 1348361 - Part 1 - Added locking to IToplevelProtocol's management of the peer's pid; https://reviewboard.mozilla.org/r/221704/#review233598
Attachment #8958906 -
Flags: review?(jld) → review+
Comment 69•2 years ago
|
||
mozreview-review |
Comment on attachment 8958907 [details] Bug 1348361 - Part 2 - Introduce a GeckoChildProcessHost subclass, ContentProcessHost, that will be used for async process launches; https://reviewboard.mozilla.org/r/221706/#review233602 ::: commit-message-ae130:1 (Diff revision 1) > +Bug 1348361 - Part 2 - Introduce a GeckoChildProcessHost that will be used for async process launches; r?jld Nit: “…a GeckoChildProcessHost subclass that will…”?
Attachment #8958907 -
Flags: review?(jld) → review+
Comment 70•2 years ago
|
||
mozreview-review |
Comment on attachment 8958908 [details] Bug 1348361 - Part 3 - Do not block the thread when spawning a gecko child process; https://reviewboard.mozilla.org/r/221708/#review233612 ::: dom/ipc/ContentParent.cpp:1519 (Diff revision 1) > > void > ContentParent::OnChannelError() > { > - RefPtr<ContentParent> content(this); > PContentParent::OnChannelError(); Normally an unused RefPtr like this is because the rest of the method might (indirectly) drop the last reference and cause a use-after-free. Are we sure that can't happen here? (See also: https://searchfox.org/mozilla-central/search?q=deathgrip&case=false®exp=false&path= ) ::: dom/ipc/ContentParent.cpp:2082 (Diff revision 1) > > // Set a reply timeout for CPOWs. > SetReplyTimeoutMs(Preferences::GetInt("dom.ipc.cpow.timeout", 0)); > > + // TODO: If OtherPid() is not called between mSubprocess->Launch() and this, > + // then we're not really measuring how long it took to spawn the process. This should have a followup bug, if there isn't one yet. The people who use this telemetry probably have ideas about whether they want to leave it here, move it to a later point like when the pid is received, or have separate histograms for both of those, and/or measure how long the I/O thread spends on launching, etc. ::: dom/ipc/ContentProcessHost.h:31 (Diff revision 1) > ContentProcessHost(ContentParent* aContentParent, > bool aIsFileContent = false); > ~ContentProcessHost(); > > + // Launch the subprocess asynchronously. On failure, false is returned. > + // Otherwise, true is returned, and the OnLaunchComplete listener callback There doesn't seem to be anything in the code called OnLaunchComplete. The GPU process patches added a comment like this (but no actual identifier with that name; I assume the callbacks changed during development and nobody noticed the comment was stale), and then the earlier patch in this bug seems to have copied it when adapting GPUProcessHost into ContentProcessHost.
Attachment #8958908 -
Flags: review?(jld) → review+
Assignee | ||
Comment 71•2 years ago
|
||
mozreview-review-reply |
Comment on attachment 8958908 [details] Bug 1348361 - Part 3 - Do not block the thread when spawning a gecko child process; https://reviewboard.mozilla.org/r/221708/#review233612 > Normally an unused RefPtr like this is because the rest of the method might (indirectly) drop the last reference and cause a use-after-free. Are we sure that can't happen here? > > (See also: https://searchfox.org/mozilla-central/search?q=deathgrip&case=false®exp=false&path= ) This was inherted from :spohl's original patch, I'll check in with him why this change was made. > This should have a followup bug, if there isn't one yet. The people who use this telemetry probably have ideas about whether they want to leave it here, move it to a later point like when the pid is received, or have separate histograms for both of those, and/or measure how long the I/O thread spends on launching, etc. Will file one, covering deferring the InitInternal work and noting that the telemetry needs to be reviewed. > There doesn't seem to be anything in the code called OnLaunchComplete. The GPU process patches added a comment like this (but no actual identifier with that name; I assume the callbacks changed during development and nobody noticed the comment was stale), and then the earlier patch in this bug seems to have copied it when adapting GPUProcessHost into ContentProcessHost. The GPU comment is unrelated; will fix my comment and also file a bug updating the GPU comment: https://bugzilla.mozilla.org/show_bug.cgi?id=1445958
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 75•2 years ago
|
||
mozreview-review-reply |
Comment on attachment 8958908 [details] Bug 1348361 - Part 3 - Do not block the thread when spawning a gecko child process; https://reviewboard.mozilla.org/r/221708/#review233612 > Will file one, covering deferring the InitInternal work and noting that the telemetry needs to be reviewed. https://bugzilla.mozilla.org/show_bug.cgi?id=1446161
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 79•2 years ago
|
||
Please note the c-n does not include :spohl's original patch.
Keywords: checkin-needed
Updated•2 years ago
|
Attachment #8951518 -
Attachment is obsolete: true
Comment 80•2 years ago
|
||
Pushed by dluca@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dd3dc46a8d7d Part 1 - Added locking to IToplevelProtocol's management of the peer's pid; r=jld https://hg.mozilla.org/integration/autoland/rev/d29a6c667eba Part 2 - Introduce a GeckoChildProcessHost subclass, ContentProcessHost, that will be used for async process launches; r=jld https://hg.mozilla.org/integration/autoland/rev/0520f0545678 Part 3 - Do not block the thread when spawning a gecko child process; r=jld
Keywords: checkin-needed
Comment 81•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/dd3dc46a8d7d https://hg.mozilla.org/mozilla-central/rev/d29a6c667eba https://hg.mozilla.org/mozilla-central/rev/0520f0545678
Status: NEW → RESOLVED
Closed: 2 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Comment 82•2 years ago
|
||
Testing the first Nightly that includes the patches from this bug on the reference hardware, I got this startup profile: https://perfht.ml/2pn9CYf It shows the main thread is blocked on mozilla::ipc::ProcessLink::Open when creating content processes. Was this expected to be fixed as part of this bug, or is it something for a follow-up?
Assignee | ||
Comment 83•2 years ago
|
||
Florian: Yes, this sets up the infrastructure, bug 1446161 is needed to really see the wins.
You need to log in
before you can comment on or make changes to this bug.
Description
•