Long wait during content process startup before we start loading libxul
Categories
(GeckoView :: General, defect, P2)
Tracking
(Not tracked)
People
(Reporter: mstange, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [gv-perspective-work])
Attachments
(3 obsolete files)
Profile: https://share.firefox.dev/3NoB0Rj
When a content process is started during app link startup, there is a delay before libxul is loaded on the Gecko thread of the content process. In the profile linked above, the tab28 process has a 300ms gap between the Android UI thread work in that content process and the start of libxul loading.
Comment 1•1 year ago
|
||
This seems to be because we don't call mozilla::dom::ContentParent::CreateBrowser until this point. I believe CreateBrowser is what actually starts the process via ContentParent::GetNewOrUsedLaunchingBrowserProcess which eventually calls start over on the content process side.
Comment 2•1 year ago
|
||
The severity field is not set for this bug.
:owlish, could you have a look please?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
| Reporter | ||
Comment 3•1 year ago
•
|
||
We have the PreallocatedProcessManager (which is currently disabled on Android, bug 1937836) to kick this initialization off before we get to ContentParent::CreateBrowser. But it seems like PreallocatedProcessManager doesn't actually have an API to start a preallocation before we ask for the first process. It also currently only kicks of preallocation during idle, and we're never idle during startup.
Once we fix this bug and the Gecko stuff in the content process starts running earlier, it should also avoid the synchronous wait on the parent process that I filed as bug 1958065.
And we should probably preallocate two processes rather than just one, one for the content process and one for the WebExtensions process, to fix bug 1958327 at the same time.
| Reporter | ||
Updated•1 year ago
|
| Reporter | ||
Comment 4•10 months ago
|
||
Otherwise it only gets lazily initialized once we create the first browser element,
which is too late.
| Reporter | ||
Comment 5•10 months ago
|
||
This patch would be the straightforward way to preallocate a content process
early during startup. However, it has some unfortunate effects, so we should
not land it as-is.
The problem is that PreallocatedProcessManagerImpl::AllocateNow() calls
WaitForLaunchAsync, which installs a promise resolution handler on the
WhenProcessHandleReady() promise, and this handler calls LaunchSubprocessResolve.
And LaunchSubprocessResolve contains at least two spots which block
the main thread when called too early. To be precise:
- On Android, calling
LaunchSubprocessResolvebefore the GPU process is connected
will block inContentParent::InitInternal->gfxPlatform::BuildContentDeviceData
->GPUProcessManager::EnsureGPUReady: https://share.firefox.dev/4oTpyi5 - On macOS, calling
LaunchProcessResolvebefore theInitFontListthread is done
will block inContentParent::InitInternal->gfxPlatformMac::ReadSystemFontList
-> ... ->gfxPlatformFontList::PlatformFontList: https://share.firefox.dev/45SF7Oe
As a result, this patch would make startup slower rather than faster.
| Reporter | ||
Comment 6•10 months ago
|
||
Crucially, this avoids calling ContentParent::WaitForLaunchAsync, so
it avoids installing the promise resolution handler that calls
LaunchSubprocessResolve too early.
LaunchSubprocessResolve will still be called once the frameloader calls
ContentParent::CreateBrowser, but this happens late enough so that the
trouble spots are avoided - at least given the current timing on the machines
that I was testing on. Specifically, when ContentParent::CreateBrowser is
called, on macOS the InitFontList thread is more likely to have finished, and
on Android the GPU process is more likely to have advanced far enough that
the sync IPC call for GPUProcessManager::EnsureGPUReady is very short.
macOS: https://share.firefox.dev/3HTjoxy
Android: https://share.firefox.dev/3UIbj1E
| Reporter | ||
Comment 7•10 months ago
•
|
||
I've attached patches which reduce the time during Fenix startup between the creation of the Gecko thread in the first content process and the call to GeckoThread.initGeckoEnvironment. These patches depend on bug 1937836 - they only affect behavior if dom.ipc.processPrelaunch.enabled is true, because they use the PreallocatedProcessManager.
However, in the process of writing an explanation of why these patches are needed, I realized that they aren't all that useful at the moment.
My goals were:
- Reduce the time that the parent blocks while it synchronously waits for the child process handle.
- Make the content process available to accept data from the network earlier, by front-loading initialization work.
And yes, the patches achieve these goals, but they don't achieve a reduction of the applink startup time.
About goal 1: In bug 1958065 I was reporting "varying" times of blocking in the parent, sometimes around 30ms. I can't really reproduce those times anymore; I wonder if it was just unfortunate thread scheduling. It turns out that GeckoChildProcessHost::WaitForProcessHandle is really quite short in most cases, below 10ms. It only takes a long time if the native Android process hasn't been created yet. But for the first content process, we preallocate the native process on the Java side, so getting the process handle is quick for this first content process. (The WebExtension process is another matter, bug 1958327.)
About goal 2: Preallocating a process from the Gecko side doesn't actually front-load all that much work. Without LaunchSubprocessResolve, all it gets us is dlopen("libxul.so") and InitXPCOM, which altogether takes 50ms or less. And with LaunchSubprocessResolve, it also front-loads about about 70ms of ContentChild initialization work. But there's not much point to front-loading content process work because the content process is not the bottleneck during applink startup. And what's worse, moving the CPU work earlier can take up CPU resources that the parent process needs for more critical work, such as drawing the URL bar or kicking off the network request.
For reference, here are a bunch of profiles I grabbed with various combinations of these patches:
Android profiles:
before these patches (but with dom.ipc.processPrelaunch.enabled set to true)
with early PreallocProcessManager initialization
with AllocateNow, blocking in ContentParent::LaunchSubprocessResolve
with immediate MakePreallocProcess() call, no sync blocking in ContentParent::LaunchSubprocessResolve
with immediate MakePreallocProcess() call and early gfxPlatform::Init
with immediate MakePreallocProcess() call and early gfxPlatform::Init and Java-side preallocation
macOS profiles:
before these patches
with early PreallocProcessManager initialization
with AllocateNow, blocking in ContentParent::LaunchSubprocessResolve
with immediate MakePreallocProcess() call, no sync blocking in ContentParent::LaunchSubprocessResolve
I will remove this bug from the applink-startup list. Instead, we should focus on launching the GPU process earlier (bug 1929365), and on launching a native Android process for the WebExtension process earlier (bug 1958327).
| Reporter | ||
Updated•10 months ago
|
| Reporter | ||
Comment 8•10 months ago
|
||
CC'ing :nika and :jesup because they might comment 5 interesting (the fact that the WaitForLaunchAsync call in PreallocatedProcessManagerImpl::AllocateNow() can slow down startup) - no action needed.
Comment 9•10 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #7)
...
My goals were:
...
2. Make the content process available to accept data from the network earlier, by front-loading initialization work.
...
About goal 2: Preallocating a process from the Gecko side doesn't actually front-load all that much work. Without LaunchSubprocessResolve, all it gets us is dlopen("libxul.so") and
InitXPCOM, which altogether takes 50ms or less. And with LaunchSubprocessResolve, it also front-loads about about 70ms of ContentChild initialization work. But there's not much point to front-loading content process work because the content process is not the bottleneck during applink startup. And what's worse, moving the CPU work earlier can take up CPU resources that the parent process needs for more critical work, such as drawing the URL bar or kicking off the network request.
I have a better understanding now, thanks for the detailed write up.
And yes, in terms of networking instead of front-loading the content process creation work it seems much better to just speculatively initiate the connection to the applink host, Bug 1956356, and let that establish while the parent and content process initialize as efficiently as possible.
Time to request start is a bit over 200ms on Fenix, P75, via the pageload event, so I would hope we can save most of that time.
| Reporter | ||
Comment 10•7 months ago
|
||
I've squashed these patches into the patch for prelaunching the webextension process, bug 1958327.
Updated•7 months ago
|
Updated•7 months ago
|
Updated•7 months ago
|
Description
•