Isolate child process launching into a helper class
Categories
(Core :: IPC, task, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: bholley, Assigned: bholley)
References
Details
Attachments
(6 files)
At present, all the launching takes place in methods on GeckoChildProcessHost. This means sharing state across the main thread, the IO thread, and the launcher pool, and no clear separation between state needed for launching versus steady-state operation.
Cleaning this up is the first step for bug 1562762.
Assignee | ||
Comment 1•6 years ago
|
||
Having to namespace these into GeckoChildProcessHost is annoying. The
|using| declarations help to some extent, but it's easier to just put
them in mozilla::ipc.
Assignee | ||
Comment 2•6 years ago
|
||
This is intended to facilitate future work to break up
PerformAsyncLaunch into asynchronous pieces. It also aims to fix some
raciness around access to state in the GeckoChildProcessHost instance.
There appears to be at least one data race in the existing code, in which
OpenPrivilegedHandle are run on both the launcher thread and the IO
thread without synchronization.
Assignee | ||
Comment 3•6 years ago
|
||
There's one failure on mac, which I finally managed to reproduce locally via an opt build, only when running xpcshell tests in parallel, e.g. ./mach xpcshell-test browser/extensions/formautofill/
Thankfully, this one has a few more useful symbols than what I saw on try:
Thread 0 (crashed)
0 XUL!InvalidArrayIndex_CRASH(unsigned long, unsigned long) [nsTArray.cpp : 27 + 0x1e]
rax = 0x0000000103d3e340 rdx = 0x0000000000000000
rcx = 0x0000000103d3dd48 rbx = 0x00007ffeec268bc0
rsi = 0x0000000000000022 rdi = 0x0000000000000000
rbp = 0x00007ffeec268ae0 rsp = 0x00007ffeec268ae0
r8 = 0x00007ffeec268838 r9 = 0x000000000000000a
r10 = 0x00007ffeec268ab0 r11 = 0xfffffffff5184701
r12 = 0x0000000103f35dd0 r13 = 0x0000000103f828c0
r14 = 0x0000000103f828c8 r15 = 0x0000000103f72020
rip = 0x000000010981638e
Found by: given as instruction pointer in context
1 XUL!mozilla::dom::ContentChild::RecvInitRendering(mozilla::ipc::Endpoint<mozilla::layers::PCompositorManagerChild>&&, mozilla::ipc::Endpoint<mozilla::layers::PImageBridgeChild>&&, mozilla::ipc::Endpoint<mozilla::gfx::PVRManagerChild>&&, mozilla::ipc::Endpoint<mozilla::PRemoteDecoderManagerChild>&&, nsTArray<unsigned int>&&) [ContentChild.cpp : 1483 + 0x9]
rbp = 0x00007ffeec268b20 rsp = 0x00007ffeec268af0
rip = 0x000000010bd642c5
I'll attach the full minidump output.
Assignee | ||
Comment 4•6 years ago
|
||
Comment 5•6 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #3)
1 XUL!mozilla::dom::ContentChild::RecvInitRendering(mozilla::ipc::Endpoint<mozilla::layers::PCompositorManagerChild>&&, mozilla::ipc::Endpoint<mozilla::layers::PImageBridgeChild>&&, mozilla::ipc::Endpoint<mozilla::gfx::PVRManagerChild>&&, mozilla::ipc::Endpoint<mozilla::PRemoteDecoderManagerChild>&&, nsTArray<unsigned int>&&) [ContentChild.cpp : 1483 + 0x9]
So that's here:
if (!CompositorManagerChild::Init(std::move(aCompositor), namespaces[0])) {
where namespaces
comes from GPUProcessManager::CreateContentBridges
in the parent process, and that would yield an empty array if any of it fails, and that failure is checked only by debug assertions.
Assignee | ||
Comment 6•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #5)
where
namespaces
comes from [GPUProcessManager::CreateContentBridges
][loc1] in the parent process, and that would yield an empty array if any of it fails, and that failure is checked only by debug assertions.
Yeah. So the opt-only nature if this is either (a) there's raciness around when CreateContentBridges is called and whether or not it succeeds, (b) the MOZ_ASSERT in the child process happens early enough that it triggers a relaunch when it's hit, or both.
In any case, I have enough to debug this now.
Assignee | ||
Comment 7•6 years ago
|
||
We end up bailing out here because the compositor thread isn't set up yet: https://searchfox.org/mozilla-central/rev/8a990595ce6d5ed07ace2d4d5d86cc69aec90bde/gfx/layers/ipc/CompositorManagerParent.cpp#62
So presumably there's a latent race here. I'll finish debugging tomorrow.
Assignee | ||
Comment 8•6 years ago
|
||
So this is the stack where CreateContentBridges fails. The key point is that we're in CompositorThreadHolder::Shutdown, having nulled out the compositor thread and spinning the event loop until shutdown is complete. ContentParent::InitInternal is being called slightly later with my patches (more promises -> more round-trips through the event loop), and therefore it happens after we've already started shutting the compositor thread down.
The Right Solution, I think, is to make ContentParent initialization fallible.
Assignee | ||
Comment 9•6 years ago
|
||
Depending on the timing of events, we can invoke InitInternal after
we've already torn down the compositor thread, which can make
CreateContentBridges fail, which in turn makes us unable to make a
correct InitRendering call. Implement the infrastructure to bail out in
this case.
Updated•6 years ago
|
Assignee | ||
Comment 10•6 years ago
|
||
This can happen if the parent starts shutting down before a child has
finished starting up.
The main thread case in the child has always just warned, and this MOZ_CRASH was
added recently to the dispatch case in [1]. This can happen if the
parent starts shutting down while the child is still initializing.
[1] https://hg.mozilla.org/mozilla-central/rev/bd8fd9a22dc08d63cec68de475182a1b8eeeca4b
Assignee | ||
Comment 11•6 years ago
|
||
And here's a try run showing mac xpcshell green across retriggers: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8e10b55fb38de49d70e92e607fad3b92ff9f192
Comment 12•6 years ago
|
||
Comment 13•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/073b4d39b0f3
https://hg.mozilla.org/mozilla-central/rev/ae890583cd76
https://hg.mozilla.org/mozilla-central/rev/6035908e789b
https://hg.mozilla.org/mozilla-central/rev/11d1d46d11fb
Description
•