Open Bug 1488990 Opened 6 years ago Updated 2 years ago

Do we handle content processes failing to advance to PROCESS_CONNECTED?

Categories

(Core :: DOM: Content Processes, enhancement, P2)

Unspecified
Windows
enhancement

Tracking

()

People

(Reporter: jld, Unassigned)

References

(Blocks 1 open bug)

Details

GeckoChildProcessHost has several ideas of when the process is “launched”.  The PROCESS_CREATED is reached when there's an OS-level identifier for the process (pid, handle); PROCESS_CONNECTED is later, when the IPC hello message is received.

On Unix the channel is a socketpair, so if the child fails to start or crashes before sending the hello message, the parent will get EOF and go to the PROCESS_ERROR state and fire error callbacks and so on.  On Windows it's a named pipe that the process has to connect back to, and this is where there might be a problem: if the parent doesn't wait for PROCESS_CONNECTED with a timeout, would we ever detect the process failing to start?

GMP processes use GeckoChildProcessHost::SyncLaunch; NPAPI and GPU processes use WaitUntilConnected, and VR processes use the OnChannelConnected callback.  Content processes, however, use LaunchAndWaitForProcessHandle and don't appear to have any kind of timeout for reaching PROCESS_CONNECTED.

So it's possible that if a content process fails to start on Windows, the ContentParent might get stuck instead of reporting an error.  I don't have a Windows dev env at the moment so I haven't tried testing this yet; I found it while reading the code as part of bug 1446161.
On second thought this probably belongs to DOM: Content Processes, because it's more or less specific to ContentParent.
Component: IPC → DOM: Content Processes
See Also: → 1165945
See Also: → 1517781

Bug 1471124 suggests that the answer is “no”: if I'm interpreting the data in that bug correctly, a content process is hanging indefinitely after it's PROCESS_CREATED and before it's even execed (so definitely isn't getting to PROCESS_CONNECTED) and one of the observed results was that every nth tab would fail to display, for n = the number of content processes. Which implies that a non-PROCESS_CONNECTED content process not only isn't ever timed out, but is repeatedly sent requests to create new PBrowsers while in that state.

See Also: → 1471124
Priority: -- → P2
See Also: → 1618904
See Also: → 1711143
See Also: → 1682520
Severity: normal → S3
Blocks: 1795821
You need to log in before you can comment on or make changes to this bug.