Closed Bug 1165945 Opened 9 years ago Closed 1 year ago

Better handle a failure to start a content child process.

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
e10s + ---

People

(Reporter: bobowen, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

On Windows, bug 1146874 fixed the problem where a failure to start a content process would cause the main process to crash.

Now we don't crash, but nothing gets rendered.

Should we handle this better, with retries or communication to the user, like when the content process crashes?

Should we be reporting this via the crash reporter?
Maybe the reporting should be common code in GeckoChildProcessHost.cpp
We should probably communicate (and capture with telemetry) this sort of child process start-up failure to the user before releasing e10s.
tracking-e10s: --- → ?
Bob seems like a printing to the console should suffice for now
In addition to logging to the console, can you add a telemetry probe so we can know if this is happening?
Assignee: nobody → jmathies
Priority: -- → P2
Flags: needinfo?(haftandilian)
Assignee: jmathies → haftandilian
Flags: needinfo?(haftandilian)
Bob, is this what you had in mind?
Attachment #8752907 - Flags: review?(bobowen.code)
Comment on attachment 8752907 [details] [diff] [review]
Adds a telemetry probe and console logging to record subprocess launch failures

Review of attachment 8752907 [details] [diff] [review]:
-----------------------------------------------------------------

This bug was originally about handling when the process fails and what happens after we return false from RunPerformAsyncLaunch.

I put some thoughts in the Description, but I'm not sure who decides how we should handle it and I don't think I'd be the right person to review it when we do.

A simple count like this of process type failures, is probably a good first probe, to see if there is a significant problem that we need to solve.
I think you have to get a data peer review for new probes (an f? to :bsmedberg is how the MDN page suggests you do this).

::: ipc/glue/GeckoChildProcessHost.cpp
@@ +564,5 @@
> +
> +  if (PerformAsyncLaunch(aExtraOpts, aArch)) {
> +    return true;
> +  } else {
> +    CHROMIUM_LOG(ERROR) << "Failed to launch " <<

Does this log to the console?
Attachment #8752907 - Flags: review?(bobowen.code)
(In reply to Bob Owen (:bobowen) from comment #5)
> Comment on attachment 8752907 [details] [diff] [review]
> Adds a telemetry probe and console logging to record subprocess launch
> failures
> 
> Review of attachment 8752907 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This bug was originally about handling when the process fails and what
> happens after we return false from RunPerformAsyncLaunch.
> 
> I put some thoughts in the Description, but I'm not sure who decides how we
> should handle it and I don't think I'd be the right person to review it when
> we do.

OK. I agree that a page like tab crashed would be ideal. Will look into that. Thanks.

> A simple count like this of process type failures, is probably a good first
> probe, to see if there is a significant problem that we need to solve.
> I think you have to get a data peer review for new probes (an f? to
> :bsmedberg is how the MDN page suggests you do this).

Wanted to check with you first, will get review from bsmedberg regarding the telemetry.

> ::: ipc/glue/GeckoChildProcessHost.cpp
> @@ +564,5 @@
> > +
> > +  if (PerformAsyncLaunch(aExtraOpts, aArch)) {
> > +    return true;
> > +  } else {
> > +    CHROMIUM_LOG(ERROR) << "Failed to launch " <<
> 
> Does this log to the console?

Yet to test on other platforms, but on OS X it does for debug and non-debug builds.
Comment on attachment 8752907 [details] [diff] [review]
Adds a telemetry probe and console logging to record subprocess launch failures

f? for data collection review.
Attachment #8752907 - Flags: feedback?(benjamin)
Comment on attachment 8752907 [details] [diff] [review]
Adds a telemetry probe and console logging to record subprocess launch failures

Why is n_values 16? Do we really expect there to be 16 process types in the future?

Please document in the histogram "description" what the values mean. Either directly, or reference the enumeration type used. data-review=me with that fixed.
Attachment #8752907 - Flags: feedback?(benjamin) → feedback+
Blocks: 1275430
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> Comment on attachment 8752907 [details] [diff] [review]
> Adds a telemetry probe and console logging to record subprocess launch
> failures
> 
> Why is n_values 16? Do we really expect there to be 16 process types in the
> future?

The GeckoProcessType macro goes from 0-5 now. It's conceivable that we might add more process types as we implement sandboxing. I've reduced this to 10 to leave room to add more.

> Please document in the histogram "description" what the values mean. Either
> directly, or reference the enumeration type used. data-review=me with that
> fixed.

Updated the histogram description to read "Counts the number of times launching a subprocess fails. Counts are by subprocess-type using the GeckoProcessType enum."
Attachment #8752907 - Attachment is obsolete: true
No longer blocks: 1275430
See Also: → 1275430
I'm splitting off the telemetry and logging to bug 1275430 so that this bug can be used to improve the user experience when a content process fails once we determine how frequently that happens. When I commented out the code to start a content process to test the telemetry code, Firefox hung at launch and no UI ever was displayed.
Assignee: haftandilian → nobody
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 5 See Also bugs.
:jjalkanen, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jjalkanen)

GeckoChildProcessHost::AsyncLaunch has been largely reworked in the meantime. Which does not mean it cannot be improved, but I think this specific bug is obsolete.

Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(jjalkanen)
Resolution: --- → WONTFIX

From the original description of this bug, this is more of a UI problem: when the process launch fails, the user gets a blank page, instead of retrying or showing an error. So, I don't know if the internal rearchitecting of GeckoChildProcessHost is relevant here.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #14)

From the original description of this bug, this is more of a UI problem: when the process launch fails, the user gets a blank page, instead of retrying or showing an error. So, I don't know if the internal rearchitecting of GeckoChildProcessHost is relevant here.

OK, fair enough. I'd say, given that we added at least telemetry and log message here, the minimum requirement of giving "some" feedback has been fulfilled and we can keep this defect closed. But I filed bug 1795821 as an enhancement and made it depend on the bugs that were linked here with "see also" only. I hope that reflects the state better.

See Also: → 1795821

(In reply to Jens Stutte [:jstutte] from comment #15)

But I filed bug 1795821 as an enhancement and made it depend on the bugs that were linked here with "see also" only. I hope that reflects the state better.

That's a good summary of the issues as I understand them; thanks for filing it.

You need to log in before you can comment on or make changes to this bug.