Closed Bug 1819311 Opened 2 years ago Closed 2 years ago

Improve error reporting for process launches

Categories

(Core :: IPC, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
114 Branch
Tracking Status
firefox114 --- fixed

People

(Reporter: gerard-majax, Assigned: gerard-majax)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 2 obsolete files)

The rationale here is related to https://bugzilla.mozilla.org/show_bug.cgi?id=1789126#c22 ; tl;dr is:

  • we hit a reject
  • everything suggest this is because of GeckoChildProcessHost::Destroy that rejects when we are in the begining of starting a process
  • we have IsShutdown() guarding on our utility process manager
  • this depends on a boolean updated when xpcom-shutdown observer is delivered to us
  • the test reproducing are tests where I already spotted raciness around starting a process when shutting down

So my take is we enter a small window of possible race between the GeckoChildProcessHost::Destroy and the moment we receive the observer, hence we get rejected by Destroy but we still dont know we are in shutdown. Moving to AppShutdown would, I hope, avoid that.

I'm not convinced that this would do anything to improve the situation from https://bugzilla.mozilla.org/show_bug.cgi?id=1789126#c21. From looking at the stack you can see that we're not yet in proper XPCOM shutdown, so we can't have passed either xpcom-shutdown or xpcom-will-shutdown yet.

I think whatever failure is happening here is probably unrelated to shutdown. There are many different reasons why child process startup could fail (e.g. due to the binary being moved/replaced, among probably other things) so we should probably change the code to handle failure here, rather than diagnostic asserting.

Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(lissyx+mozillians)

So there's already some data collection on subprocess launch failures at https://searchfox.org/mozilla-central/rev/aa3ccd258b64abfd4c5ce56c1f512bc7f65b844c/ipc/glue/GeckoChildProcessHost.cpp#776-779 and it should cover what we want to look at, except maybe we lack of actual error message (but I could only see usages of LaunchError{} around, with LaunchError being an empty struct. I have however not been able to see the telemetry data so far.

Jens, on bug 1795821 you mention tracking better the kind error failure Which errors are propagated under which conditions by the async promise chain we have here and if there are recoverable conditions we might be able to distinguish and handle better or at least to notify to the user. So far looking at GeckoChildProcessHost I see mostly opaque LaunchError, is fixing this was what you implied ?

Flags: needinfo?(jstutte)

I think I wanted to say something on the lines of:

  1. check if we understand which conditions can lead to an error
  2. check if those have distinct error codes/messages
  3. verify if some of those are potentially avoidable (like checking something before launching)
  4. verify if some of those are potentially recoverable (like retrying)
  5. check then if all callers are checking well for error conditions and how they react and if we should tell something to the user (if we are not in shutdown)

So having something better than opaque LaunchError is definitely an improvement in that direction, as you would touch at least 1., 2. and a bit of 5. of the above list, yeah.

(this would alter the scope of this bug quite heavily, btw)

Flags: needinfo?(jstutte)
Summary: Use AppShutdown::IsInOrBeyond instead of sXPCOMShutdown + Observer on Utility process → Improve error reporting for process launches
Blocks: 1795821
Attachment #9320274 - Attachment description: Bug 1819311 - Use AppShutdown instead of xpcom-shutdown observer r?nika! → WIP: Bug 1819311 - Dont assert on process launch failure and collect errors
Attached file WIP: Bug 1819311 - WIP Linux (obsolete) —

Depends on D171226

Attached file WIP: Bug 1819311 - WIP macOS (obsolete) —

Depends on D172433

Attachment #9322752 - Attachment is obsolete: true
Attachment #9322751 - Attachment is obsolete: true
Severity: -- → S3
Priority: -- → P2
Attached file data review
Attachment #9326644 - Flags: data-review?(jrediger)
Attachment #9326644 - Flags: data-review?(chutten)
Attachment #9320274 - Attachment description: WIP: Bug 1819311 - Dont assert on process launch failure and collect errors → Bug 1819311 - Dont assert on process launch failure and collect errors r?nika!,bobowen!,#geckoview-reviewers!

Comment on attachment 9326644 [details]
data review

I'm not a data steward, but chutten is and can take care of it here.

Attachment #9326644 - Flags: data-review?(jrediger)
Attachment #9320274 - Attachment description: Bug 1819311 - Dont assert on process launch failure and collect errors r?nika!,bobowen!,#geckoview-reviewers! → Bug 1819311 - Collect errors on process launch r?nika!,bobowen!

Comment on attachment 9326644 [details]
data review

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Alexandre Lissy is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.


Result: datareview+

Attachment #9326644 - Flags: data-review?(chutten) → data-review+
Blocks: 1826795
Pushed by alissy@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/654ce2247015 Limit hard crash when utility process is not starting r=nika https://hg.mozilla.org/integration/autoland/rev/6ef392f0f69a Collect errors on process launch r=nika,geckoview-reviewers,owlish
Regressions: 1827894
Blocks: 1835307
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: