Closed Bug 1789421 Opened 2 years ago Closed 2 years ago

Do not record minidumps for content process shutdown hangs if we fail to kill the process

Categories

(Core :: DOM: Content Processes, enhancement)

Unspecified
All
enhancement

Tracking

()

RESOLVED FIXED
106 Branch
Tracking Status
firefox106 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

The code that grabs a minidump of a hung content process during shutdown has two flaws which we should address:

  • The first one is that we still record a minidump even when we fail to kill the process. See this, if KillProcess() fails we ought to delete the minidump but we don't. This means that we often record hangs for cases when the process terminated on its own somehow (or crashed!) and it artificially inflates the volume in bug 1279293.
  • The second issue is that we send telemetry for hangs as if they were crashes. That's because FinalizeCrashReport() calls RecordCrash() (here). Duh! This is wrong as these aren't crashes and neither Gecko nor our Telemetry should record them as such.

(In reply to Gabriele Svelto [:gsvelto] from comment #0)

  • The first one is that we still record a minidump even when we fail to kill the process. See this, if KillProcess() fails we ought to delete the minidump but we don't. This means that we often record hangs for cases when the process terminated on its own somehow (or crashed!) and it artificially inflates the volume in bug 1279293.

Just to second this: This seems to happen in ~28% of the recorded hangs (I counted those without "IPC shutdown status" annotation and did not check all of them for empty minidumps, though).

(In reply to Gabriele Svelto [:gsvelto] from comment #0)

  • The second issue is that we send telemetry for hangs as if they were crashes. That's because FinalizeCrashReport() calls RecordCrash() (here). Duh! This is wrong as these aren't crashes and neither Gecko nor our Telemetry should record them as such.

Would we still see them in crash-stats as before? We have quite some instrumentation (like annotations) in place to analyze those hangs, and we pretty much rely on queries to crash-stats for those to be useful.

Yes, this change only affects telemetry and it means we're overcounting crashes by lumping hangs together with them.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Blocks: 1790334
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bff57b2e4b16
Generate crash reports for content process shutdown hangs only when we can confirm that the process did not shut down in time r=jstutte

The build failure is a dumb mistake on my part... but the test failure doesn't happen on my box, so odd.

Flags: needinfo?(gsvelto)

The test failure is also a dumb mistake on my part, and it does fail locally but it times out so I hadn't spotted the failure 🤦

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/881add28e309
Generate crash reports for content process shutdown hangs only when we can confirm that the process did not shut down in time r=jstutte
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch
Regressions: 1790562
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: