Do not record minidumps for content process shutdown hangs if we fail to kill the process
Categories
(Core :: DOM: Content Processes, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox106 | --- | fixed |
People
(Reporter: gsvelto, Assigned: gsvelto)
References
(Blocks 2 open bugs)
Details
Attachments
(1 file)
The code that grabs a minidump of a hung content process during shutdown has two flaws which we should address:
- The first one is that we still record a minidump even when we fail to kill the process. See this, if
KillProcess()
fails we ought to delete the minidump but we don't. This means that we often record hangs for cases when the process terminated on its own somehow (or crashed!) and it artificially inflates the volume in bug 1279293. - The second issue is that we send telemetry for hangs as if they were crashes. That's because
FinalizeCrashReport()
callsRecordCrash()
(here). Duh! This is wrong as these aren't crashes and neither Gecko nor our Telemetry should record them as such.
Comment 1•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #0)
- The first one is that we still record a minidump even when we fail to kill the process. See this, if
KillProcess()
fails we ought to delete the minidump but we don't. This means that we often record hangs for cases when the process terminated on its own somehow (or crashed!) and it artificially inflates the volume in bug 1279293.
Just to second this: This seems to happen in ~28% of the recorded hangs (I counted those without "IPC shutdown status" annotation and did not check all of them for empty minidumps, though).
Comment 2•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #0)
- The second issue is that we send telemetry for hangs as if they were crashes. That's because
FinalizeCrashReport()
callsRecordCrash()
(here). Duh! This is wrong as these aren't crashes and neither Gecko nor our Telemetry should record them as such.
Would we still see them in crash-stats as before? We have quite some instrumentation (like annotations) in place to analyze those hangs, and we pretty much rely on queries to crash-stats for those to be useful.
Assignee | ||
Comment 3•2 years ago
|
||
Yes, this change only affects telemetry and it means we're overcounting crashes by lumping hangs together with them.
Assignee | ||
Comment 4•2 years ago
|
||
Updated•2 years ago
|
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bff57b2e4b16 Generate crash reports for content process shutdown hangs only when we can confirm that the process did not shut down in time r=jstutte
Comment 6•2 years ago
•
|
||
Backed out for causing build bustage.
Backout link: https://hg.mozilla.org/integration/autoland/rev/006e1e36a92e38075b716d1c1beea275dc06851e
Failure log: https://treeherder.mozilla.org/logviewer?job_id=390207315&repo=autoland&lineNumber=65237
There are also these failures:
https://treeherder.mozilla.org/logviewer?job_id=390209544&repo=autoland&lineNumber=6487
Comment 7•2 years ago
|
||
Assignee | ||
Comment 8•2 years ago
|
||
The build failure is a dumb mistake on my part... but the test failure doesn't happen on my box, so odd.
Assignee | ||
Comment 9•2 years ago
|
||
The test failure is also a dumb mistake on my part, and it does fail locally but it times out so I hadn't spotted the failure 🤦
Comment 10•2 years ago
|
||
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/881add28e309 Generate crash reports for content process shutdown hangs only when we can confirm that the process did not shut down in time r=jstutte
Comment 11•2 years ago
|
||
bugherder |
Description
•