Closed
Bug 1499465
Opened 7 years ago
Closed 7 years ago
Considering stopping collection of crash reports when KillHard'ing from the shutdown timer
Categories
(Core :: DOM: Content Processes, enhancement)
Core
DOM: Content Processes
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox65 | --- | fixed |
People
(Reporter: mconley, Assigned: mconley)
References
Details
(Whiteboard: [fxperf:p1])
Attachments
(1 file)
We collect paired minidumps for a crash report anytime the parent process forcibly kills a content process. Sometimes, the parent KillHard's the content process because it returns false or otherwise unexpected values over IPC.
The other time we KillHard is when we request to shutdown a content process, and it hasn't shut itself down within 5 seconds.
We originally added the minidump gathering mechanism in bug 1068349.
Here's a profile where a KillHard from the shutdown kill timer is hanging the main thread in the parent process for what appears to be about _10 seconds_:
https://perfht.ml/2Enu2Lp
So gathering these things is not free, and I suspect this will only get worse as we ratchet up the number of default content processes (since presumably we'll be starting and shutting down more of them).
I propose that we skip the crash report gathering step when KillHard'ing due to the shutdown timeout. We've been gathering these shutdownkill crash reports for a while now (see bug 1219672), and I don't think we've really made many inroads into reducing them. Maybe we should just be satisfied with KillHard'ing them, and skip the crash report gathering altogether.
| Assignee | ||
Comment 2•7 years ago
|
||
For the record, this 10s hang was experienced on the Quantum 2018 reference hardware, which is pretty weak CPU-wise.
Comment 3•7 years ago
|
||
Do we have any mechanisms for gathering hang reports from content processes these days? Something like that would probably be a lot lighter weight than collecting a full crash report.
| Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #3)
> Do we have any mechanisms for gathering hang reports from content processes
> these days? Something like that would probably be a lot lighter weight than
> collecting a full crash report.
We have BHR getting this sort of thing on Nightly.
Comment 5•7 years ago
|
||
We were just discussing making improvements to our child shutdown situation a release requirement for fission rollout. I think the hangs will show up in other ways (the prompt in bug 1493284 possibly?). The crash reports serve as a reminder we have an issue here.
I'd be ok with turning this off for release builds but I think we should keep reporting for dev builds.
ni'ing overholt for the fyi that we might do something here that changes current behavior.
Flags: needinfo?(jmathies) → needinfo?(overholt)
Comment 6•7 years ago
|
||
Thanks, Jim! Agreed on turning this off for release builds but I'm curious about the root causes.
Flags: needinfo?(overholt)
| Assignee | ||
Comment 7•7 years ago
|
||
If I'm reading the profile in comment 0 correctly, the majority of the time is spent getting the thread context for the parent process. So as an alternative to completely shutting it off, we could also consider not doing a paired minidump in the shutdownkill case, but just getting the minidump from the child (I can't see how the parent stack would be that useful - the stack will just show us running the shutdownkill callback).
Comment 8•7 years ago
|
||
Historically I think we captured paired minidumps to help diagnose issues where the child process was stuck waiting for a reply from the parent process and things like that. I don't know how useful that has actually been in practice. I think you should do whatever makes the end-user experience best since you're the ones doing investigation with the data produced here. If you're not using these crash reports, then they're probably not useful.
| Assignee | ||
Updated•7 years ago
|
Whiteboard: [qf] → [qf:p2][fxperf]
Comment 9•7 years ago
|
||
If this is just flipping a switch to turn these off (and/or removing code), this seems like a cheap way to improve people's experiences with this stuff, so let's make it p1.
Mike, do we need to add a pref for this or should we essentially just back out bug 1068349?
Flags: needinfo?(mconley)
Whiteboard: [qf:p2][fxperf] → [qf:p2][fxperf:p1]
| Assignee | ||
Comment 10•7 years ago
|
||
I think it's a matter of either adding a pref or a build time configuration to not run this chunk of code:
https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/dom/ipc/ContentParent.cpp#3467-3485
Assignee: nobody → mconley
Flags: needinfo?(mconley)
| Assignee | ||
Comment 11•7 years ago
|
||
| Assignee | ||
Comment 12•7 years ago
|
||
| Assignee | ||
Updated•7 years ago
|
Attachment #9022761 -
Flags: review?(mconley)
| Assignee | ||
Updated•7 years ago
|
Attachment #9022761 -
Flags: review?(mconley)
Updated•7 years ago
|
Attachment #9022761 -
Attachment description: Bug 1499465 - Don't collect ShutdownKill crashes on Beta or Release. r?jld → Bug 1499465 - Don't collect KillHard crash reports on Beta or Release. r?jld
Updated•7 years ago
|
Attachment #9022761 -
Attachment description: Bug 1499465 - Don't collect KillHard crash reports on Beta or Release. r?jld → Bug 1499465 - Don't collect KillHard crashes on Beta or Release. r?jld,gsvelto
Comment 13•7 years ago
|
||
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b4754021fc2d
Don't collect KillHard crashes on Beta or Release. r=jld,gsvelto
Comment 14•7 years ago
|
||
| bugherder | ||
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox65:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Updated•3 years ago
|
Performance Impact: --- → P2
Whiteboard: [qf:p2][fxperf:p1] → [fxperf:p1]
You need to log in
before you can comment on or make changes to this bug.
Description
•