Closed Bug 1499465 Opened 7 years ago Closed 7 years ago

Considering stopping collection of crash reports when KillHard'ing from the shutdown timer

Categories

(Core :: DOM: Content Processes, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla65
Performance Impact medium
Tracking Status
firefox65 --- fixed

People

(Reporter: mconley, Assigned: mconley)

References

Details

(Whiteboard: [fxperf:p1])

Attachments

(1 file)

We collect paired minidumps for a crash report anytime the parent process forcibly kills a content process. Sometimes, the parent KillHard's the content process because it returns false or otherwise unexpected values over IPC. The other time we KillHard is when we request to shutdown a content process, and it hasn't shut itself down within 5 seconds. We originally added the minidump gathering mechanism in bug 1068349. Here's a profile where a KillHard from the shutdown kill timer is hanging the main thread in the parent process for what appears to be about _10 seconds_: https://perfht.ml/2Enu2Lp So gathering these things is not free, and I suspect this will only get worse as we ratchet up the number of default content processes (since presumably we'll be starting and shutting down more of them). I propose that we skip the crash report gathering step when KillHard'ing due to the shutdown timeout. We've been gathering these shutdownkill crash reports for a while now (see bug 1219672), and I don't think we've really made many inroads into reducing them. Maybe we should just be satisfied with KillHard'ing them, and skip the crash report gathering altogether.
What do you think, jimm?
Flags: needinfo?(jmathies)
For the record, this 10s hang was experienced on the Quantum 2018 reference hardware, which is pretty weak CPU-wise.
Do we have any mechanisms for gathering hang reports from content processes these days? Something like that would probably be a lot lighter weight than collecting a full crash report.
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #3) > Do we have any mechanisms for gathering hang reports from content processes > these days? Something like that would probably be a lot lighter weight than > collecting a full crash report. We have BHR getting this sort of thing on Nightly.
We were just discussing making improvements to our child shutdown situation a release requirement for fission rollout. I think the hangs will show up in other ways (the prompt in bug 1493284 possibly?). The crash reports serve as a reminder we have an issue here. I'd be ok with turning this off for release builds but I think we should keep reporting for dev builds. ni'ing overholt for the fyi that we might do something here that changes current behavior.
Flags: needinfo?(jmathies) → needinfo?(overholt)
Thanks, Jim! Agreed on turning this off for release builds but I'm curious about the root causes.
Flags: needinfo?(overholt)
If I'm reading the profile in comment 0 correctly, the majority of the time is spent getting the thread context for the parent process. So as an alternative to completely shutting it off, we could also consider not doing a paired minidump in the shutdownkill case, but just getting the minidump from the child (I can't see how the parent stack would be that useful - the stack will just show us running the shutdownkill callback).
Historically I think we captured paired minidumps to help diagnose issues where the child process was stuck waiting for a reply from the parent process and things like that. I don't know how useful that has actually been in practice. I think you should do whatever makes the end-user experience best since you're the ones doing investigation with the data produced here. If you're not using these crash reports, then they're probably not useful.
Whiteboard: [qf] → [qf:p2][fxperf]
If this is just flipping a switch to turn these off (and/or removing code), this seems like a cheap way to improve people's experiences with this stuff, so let's make it p1. Mike, do we need to add a pref for this or should we essentially just back out bug 1068349?
Flags: needinfo?(mconley)
Whiteboard: [qf:p2][fxperf] → [qf:p2][fxperf:p1]
I think it's a matter of either adding a pref or a build time configuration to not run this chunk of code: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/dom/ipc/ContentParent.cpp#3467-3485
Assignee: nobody → mconley
Flags: needinfo?(mconley)
Attachment #9022761 - Flags: review?(mconley)
Attachment #9022761 - Flags: review?(mconley)
Attachment #9022761 - Attachment description: Bug 1499465 - Don't collect ShutdownKill crashes on Beta or Release. r?jld → Bug 1499465 - Don't collect KillHard crash reports on Beta or Release. r?jld
Attachment #9022761 - Attachment description: Bug 1499465 - Don't collect KillHard crash reports on Beta or Release. r?jld → Bug 1499465 - Don't collect KillHard crashes on Beta or Release. r?jld,gsvelto
Pushed by mconley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b4754021fc2d Don't collect KillHard crashes on Beta or Release. r=jld,gsvelto
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Performance Impact: --- → P2
Whiteboard: [qf:p2][fxperf:p1] → [fxperf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: