Add API to BHR for specifying a child process to generate a paired minidump upon hang
Categories
(Core :: XPCOM, enhancement)
Tracking
()
People
(Reporter: bugzilla, Unassigned)
References
Details
From https://bugzilla.mozilla.org/show_bug.cgi?id=1419108#c23:
it would be nice if we had some kind of RAII mechanism to say, "Yo, hang reporter, if you get any hangs while I'm on the stack, take a paired minidump with this process, okay?"
Comment 1•6 years ago
|
||
This would potentially help with a high volume a11y crash.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 2•6 years ago
|
||
Taking this since I've finally have some spare cycles.
Comment 3•6 years ago
|
||
I did a bit of preliminary analysis of what would be needed and I think I had underestimated the effort; there's a few moving parts here:
-
An interface that allows the shutdown terminator to generate the hang of a process with some kind of RAII "hang marker" as per comment 0. This isn't hard, it just needs a bit of IPC to wire it up but nothing major.
-
In the terminator itself we'd need to grab a minidump of the target child process before crashing. This is well supported and isn't a real issue, the hang detector (which is a different piece of code than the shutdown terminator) already does it.
-
The real problems begin once we have the minidump. Right now we have a mechanism to pair minidumps and send them in a single crash report; that's what we use when detecting content process & plugin hangs. However that mechanism expects that the main process is alive and well to work. In our case we're also killing the main process, so we'd need to adapt it to pair the minidump with the one generated from the main process. That's tricky because we're not calling the crash reporter as with regular hangs, we're killing the main process. One way to do this might be to add an annotation with the child minidump so that the crash reporter client can pick it up and use it together with the main process dump.
-
Assuming we can fix the above and have both the main process minidump and the child process minidump, and they're paired, we need to modify the crash reporter client to handle both. Right now the crash reporter client assumes there's only one minidump to be sent, the one that's passed by the exception handler. Assuming we have a way to find the child process minidump (for example from an annotation) we'd rename the browser minidump like we do for hangs and send them both. Socorro knows how to handle these cases.
The result should end up similar to regular hangs (see [1] for an example). The signature and stack trace on crash-stats will be the one from the child process, but we'd prepend the shutdown hang crash reason to it. The main process dump would then be available in the "Raw" tab as the paired minidump.
Liz, Aaron would this be acceptable? My main gripe is that this would change the crash signature of all existing shutdown hangs. On the flipside the new signature might be more useful; I believe that with the current system we're often lumping different shutdown hangs under the same signature.
[1] https://crash-stats.mozilla.com/report/index/40e2fb1f-79b7-4e9c-b2b8-b8c3d0190306
Comment 4•6 years ago
|
||
Scratch most of my previous comment, there should be a better way to do this. We already have a way to grab a minidump when we kill a stuck content process [1]. There's two problems with using that: the first one is that in this case it's the main process that's also stuck so it will never trigger that code (unless we do it explicitly) and the second one is that we've explicitly disabled minidump generation during shutdown because it was leading to other shutdown hangs.
So maybe an easier way to do this is the following:
-
Implement the RAII marker so that it flags the content process as "wants a minidump on shutdown"
-
In the shutdown terminator code explicitly kill all content processes in a way that uses ContentProcess::KillHard() before killing the parent process. Only the content process that has been marked will get a minidump however, even if we're already shutting down.
If I could get this to work then I wouldn't have to worry about dealing with the paired minidumps. They would be assembled and detected correctly upon startup giving the user the chance to send them. The biggest problem - which also applies to explicitly writing the minidumps as described in comment 3 - is that all of this is happening on the shutdown terminator's thread. We cannot rely on the main thread and I don't think any of this code is safe to call outside of the main thread, but the main thread's stuck so we've got a chicken-and-egg problem.
Comment 5•6 years ago
|
||
Another quick update: it seems that what I described above is not feasible because in the scenario of bug 1419108 the main process' main thread is stuck and we rely on it at least partially for minidump generation. It might still be possible to generate a child process minidump from a separate thread (the actual minidump writer has its own thread) but I haven't figured out how to do it during shutdown yet.
Comment 6•6 years ago
|
||
Cleaning up flags for our releases in flight.
Comment 7•6 years ago
|
||
I've investigated all possible ways of doing this but I couldn't find one that would work in this scenario (i.e. with the main process' main thread blocked). To make something like this work we would need fully out-of-process minidump generation and crash handling which is something we'd like to implement but which will require significant time and effort. I'm unassigning myself for now as this is not feasible in a short timeframe.
Updated•3 years ago
|
Description
•