Use the Windows Error Reporting API to generate minidumps for crashes which cannot be caught using Breakpad
Categories
(Toolkit :: Crash Reporting, enhancement)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
References
(Blocks 1 open bug)
Details
As per title, this will need to leverage the Windows Error Reporting API to generate minidumps for crashes that cannot be caught with Breakpad. There's multiple parts to this work which I'll file out in separate bugs:
- A runtime exception module needs to be written in order to intercept the crashes
- The module needs to be registered by the installer/updater in the Windows registry in order to work
- Gecko must be modified to also register the new exception module at runtime
- The exception module needs to be wired up with the crash reporter client so that it can submit crash reports and restart Firefox in case of full browser crashes. This includes finding a way to generate the required metadata which we normally store in the .extra file
- The exception module needs to be able to talk to Firefox' main process to notify when it has intercepted child process crashes that could not be handled by Breakpad
Comment 1•4 years ago
|
||
I poked around a bit and I'm not 100% sure you can use this the way you want to. It seems to be primarily focused on allowing you to add some additional context to WER reports. In any event, here are some links I found:
- A sample C++ project using these APIs in a Microsoft GitHub repo
- A Chromium bug where a Google employee experimented with using these APIs but found that they didn't work for the specific cases they were interested in in Chromium.
- Assuming you were planning to implement this module in Rust, the winapi crate already has these APIs declared.
- A blog post that talks about experimenting with these APIs, including writing a minidump independently of WER.
Additionally, I ran into this MSDN article about enabling User-Mode dumps in WER, which discusses how you can configure registry keys to ask WER to write minidumps to a local path on disk, including configuring that on a per-application basis. If writing dumps from the runtime exception module doesn't prove feasible, setting these keys in the installer might be a way to still get minidumps from crashes that Firefox can't handle itself and upload them to crash-stats.
Reporter | ||
Comment 2•4 years ago
|
||
(In reply to (not currently active) Ted Mielczarek from comment #1)
I poked around a bit and I'm not 100% sure you can use this the way you want to. It seems to be primarily focused on allowing you to add some additional context to WER reports.
I manage to coerce it into writing minidumps via the callbacks that are in the runtime exception module. I haven't tested every possible exception yet but it looks quite promising and I hope it can capture most of the crashes that we're missing. A very nice side-effect of this is that it all works out of the crashed process and while you're in WER the process is suspended, so I'm almost hopeful to be able to disable Breakpad's exception handler once I get it workig.
Additionally, I ran into this MSDN article about enabling User-Mode dumps in WER, which discusses how you can configure registry keys to ask WER to write minidumps to a local path on disk, including configuring that on a per-application basis. If writing dumps from the runtime exception module doesn't prove feasible, setting these keys in the installer might be a way to still get minidumps from crashes that Firefox can't handle itself and upload them to crash-stats.
I've considered that, but only as a last resort. The problem is that there's nothing to inform you that the minidump was written so you'd have to poll the target directory after a process crashes hoping to find something.
Comment 3•4 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #0)
As per title, this will need to leverage the Windows Error Reporting API to generate minidumps for crashes that cannot be caught with Breakpad.
Can you say more about which types of crashes can't be caught by breakpad? Are these just content process startup crashes, or are there other categories as well?
Comment 4•4 years ago
|
||
TL;DR any exception raised via the RaiseFailFastException
API.
Control-flow guard and /GS
violations both raise exceptions using this API.
I'm not sure whether Gabriele is aware of any other cases...
Reporter | ||
Comment 5•4 years ago
|
||
I don't know about specific classes of crashes besides the ones mentioned by Aaron but we know we're missing crashes from exceptions we're supposed to be able to catch. For example:
- We catch heap corruption exception crashes (bug 1633052) but looking at the Windows Error Reporting dashboards we're likely missing 90%+ of those
- We often don't catch stack overflow / stack smashing crashes, possibly because the stack is so borked - or there's so little space left - that even the exception handler can't do its job
- We don't catch many OOMs that are originating from within Microsoft libraries (we basically catch OOM crashes only when
VirtualAlloc()
returnsNULL
)
I'm sure there's more, this is just what came up by looking at Microsoft's dashboards. Also if the mechanism proves itself reliable we might as well get rid of breakpad's exception handler.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 6•3 years ago
|
||
This has been working fine for over a year, taking out the remaining bugs because they're nice-to-have but don't affect the core functionality and closing this bug.
Comment 7•3 years ago
|
||
This has been working fine for over a year, taking out the remaining bugs because they're nice-to-have but don't affect the core functionality and closing this bug.
Did you intend to close this or is there anything left?
Reporter | ||
Comment 8•3 years ago
|
||
Yes, I wanted to close this but I forgot!
Description
•