Closed Bug 1682507 Opened 3 years ago Closed 2 years ago

Use the Windows Error Reporting API to generate minidumps for crashes which cannot be caught using Breakpad

Categories

(Toolkit :: Crash Reporting, enhancement)

Unspecified
Windows
enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: gsvelto, Unassigned)

References

(Blocks 1 open bug)

Details

As per title, this will need to leverage the Windows Error Reporting API to generate minidumps for crashes that cannot be caught with Breakpad. There's multiple parts to this work which I'll file out in separate bugs:

  • A runtime exception module needs to be written in order to intercept the crashes
  • The module needs to be registered by the installer/updater in the Windows registry in order to work
  • Gecko must be modified to also register the new exception module at runtime
  • The exception module needs to be wired up with the crash reporter client so that it can submit crash reports and restart Firefox in case of full browser crashes. This includes finding a way to generate the required metadata which we normally store in the .extra file
  • The exception module needs to be able to talk to Firefox' main process to notify when it has intercepted child process crashes that could not be handled by Breakpad
Depends on: 1682509
Depends on: 1682511
Depends on: 1682514
Depends on: 1682516
Depends on: 1682518
Depends on: 1682520

I poked around a bit and I'm not 100% sure you can use this the way you want to. It seems to be primarily focused on allowing you to add some additional context to WER reports. In any event, here are some links I found:

Additionally, I ran into this MSDN article about enabling User-Mode dumps in WER, which discusses how you can configure registry keys to ask WER to write minidumps to a local path on disk, including configuring that on a per-application basis. If writing dumps from the runtime exception module doesn't prove feasible, setting these keys in the installer might be a way to still get minidumps from crashes that Firefox can't handle itself and upload them to crash-stats.

(In reply to (not currently active) Ted Mielczarek from comment #1)

I poked around a bit and I'm not 100% sure you can use this the way you want to. It seems to be primarily focused on allowing you to add some additional context to WER reports.

I manage to coerce it into writing minidumps via the callbacks that are in the runtime exception module. I haven't tested every possible exception yet but it looks quite promising and I hope it can capture most of the crashes that we're missing. A very nice side-effect of this is that it all works out of the crashed process and while you're in WER the process is suspended, so I'm almost hopeful to be able to disable Breakpad's exception handler once I get it workig.

Additionally, I ran into this MSDN article about enabling User-Mode dumps in WER, which discusses how you can configure registry keys to ask WER to write minidumps to a local path on disk, including configuring that on a per-application basis. If writing dumps from the runtime exception module doesn't prove feasible, setting these keys in the installer might be a way to still get minidumps from crashes that Firefox can't handle itself and upload them to crash-stats.

I've considered that, but only as a last resort. The problem is that there's nothing to inform you that the minidump was written so you'd have to poll the target directory after a process crashes hoping to find something.

(In reply to Gabriele Svelto [:gsvelto] from comment #0)

As per title, this will need to leverage the Windows Error Reporting API to generate minidumps for crashes that cannot be caught with Breakpad.

Can you say more about which types of crashes can't be caught by breakpad? Are these just content process startup crashes, or are there other categories as well?

Flags: needinfo?(gsvelto)

TL;DR any exception raised via the RaiseFailFastException API.

Control-flow guard and /GS violations both raise exceptions using this API.

I'm not sure whether Gabriele is aware of any other cases...

I don't know about specific classes of crashes besides the ones mentioned by Aaron but we know we're missing crashes from exceptions we're supposed to be able to catch. For example:

  • We catch heap corruption exception crashes (bug 1633052) but looking at the Windows Error Reporting dashboards we're likely missing 90%+ of those
  • We often don't catch stack overflow / stack smashing crashes, possibly because the stack is so borked - or there's so little space left - that even the exception handler can't do its job
  • We don't catch many OOMs that are originating from within Microsoft libraries (we basically catch OOM crashes only when VirtualAlloc() returns NULL)

I'm sure there's more, this is just what came up by looking at Microsoft's dashboards. Also if the mechanism proves itself reliable we might as well get rid of breakpad's exception handler.

Flags: needinfo?(gsvelto)
Depends on: 1696620
Summary: Using the Windows Error Reporting API to generate minidumps for crashes which cannot be caught using Breakpad → Use the Windows Error Reporting API to generate minidumps for crashes which cannot be caught using Breakpad
Depends on: 1697895
Blocks: 1696590
Depends on: 1705396
Depends on: 1711418
Depends on: 1703761

This has been working fine for over a year, taking out the remaining bugs because they're nice-to-have but don't affect the core functionality and closing this bug.

No longer depends on: 1682520, 1696620

This has been working fine for over a year, taking out the remaining bugs because they're nice-to-have but don't affect the core functionality and closing this bug.

Did you intend to close this or is there anything left?

Yes, I wanted to close this but I forgot!

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.