Open Bug 1822498 Opened 2 years ago Updated 6 months ago

When handling a main process crash make sure that further exceptions don't generate more minidumps

Categories

(Toolkit :: Crash Reporting, task)

task

Tracking

()

People

(Reporter: gsvelto, Unassigned)

References

(Blocks 1 open bug)

Details

As per title, this is a problem that's similar to bug 1434933 but trickier to handle. The fix for bug 1434933 prevents us from generating two crash reports from the same crashing child process but only works if:

  • The exception handler is set
  • Windows Error Reporting is not involved

The latter generates problems with the main process too (see bug 1542485 comment 34) as WER will generate minidumps even when the exception handler has been unset.

A robust solution to this problem requires us to atomically set a value indicating that a process is crashing and we're handling the crash. This value would need to be read by all paths that are capable of generating a crash report (so including WER) to prevent them from generating further reports from the same process.

There's another wrinkle to this: in Breakpad we don't support recursive exception handling as it would make things even more complicated. This means that if we have a bug in the crash reporting code and this bug will cause a crash while handling a previous one then we'll never see it. Since WER lives outside of Firefox processes it was capable of catching these crashes - which was useful in some occasions. Preventing it from catching crashes we're already handling would effectively kill this functionality.

... unless we limit its ability but only under some circumstances. Let's say we store the thread ID of the crashing thread when we start handling a crash. We could ignore crashes coming from other threads but still catch crashes coming from the crashing thread, because that would indicate a bug in the exception handler.

Blocks: qm-shutdown-hangs
No longer blocks: 1542485
Flags: needinfo?(jstutte)

After pondering a bit about this here's what we could do: atomically set a global when entering the exception handler, with the thread ID of the thread executing the exception handler. Once the crash report has been written, set the same global to a known value that indicate the crash was handled. The first part of this would be similar to how we use gEncounteredChildException (see here), in that it would enable us to guarantee only one thread entered the exception handler at a time. The second part would help us with WER: since bug 1831092 landed we could register this global as an annotation, so that WER can read it. WER would then emit a crash only if the exception handler was entered but didn't finish executing.

You are answering faster than I could ask... Are you going to pursue this path? Thanks

Flags: needinfo?(jstutte) → needinfo?(gsvelto)

Yeah, that's the plan. I'm currently working on bug 1620998 which is going to change quite a bit of things around the exception handler, so it sounds like a good time.

Flags: needinfo?(gsvelto)
You need to log in before you can comment on or make changes to this bug.