Closed Bug 1740078 Opened 3 years ago Closed 2 years ago

Hang in main process under CrashReporter::CreateMinidumpsAndPair

Categories

(Toolkit :: Crash Reporting, defect, P2)

x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
101 Branch
Tracking Status
firefox101 --- fixed

People

(Reporter: mccr8, Assigned: gsvelto)

References

Details

Attachments

(2 files)

I was using Firefox Nightly on OSX today, and immediately after clicking on a link in Twitter or something Firefox started beachballing. It just kind of sat in that state for awhile. Eventually, I went into the activity monitor to kill the process, and it looked like the main process was not responding. I sampled the process, and it looks like the main thread is sitting in CrashReporter::CreateMinidumpsAndPair(). When I went to kill the main process, it brought up an OSX crash report, which I submitted. I'll attach an excerpt from that. about:crashes doesn't show anything.

Flags: needinfo?(gsvelto)

This is a part of the macOS minidump generator which I hadn't looked into yet and quite frankly it's scary. It synchronizes the main thread requesting a minidump with the exception handler writing it by doing the following:

  1. The main thread locks a mutex, this should always succeed
  2. The main thread sends a message over a mach port to the exception handler, this is a non-blocking operation
  3. The main thread locks the mutex in point 1 again, this should always block because the mutex is already locked
  4. The exception handler thread receives the message and writes out a minidump
  5. The exception handler thread unlocks the mutex
  6. The main thread obtains the lock on the mutex and proceeds to retrieve the minidump

This is rather scary but should be race-free - if the lock behaves. In the macOS crash report the exception handler thread is waiting on messages and the main thread is waiting on the lock so we've somehow gotten in the situation where they're locked up. We've already seen macOS locks misbehaving in unpleasant ways (cough bug 1676343 cough) so that could be part of the problem. Alternatively there might be a race elsewhere, a third thread - the crash generation thread - could be involved if a child process requested a minidump at the same time. I have no idea how the three threads would interact in that scenario but I'll probably have to figure it out to get to the bottom of this.

Severity: -- → S2
Flags: needinfo?(gsvelto)
Priority: -- → P2

My analysis in comment 1 was probably wrong. This is more likely to be caused by other threads in the main process being paused, see bug 1764230 comment 3. A potential quick fix would be to not allocate anything in CrashReporter::PairedDumpCallback().

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8f49a216e205
Do not allocate memory in Breakpad's minidump generation callbacks to prevent deadlocks r=KrisWright
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: