Closed Bug 1878428 Opened 4 months ago Closed 3 months ago

WriteBootargsStream invokes moz_xmalloc while other threads are suspended

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

RESOLVED FIXED
125 Branch
Tracking Status
firefox-esr115 125+ fixed
firefox122 --- wontfix
firefox123 --- wontfix
firefox124 --- wontfix
firefox125 --- fixed

People

(Reporter: nika, Assigned: gsvelto)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

I discovered this by encountering a full browser hang on macOS and sampling the browser. I discovered the browser is fully suspended with this stack on one of the threads:

    2215 Thread_21094859: Breakpad ExceptionHandler
      2215 thread_start  (in libsystem_pthread.dylib) + 8  [0x181373e3c]
        2215 _pthread_start  (in libsystem_pthread.dylib) + 136  [0x181379034]
          2215 google_breakpad::ExceptionHandler::WaitForMessage(void*)  (in XUL) + 320  [0x119ff6310]
            2215 google_breakpad::ExceptionHandler::WriteMinidumpWithException(int, long long, long long, __darwin_ucontext64*, unsigned int, unsigned int, bool, bool)  (in XUL) + 516  [0x11d55a6a4]
              2215 google_breakpad::MinidumpGenerator::Write(char const*)  (in XUL) + 264  [0x11d55a3d4]
                2215 google_breakpad::MinidumpGenerator::WriteBootargsStream(MDRawDirectory*)  (in XUL) + 200  [0x11d55bb10]
                  2215 moz_xmalloc  (in libmozglue.dylib) + 40  [0x100fdfc48]
                    2215 malloc  (in libmozglue.dylib) + 464  [0x100fb04c8]
                      2215 _os_unfair_lock_lock_slow  (in libsystem_platform.dylib) + 208  [0x1813a570c]
                        2215 __ulock_wait  (in libsystem_kernel.dylib) + 8  [0x18133a66c]

Another thread held the jemalloc lock, and was suspended, meaning the browser fully deadlocked.

I am guessing the malloc call is being invoked by https://searchfox.org/mozilla-central/rev/34ba7989ae53b9112d543ce39d350b79dc2a16b4/toolkit/crashreporter/breakpad-client/mac/handler/minidump_generator.cc#2051, which is attempting to create a std::vector while threads are suspended.

Set release status flags based on info from the regressing bug 1833680

:smichaud, since you are the author of the regressor, bug 1833680, could you take a look?

For more information, please visit BugBot documentation.

Flags: needinfo?(smichaud)

I can't for at least a week. I'm in the hospital, waiting for my right foot to heal enough for me to be able to walk on it. It was badly infected, and I had an operation. The prognosis is good, but I'm away from all my stuff until I get home. Someone else needs to take this bug if you want quicker action.

Flags: needinfo?(smichaud)

Gabriele, can we please get Priority/Severity settings applied to this report?

Flags: needinfo?(gsvelto)
Severity: -- → S3
Flags: needinfo?(gsvelto)

While investigating this I found at least another issue. We're doing an allocation here which we shouldn't be doing either; at least not with the regular allocator.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED

This removes all the memory allocations which we were doing while writing a
minidump of the crashed process from within the process itself. Under these
conditions all threads are stopped save for the minidump writer; so if another
thread owns a mutex guarding the memory allocator we could deadlock by doing
allocations.

This patch avoids allocating strings entirely and uses Breakpad's alternative
allocator to sidestep the issue where we cannot avoid allocations.

Set release status flags based on info from the regressing bug 1833680

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eeb994f89086
Prevent a deadlock while doing in-process minidump generation r=spohl
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch

The patch landed in nightly and beta is affected.
:gsvelto, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox124 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(gsvelto)
Flags: needinfo?(gsvelto)

Comment on attachment 9380398 [details]
Bug 1878428 - Prevent a deadlock while doing in-process minidump generation r=spohl

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is an issue which could mask stability issues and prevent the user from reporting them, unfortunately this is a regression which we introduced directly into 115 so it's always been in ESR.
  • User impact if declined: This can lead to Firefox hanging completely in response to a crash.
  • Fix Landed on Version: 125
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): We only change the memory allocator used when retrieving some data from a crashed browser process, this code doesn't affect regular behavior.
Attachment #9380398 - Flags: approval-mozilla-esr115?

Comment on attachment 9380398 [details]
Bug 1878428 - Prevent a deadlock while doing in-process minidump generation r=spohl

Approved for 115.10esr.

Attachment #9380398 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: