Closed
Bug 1457501
Opened 6 years ago
Closed 6 years ago
Mac Crash deadlock triggered by CrashReporter::GetFlatThreadAnnotation() lock acquisition
Categories
(Toolkit :: Crash Reporting, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: haik, Assigned: haik)
References
(Blocks 1 open bug)
Details
Attachments
(3 files)
While running the test testing/marionette/harness/marionette_harness/tests/unit/test_crash.py with a change to make the test wait indefinitely for the browser to cleanly exit (see attached), I hit test hangs where Firefox has exited but a content process is stuck in a hung state. This hang is a deadlock similar to bug 1395504, but in a breakpad callback. The problem is that in CrashReporter::GetFlatThreadAnnotation() which is run from the exception handler thread when all other threads have been suspended, we acquire a lock. Acquiring the lock triggers a memory allocation which blocks on a memory allocator lock. The lock is held by another thread which is suspended, hence the deadlock. And without the allocation, we could still deadlock if another thread held the mutex. So we shouldn't try to get the lock in exception context on Mac. This bug is specific to Mac because Windows and Linux implementations of breakpad work differently and don't suspend all threads while still trying to run in the child.
Here are some stacks from an instance of the problem.
Exception handler thread, blocked trying to allocate memory:
thread #5
frame #0 libsystem_kernel.dylib`syscall_thread_switch()
frame #1 libsystem_platform.dylib`_OSSpinLockLockYield()
frame #2 libmozglue.dylib`Mutex::Lock() at Mutex.h:66
frame #3 libmozglue.dylib`AutoLock<Mutex>::AutoLock() at Mutex.h:129
frame #4 libmozglue.dylib`AutoLock<Mutex>::AutoLock() at Mutex.h:127
frame #5 libmozglue.dylib`arena_t::MallocSmall() at mozjemalloc.cpp:2940
frame #6 libmozglue.dylib`arena_t::Malloc() at mozjemalloc.cpp:3000
frame #7 libmozglue.dylib`BaseAllocator::calloc() at mozjemalloc.cpp:4179
frame #8 libmozglue.dylib`Allocator<MozJemallocBase>::calloc() at malloc_decls.h:38
frame #9 libmozglue.dylib`Allocator<ReplaceMallocBase>::calloc() at malloc_decls.h:38
frame #10 libmozglue.dylib`::calloc(size_t, size_t)() at malloc_decls.h:38
frame #11 libnss3.dylib`PR_Calloc() at prmem.c:443
frame #12 libnss3.dylib`pt_AttachThread() at ptthread.c:265
frame #13 libnss3.dylib`PR_GetCurrentThread() at ptthread.c:622
frame #14 libnss3.dylib`PR_GetThreadPrivate() at prtpd.c:204
frame #15 XUL`mozilla::BlockingResourceBase::ResourceChainFront() at BlockingResourceBase.h:157
frame #16 XUL`mozilla::BlockingResourceBase::CheckAcquire() at BlockingResourceBase.cpp:279
frame #17 XUL`mozilla::OffTheBooksMutex::Lock() at BlockingResourceBase.cpp:384
frame #18 XUL`mozilla::StaticMutex::Lock() at StaticMutex.h:44
frame #19 XUL`mozilla::BaseAutoLock<mozilla::StaticMutex>::BaseAutoLock() at Mutex.h:166
frame #20 XUL`mozilla::BaseAutoLock<mozilla::StaticMutex>::BaseAutoLock() at Mutex.h:163
frame #21 XUL`CrashReporter::GetFlatThreadAnnotation() at ThreadAnnotation.cpp:235
frame #22 XUL`CrashReporter::PrepareChildExceptionTimeAnnotations() at nsExceptionHandler.cpp:1300
frame #23 XUL`CrashReporter::ChildFilter() at nsExceptionHandler.cpp:1426
frame #24 XUL`google_breakpad::ExceptionHandler::WriteMinidumpWithException() at exception_handler.cc:373
frame #25 XUL`google_breakpad::ExceptionHandler::WaitForMessage() at exception_handler.cc:573
frame #26 libsystem_pthread.dylib`_pthread_body()
frame #27 libsystem_pthread.dylib`_pthread_start()
frame #28 libsystem_pthread.dylib`thread_start()
Main thread, suspended by exception handler, holding allocator lock:
thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0 libmozglue.dylib`RedBlackTree<arena_chunk_map_t, ArenaRunTreeTrait>::TreeNode::Left() at rb.h:172
frame #1 libmozglue.dylib`RedBlackTree<arena_chunk_map_t, ArenaRunTreeTrait>::Remove() at rb.h:577
frame #2 libmozglue.dylib`RedBlackTree<arena_chunk_map_t, ArenaRunTreeTrait>::Remove() at rb.h:144
frame #3 libmozglue.dylib`arena_t::GetNonFullBinRun() at mozjemalloc.cpp:2790
frame #4 libmozglue.dylib`arena_t::MallocSmall() at mozjemalloc.cpp:2943
frame #5 libmozglue.dylib`arena_t::Malloc() at mozjemalloc.cpp:3000
frame #6 libmozglue.dylib`BaseAllocator::calloc() at mozjemalloc.cpp:4179
frame #7 libmozglue.dylib`Allocator<MozJemallocBase>::calloc() at malloc_decls.h:38
frame #8 libmozglue.dylib`Allocator<ReplaceMallocBase>::calloc() at malloc_decls.h:38
frame #9 libmozglue.dylib`::calloc(size_t, size_t)() at malloc_decls.h:38
frame #10 libmozglue.dylib`zone_calloc() at zone.c:143
frame #11 libsystem_malloc.dylib`malloc_zone_calloc()
frame #12 libsystem_malloc.dylib`calloc()
frame #13 ColorSync`CMMMemMgr::New()
frame #14 ColorSync`CMMBase::NewInternal()
frame #15 ColorSync`CMMProfile::MakeTag()
frame #16 ColorSync`CMMProfile::GetTag()
frame #17 ColorSync`CMMProfile::Usable()
frame #18 ColorSync`DoValidateProfile()
frame #19 ColorSync`AppleCMMValidateProfile()
frame #20 ColorSync`create()
frame #21 ColorSync`ColorSyncVerifySRGBData()
frame #22 CoreGraphics`CGColorSpaceCreateWithICCData()
frame #23 SkyLight`CGSColorSpaceRegistryCopyColorSpace()
frame #24 SkyLight`SLSCopyDisplayColorSpace()
frame #25 AppKit`-[NSCGSDisplay initWithDisplayID:flipOffset:]()
frame #26 AppKit`___NSCGSCreateDisplaysFromDisplayIDsUsingPredicate_block_invoke()
frame #27 AppKit`_NSCGSCreateArrayUsingBlock()
frame #28 AppKit`+[NSCGSDisplay uniqueDisplays]()
frame #29 AppKit`_NSScreenConfigurationUpdateSharedInfoForReason()
frame #30 AppKit`___NSScreenConfigurationEnsureInitialUpdateOccurred_block_invoke()
frame #31 libdispatch.dylib`_dispatch_client_callout()
frame #32 libdispatch.dylib`dispatch_once_f()
frame #33 AppKit`-[NSApplication(ScreenHandling) _registerForDisplayChangedNotifications]()
frame #34 AppKit`-[NSApplication init]()
frame #35 AppKit`+[NSApplication sharedApplication]()
frame #36 XUL`nsAppShell::Init() at nsAppShell.mm:314
frame #37 XUL`nsAppShellInit() at nsAppShellSingleton.h:45
frame #38 XUL`nsComponentManagerImpl::KnownModule::Load() at nsComponentManager.cpp:726
frame #39 XUL`nsFactoryEntry::GetFactory() at nsComponentManager.cpp:1748
frame #40 XUL`nsComponentManagerImpl::CreateInstance() at nsComponentManager.cpp:964
frame #41 XUL`nsComponentManagerImpl::GetService() at nsComponentManager.cpp:1209
frame #42 XUL`CallGetService() at nsComponentManagerUtils.cpp:56
frame #43 XUL`nsGetServiceByCID::operator()() at nsComponentManagerUtils.cpp:253
frame #44 XUL`nsCOMPtr<nsIAppShell>::assign_from_gs_cid() at nsCOMPtr.h:1198
frame #45 XUL`nsCOMPtr<nsIAppShell>::nsCOMPtr() at nsCOMPtr.h:559
frame #46 XUL`nsCOMPtr<nsIAppShell>::nsCOMPtr() at nsCOMPtr.h:556
frame #47 XUL`XRE_RunAppShell() at nsEmbedFunctions.cpp:850
frame #48 XUL`mozilla::ipc::MessagePumpForChildProcess::Run() at MessagePump.cpp:269
frame #49 XUL`MessageLoop::RunInternal() at message_loop.cc:326
frame #50 XUL`MessageLoop::RunHandler() at message_loop.cc:319
frame #51 XUL`MessageLoop::Run() at message_loop.cc:299
frame #52 XUL`XRE_InitChildProcess() at nsEmbedFunctions.cpp:719
frame #53 XUL`mozilla::BootstrapImpl::XRE_InitChildProcess() at Bootstrap.cpp:69
frame #54 plugin-container`content_process_main() at plugin-container.cpp:50
frame #55 plugin-container`main() at MozillaRuntimeMain.cpp:25
frame #56 plugin-container`start()
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → haftandilian
Priority: -- → P1
Assignee | ||
Comment 1•6 years ago
|
||
Comment 2•6 years ago
|
||
More specifically: it's not safe to malloc in the exception handler, because one of the things that can cause a crash is heap corruption, so trying to malloc could cause us to crash again! It's unfortunate that `PR_GetCurrentThread` can malloc.
I wonder if we could change BlockingResourceBase to use a `static MOZ_THREAD_LOCAL(BlockingResourceBase*) sResourceAcqnChainFront` member instead of `PR_{Set,Get}ThreadPrivate`?
That would involve changing this:
https://dxr.mozilla.org/mozilla-central/rev/63a0e2f626febb98d87d2543955ab99a653654ff/xpcom/threads/BlockingResourceBase.h#297
To look more like this:
https://searchfox.org/mozilla-central/rev/78dbe34925f04975f16cb9a5d4938be714d41897/dom/script/ScriptSettings.cpp#29
and replacing the `PR_{Get,Set}CurrentThread` usage with `sResourceAcqnChainFront.{get,set}`.
Comment 3•6 years ago
|
||
I'm not super familiar with how our thread local stuff works, but we're using `MOZ_THREAD_LOCAL` in plenty of other places and it seems like the thing we need here.
Assignee | ||
Comment 4•6 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> I'm not super familiar with how our thread local stuff works, but we're
> using `MOZ_THREAD_LOCAL` in plenty of other places and it seems like the
> thing we need here.
Even if acquiring the StaticMutex didn't trigger allocations, it's still not safe to do in exception context because a suspended thread may hold the lock. That would cause deadlock because the exception thread would be waiting for a suspended thread to release a lock which would never happen.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 9•6 years ago
|
||
mozreview-review |
Comment on attachment 8971644 [details]
Bug 1457501 - Part 1 - Mac Crash deadlock triggered by CrashReporter::GetFlatThreadAnnotation() lock acquisition
https://reviewboard.mozilla.org/r/240414/#review246608
Looks good to me. While I don't like seeing more platform-specific code being added to these code paths I don't see an alternative method of fixing this.
Attachment #8971644 -
Flags: review?(gsvelto) → review+
Comment 10•6 years ago
|
||
mozreview-review |
Comment on attachment 8971645 [details]
Bug 1457501 - Part 2 - Remove unused type DeleteWithLock
https://reviewboard.mozilla.org/r/240416/#review246610
Nice catch, the code that used this was backed out and never reintroduced. LGTM.
Attachment #8971645 -
Flags: review?(gsvelto) → review+
Comment 11•6 years ago
|
||
Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bba71ee5fb46
Part 1 - Mac Crash deadlock triggered by CrashReporter::GetFlatThreadAnnotation() lock acquisition r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/911c930bc055
Part 2 - Remove unused type DeleteWithLock r=gsvelto
Comment 12•6 years ago
|
||
Backed out 2 changesets (bug 1457501) for build bustage. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer.html#?job_id=176362508&repo=autoland&lineNumber=24665
[task 2018-04-30T22:45:34.310Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:82:8: error: 'MacCrashReporterLock' does not name a type
[task 2018-04-30T22:45:34.310Z] 22:45:34 INFO - static MacCrashReporterLock sMutex;
[task 2018-04-30T22:45:34.310Z] 22:45:34 INFO - ^~~~~~~~~~~~~~~~~~~~
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp: In function 'void CrashReporter::{anonymous}::ThreadLocalDestructor(void*)':
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:210:30: error: 'sMutex' was not declared in this scope
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - CrashReporterAutoLock lock(sMutex);
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - ^~~~~~
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp: In destructor 'CrashReporter::{anonymous}::ThreadAnnotationSpan::~ThreadAnnotationSpan()':
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:222:3: error: 'sMutex' was not declared in this scope
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - sMutex.AssertCurrentThreadOwns();
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - ^~~~~~
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp: In function 'void CrashReporter::InitThreadAnnotation()':
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:233:30: error: 'sMutex' was not declared in this scope
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - CrashReporterAutoLock lock(sMutex);
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - ^~~~~~
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp: In function 'void CrashReporter::SetCurrentThreadName(const char*)':
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:258:30: error: 'sMutex' was not declared in this scope
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - CrashReporterAutoLock lock(sMutex);
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - ^~~~~~
[task 2018-04-30T22:45:34.311Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp: In function 'void CrashReporter::GetFlatThreadAnnotation(const std::function<void(const char*)>&, bool)':
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:294:5: error: 'sMutex' was not declared in this scope
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - sMutex.Lock();
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - ^~~~~~
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:305:5: error: 'sMutex' was not declared in this scope
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - sMutex.Unlock();
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - ^~~~~~
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp: In function 'void CrashReporter::ShutdownThreadAnnotation()':
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - /builds/worker/workspace/build/src/toolkit/crashreporter/ThreadAnnotation.cpp:311:30: error: 'sMutex' was not declared in this scope
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - CrashReporterAutoLock lock(sMutex);
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - ^~~~~~
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - At global scope:
[task 2018-04-30T22:45:34.312Z] 22:45:34 INFO - cc1plus: error: unrecognized command line option '-Wno-implicit-fallthrough' [-Werror]
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - cc1plus: all warnings being treated as errors
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - /builds/worker/workspace/build/src/config/rules.mk:1024: recipe for target 'Unified_cpp_crashreporter0.o' failed
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - make[4]: *** [Unified_cpp_crashreporter0.o] Error 1
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/toolkit/crashreporter'
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - /builds/worker/workspace/build/src/config/recurse.mk:73: recipe for target 'toolkit/crashreporter/target' failed
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - make[3]: *** [toolkit/crashreporter/target] Error 2
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - make[3]: *** Waiting for unfinished jobs....
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/media/mtransport/third_party/nICEr/nicer_nicer'
[task 2018-04-30T22:45:34.313Z] 22:45:34 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/media/mtransport/third_party/nICEr/nicer_nicer'
Push with fails:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=911c930bc0558545ef3b49294b81fa3ec0237b9a
Backout:
https://hg.mozilla.org/integration/autoland/rev/072c882dccce90b2b2b3e13442dab17752855947
Flags: needinfo?(haftandilian)
Assignee | ||
Comment 13•6 years ago
|
||
(In reply to Dorel Luca [:dluca] from comment #12)
> Backed out 2 changesets (bug 1457501) for build bustage. CLOSED TREE
Sorry, this fails to compile on Linux due to a typo. The fix is below. I'll fix this and test the build on Linux/Mac/Windows and then re-land.
$ hg diff
diff --git a/toolkit/crashreporter/ThreadAnnotation.cpp b/toolkit/crashreporter/ThreadAnnotation.cpp
--- a/toolkit/crashreporter/ThreadAnnotation.cpp
+++ b/toolkit/crashreporter/ThreadAnnotation.cpp
@@ -74,17 +74,17 @@ typedef mozilla::BaseAutoLock<MacCrashRe
typedef MacCrashReporterLock CrashReporterLockType;
#else /* !XP_MACOSX */
// Use StaticMutex for locking
typedef StaticMutexAutoLock CrashReporterAutoLock;
typedef StaticMutex CrashReporterLockType;
#endif /* XP_MACOSX */
// Protects access to sInitialized and sThreadAnnotations.
-static MacCrashReporterLock sMutex;
+static CrashReporterLockType sMutex;
Flags: needinfo?(haftandilian)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 16•6 years ago
|
||
hg error in cmd: hg pull upstream: pulling from https://hg.mozilla.org/integration/autoland
searching for changes
abort: HTTP Error 500: Internal Server Error
Comment 17•6 years ago
|
||
Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c287079aaf7f
Part 1 - Mac Crash deadlock triggered by CrashReporter::GetFlatThreadAnnotation() lock acquisition r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/e4216d889836
Part 2 - Remove unused type DeleteWithLock r=gsvelto
Comment 18•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c287079aaf7f
https://hg.mozilla.org/mozilla-central/rev/e4216d889836
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in
before you can comment on or make changes to this bug.
Description
•