Open Bug 1408629 Opened 7 years ago Updated 2 years ago

Crash in shutdownhang | libsystem_kernel.dylib@0x1be7e

Categories

(Core :: XPCOM, defect)

Unspecified
macOS
defect

Tracking

()

People

(Reporter: jseward, Unassigned)

References

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-a14e2057-5361-42fe-aea9-75d700171013.
=============================================================

This is topcrash #13 in the OSX nightly 20171012105833.
Flags: needinfo?(nfroyd)
Uh.  Usually I can tell when something else is hanging in shutdown crashes, but nothing obviously jumps out.  The two threads that look sort of suspicious are the crashing main thread, which is hanging during some sort of worker cleanup.  I guess it's waiting on thread 27:

0 	libsystem_kernel.dylib 	libsystem_kernel.dylib@0x1be7e 	
1 	libmozglue.dylib 	<name omitted> 	mozglue/misc/ConditionVariable_posix.cpp:118
2 	XUL 	mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) 	xpcom/threads/CondVar.h:68
3 	XUL 	(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() 	dom/workers/RuntimeService.cpp:2864
4 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1037
5 	XUL 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:524
6 	XUL 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:368
7 	XUL 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:326
8 	XUL 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:425
9 	libnss3.dylib 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:216

and that thread just isn't getting the message?  Did we just fail to poke that thread appropriately or something?  ni baku for ideas there.

The other thread that looks sort of plausible is thread 23:

0 	libsystem_kernel.dylib 	libsystem_kernel.dylib@0x12e76 	
1 	XUL 	google_breakpad::ReceivePort::WaitForMessage(google_breakpad::MachReceiveMessage*, unsigned int) 	toolkit/crashreporter/google-breakpad/src/common/mac/MachIPC.mm:249
2 	XUL 	google_breakpad::CrashGenerationServer::WaitForOneMessage() 	toolkit/crashreporter/breakpad-client/mac/crash_generation/crash_generation_server.cc:102
3 	XUL 	google_breakpad::CrashGenerationServer::WaitForMessages(void*) 	toolkit/crashreporter/breakpad-client/mac/crash_generation/crash_generation_server.cc:96
Ø 4 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x36c0 	
Ø 5 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x356c 	
Ø 6 	libsystem_pthread.dylib 	libsystem_pthread.dylib@0x2c5c 	
7 	XUL 	XUL@0x359ce2f

(failure to symbolicate that XUL symbol doesn't seem good...)  I have no idea what's going on there, ni to Ted if his Breakpad knowledge can generate some insight there.
Flags: needinfo?(ted)
Flags: needinfo?(nfroyd)
Flags: needinfo?(amarchesini)
For the record: the missing libsystem_kernel.dylib symbols are because this is macOS 10.13, and the process I have setup to scrape system symbols for macOS only handles Apple's update packages from their update servers, which doesn't include full major version updates like this apparently. I'll try to backfill those for sanity's sake, but they're not likely to be super interesting.

> The other thread that looks sort of plausible is thread 23:

That's the thread that waits for crash messages from child processes. It gets signaled and shutdown in `OOPDeinit` which gets called from `UnsetExceptionHandler` very late in shutdown. Shouldn't be a problem.

> (failure to symbolicate that XUL symbol doesn't seem good...)

I don't think it's a problem here, I think the stackwalker just walked off the end of the stack and found junk. Not having symbols for libsystem_pthread.dylib probably doesn't help.

I think your analysis is likely correct--there's a worker thread there that didn't get the message that it's supposed to shut down, and the main thread is hanging out waiting for it.
Flags: needinfo?(ted)
Andrew - we need an assessment of whether this bug should be a critical or not, and if :baku is too backed up, an alternate to investigate. Thanks!
Flags: needinfo?(overholt)
Looks like a null deref. The comments make me think this is QuotaManager-related because one mentions attempting to manually move their profile directory from one OS to another (PC->Mac so I doubt it's the .DS_STORE (?) problem Ehsan experienced) but others just mention it was a "restart firefox" situation (which obviously correlates with the shutdownhang summary here).

baku told me he'd take another look.
Flags: needinfo?(overholt)
(In reply to Andrew Overholt [:overholt] from comment #4)
> Looks like a null deref. The comments make me think this is
> QuotaManager-related because one mentions attempting to manually move their
> profile directory from one OS to another (PC->Mac so I doubt it's the
> .DS_STORE (?) problem Ehsan experienced) but others just mention it was a
> "restart firefox" situation (which obviously correlates with the
> shutdownhang summary here).

FTR: this is an intentional MOZ_CRASH triggered because shutdown took too long (the `shutdownhang` in the signature).
I recently landed a set of patches for bug 1405290. We should have more data about why workers block the shutdown.
Flags: needinfo?(amarchesini)
See Also: → 1405290
See Also: → 1437575
QA Whiteboard: qa-not-actionable

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: critical → S3
You need to log in before you can comment on or make changes to this bug.