Closed Bug 582466 Opened 9 years ago Closed 9 years ago
Crash in [@ pthread
_mutex _lock] called by mozilla::Time Stamp::Now on shutdown on OS X
I get this crash everytime I shutdown. http://crash-stats.mozilla.com/report/index/bp-b2bc4edb-61c7-44b9-bfe6-456f62100727
This is fallout from bug 558306. Looks like the timer thread is still running after TimeStamp::Shutdown has been called. So our lock is null, and we crash. But Shutdown is a static object's destructor; I'm a little surprised we call it before joining the TimerThread... Chris, any idea what's up with that?
blocking2.0: --- → ?
Component: General → XPCOM
QA Contact: general → xpcom
I don't see any code calling TimerThread::Shutdown(), where the join is supposed to happen.
Scratch that, I found its calling context in gdb. The nsThread::Shutdown code looks OK to me (assuming PR_JoinThread() is working correctly). I wonder if something is exit()ing the main thread, or perhaps raising an exception? That might cause static dtors to be run without joining the timer thread. I'm not sure how implicit joins are ordered wrt other at-exit() code on Mac. Jeff, it would help a lot if you could check whether we get to TimerThread::Shutdown().
cjones: if you load crash stats you can click to see other threads, Thread 0 Frame Module Signature [Expand] Source 0 XUL __tcf_2 js/src/xpconnect/wrappers/FilteringWrapper.cpp:149 1 libSystem.B.dylib __cxa_finalize 2 libSystem.B.dylib exit 3 AppKit -[NSApplication terminate:] 4 AppKit -[NSApplication _terminateSendShould:] 5 CoreFoundation __invoking___ 6 CoreFoundation -[NSInvocation invoke] 7 Foundation __NSFireTimer 8 CoreFoundation CFRunLoopRunSpecific 9 CoreFoundation CFRunLoopRunInMode 10 Foundation -[NSRunLoop runMode:beforeDate:] 11 XUL nsAppShell::ProcessNextNativeEvent widget/src/cocoa/nsAppShell.mm:665 12 XUL nsBaseAppShell::OnProcessNextEvent widget/src/xpwidgets/nsBaseAppShell.cpp:156 13 XUL nsAppShell::OnProcessNextEvent widget/src/cocoa/nsAppShell.mm:834 14 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:517 15 XUL NS_ProcessPendingEvents_P nsThreadUtils.cpp:200 16 XUL nsBaseAppShell::NativeEventCallback widget/src/xpwidgets/nsBaseAppShell.cpp:126 17 XUL nsAppShell::ProcessGeckoEvents widget/src/cocoa/nsAppShell.mm:394 18 CoreFoundation CFRunLoopRunSpecific 19 CoreFoundation CFRunLoopRunInMode 20 HIToolbox RunCurrentEventLoopInMode 21 HIToolbox ReceiveNextEventCommon 22 HIToolbox BlockUntilNextEventMatchingListInMode 23 AppKit _DPSNextEvent 24 AppKit -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 25 AppKit -[NSApplication run] 26 XUL nsAppShell::Run widget/src/cocoa/nsAppShell.mm:747 27 XUL nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:191 28 XUL XRE_main toolkit/xre/nsAppRunner.cpp:3625 29 firefox-bin main browser/app/nsBrowserApp.cpp:158 30 firefox-bin firefox-bin@0xbf5 31 @0x1 indeed we were being force quit. jeff: please provide steps to reproduce, namely how precisely do you "quit"? the use of the mouse, keyboard, menus, context menus, dock are quite significant.
Doy, I knew that. This doesn't look good; sez Apple ----- terminate: Terminates the receiver. - (void)terminate:(id)sender [snip] Discussion This method is typically invoked when the user chooses Quit or Exit from the application’s menu. When invoked, this method performs several steps to process the termination request. First, it asks the application’s document controller (if one exists) to save any unsaved changes in its documents. During this process, the document controller can cancel termination in response to input from the user. If the document controller does not cancel the operation, this method then calls the delegate’s applicationShouldTerminate: method. If applicationShouldTerminate: returns NSTerminateCancel, the termination process is aborted and control is handed back to the main event loop. If the method returns NSTerminateLater, the application runs its run loop in the NSModalPanelRunLoopMode mode until the replyToApplicationShouldTerminate: method is called with the value YES or NO. If the applicationShouldTerminate: method returns NSTerminateNow, this method posts a NSApplicationWillTerminateNotification notification to the default notification center. Do not bother to put final cleanup code in your application’s main() function—it will never be executed. If cleanup is necessary, perform that cleanup in the delegate’s applicationWillTerminate: method. ----- Maybe we should try to play games with NSTerminateCancel followed by our own shutdown code? Note that any "correct" fix will regress Tshutdown. The other option is just to go to back to leaking the timer lock. We were apparently doing this before Vlad changed the timer shutdown code to run from a static dtor.
fwiw, we have a dozen other bugs talking about this problem in case you want to read them, they're a trail of tears. what it basically amounts to is: * even though we knew we could not use static xpcom things, we really can't use static xpcom things or really static any-things
I haven't been able to reproduce this after I needed to
-[NSApplication terminate:] should only get called if you quit from the Dock menu. Is that what you were doing, Jeff?
Also, could you have been using non-standard settings? Particularly timer-related settings?
> Is that what you were doing, Jeff? My guess is the answer is no. Others have been seeing stacks with the following crucial part in them (including with Safari): 3 AppKit -[NSApplication terminate:] 4 AppKit -[NSApplication _terminateSendShould:] 5 CoreFoundation __invoking___ 6 CoreFoundation -[NSInvocation invoke] 7 Foundation __NSFireTimer 8 CoreFoundation __CFRunLoopRun 9 CoreFoundation CFRunLoopRunSpecific 10 CoreFoundation CFRunLoopRunInMode http://stackoverflow.com/questions/1094108/help-with-a-mac-crash-log-what-causes-nsapplication-terminatesendshould http://forums.macosxhints.com/archive/index.php/t-52129.html My hunch is this is an Apple bug -- a very old one.
Needless to say, there's probably no way we can work around this until/unless we can reproduce it.
One more thing, Jeff. If you ever see this again (even just once), look in the Console for any interesting error messages (particularly any Objective-C exceptions).
(Following up comment #10) Forget comment #10. The stack chunk I quoted appears even during "normal" uses of '-[NSApplication terminate:]'. I found this by breaking (in gdb) on '-[NSApplication terminate:]', then quitting from the Dock menu. One should test before one comments :-( I'd still like to know, Jeff, if the crashes you saw happened when you quit from the Dock menu.
Hi, hope this helps (if not please delete). I had this crash yesterday. http://crash-stats.mozilla.com/report/index/bp-3247721c-1c2d-4f08-bdd9-42f0f2100819 I shut down my MBP, when a message appeared saying Shutdown had been stopped by application Crash Reporter. Had 2 apptabs Gmail and a forum, normal tab google reader. About:support info @ http://pastebin.com/yUcQ5Fu2 Please email me if need more info.
(In reply to comment #14) The information you provide could be helpful. Thanks! By the way, I forgot to mention above that shutting down your computer (or logging out from it) while Firefox is running is another way (besides quitting from the Dock menu) to trigger a call to -[NSApplication terminate:].
I've just found a close to 100% reliable way to reproduce this bug in a recent Minefield nightly (on OS X 10.5.8). You also crash in FF 3.6.8, but Breakpad usually doesn't come up. 1) Start Minefield. 2) Choose Bookmarks : Bookmarks Toolbar : Latest Headlines. 3) Scroll all the way to the bottom of the Latest Headlines menu and choose "Open All in Tabs". (Doing this currently opens 77 tabs.) 4) Choose Quit from the Dock menu. After 5-10 seconds you'll crash and Breakpad will come up. Here are a couple of my crash reports: bp-8a815fad-85bb-4f02-8c77-3d7462100819 bp-0ed55c01-a762-4670-b980-697902100819
Assignee: nobody → smichaud
Status: NEW → ASSIGNED
Summary: Crash in [@ pthread_mutex_lock] called by mozilla::TimeStamp::Now on shutdown → Crash in [@ pthread_mutex_lock] called by mozilla::TimeStamp::Now on shutdown on OS X
Apparently NSApplication:terminate calls exit(). Does it shut down XPCOM first? Assuming that XPCOM was shut down correctly, that thread should not be running. If XPCOM was not shut down, that's the obvious problem here, and we should make sure it *is* shut down. Why can't we just unwind the stack normally, instead of calling exit() from within a deeply nested event handler?!
Assignee: smichaud → nobody
Component: XPCOM → Widget: Cocoa
QA Contact: xpcom → cocoa
The general problem (the difficulty of making [NSApplication terminate:] work together with XPCOM shutdown) is very old, and has come up several times before -- notably in bug 384284. The difficulty is that there's currently no solution to the general problem. So whenever this has come up before, we've been forced to hack around it ... or (if the bug wasn't bad enough or didn't happen too often) just to throw our hands up. The first thing that needs to be understood is that we don't call [NSApplication terminate:] -- the OS does (when someone quits from the Dock menu or shuts down (or logs out from) their computer while Firefox is running). So we have to somehow handle the call. And we can't simply ignore it -- otherwise (for example) quitting from the Dock menu wouldn't work at all. Calls (from the OS) to [NSApplication terminate:] are currently handled by code in toolkit/xre/MacApplicationDelegate.mm -- specifically in [MacApplicationDelegate applicationShouldTerminate:]. We currently (under certain circumstances) allow the user to abort the termination (by returning NSTerminateCancel), or we allow it to proceed (by first calling nsIAppStartup::Quit(nsIAppStartup::eForceQuit) and then returning NSTerminateNow). But XPCOM isn't completely (or "completely enough") shut down by the time nsIAppStartup::Quit() returns, and this is the root cause of all our troubles. To truly fix this problem we need either of the following: 1) A synchronous counterpart/variant of nsIAppStartup::Quit() 2) A callback that tells us the browser has finished shutting down (from which we could call [NSApplication replyToApplicationShouldTerminate:YES])
The previous comment oversimplifies one thing -- the OS calls various handlers (like applicationShouldTerminate:) *before* it calls [NSApplication terminate:]. What these handlers return determines when/if the OS calls [NSApplication terminate:]. By the time [NSApplication terminate:] actually is called, there's nothing we can do.
> 1) A synchronous counterpart/variant of nsIAppStartup::Quit() > > 2) A callback that tells us the browser has finished shutting down > (from which we could call [NSApplication > replyToApplicationShouldTerminate:YES]) The callback would need to be called on the main thread.
nsIAppStartup quit doesn't and can't synchronously shut down. It signals Firefox that it should shut down. What happens if applicationShouldTerminate returns NSTerminateLater? The goal here is for us to exit back out to the main event loop, which terminates when nsIAppStartup.Quit is called, and exit back out through XRE_main.
> What happens if applicationShouldTerminate returns NSTerminateLater? Then (presumably) the OS spins its event loops, waiting for us to call [NSApplication replyToApplicationTerminate:]. If we could get a callback from Gecko telling us XPCOM had finished shutting down (or was "close enough" to shutting down), we could then call [NSApplication replyToApplicationShouldTerminate:YES].
Could we just quit (by exiting from main) and avoid calling replyToApplicationTerminate at all?
> Could we just quit (by exiting from main) and avoid calling > replyToApplicationTerminate at all? I doubt it. And why should we? At the same point we'd be calling exit() we can call [NSApplication replyToApplicationShouldTerminate:YES], or call a callback which calls it.
>> Could we just quit (by exiting from main) and avoid calling >> replyToApplicationTerminate at all? > > I doubt it. I suspect this would lead to OS-level things not being cleaned up properly.
Why do you think that? Exiting from main is the normal way, and it doesn't show any obvious problems. Let's try it?
> Let's try it? If it does cause problems, they may not be obvious right away. What's your objection to the callback?
And needless to say, the OS may have saved up a bunch of tasks to run on the call to [NSApplication replyToApplicationShouldTerminate:YES], but not on exit().
> What's your objection to the callback? Are you worried that some threads may still be running after we exit from main()?
Not threads, no. But if calling replyToApplicationShouldTerminate could re-enter our code in any way, we'd be screwed. Unless we're absolutely certain it cannot, I think we're safer never calling it (and I really doubt that MacOS isn't perfectly capable of dealing with applications that exit: after all, they exit or crash on a regular basis).
OK, I just tried changing every occurrence of NSTerminateNow in [MacApplicationDelegate applicationShouldTerminate:] to NSTerminateLater. It doesn't work -- the browser hangs when you quit from the Dock menu. > But if calling replyToApplicationShouldTerminate could re-enter our > code in any way, we'd be screwed. It'd be easy enough to guarantee that no Gecko calls will be made. All we'd need to do is set a global variable (say in MacApplicationDelegate.mm) indicating whether or not we'd returned NSTerminateLater from applicationShouldTerminate:, then call replyToApplicationShouldTerminate:YES from the callback if this variable had been set.
> and I really doubt that MacOS isn't perfectly capable of dealing > with applications that exit: after all, they exit or crash on a > regular basis Under normal circumstances, sure. In fact this is how I assume we've been quitting Firefox from the main menu (or using Cmd+Q) for years. What's not normal is doing this after the OS has started taking steps to call [NSApplication terminate:].
>> But if calling replyToApplicationShouldTerminate could re-enter our >> code in any way, we'd be screwed. > > It'd be easy enough to guarantee that no Gecko calls will be made. > All we'd need to do is set a global variable (say in > MacApplicationDelegate.mm) indicating whether or not we'd returned > NSTerminateLater from applicationShouldTerminate:, then call > replyToApplicationShouldTerminate:YES from the callback if this > variable had been set. As for the call to replyToApplicationShouldTerminate *itself* causing our code to be re-entered -- yes, we'd need to be careful in nsAppShell's applicationWillTerminate: handler. For example, we could no longer call nsBaseAppShell::Exit() from there. But I'm quite sure we could make this handler work without ever calling Gecko.
At this point I'd appreciate someone's help: I have only the vaguest idea where the "XPCOM is now destroyed" callback should be called from. Please make suggestions.
anyone have suggestions? looks like this would fix the #7 top crash in 4.0b5. more reports at http://crash-stats.mozilla.com/report/list?signature=pthread_mutex_lock
> anyone have suggestions? I do, of course (the second alternative from comment #18): > 2) A callback that tells us the browser has finished shutting down > (from which we could call [NSApplication > replyToApplicationShouldTerminate:YES]) And in fact I suspect this is the only way we can fix the general problem (to make [NSApplication terminate:] get along with FF's shutdown procedure). But I haven't yet had a chance to start work on it. And, because implementing my suggestion may be somewhat complicated, we'd need to land it soon (in the next beta or the one after that), to reduce the chances that it wouldn't completely fix the problem. Josh and I would like to postpone dealing with this bug until after the 4.0 release (possibly in a 4.1 or 4.5 release). Yes, this is a 4.0b5 topcrasher (#7 on all platforms and #2 on the Mac). But it isn't entirely random -- it only happens when quitting from the Dock menu or shutting down (or logging out from) the OS while FF is running, and then only when you have lots of tabs open. So we could relnote it. And, because it only happens on quitting, nobody will lose work because of it. So I think it'd be possible to postpone dealing with this until FF 4.1 or 4.5.
(In reply to comment #36) > Josh and I would like to postpone dealing with this bug until after > the 4.0 release (possibly in a 4.1 or 4.5 release). When we discussed this I didn't know it was so high up in our top crash list, I thought the number of crashes was very small. In any case, after reading this, I don't think we should wait until after Firefox 4 to fix this.
yeah, something has made this worse in 4.0 checking --- pthread_mutex_lock 20100930-crashdata.csv found in: 4.0b6 3.6.10 4.0b4 4.0b3 3.6.8 3.5.13 4.0b5 3.6.6 3.6.3 3.6.9 4.0b7pre 3.6 3.6.7 3.5.9 3.5.7 3.5.5 3.5.1 3.0.19 3.0.1 3.0 release total-crashes pthread_mutex_lock crashes pct. all 336929 511 0.00151664 4.0b6 23602 366 0.0155072 3.6.10 195929 62 0.000316441 4.0b4 2075 16 0.00771084 4.0b3 1089 14 0.0128558 3.6.8 20011 13 0.000649643 3.5.13 16321 7 0.000428895 4.0b5 2759 5 0.00181225 3.6.6 5965 5 0.000838223 3.6.3 11144 5 0.000448672 3.6.9 4111 4 0.000972999 4.0b7pre2278 3 0.00131694
does the risk level of the possible changes here warrant changing Final+ to BetaN+ ? e.g. does it need a beta to shake out side effect?
> does the risk level of the possible changes here warrant changing Final+ to > BetaN+ ? e.g. does it need a beta to shake out side effect? Probably. If we're going to fix this in FF 4, this should be my highest priority (among Mozilla bugs).
also saw some feedback from beta Input and crash data that some users hitting this signature are doing so when "...Switching to tab groups view and it crashed" like: http://crash-stats.mozilla.com/report/index/37aa9157-5fe3-45bb-9910-527022101001 Firefox 4.0b6 Mac OS X 10.6.4 --bugs 452318,564089,582466 would that kind of action be a different bug?
> would that kind of action be a different bug? Probably not.
"...Switching to tab groups view, and switching between tab groups" might be a good test case for this bug then.
>> would that kind of action be a different bug? > > Probably not. That stack also has [NSApplication terminate:] on the main thread, so something must have happened to trigger an OS-forced shutdown sequence. I know that quitting from the Dock menu and logging out (or shutting down the OS) will trigger this. But I wouldn't be surprised if there are also other ways for it to happen. > "...Switching to tab groups view, and switching between tab groups" > might be a good test case for this bug then. I've already got a very good way to reproduce this bug from comment #16.
My callback idea didn't work out: Even with a callback, OS-initiated termination still ends up aborting the browser during the call to nsAppShell::Run() (more specifically during the call to [NSApp run] inside nsAppShell::Run()). This means that the call to nsAppStartup::Run() in XRE_Main() never returns, which in turn means that ScopedXPCOMStartup's destructor and NS_ShutdownXPCOM() never get called. So for a while all hope seemed lost. But then I discovered that (during OS-initiated termination) [NSApplication terminate:] doesn't get called until after [MacApplicationDelegate applicationShouldTerminate:] returns, and [NSApplication terminate:] doesn't do anything we need except to post an NSApplicationWillTerminateNotification to everyone (including ourselves) that's registered for it. Which leads to a very simple, elegant and robust solution -- hook [NSApplication terminate:] and replace it with a method that does exactly (and only) what we need! OS-initiated termination works this way ([NSApplication terminate:] is called after the OS posts NSApplicationShouldTerminateNotification) on all three OSes I tested (10.4.11, 10.5.8 and 10.6.4), in both 32-bit and 64-bit mode. So it seems very likely it will continue to work this way even in future major OS X releases (and almost certainly in future minor releases of OS X 10.6). I really regret I didn't think of this earlier :-( A tryserver build will follow in a few hours.
Attachment #483340 - Flags: review?(joshmoz)
Here's a tryserver build made with my patch from comment #45: http://firstname.lastname@example.org/tryserver-macosx64/firefox-4.0b8pre.en-US.mac64.dmg Please try it out! There were no non-spurious failures during any of the tryserver tests.
Comment on attachment 483340 [details] [diff] [review] Fix Landed on trunk: http://hg.mozilla.org/mozilla-central/rev/00e90d9720c8
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Just to pre-empt any doubts about the following failures in mochitests-1/5: test_bug557087-3.html | element should be focused - got [object HTMLBodyElement], expected [object HTMLButtonElement] (and so forth) These failures happen on both OS X (http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1287161410.1287163611.22993.gz) and Linux (http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1287161977.1287162653.19127.gz). So it can't have been caused by my patch (which is OS-X-only).
I've just realized that my patch will break Cocoa embedders (like Camino) on the trunk ... if there *are* any Cocoa embedders on the trunk. Smokey, does Camino have any builds from trunk code? The breakage is fairly major, though not critical -- the NSApplicationShouldTerminateNotification will no longer be sent, which means (for example) that the user will no longer be able to cancel a Quit when more than one tab is open. I already have a patch that should fix this problem (by not replacing the OS's [NSApplication terminate:] method when running in a Cocoa embedder). But first I'd like to have an idea of how urgent this problem is. (By the way, the underlying reason for the breakage is that Cocoa embedders call [NSApplication terminate:] themselves when they want to quit, and it behaves differently in this case (as opposed to when the OS calls it as part of a shutdown sequence initiated (say) from the Dock menu).)
> I already have a patch that should fix this problem (by not > replacing the OS's [NSApplication terminate:] method when running in > a Cocoa embedder). But first I'd like to have an idea of how urgent > this problem is. It'd also be nice to have a Camino trunk build to test my new patch in.
(In reply to comment #49) > I've just realized that my patch will break Cocoa embedders (like > Camino) on the trunk ... if there *are* any Cocoa embedders on the > trunk. > > Smokey, does Camino have any builds from trunk code? bsmedberg, maybe you could field this question? This kind of question in bugzilla is getting pretty awkward for us.
I've opened a new bug on the breakage in Cocoa embedders -- bug 604901.
You need to log in before you can comment on or make changes to this bug.