Closed Bug 422827 Opened 14 years ago Closed 14 years ago
crash on quit [@ libobjc
.A .dylib@0x146e8][@ ns Menu X::Remove All()]
2.94 KB, text/plain
3.12 KB, text/plain
12.63 KB, patch
|Details | Diff | Splinter Review|
2.99 KB, text/plain
Our #22 topcrash (#6 on Mac) in beta 4 happens at libobjc.A.dylib@0x146e8, but more specifically, has nsMenuX::RemoveAll() fairly regularly in stacks. There might be other crashes at libobjc.A.dylib@0x146e8 intermixed with this, but comments indicate a lot of users are crashing on quit. Sample stack from bp-018175d9-f072-11dc-b290-001a4bd43e5c: Frame Signature Source 0 libobjc.A.dylib@0x146e8 1 AppKit@0x3574d 2 AppKit@0x353d0 3 AppKit@0x3640b 4 nsMenuX::RemoveAll() mozilla/widget/src/cocoa/nsMenuX.mm:414 5 nsMenuX::~nsMenuX() mozilla/widget/src/cocoa/nsMenuX.mm:111 6 nsMenuX::Release() mozilla/widget/src/cocoa/nsMenuX.mm:87 7 ReleaseObjects(void*, void*) nsCOMArray.cpp:151 8 nsVoidArray::EnumerateForwards(int (*)(void*, void*), void*) nsVoidArray.cpp:678 9 nsCOMArray_base::Clear() nsCOMArray.cpp:158 10 nsMenuBarX::~nsMenuBarX() mozilla/widget/src/cocoa/nsMenuBarX.mm:217 11 nsMenuBarX::Release() mozilla/widget/src/cocoa/nsMenuBarX.mm:67 12 nsCOMPtr_base::~nsCOMPtr_base() nsCOMPtr.cpp:81 13 nsCocoaWindow::~nsCocoaWindow() mozilla/widget/src/cocoa/nsCocoaWindow.mm:542 14 nsBaseWidget::Release() mozilla/widget/src/xpwidgets/nsBaseWidget.cpp:69 15 nsCOMPtr_base::assign_with_AddRef(nsISupports*) nsCOMPtr.cpp:531 16 nsXULWindow::Destroy() mozilla/xpfe/appshell/src/nsXULWindow.cpp:713 17 nsWebShellWindow::Destroy() mozilla/xpfe/appshell/src/nsWebShellWindow.cpp:840 18 nsAppShellService::DestroyHiddenWindow() mozilla/xpfe/appshell/src/nsAppShellService.cpp:209 19 nsAppStartup::DestroyHiddenWindow() mozilla/toolkit/components/startup/src/nsAppStartup.cpp:160 20 ScopedXPCOMStartup::~ScopedXPCOMStartup() mozilla/toolkit/xre/nsAppRunner.cpp:898 21 XRE_main mozilla/toolkit/xre/nsAppRunner.cpp:3190 22 main mozilla/browser/app/nsBrowserApp.cpp:158 23 start crt.c:272 24 start 25 @0x1 Sample comments with the same stack: wasn't doing much. just quited, then used spaces to switch screens. also shut down Neo Office Was just quitting Firefox Program did not crash. I quit out of it. printed a PDF (with no physical printer defined) ... then quit out of Firefox gmail, using firebug, tried to close the browser and it crashed. firebug and gmail have issues, but both are so useful that i can't live without either of them crashed on quit with 5 tabs opened Closed Firefox. Two tabs were open. My guess is that this is a Mac-only crash on quit.
This stack is a bit screwy -- nsMenuX::RemoveAll() just returns NS_ERROR_NOT_IMPLEMENTED. But (allowing for some corruption) I'd guess that mMenusArray has gotten out of date, and contains some items that have already been destroyed.
Here's a stack trace, made in gdb, using a build with debug symbols. I'm going to try to find an STR.
STR: 1) Open Minefield. 2) Choose File : Print and then PDF : Save as PDF. 3) Choose Save 4) Choose Quit from the Minefield menu or do command-q. 5) Crash
This STR only "works" on OS X 10.5.2 (not on 10.4.11).
> printed a PDF (with no physical printer defined) ... then quit out of Firefox Bingo!
This is an old bug, and has a complex regression history. My STR from comment #3 "works" in the same way (with minor variations) back to the 2006-11-30-04-trunk Minefield nightly. Between the 2006-11-22-04-trunk Minefield nightly and the 2006-11-29-04-trunk nightly (inclusive), you get a crash under DeviceContextImpl::Release() immediately upon printing. The last nightly where printing works entirely normally (on OS X 10.5.2) is the 2006-11-21-07-trunk nightly. So what probably started the trouble was the patch for bug 323934 ("turn on cocoa-cairo by default"), possibly assisted by the patch for "fixing cocoa-cairo plugins" (bug number is given as 35732, which must be wrong). And what probably triggered/caused the current problem is a change (among other things) to nsMenuX.mm that has nothing specific to do with printing -- the patch for bug 351230.
Here's another gdb stack trace, showing that this kind of crash can be triggered by an Objective-C exception. My hunch is that the specific exception will vary.
I've got a revision of my STR from comment #3. I've found that it only works if you've already printed to PDF in the same location, and choose Replace to replace the previous file. 1) Open Minefield. 2) Choose File : Print and then PDF : Save as PDF. 3) Choose Save, then choose to replace the already existing file with the same name. 4) Choose Quit from the Minefield menu or do command-q. 5) Crash
Here's a fix for this bug (bug 422827) and bug 423669. Or more accurately, it's a workaround for an obscure Apple bug that's specific to Leopard (OS X 10.5.X). (For more details see the comments in the patch.) This patch is definitely hackish (because it digs into undocumented stuff). But there's no other way to get this particular problem fixed quickly. And the patch itself is low-risk: It's conceptually very simple, and will fail gracefully if Apple changes the "signatures" of the undocumented methods that it "hooks" (if that happens the patch will simply have no effect). I hope that Apple will eventually fix this problem. And when they do this patch can be backed out. But (as best I can tell) this bug doesn't effect any Apple products, so they don't have much incentive to be quick in their work. Here's a tryserver build made with this patch: https://build.mozilla.org/tryserver-builds/2008-03-20_13:firstname.lastname@example.orgemail@example.com
Comment on attachment 310834 [details] [diff] [review] Fix (workaround for an Apple bug) Oops, I made a dumb mistake in this patch. I'll post a new one (and a new tryserver build) shortly.
This new patch fixes a bad leak. Here's a tryserver build made with it: https://build.mozilla.org/tryserver-builds/2008-03-20_15:firstname.lastname@example.orgemail@example.com
(In reply to comment #11) > Created an attachment (id=310864) [details] > Fix rev1 (fix bad leak) > > This new patch fixes a bad leak. > > Here's a tryserver build made with it: Note: this tryserver build Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b5pre) Gecko/2008032015 Minefield/3.0b5pre has fixed my crash with a similar/same stack in bug 423669
Josh, I think we should consider bumping this bug to P1 and including my patch in beta5. This crash _really_ is a topcrasher, and I'm pretty sure there are lots of other ways to trigger it than the ones we've already found. Look at individual crash reports at the following two URLs: http://crash-stats.mozilla.com/report/list?range_unit=weeks&branch=1.9&range_value=2&signature=objc_msgSend http://crash-stats.mozilla.com/report/list?range_unit=weeks&branch=1.9&range_value=2&signature=libobjc.A.dylib%400x146e8
OK. I'll bump this to a P1, and we need to get josh to review this ASAP. I'll make sure he's aware of the issue (i.e., I'm dialing his cell right now).
Priority: P2 → P1
Target Milestone: --- → mozilla1.9beta5
OK. Josh will look at this tonight and get the review done as soon as possible. Note here to beltzner and josh, it's entirely possible this isn't a beta blocker, so feel free to bump this down if you disagree. I just want to get this on the radar of the beta if it is indeed a #6 top crasher with lots of possible entry points.
And... adding beltzner to cc. Beltzner, take a look at Comment #16.
> #6 top crasher Actually, if you count both URLs from comment #14 together (as you probably should), and if you discount the nsObjCExceptionLogAbort(NSException*) topcrasher (which can have many causes and may be tailing off), this is now the #1 Mac topcrasher.
Priority: P1 → P2
Target Milestone: mozilla1.9beta5 → ---
Oops, didn't mean to take this off the beta blocker list.
Priority: P2 → P1
Target Milestone: --- → mozilla1.9beta5
I started reviewing this, it is going to take me more time than I have tonight. I don't think this needs to be a beta blocker, moved to P2.
Priority: P1 → P2
Target Milestone: mozilla1.9beta5 → mozilla1.9
Even though this is (probably) the #1 Mac topcrasher, my patch should bake on the trunk for a few days before it gets into a "release". If we don't have those few days before beta5 (including at least one beta5 "RC"), I'd agree with Josh that this should be postponed until the first RC. (For all its apparent complexity, this is conceptually a very simple and well-isolated fix. It should be fine to land it in an RC.)
The call to super in nsMenuX_nsMenu_removeItem should come first. Otherwise there's a (small) possibility that aItem will have been deleted by the time the call is made. I'll post another tryserver build shortly.
Here's a tryserver build made with my rev2 patch: https://build.mozilla.org/tryserver-builds/2008-03-21_09:firstname.lastname@example.orgemail@example.com
Crashes always get "critical" severity.
Severity: major → critical
In an attempt to get a better idea of whether or not this is really our #1 Mac topcrasher, I searched for all 1.9-branch and platform-OS-X crashes with exactly "nsMenuX::RemoveAll()" in the top 10 stack frames, over the last month. I got a total of just under 4200 ... which tends to confirm what I said. http://crash-stats.mozilla.com/?do_query=1&branch=1.9&platform=mac&query_search=stack&query_type=exact&query=nsMenuX%3A%3ARemoveAll%28%29&date=&range_value=1&range_unit=months
Oops -- for comparison with the topcrasher stats at http://crash-stats.mozilla.com/topcrasher/bybranch/1.9 I really should have chosen "2 weeks" instead of "1 month". The total for 2 weeks is about 2650 -- impressive but still probably not the #1 topcrasher. http://crash-stats.mozilla.com/?do_query=1&branch=1.9&platform=mac&query_search=stack&query_type=exact&query=nsMenuX%3A%3ARemoveAll%28%29&date=&range_value=2&range_unit=weeks
+// Class for numerical-index keys in gShadowKeyEquivDB (each is the unsigned +// int equivalent of an NSMenuItem's address). Using this kind of key is more Needs a comment about what to do for 64-bit, or better yet, a 64-bit-safe solution. + gShadowKeyEquivDB = [[NSMutableDictionary alloc] init]; + if (!gShadowKeyEquivDB) Generally we assume those allocations succeed. + if ((mTables = [[NSMutableIndexSet alloc] init]) == nil) Same here. Also I don't really like comparisons against assignments, no reason not to just make it more clear with one extra line of code. Irrelevant here since that test can just go away. Can we move most of the additions to nsMenuX.mm to the bottom of the file? We need to file a bug with Apple about this, please paste the text of your filing here if you can.
I tried to make this completely 64-bit safe ... but wasn't able to. There's an NSNumber method that I'd have needed to use that's only available when building on OS X 10.5.X (or presumably with a 10.5 SDK). But I _did_ put in defintions for NSInteger and NSUInteger, which I "borrowed" the QTKit framework's QTKitDefines.h (the only place they're defined on 10.4). I've assumed that we'll only ever be building 32-bit binaries on 10.4, which I think is both reasonable and true. Please let me know, Josh, if you think otherwise. I chose not wrap these definitions in the ifdef you suggested, Josh (#if MAC_OSX_VERSION_MAX_ALLOWED <= MAC_OSX_VERSION_10_4) because I'm not entirely sure how that's supposed to work. (On the face of it, it seems to be intended for code that's not supposed to run on 10.5.)
Another way to trigger this problem (in current nightly builds): 1) Run Minefield and close all windows. 2) Do Command-o or File : Open File -- a native app-modal file picker will appear. 3) Do Command-h or Minefield : Hide Minefield. 4) Click on the Minefield doc icon -- Minefield will be unhidden, a new browser window will appear under the file picker (still displayed), and the file picker will lose focus to the browser window. 5) Click on the file picker's Cancel button. 6) Quit (Command-q or Minefield : Quit). 7) Crash.
(Following up comment #29) Forgot to mention that my patch also fixes this.
Tryserver build made with my current patch: https://build.mozilla.org/tryserver-builds/2008-03-25_13:firstname.lastname@example.orgemail@example.com
Using Steven's tryserver build in Comment 31 and following the STR in Comment 29, I do not crash. Build ID: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b5pre) Gecko/2008032504 Minefield/3.0b5pre.
Here's the Apple bug report that I'm about to post: For some time OS X has had an undocumented NSKeyEquivalents protocol of the NSMenu interface. There appear to problems with two of these methods on OS X 10.5.X (or with the code that calls them): + (void)_addItem:(NSMenuItem *)aItem toTable:(NSMapTable *)aTable + (void)_removeItem:(NSMenuItem *)aItem fromTable:(NSMapTable *)aTable There have been many cases (thousands) of crashes in Firefox 3 pre-release builds in the second of those two methods (_removeItem:fromTable:), when code called from _removeItem:fromTable: tries to access an NSMenuItem object that has already been deleted (sometimes (always?) when it tries to send a _setChangedFlags: message to that object). These crashes happen only on OS X 10.5.X (never on 10.4.X). And they stop happening when I "hook" these methods (via method swizzling) and add code to "retain" aItem on calls to _addItem:toTable:, and to "release" it on calls to _removeItem:fromTable:. That this simple change clears up the problem tells me that there are bugs in Apple's code -- either in these two methods, or in the code that calls these methods. Apple's implementations of _addItem:toTable: and _removeItem:fromTable: appear to assume that a call to the first (on a given NSMenuItem) will always be matched by a call to the second (on the same NSMenuItem before it gets deleted). So these methods don't retain and release the NSMenuItem. But this assumption isn't always true. The crashes in _removeItem:fromTable: in Firefox 3 pre-release builds (when it's called on a different NSMenuItem object) show that a NSMenuItem object can be "added", and then deleted, without ever having been "removed". This problem is being tracked in Mozilla.org's bug database at the following URL: https://bugzilla.mozilla.org/show_bug.cgi?id=422827
My report's been given radar #5820190.
Attachment #311648 - Flags: superreview?(vladimir) → superreview+
Landed on trunk: Checking in widget/src/cocoa/nsCocoaUtils.h; /cvsroot/mozilla/widget/src/cocoa/nsCocoaUtils.h,v <-- nsCocoaUtils.h new revision: 1.6; previous revision: 1.5 done Checking in widget/src/cocoa/nsMenuX.mm; /cvsroot/mozilla/widget/src/cocoa/nsMenuX.mm,v <-- nsMenuX.mm new revision: 1.65; previous revision: 1.64 done Checking in widget/src/cocoa/nsToolkit.h; /cvsroot/mozilla/widget/src/cocoa/nsToolkit.h,v <-- nsToolkit.h new revision: 1.18; previous revision: 1.17 done Checking in widget/src/cocoa/nsToolkit.mm; /cvsroot/mozilla/widget/src/cocoa/nsToolkit.mm,v <-- nsToolkit.mm new revision: 1.32; previous revision: 1.31 done
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Backed out because of a leak test failure ... which now appears to have been spurious. I'll wait for a while and try again.
Landed again on trunk. This time no leak test failure. Checking in widget/src/cocoa/nsCocoaUtils.h; /cvsroot/mozilla/widget/src/cocoa/nsCocoaUtils.h,v <-- nsCocoaUtils.h new revision: 1.8; previous revision: 1.7 done Checking in widget/src/cocoa/nsMenuX.mm; /cvsroot/mozilla/widget/src/cocoa/nsMenuX.mm,v <-- nsMenuX.mm new revision: 1.67; previous revision: 1.66 done Checking in widget/src/cocoa/nsToolkit.h; /cvsroot/mozilla/widget/src/cocoa/nsToolkit.h,v <-- nsToolkit.h new revision: 1.20; previous revision: 1.19 done Checking in widget/src/cocoa/nsToolkit.mm; /cvsroot/mozilla/widget/src/cocoa/nsToolkit.mm,v <-- nsToolkit.mm new revision: 1.34; previous revision: 1.33 done
Verified fixed using "my testcase" from Bug 425585 and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9pre) Gecko/2008032804 Minefield/3.0pre Crashed with Beta 5 RC2 - does not crash with the Nightly --> Verified fixed
Status: RESOLVED → VERIFIED
Crash Signature: [@ libobjc.A.dylib@0x146e8] [@ nsMenuX::RemoveAll()]
You need to log in before you can comment on or make changes to this bug.