crash on shutdown when using Apple-Q to quit [@ AppKitMenuEventHandler] [@ objc_msgSend]

RESOLVED FIXED

Status

()

defect
--
critical
RESOLVED FIXED
12 years ago
8 years ago

People

(Reporter: jaas, Assigned: jag+mozilla)

Tracking

({crash, topcrash})

Trunk
x86
macOS
Points:
---
Bug Flags:
blocking1.9 +
in-litmus +

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(4 attachments, 2 obsolete attachments)

Posted file stack trace 1
No description provided.
Flags: blocking1.9+
neither Steven nor I were able to reproduce this (discussed on IRC).

Steven was using a nightly, on PPC and x86. I was using a build from a pull that was about an hour or so before I made this comment.
I hit this fairly often when I have Firefox automatically start and shut down a few hundred times in a row.
Summary: crash on shutdown when using Apple-Q to quit → crash on shutdown when using Apple-Q to quit [@ AppKitMenuEventHandler]
Does it make any difference if you automate quitting from the main
menu (as opposed to using Cmd-Q)?
I automate quitting using Bob Clary's goQuitApplication, which uses nsIAppStartup / nsIAppShellService.  It doesn't emulate a Cmd+Q or main menu click.
Oh, well.

My next question was whether you could automate starting and quitting
other apps (e.g. Safari).  It's notable that the entire crash log
above nextEventMatchingMask: contains only system calls.

(I believe the stack trace was made with a build that contains debug
symbols, and is therefore accurate.)
On the other hand, Safari (and other "pure Cocoa" apps) quit using
[NSApplication terminate:], while Minefield (mostly) doesn't.  And
Minefield's shutdown sequence is complex and fragile ...
(In reply to comment #6)
> On the other hand, Safari (and other "pure Cocoa" apps) quit using
> [NSApplication terminate:], while Minefield (mostly) doesn't.  And
> Minefield's shutdown sequence is complex and fragile ...

Yeah, I agree. I think capturing this in the debugger will answer a lot of questions.

Jesse, is that something you could do?
I see goQuitApplication has a home page
(http://wiki.mozilla.org/SoftwareTesting:Tools:BrowserShutdown).

When I get back from vacation (week after next) I'll try it with a
debugging build I've created that logs all nsRunnable events (by
class name) as they're processed.
> debugging build I've created that logs all nsRunnable events (by
> class name) as they're processed.

Colin (or Jesse), I create my "debugging build" with a gigantic patch
(about 140K) that touches just about every module and swiftly goes out
of date.  But if you'd like, I can inflict it upon you via email :-)
Flags: in-litmus?
I've just started experiencing crashes supposedly like this one a lot yesterday and today. It happens with just about any command-key combination (Cmd-T and Cmd-W). Some breakpad reports:

bp-c4018a70-41f5-11dc-98dd-001a4bd43e5c
bp-fc8ef63a-41f4-11dc-bc5a-001a4bd43ef6
bp-a2f90f91-41f2-11dc-91f9-001a4bd43e5c
bp-cf4d9bdc-41f0-11dc-8b3b-001a4bd43ed6

Unfortunately breakpad doesn't seem to record signature information for these (not sure why), so I can't generate aggregate reports to see when the crashes started.
Is it interesting (wrt breakpad not recording signature info) that the stack displayed there is being truncated at 17 levels? The stack Josh posted is much longer.
I mentioned this to josh on IRC, I noticed today's trunk build has crashed for me more than yesterday's. http://crash-stats.mozilla.com/report/index/b7315807-41f1-11dc-8588-001a4bd43e5c?date=2007-08-03-18 is the example of a common crash I am hitting using when entering text in a form field using  Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.9a8pre) Gecko/200708030808 Minefield/3.0a8pre.
I'll hazard a guess -- an NSMenu object has been released that
shouldn't have been.
Empirically I observe that first bringing up the history menu once guarantees I crash. It's quite possible that the cross-level connection trick I did for submenus is releasing at the wrong time.
Posted file anotehr stack?
Colin asked me to comment here. I'm getting a pretty similar stack very reliably when copying from the url bar with apple-C in this build (2007073004) but not in previous builds.

Example breakpad report, similarly truncated: http://crash-stats.mozilla.com/report/index/3ee53ff5-4212-11dc-b430-001a4bd43ed6
Stan, does your new menu code use a menu delegate?

If so, I'll bet it's the delegate that's being released prematurely
(or inappropriately).

Here's the basis for my hunch:

Set NSZombieEnabled at the Terminal prompt (export
NSZombieEnabled=YES) and then run Firefox (at the same Terminal
prompt).  Fiddling with the History menus and using Cmd-C at the
location bar (the url bar), in some order, causes the following to
appear in Terminal:

*** -[_NSZombie forward::] not overridden!

Now (from the same Terminal prompt) run Firefox/Minefield in gdb and
break on '-[_NSZombie forward::]' (which is what displays this
message).  You'll get a stack, the top of which looks like this:

#0  0x92cb0bf0 in -[_NSZombie forward::] ()
#1  0x90a450b0 in _objc_msgForward ()
#2  0x9389a7c0 in -[NSMenu _populate:] ()
#3  0x9389a3a4 in AppKitMenuEventHandler ()
#4  0x9329a934 in DispatchEventToHandlers ()
#5  0x9329a08c in SendEventToEventTargetInternal ()
#6  0x93299f08 in SendEventToEventTargetWithOptions ()
#7  0x933337b0 in SendMenuPopulate ()
#8  0x9334c70c in PopulateMenu ()
#9  0x9334c68c in Check1MenuForKeyEvent ()
#10 0x93369fc4 in CheckEachHMenuForKeyEvent ()
#11 0x932d3874 in ForEachInstalledHMenuDo ()
#12 0x9334bff0 in CheckMenusForKeyEvent ()
#13 0x9334bcb0 in _IsMenuKeyEvent ()
#14 0x9334ba34 in IsMenuKeyEvent ()
#15 0x9384058c in _NSGetMenuItemForCommandKeyEvent ()
#16 0x937a457c in _NSHandleCarbonMenuEvent ()
#17 0x937a1e64 in _DPSNextEvent ()
#18 0x937a17a8 in -[NSApplication
                   nextEventMatchingMask:untilDate:inMode:dequeue:] ()
...

What's going on here (I believe) is that the delegate has already been
deleted and turned into an _NSZombie object.  But it's NSMenu object
is still trying to send (i.e. "forward") messages to it.
I forgot to mention that comment #16's tests were done with the latest
Minefield nightly (2007-08-03-08-trunk).
No delegate fiddling, all I did was to retain the NSMenuItem for submenus and
use it to set up favicon for the submenu. Just as an experiment, I commented
out my release of the menu item, browser crash happens exactly (and as
consistently) as before.
Then your code didn't cause the problem.

But I'm pretty sure something is causing an NSMenu delegate to be
dealloced prematurely.

And there is code in nsMenuX::CreateMenuWithGeckoString() that sets a
menu delegate:

http://lxr.mozilla.org/mozilla/source/widget/src/cocoa/nsMenuX.mm#614
> But I'm pretty sure something is causing an NSMenu delegate to be
> dealloced prematurely.

Another possibility is that the delegate isn't being "removed" from
its NSMenu object at the right time (before the delegate is
dealloced).
Is it possible that someone forgot that objects don't retain their delegates, and was counting on the menu having an owning ref to the delegate?

That's a complete shot-in-the-dark suggestion. I've just seen that problem before when working with delegates.
Looking through the code in widget/src/cocoa/nsMenuX.mm, I think I see
the problem:

Each nsMenuX object has a MenuDelegate object created in its
constructor and released in its destructor (and stored in the
mMenuDelegate instance variable).  This mMenuDelegate object is set as
the delegate of the NSMenu object created and returned by
nsMenuX::CreateMenuWithGeckoString().  But the nsMenuX object has no
variable to keep track of this "myMenu" object, and so is unable to
remove its delegate before the delegate itself (mMenuDelegate) is
released (and dealloced) in the nsMenuX object's destructor.

It's possible that this is a very old problem, whose ill effects have
only just become visible.
Actually, the mMacMenu variable gets assigned to result of CreateMenuWithGeckoString.

I don't see how this would cause a problem, though -- mMacMenu is released before mMenuDelegate.
> Actually, the mMacMenu variable gets assigned to result of
> CreateMenuWithGeckoString.

Oops, I missed that.

But are you quite sure that nsMenuX::SetLabel() is the only place
nsMenuX::CreateMenuWithGeckoString() is called from?

(It is in code I pulled at the beginning of this week ... but I
haven't checked anything more recent.)
Oh, well.  I thought for sure I had it.

Can you think of any other kind of object (besides a delegate) that an
NSMenu object would "forward" messages to?
Fwiw, the crashes that Benjamin (comment 10) and Dave (comment 15) are seeing have been filed as bug 390773. On my side those started _after_ the 2007080222 build (that build worked pretty well). 
> Oh, well.  I thought for sure I had it.

I still have one nagging doubt:

I wonder if it would make any difference if (in the nsMenuX
destructor) you set mMacMenu's delegate to NULL before you released
it.
(In reply to comment #26)
> Oh, well.  I thought for sure I had it.
> 
> Can you think of any other kind of object (besides a delegate) that an
> NSMenu object would "forward" messages to?

It's trying to do an NSInvocation on something, I'm guessing (after taking a closer look at your stack).

The arguments it's trying to pass are probably pretty interesting here. FWIW I wasn't able to reproduce this originally, but I haven't tried your new steps. I'll try them soon.

(In reply to comment #28)
> > Oh, well.  I thought for sure I had it.
> 
> I still have one nagging doubt:
> 
> I wonder if it would make any difference if (in the nsMenuX
> destructor) you set mMacMenu's delegate to NULL before you released
> it.

mMenuDelegate is still around there though, I don't think it will help, but it's worth a shot.
Here's another thought:

I might have been mistaken about what kind of object has been turned
into an _NSZombie -- I've assumed it was an NSMenu delegate, but it
might (conceivably) have been an NSMenu.

(Sorry for the logorrhea, if I'd had the time (before my vacation) to
debug this on my own, I would have done it.)
>> Can you think of any other kind of object (besides a delegate) that
>> an NSMenu object would "forward" messages to?
>
> It's trying to do an NSInvocation on something, I'm guessing (after
> taking a closer look at your stack).

It _would_ be nice to know what parameters [_NSZombie forward::] is
being called with (this might help to guess what kind of object it
originally was).  The only way I can think of is to use poseAsClass on
the _NSZombie class itself ... but that's tricky with a class that's
not publicly defined and documented.
You mean you can't get at them in gdb when it breaks on [_NSZombie forward::]? Am I totally off my rocker here in thinking that you can?
> You mean you can't get at them in gdb when it breaks on [_NSZombie
> forward::]?

In principle you can, but ...

If you use class-dump on the Foundation binary of the Foundation
framework, you'll see the definition of the _NSZombie class, and of
it's forward:: method:

- (id)forward:(SEL)fp8:(void *)fp12;

This same method appears under several other classes (in the results
generated by class-dump), but in none of them is it documented.
However, I'm almost certain that the SEL argument is the selector
that's being forwarded to the object whose forward:: method is being
called.

I don't know what the void * argument is, but (I suspect) the really
crucial argument is the SEL one.

Do you know how to read a SEL argument from inside gdb?  It's not an
"object" (on which you could call print-object).
> I don't know what the void * argument is

Come to think of it, it's probably a pointer to an array of objects
that belong together with the SEL argument.
Try
[(NSObject *)(fp12[0]) description]

Maybe
[(NSObject *)fp12 description]

in case it's some sort of NSArray or something (???)

If only I could get this in the debugger, grr...
> You should be able to cast it to char*.

I have my doubts.

The following appears in /usr/include/objc/objc.h:

typedef struct objc_selector *SEL;

I can't find a definition of "struct objc_selector" anywhere in
/usr/include/objc.
Here's someone who claims Apple's definition of struct objc_selector
is

typedef struct objc_selector
{
        IMP             implementation;
        char            *name;
        char            *signature;
} _ objc_selector;

http://lists.macserve.net/pipermail/valentina/2006-August/017063.html
But here it's defined as follows (commented out in original):

/* struct _objc_selector {
     SEL sel_id;
     char *sel_type;
   }; */

http://gcc.gnu.org/viewcvs/branches/apple/trunk/gcc/objc/objc-act.c?view=markup
> Maybe
> [(NSObject *)fp12 description]

If I'm right about the void * argument (to forward::) I bet this would
work (for the first object in the "array").  I suspect this "array" is
nothing fancy -- just a bunch of consecutive "id"s (i.e. void *s),
probably not even NULL-terminated (since the SEL object determines how
many are expected).

Or since (as far as I know) you can't use Objective-C syntax in gdb,
you could do "print-object fp12".
(In reply to comment #19)
> Then your code didn't cause the problem.


Backing his patch out seemed to fix it for me. At least, I haven't crashed in awhile, and the history trick was producing reliable crashes before. Also, if as mentioned in comment #27, this is the same as bug 390773, then it is in the right regression range.
(In reply to comment #33)
<snip>
> Do you know how to read a SEL argument from inside gdb?  It's not an
> "object" (on which you could call print-object).
> 

Use NSStringFromSelector(SEL) and then print-object it in gdb. E.g.:

gdb> po NSStringFromSelector(fp12)
I can reliably reproduce this in SeaMonkey by going to Mail/News (an IMAP account), opening a message (some spam), then closing it and trying to delete it.

I set NSZombieEnables=YES and MallocStackLoggingNoCompact=1 before launching seamonkey-bin, I attached gdb, set breakpoint -[_NSZombie forward::], repeated those steps to hit the breakpoint, bt shows the same stack as in this bug, info registers gives me some interesting numbers in esi (0x37dc8460) and edi (0x37dc8480), which if I feed them to malloc-history give me the following:

jag:~ jag$ malloc_history 4955 0x37dc8460

Call [2] [arg=16]: thread_a000d000 |0x1 | start | _start | main | XRE_main | nsAppStartup::Run() | nsAppShell::Run() | -[NSApplication run] | -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] | _DPSNextEvent | BlockUntilNextEventMatchingListInMode | ReceiveNextEventCommon | RunCurrentEventLoopInMode | CFRunLoopRunInMode | CFRunLoopRunSpecific | __CFMachPortPerform | __NSFireMachPort | -[AppShellDelegate handlePortMessage:] | nsAppShell::ProcessGeckoEvents() | nsBaseAppShell::NativeEventCallback() | NS_ProcessPendingEvents_P(nsIThread*, unsigned) | nsThread::ProcessNextEvent(int, int*) | nsAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned) | nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned) | nsBaseAppShell::DoProcessNextNativeEvent(int) | nsAppShell::ProcessNextNativeEvent(int) | -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] | _DPSNextEvent | BlockUntilNextEventMatchingListInMode | ReceiveNextEventCommon | RunCurrentEventLoopInMode | CFRunLoopRunInMode | CFRunLoopRunSpecific | __NSFireDelayedPerform | -[AppShellDelegate runAppShell] | nsAppShell::Run() | nsBaseAppShell::Run() | NS_ProcessNextEvent_P(nsIThread*, int) | nsThread::ProcessNextEvent(int, int*) | nsInputStreamReadyEvent::Run() | nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) | nsInputStreamPump::OnStateStop() | nsBaseChannel::OnStopRequest(nsIRequest*, nsISupports*, unsigned) | nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, unsigned) | nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned) | nsDocLoader::DocLoaderIsEmpty() | nsDocLoader::ChildDoneWithOnload(nsIDocumentLoader*) | nsDocLoader::DocLoaderIsEmpty() | nsDocLoader::doStopDocumentLoad(nsIRequest*, unsigned) | nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, unsigned) | nsWebShellWindow::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned, unsigned) | LoadNativeMenus(nsIDOMDocument*, nsIWidget*, nsIDocShell*) | nsMenuBarX::MenuConstruct(nsMenuEvent const&, nsIWidget*, void*, void*) | nsMenuX::Create(nsISupports*, nsAString_internal const&, nsAString_internal const&, nsIChangeManager*, nsIDocShell*, nsIContent*) | nsMenuX::MenuConstruct(nsMenuEvent const&, nsIWidget*, void*, void*) | nsMenuX::LoadSubMenu(nsIContent*) | nsCOMPtr<nsIMenu>::nsCOMPtr[in-charge](nsCOMPtr_helper const&) | nsCOMPtr<nsIMenu>::assign_from_helper(nsCOMPtr_helper const&, nsID const&) | nsCreateInstanceByCID::operator()(nsID const&, void**) const | CallCreateInstance(nsID const&, nsISupports*, nsID const&, void**) | nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) | nsGenericFactory::CreateInstance(nsISupports*, nsID const&, void**) | nsMenuXConstructor(nsISupports*, nsID const&, void**) | nsMenuX::nsMenuX[in-charge]() | +[NSObject alloc] | NSAllocateObject | _internal_class_createInstanceFromZone | malloc_zone_calloc 


jag:~ jag$ malloc_history 4955 0x37dc8480

Call [2] [arg=28]: thread_a000d000 |0x1 | start | _start | main | XRE_main | nsAppStartup::Run() | nsAppShell::Run() | -[NSApplication run] | -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] | _DPSNextEvent | BlockUntilNextEventMatchingListInMode | ReceiveNextEventCommon | RunCurrentEventLoopInMode | CFRunLoopRunInMode | CFRunLoopRunSpecific | __CFMachPortPerform | __NSFireMachPort | -[AppShellDelegate handlePortMessage:] | nsAppShell::ProcessGeckoEvents() | nsBaseAppShell::NativeEventCallback() | NS_ProcessPendingEvents_P(nsIThread*, unsigned) | nsThread::ProcessNextEvent(int, int*) | nsAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned) | nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned) | nsBaseAppShell::DoProcessNextNativeEvent(int) | nsAppShell::ProcessNextNativeEvent(int) | -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] | _DPSNextEvent | BlockUntilNextEventMatchingListInMode | ReceiveNextEventCommon | RunCurrentEventLoopInMode | CFRunLoopRunInMode | CFRunLoopRunSpecific | __NSFireDelayedPerform | -[AppShellDelegate runAppShell] | nsAppShell::Run() | nsBaseAppShell::Run() | NS_ProcessNextEvent_P(nsIThread*, int) | nsThread::ProcessNextEvent(int, int*) | nsInputStreamReadyEvent::Run() | nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) | nsInputStreamPump::OnStateStop() | nsBaseChannel::OnStopRequest(nsIRequest*, nsISupports*, unsigned) | nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, unsigned) | nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned) | nsDocLoader::DocLoaderIsEmpty() | nsDocLoader::ChildDoneWithOnload(nsIDocumentLoader*) | nsDocLoader::DocLoaderIsEmpty() | nsDocLoader::doStopDocumentLoad(nsIRequest*, unsigned) | nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, unsigned) | nsWebShellWindow::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned, unsigned) | LoadNativeMenus(nsIDOMDocument*, nsIWidget*, nsIDocShell*) | nsMenuBarX::MenuConstruct(nsMenuEvent const&, nsIWidget*, void*, void*) | nsMenuX::Create(nsISupports*, nsAString_internal const&, nsAString_internal const&, nsIChangeManager*, nsIDocShell*, nsIContent*) | nsMenuX::MenuConstruct(nsMenuEvent const&, nsIWidget*, void*, void*) | nsMenuX::LoadSubMenu(nsIContent*) | nsMenuX::Create(nsISupports*, nsAString_internal const&, nsAString_internal const&, nsIChangeManager*, nsIDocShell*, nsIContent*) | nsMenuX::SetLabel(nsAString_internal const&) | nsMenuX::CreateMenuWithGeckoString(nsString&) | +[NSObject alloc] | NSAllocateObject | _internal_class_createInstanceFromZone | malloc_zone_calloc 

I'm not sure I have the right objects there, but it does seem like they are. Oh, and if I do print-object on the edi value I get this:

(gdb) print-object 0x37dc8480

Program received signal SIGPIPE, Broken pipe.
0x93666dff in -[NSMenu description] ()
The program being debugged was signaled while in a function called from GDB.
GDB has restored the context to what it was before the call.
To change this behavior use "set unwindonsignal off"
Evaluation of the expression containing the function (_NSPrintForDebugger) will be abandoned.


Which I guess confirms that there's a NSMenu object at the edi address, but I'm not sure if that'll help us figure out what the _NSZombie object once was.

I tried print-object NSStringFromSelector(fp12), but no luck: no symbol "fp12" in current context.
po NSStringFromSelector(SEL) gives: Could not find OSO file: ""

Does anyone know how to get the _NSZombie object's "this" (memory address) value? Apparently usually it shows up in the console message the _NSZombie prints, but as comment 16 shows, we only get "not overridden".


Anyway, since I could easily reproduce this I figured I'd do some binary-search backing out. With the patch for bug 379660 backed out, I don't crash anymore (or rather, I don't see the _NSZombie forward messages anymore :-)).

Stan, any new ideas on why your patch may cause crashes, or increase their occurrence?
> Use NSStringFromSelector(SEL) and then print-object it in gdb.

Thanks!  This works.

Since I'm on a PPC machine (which kindly puts each parameter in a
separate register), and since the SEL parameter I want to see is the
third parameter (after "self" and the called method's own selector), I
used

(gdb) po NSStringFromSelector($r5)

which returned "menuNeedsUpdate:"

This is a method of several classes in the AppKit framework, but none
of them are documented (you can tell by grepping through the output of
class-dump on the AppKit framework's AppKit binary).

So the object that was turned into an _NSZombie is neither an NSMenu
object nor an NSMenu delegate.  And this problem will be a lot more
complicated to debug than I first realized.

(I also now think I was wrong about the void * argument of _NSZombie's
- (id)forward:(SEL)fp8:(void *)fp12 method -- I now suspect it's a
pointer so some kind of structure.)

I'm off on my vacation.  Good luck!
Just thinking out loud... One of the things that's new is that nsMenuItemIconX objects now have a non-owning pointer to a NSMenuItem object. Could that NSMenuItem object be released before the nsMenuItemIconX object is deleted, giving it an opportunity to send it a message?

I hacked things up real quick so mNativeMenuItem would always be 0, and I still saw the _NSZombie forwards, so I guess it must be in other code.

hwaara pointed to http://www.borkware.com/quickies/single?id=306

So I set a breakpoint on _NSZombie forward:: again, and when it hit I did:

Breakpoint 1, 0x928d5870 in -[_NSZombie forward::] ()
(gdb) po *(int*)($ebp+8)

Breakpoint 1, 0x928d5870 in -[_NSZombie forward::] ()
The program being debugged stopped while in a function called from GDB.
When the function (_NSPrintForDebugger) is done executing, GDB will silently
stop (instead of continuing to evaluate the expression containing
the function call).
(gdb) p/x *(int*)($ebp+8)
$2 = 0x44f62160

And malloc-history for that address says:

jag:~ jag$ malloc_history 6589 0x44f62160

Call [2] [arg=16]: thread_a000d000 |0x1 | start | _start | main | XRE_main | nsAppStartup::Run() | nsAppShell::Run() | -[NSApplication run] | -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] | _DPSNextEvent | BlockUntilNextEventMatchingListInMode | ReceiveNextEventCommon | RunCurrentEventLoopInMode | CFRunLoopRunInMode | CFRunLoopRunSpecific | __CFMachPortPerform | __NSFireMachPort | -[AppShellDelegate handlePortMessage:] | nsAppShell::ProcessGeckoEvents() | nsBaseAppShell::NativeEventCallback() | NS_ProcessPendingEvents_P(nsIThread*, unsigned) | nsThread::ProcessNextEvent(int, int*) | nsAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned) | nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, int, unsigned) | nsBaseAppShell::DoProcessNextNativeEvent(int) | nsAppShell::ProcessNextNativeEvent(int) | -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] | _DPSNextEvent | BlockUntilNextEventMatchingListInMode | ReceiveNextEventCommon | RunCurrentEventLoopInMode | CFRunLoopRunInMode | CFRunLoopRunSpecific | __NSFireDelayedPerform | -[AppShellDelegate runAppShell] | nsAppShell::Run() | nsBaseAppShell::Run() | NS_ProcessNextEvent_P(nsIThread*, int) | nsThread::ProcessNextEvent(int, int*) | nsInputStreamReadyEvent::Run() | nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) | nsInputStreamPump::OnStateStop() | nsBaseChannel::OnStopRequest(nsIRequest*, nsISupports*, unsigned) | nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, unsigned) | nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned) | nsDocLoader::DocLoaderIsEmpty() | nsDocLoader::ChildDoneWithOnload(nsIDocumentLoader*) | nsDocLoader::DocLoaderIsEmpty() | nsDocLoader::doStopDocumentLoad(nsIRequest*, unsigned) | nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, unsigned) | nsWebShellWindow::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned, unsigned) | LoadNativeMenus(nsIDOMDocument*, nsIWidget*, nsIDocShell*) | nsMenuBarX::MenuConstruct(nsMenuEvent const&, nsIWidget*, void*, void*) | nsMenuX::Create(nsISupports*, nsAString_internal const&, nsAString_internal const&, nsIChangeManager*, nsIDocShell*, nsIContent*) | nsMenuX::MenuConstruct(nsMenuEvent const&, nsIWidget*, void*, void*) | nsMenuX::LoadSubMenu(nsIContent*) | nsCOMPtr<nsIMenu>::nsCOMPtr[in-charge](nsCOMPtr_helper const&) | nsCOMPtr<nsIMenu>::assign_from_helper(nsCOMPtr_helper const&, nsID const&) | nsCreateInstanceByCID::operator()(nsID const&, void**) const | CallCreateInstance(nsID const&, nsISupports*, nsID const&, void**) | nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) | nsGenericFactory::CreateInstance(nsISupports*, nsID const&, void**) | nsMenuXConstructor(nsISupports*, nsID const&, void**) | nsMenuX::nsMenuX[in-charge]() | +[NSObject alloc] | NSAllocateObject | _internal_class_createInstanceFromZone | malloc_zone_calloc
So since the only big difference is when mNativeMenuItem gets released, it looks like having the native menu item around too long is causing the problem.
No, scratch that. This is the relevant part of the patch:

-  [newNativeMenuItem release];
   
   NSMenu* childMenu;
   if (aMenu->GetNativeData((void**)&childMenu) == NS_OK)
-    [newNativeMenuItem setSubmenu:childMenu];
+    [mNativeMenuItem setSubmenu:childMenu];
 
I guess sending the |setSubmenu| message to the released |newNativeMenuItem| would effectively be a no-op, where-as with the patch we do successfully set the sub-menu.
Comment on attachment 275310 [details] [diff] [review]
This part of bug 379660's patch causes the crash

To clarify, I made this patch from a checkout from before the checkin made for bug 379660, with just these changes applied. Without: no crash, with: crash.
(In reply to comment #47)
> No, scratch that. This is the relevant part of the patch:
...
> I guess sending the |setSubmenu| message to the released |newNativeMenuItem|
> would effectively be a no-op, where-as with the patch we do successfully set
> the sub-menu.

Actually, nevermind that. If (in my pre-bug 379660 tree) I merely move the |newNativeMenuItem release| call below the |setSubmenu| call, we're still good. So it is a lifetime issue.
If I move the |[mNativeMenuItem release];| from the destructor back to where we alloc the object, then things work ok again. We'll let mMacMenu own the object, and hold a weak reference/pointer to it in nsMenuX, which should still be valid by the time nsMenuItemIconX tries to get it.
Posted patch One way to fix it (obsolete) — Splinter Review
I don't know the Mac code well enough to guarantee that the mNativeMenuItem will be kept alive (through mMacMenu) longer than any nsMenuItemIconX object's SetupIcon() will need access to it, but this does seem to work.
We were leaking mNativeMenuItem if AddMenu() was called more than once on a nsMenuX. Not leaking them seems to have fixed the problem.
Assignee: nobody → jag
Attachment #275315 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #275323 - Flags: review?(stanshebs)
Comment on attachment 275323 [details] [diff] [review]
Or we simply don't leak mNativeMenuItem in subsequent AddMenu() calls

Nice find Peter! 

I'm gonna go ahead and review this, as I've been around IRC debugging it with jag.
Attachment #275323 - Flags: superreview?(joshmoz)
Attachment #275323 - Flags: review?(stanshebs)
Attachment #275323 - Flags: review+
The patch makes sense, I should have thought harder about the consequences of changing an object from ephemeral to long-lived.

Looking at this further, I'm not sure we really need a fresh new mNativeMenuItem every time we pass through AddMenu(), just need to create it on the first pass and reuse it subsequently. My one bit of apprehension is that one would have to then remove it from mMacMenu and re-add each time, in case its position has changed?
Maybe I'm missing something, but isn't the mNativeMenuItem created with the label for the |aMenu| passed in? |aMenu| is a sub-menu to the current menu object, and we're holding on to the sub-menu's associated mNativeMenuItem for a short while, just like our parent menu is holding on to ours for a bit, unless our parent is the menubar, of course.

This all seems like a lot more work than necessary. Why can't we do we do for nsMenuItemX? Create our own NSMenuItem in our ::Create method, and hold on to that in mNativeMenuItem, then have AddMenu() call GetNativeData() like AddMenuItem() does to get the NSMenuItem for |aMenu|?
While we're discussing ownership issues, I have some things I've seen while looking at the code:

* I think GetNativeData() shouldn't violate the "getter rule" of returning a retained object. This can create a lot of problems in the future if we're introducing more and more calls to it, as in comment 55. Explicit ownership is clearer.

* nsMenuItemIconX keeps a pointer to the nsMenuItemX's mNativeMenuItem, without retaining it. The only reason this pointer doesn't become invalid and crash us because nsMenuItemX owns the nsMenuItemIconX, but this is a bug prone assumption. I think if it really needs the reference, it should retain for clarity.

* If you only need to find the mNativeMenuItem sometimes, you could just tag it, and then fetch it on runtime (using itemWithTag:) to avoid all of these weak/strong ref issues.
This is much cleaner, and it seems to work. Not sure whether I could simply set mIsEnabled, so I went the safe route and did what we were doing. We could move the "checked" stuff for nsMenuItemXs into their ::Create() method too, btw.
Attachment #275327 - Flags: superreview?(joshmoz)
Attachment #275327 - Flags: review?(stanshebs)
Comment on attachment 275327 [details] [diff] [review]
Create mNativeMenuItem ourselves instead of having our parent menu do it for us (we're nsMenuXs).

Mind you, my Cocoa-fu is weak, so there's a good chance there's something wrong with this patch.
Could you post a patch with 8 lines of context? cvs dif -u8p should do it.

cbarrett@mozilla.com: please practice using bugzilla instead of abusing contributors, try this link:
https://bugzilla.mozilla.org/attachment.cgi?action=diff&id=275327&collapsed=&headers=1&context=30

30 is arbitrary, I tend to use file, but you could easily pick 8 or 10.
Timeless, I was only asking what has been requested of me in the past. When I've posted patches with 3 lines of context I've been asked to regenerate them with 8. I was saving him, and the reviewers, the trouble of asking for it later.

It's also in the instructions for how to generate a patch that you should use 8 lines.

I'm not abusing anyone, and I know about that feature of Bugzilla.
Attachment #275323 - Flags: superreview?(pavlov)
Attachment #275323 - Flags: superreview?(joshmoz)
Attachment #275323 - Flags: review+
Comment on attachment 275323 [details] [diff] [review]
Or we simply don't leak mNativeMenuItem in subsequent AddMenu() calls

sr=me, though it's scary that fixing that leak also fixes a crash...
Attachment #275323 - Flags: superreview?(pavlov) → superreview+
Colin: oops, sorry, I'll update my .cvsrc (or whatever it's called).

Vlad: yeah, it is. I suspect the leaked object ends up trying to call one of the objects we do free later.

While attachment 275323 [details] [diff] [review] makes a decent branch patch (if we need one), the real fix is something along the lines of attachment 275327 [details] [diff] [review].
jag - do you have cvs commit privs to check in your one-line leak patch? If so please do, otherwise let me know and I'll do it.
(In reply to comment #64)
> jag - do you have cvs commit privs to check in your one-line leak patch? If so
> please do, otherwise let me know and I'll do it.

I would hope so, considering he's a super-reviewer. :P
Summary: crash on shutdown when using Apple-Q to quit [@ AppKitMenuEventHandler] → crash on shutdown when using Apple-Q to quit [@ AppKitMenuEventHandler] [@ objc_msgSend]
Severity: normal → critical
Keywords: crash, topcrash
Version: unspecified → Trunk
I checked this in because this bug was harming me.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
I'd personally like it if we could check in the other patch, or at least move it to another bug to clean that sort of thing up. There are a lot of strange things going on in the menu code, and lifetime issues clearer would be beneficial -- especially since there's already been confusion about ownership and lifetime things!
that is the plan, might be a good idea to move it to another bug though
Thanks roc for checking this in.

I'll move the real fix out to a new bug.
Comment on attachment 275327 [details] [diff] [review]
Create mNativeMenuItem ourselves instead of having our parent menu do it for us (we're nsMenuXs).

See bug 391288 for follow-up work.
Attachment #275327 - Attachment is obsolete: true
Attachment #275327 - Flags: superreview?(joshmoz)
Attachment #275327 - Flags: review?(stanshebs)
Duplicate of this bug: 391020
https://litmus.mozilla.org/show_test.cgi?id=6948 was added to the Keyboard test suite a while back.
Flags: in-litmus? → in-litmus+
Crash Signature: [@ AppKitMenuEventHandler] [@ objc_msgSend]
You need to log in before you can comment on or make changes to this bug.