Crash in [@ nsMenuBarX::Paint]
Categories
(Core :: Widget: Cocoa, defect, P3)
Tracking
()
People
(Reporter: mccr8, Assigned: spohl)
References
Details
(Keywords: crash, regression, Whiteboard: [tbird crash] [STR in comment #61])
Crash Data
Attachments
(6 files, 4 obsolete files)
Crash report: https://crash-stats.mozilla.org/report/index/606ea0f7-5d04-41ee-bb7c-a43880220419
MOZ_CRASH Reason: MOZ_CRASH(Encountered unexpected Objective C exception)
Top 10 frames of crashing thread:
0 XUL nsMenuBarX::Paint widget/cocoa/nsMenuBarX.mm:431
1 XUL -[WindowDelegate windowDidResignMain:] widget/cocoa/nsCocoaWindow.mm:2886
2 CoreFoundation __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__
3 CoreFoundation ___CFXRegistrationPost_block_invoke
4 CoreFoundation _CFXRegistrationPost
5 CoreFoundation _CFXNotificationPost
6 Foundation -[NSNotificationCenter postNotificationName:object:userInfo:]
7 AppKit -[NSWindow _changeKeyAndMainLimitedOK:]
8 AppKit -[NSWindow _makeKeyRegardlessOfVisibility]
9 AppKit -[NSWindow makeKeyAndOrderFront:]
I hit this crash, in a build that I think has the patch from bug 1699936. My browser was lagging a lot while trying to switch "tabs" in crash stats, so I decided to restart the browser. While it was in the middle of shutting down, it hit this crash.
Comment 1•3 years ago
|
||
signature ranks #35 for Thunderbird 100 beta
bp-d7530b60-9a3f-4395-84a7-185290220421
Assignee | ||
Comment 2•3 years ago
|
||
This should have started to drop off now that bug 1699936 was backed out. Wayne, is this so?
Comment 3•3 years ago
|
||
Will have results in about two weeks, with stats from beta 101
Updated•3 years ago
|
Comment 4•3 years ago
|
||
FWIW a firefox 101.0b2 beta crash- May 3 build bp-8716a618-6155-4634-9c5a-806e90220506
Updated•3 years ago
|
Comment 5•3 years ago
|
||
(In reply to Stephen A Pohl [:spohl] from comment #2)
This should have started to drop off now that bug 1699936 was backed out. Wayne, is this so?
The backout was April 25. That change would pick up in beta 101?
Overall, for beta, average crash rate
Assignee | ||
Comment 6•3 years ago
|
||
Then this couldn't have been caused by bug 1699936...
Comment 7•3 years ago
|
||
The volume is pretty low, could we decrease the severity?
Assignee | ||
Updated•3 years ago
|
Comment 8•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Mac on beta (startup)
:spohl, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 9•3 years ago
•
|
||
(In reply to Marco Castelluccio [:marco] from comment #7)
The volume is pretty low, could we decrease the severity?
(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #8)
:spohl, could you consider increasing the severity of this top-crash bug?
🤷♂️
Reporter | ||
Comment 10•3 years ago
|
||
A couple of crashes per build doesn't really seem like cause for alarm (comment 8 says this is top 5 for beta Mac startup crashes). Maybe there should be a lower threshold for crash volume for one of these alerts?
Comment 11•3 years ago
•
|
||
The release reports are throttled so the actual count is like 10x higher? And this is on a minor platform, which means that the chance per user of hitting this crash is proportionally greater? I understand we like S2 bugs to be gone from everyone's radar but I do think this is the kind of thing we actually want to look into...
Comment 12•3 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #10)
A couple of crashes per build doesn't really seem like cause for alarm (comment 8 says this is top 5 for beta Mac startup crashes). Maybe there should be a lower threshold for crash volume for one of these alerts?
Autonag implements the top crash criteria which in this specific case have the following thresholds: "If there's less than 5 crashes per week on a signature, that bug probably still doesn't qualify - same for crashes happening to only 2 or 3 installations."
Comment 13•3 years ago
|
||
https://support.mozilla.org/en-US/questions/1396794#answer-1549032 cites Thunderbird crash bp-32bd076c-494d-4eed-bd73-e2cba0221114
0 XUL nsMenuBarX::Paint()
1 XUL +[WindowDelegate paintMenubarForWindow:]
2 XUL -[WindowDelegate windowDidBecomeMain:]
3 CoreFoundation CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER
4 CoreFoundation ___CFXRegistrationPost_block_invoke
5 CoreFoundation _CFXRegistrationPost
6 CoreFoundation _CFXNotificationPost
7 Foundation -[NSNotificationCenter postNotificationName:object:userInfo:]
8 AppKit -[NSWindow _changeKeyAndMainLimitedOK:]
9 AppKit -[NSMenuWindowManagerWindow _restorePreviousKeyWindowFromSavedProperties]
10 AppKit -[NSMenuWindowManagerWindow _setVisible:]
11 AppKit -[NSWindow _doWindowWillBecomeHidden]
12 AppKit -[NSWindow _reallyDoOrderWindowOutRelativeTo:findKey:forCounter:force:isModal:]
Comment 14•3 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 15•2 years ago
|
||
Firefox crash rate of the past month is tripled compared to April.
And pct of crashes <1min uptime has increased to 67% for past month, up from 56% for April
Comment 16•2 years ago
|
||
example startup crash bp-0a329d6d-3dc9-44fd-acb4-4b0bf0230918
Assignee | ||
Comment 17•2 years ago
|
||
I am reworking our menu bar in bug 1808223 and this crash will hopefully be fixed at the same time. Marking as blocked by bug 1808223 for now.
Comment 18•2 years ago
|
||
New report: Thunderbird Support Forum: https://support.mozilla.org/en-US/questions/1428216
MAC OS X version 10.15.7 Catalina
Crash ID: bp-290856b9-84d7-4075-aada-dfe180231020
Info says using version 102.15.1
I'm going to suggest:
- switch off auto compact
- try manual compact on one folder at time.
- an upgrade to 115.3.3, but that's going to be the user choice. Just to see if problem continues.
Assignee | ||
Comment 19•2 years ago
|
||
The fix that ultimately landed in bug 1808223 will probably not make a difference in the crash rate here.
Assignee | ||
Comment 20•2 years ago
|
||
Updated•2 years ago
|
Assignee | ||
Comment 21•2 years ago
|
||
I will be landing the diagnostic patch next week. Adding leave-open keyword.
Comment 22•2 years ago
|
||
Comment 23•2 years ago
|
||
bugherder |
Comment 24•1 year ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:spohl, maybe it's time to close this bug?
For more information, please visit BugBot documentation.
Assignee | ||
Comment 25•1 year ago
|
||
The improved exception handling shows that this crash occurs when we're attempting to set NSApp.mainMenu
. The exception, "NSGenericException: *** Collection <__NSArrayM: 0x140c82f10> was mutated while being enumerated." indicates that macOS might be in the process of enumerating the menu items when we're attempting to set a new mainMenu
. Let's see if we can fix this with a simple mutex lock.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 26•1 year ago
|
||
Comment 27•1 year ago
|
||
(In reply to Stephen A Pohl [:spohl] from comment #25)
The improved exception handling shows that this crash occurs when we're attempting to set
NSApp.mainMenu
. The exception, "NSGenericException: *** Collection <__NSArrayM: 0x140c82f10> was mutated while being enumerated." indicates that macOS might be in the process of enumerating the menu items when we're attempting to set a newmainMenu
. Let's see if we can fix this with a simple mutex lock.
Which crash report did you see this in?
Is another thread enumerating, or is the same thread enumerating? The stacks in the report should show us.
Assignee | ||
Comment 28•1 year ago
|
||
(In reply to Markus Stange [:mstange] from comment #27)
(In reply to Stephen A Pohl [:spohl] from comment #25)
The improved exception handling shows that this crash occurs when we're attempting to set
NSApp.mainMenu
. The exception, "NSGenericException: *** Collection <__NSArrayM: 0x140c82f10> was mutated while being enumerated." indicates that macOS might be in the process of enumerating the menu items when we're attempting to set a newmainMenu
. Let's see if we can fix this with a simple mutex lock.Which crash report did you see this in?
Is another thread enumerating, or is the same thread enumerating? The stacks in the report should show us.
For example in: https://crash-stats.mozilla.org/report/index/beaf09ca-957f-476c-a21a-47f840240618
The top-most frame points to the try/abort block that only has one statement, which is setting NSApp.mainMenu
.
Comment 30•10 months ago
•
|
||
This is a stretch, but it's possible the (final) fix for bug 1880582 will have an effect here, or even fix this bug's crashes.
Bug 1880582 involved "mutations" that it seemed could only happen as a result of thread contention. It turns out they were caused by run loop nesting. The same could be true here.
How to search for this bug's crashes on builds that contain the (final) fix for bug 1880582:
Assignee | ||
Comment 31•10 months ago
|
||
Thanks, Steven! I'll keep an eye on your search to see if the fix here is still necessary.
Comment 32•10 months ago
|
||
This bug's crashes have disappeared on 133.0b builds, all of which have the (latest) patch for bug 1880582. Given this bug's high crash volume, I think we can say this bug's crashes have been fixed (worked around). What do you think, Stephen?
Assignee | ||
Comment 33•10 months ago
|
||
Agreed! Thank you, Steven!
Closing as fixed by bug 1880582.
Updated•10 months ago
|
Comment 34•10 months ago
•
|
||
Unfortunately, the latest patch for bug 1880582 doesn't seem to have fixed this bug:
bp-be5d3964-cc2e-45a8-8fed-65e8a0241104
That the "mutations'" only easy explanation was run loop nesting made it seem likely. And run loop nesting may yet be involved here. But "early" deallocation of an NSWindow
object doesn't seem to be the trigger.
On the positive side, I noticed that none of this bug's crashes happen on macOS 14 or later. So even without a fix, they should diminish over time.
Updated•10 months ago
|
Updated•10 months ago
|
Comment 35•10 months ago
|
||
I think Stephen's patch from comment #26 is worth a try.
Comment 36•10 months ago
|
||
Comment 37•10 months ago
|
||
Backed out for causing build bustages @ nsCocoaWindow.mm
Backout link: https://hg.mozilla.org/integration/autoland/rev/f6fd362e92218f9ebd6c88348dd43a9a07f6c1e7
Comment 38•10 months ago
|
||
Didn't realize this one was assigned to me. Stephen is working this.
Comment 39•10 months ago
|
||
Assignee | ||
Updated•10 months ago
|
Comment 40•10 months ago
|
||
bugherder |
Updated•10 months ago
|
Comment 41•10 months ago
|
||
The patch landed in nightly and beta is affected.
:spohl, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox133
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 42•10 months ago
|
||
Let's have this ride the trains to give us an opportunity to look out for any further crash reports and/or regressions.
Comment 43•9 months ago
|
||
Stephen's patch (from comment #26) hasn't fixed this bug's crashes :-(
Comment 44•9 months ago
|
||
How to search for this bug's crashes in builds that contain Stephen's patch from comment #26:
Comment 45•9 months ago
•
|
||
I've painstakingly symbolicated the top eight lines of the stack in the App Notes for bp-6e9e5bdb-0ac2-47c8-8166-947610241112. For the record, the dyld shared cache slide for this particular crash is 0x13d7f000
. I figured that out by comparing the known value for (dyld) start + 0x768
(on Intel macOS 13.7.1) to the one in the App Notes crash stack. (By the way, the offsets in Mozilla crash stacks are systematically off by 1.) The line just below these eight is the call to nsMenuBarX::Paint()
at the top of the crash report's stack. I used atos
, plus the output of dyld_shared_cache_util -list -vmaddr
to find which modules were involved from the bare addresses (minus the slide) in the original stack.
__exceptionPreprocess (in CoreFoundation) + 178
objc_exception_throw (in libobjc.A.dylib) + 48
__NSFastEnumerationMutationHandler (in CoreFoundation) + 135
-[NSCarbonMenuImpl _menuLostMainMenuStatus] (in AppKit) + 398
-[NSToolbar _notifyFamily_DidRemoveItemAtIndex:] (in AppKit) + 47
_NSLogOverriddenIdentifierForObject (in AppKit) + 196
-[NSMenuKEUniquer _coreAddItem:] (in AppKit) + 793
-[NSMenu setDelegate:] (in AppKit) + 48
Edit: I redid the symbolication using modules that I'd extracted (as files) from the shared library cache. The result is slightly different, but basically the same.
Edit: See comment #59 below.
Comment 46•9 months ago
|
||
I wonder if we could sidestep this issue by instead using a pattern of copy-modify-replace rather than trying to change the current mainMenu
at all. In other words, nsMenuBarX::Paint
modifies both mNativeMenu
and outgoingMenu
(both of which, I think, are retained pointers to NSApp.mainMenu
) and instead maybe we just make a copy of the menu, modify the copy, and then set the mainMenu to the modified copy -- possibly on a later event loop.
That is more complicated, and perhaps not well-reasoned. I'm also not sure if it would solve the problem if we are in a nested run loop as we think was happening in similar Bug 1880582. Just brainstorming, here...
Comment 47•9 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #45)
I've painstakingly symbolicated the top eight lines of the stack in the App Notes
FWIW, there's a tampermonkey script in bug 1692394 comment 5 but it's not working anymore because the symbolication request is blocked by CSP.
Comment hidden (obsolete) |
Comment 49•9 months ago
|
||
A really simple fix occurs to me.
Stephen, you patched nsCocoaWindow::SetMenuBar()
to defer a call to mMenuBar->Paint()
. But there's still a call to hiddenWindowMenuBar->Paint()
in -[WindowDelegate windowDidResignMain:(NSNotification*)aNotification]
. Maybe you should defer that call, too.
Comment 50•9 months ago
|
||
(Following up comment #45)
I can't make any sense of the symbolicated stack from comment #45. In particular, I can't reproduce the call to -[NSMenu setDelegate:]
. I'll keep playing around with my hook library, but it's going to be a lot of work. Hopefully my suggestion from comment #49 will obviate the need for this work.
Comment 51•9 months ago
|
||
I am able to reproduce the following, which makes a lot more sense as the top of this bug's crash stacks:
(Fri Nov 15 15:38:34 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[6379] MainThread[0x1118d4280] Hook.mm: -[NSCarbonMenuImpl _menuLostMainMenuStatus]
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__menuLostMainMenuStatus] + 0x68
(AppKit) -[NSMenu _setMenuName:] + 0x27e
(AppKit) -[NSApplication setMainMenu:] + 0x15e
(AppKit) -[NSMenu _setMenuName:] + 0x405
(XUL) nsMenuBarX::Paint() + 0xad
...
Maybe the App Notes' stacks are corrupt. Or maybe (and I think this is more likely) there are problems with atos
.
Comment 52•9 months ago
•
|
||
-[NSCarbonMenuImpl _menuLostMainMenuStatus]
calls -[NSFastEnumeration countByEnumeratingWithState:objects:count:]
on the NSCarbonMenuImpl
object's array of menu items. I bet this is where the "mutation" exception is thrown.
Assignee | ||
Updated•9 months ago
|
Comment 53•9 months ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #51)
I am able to reproduce the following, which makes a lot more sense as the top of this bug's crash stacks:
(Fri Nov 15 15:38:34 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[6379] MainThread[0x1118d4280] Hook.mm: -[NSCarbonMenuImpl _menuLostMainMenuStatus] (hook.dylib) PrintStackTrace() + 0x68 (hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__menuLostMainMenuStatus] + 0x68 (AppKit) -[NSMenu _setMenuName:] + 0x27e (AppKit) -[NSApplication setMainMenu:] + 0x15e (AppKit) -[NSMenu _setMenuName:] + 0x405 (XUL) nsMenuBarX::Paint() + 0xad ...
Maybe the App Notes' stacks are corrupt. Or maybe (and I think this is more likely) there are problems with
atos
.
There should be a call to (AppKit) -[NSApplication setMainMenu:]
above (XUL) nsMenuBarX::Paint() + 0xad
and below (AppKit) -[NSMenu _setMenuName:] + 0x405
. The reason there isn't is because (AppKit) -[NSApplication setMainMenu:]
, at its end, jmp
s to (AppKit) -[NSMenu _setMenuName:]
. This puts the two calls at the same offset in the call stack, so that the latter overwrites the former.
Updated•9 months ago
|
Assignee | ||
Comment 54•9 months ago
|
||
Assignee | ||
Comment 55•9 months ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #49)
A really simple fix occurs to me.
Stephen, you patched
nsCocoaWindow::SetMenuBar()
to defer a call tomMenuBar->Paint()
. But there's still a call tohiddenWindowMenuBar->Paint()
in-[WindowDelegate windowDidResignMain:(NSNotification*)aNotification]
. Maybe you should defer that call, too.
Steven, your comment made me try to aggregate the crash reports on their proto signatures, and it turns out that there were a total of five different ways that this crash occurred. My patch addresses the remaining four.
Comment 56•9 months ago
|
||
Someone please look at the comments for this bug's crashes and report back here. I'm trying to figure out how run loop nesting might be involved.
Assignee | ||
Comment 57•9 months ago
|
||
- Several comments indicate crashing upon restart (which seemed to have been a manual close and start, so possibly session restore related?)
- some indicating opening "attachments" (presumably from webmail)
- at least one indicated rapid closing of many tabs/windows that were no longer needed
- one mentioned closing windows during session restore at startup
- some indicate crashes after restart to update
- some say that they are no longer able to open Firefox and see an immediate crash
I can dig for more.
Comment 58•9 months ago
•
|
||
Thanks. I was hoping for something about dragging. Otherwise I don't really know what I'm looking for ... at least not yet.
HookCase gives me the power to see all kinds of things that would otherwise be invisible. But sometimes that leads me down rabbit holes. I'll keep playing with my hook library for this bug. With luck something will turn up that isn't just another rabbit hole.
To this point I've discovered that this bug's "mutation" exceptions are thrown when changes are made to an array of NSMenuItem pointers (what's returned by -[NSMenu itemArray]
) during a call on one of these arrays to -[NSFastEnumeration countByEnumeratingWithState:objects:count:]
. See comment #52. I've also (I think) discovered a way to log all changes to such arrays. So far, though, I haven't seen any of these changes being made on a nested run loop. I'll keep looking.
Comment 59•9 months ago
•
|
||
(Following up comment #45)
I redid my manual symbolication without using atos
, and got a reasonable result. (Instead of using atos
I looked up addresses (minus the slide) in modules loaded into Hopper Disassembler.)
(CoreFoundation) __exceptionPreprocess + 0xe2
(libobjc.A.dylib) objc_exception_throw + 0x30
(CoreFoundation) __NSFastEnumerationMutationHandler + 0x97
(AppKit) -[NSCarbonMenuImpl _destroyMenuRefIfNotCheckedOut] + 0x104
(AppKit) -[NSCarbonMenuImpl _menuLostMainMenuStatus] + 0x1ed
(AppKit) -[NSMenu _setMenuName:] + 0x27e
(AppKit) -[NSApplication setMainMenu:] + 0x15e
(AppKit) -[NSMenu _setMenuName:] + 0x405
Comment 60•9 months ago
•
|
||
I'm still pretty sure run loop nesting comes into this bug, somehow. But, so far at least, my best efforts don't show me how.
In the meantime I've been looking into other stuff, and as best I can tell this bug's crashes happen in either of two circumstances:
- Two or more windows are open and Firefox has the app focus: Do something to focus a (Firefox) window that isn't focused. For example, click on it. Or close the focused window.
- Two or more windows are open and Firefox doesn't have the app focus: Do something to focus a (Firefox) window that wasn't focused when Firefox had the app focus. For example click on it -- which gives the app focus back to Firefox. Or click the red
close
button on the (formerly) focused window -- which doesn't give back the app focus.
The first has nsCocoaWindow::Show()
in the stack trace. The second has nsFocusManager::RaiseWindow()
in the stack trace, except when you close the window without returning the app focus to Firefox (in this case the stack trace has nsCocoaWindow::Show()
).
Comment 61•9 months ago
|
||
I have STR for this bug's crashes! The following works at least 50% of the time in today's mozilla-central nightly on macOS 13.7.1.
- Run Firefox and open 10 or more windows, then quit Firefox.
- Restart Firefox, and as soon as the Firefox menu starts being displayed but before any windows appear, browse quickly and continuously through the menu's components with the mouse held down -- that is, as if you're dragging the mouse through the menus. The windows will appear one by one as you're doing this. Keep going until all of them appear, or you crash.
bp-a8a58c65-2e0d-478c-bba5-750ca0241126
bp-79b95889-f9c2-4794-ac5e-bcab00241126
bp-5035e53e-8923-4a07-925c-4a8a00241126
Comment 62•9 months ago
|
||
Run loop nesting is not involved here. Rather, if -[NSCarbonMenuImpl _destroyMenuRefIfNotCheckedOut]
destroys a MenuRef
, it can change an array of NSMenuItem *
s while it's iterating through them. For example:
(Mon Nov 25 20:01:50 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[5253] MainThread[0x1100ef280] Hook.mm: -[__NSArrayM insertObject:atIndex:]: self (
"<NSWindowRepresentingMenuItem: 0x13de60d60 Firefox Nightly>"
)
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) -[NSObject(NSMutableArraySwizzling) __NSArrayM_insertObject:atIndex:] + 0x10d
(AppKit) _findMenuItemsForWindow + 0xb3
(AppKit) -[NSApplication(NSWindowsMenu) updateWindowsItem:] + 0xf6
(AppKit) -[NSWindow becomeKeyWindow] + 0x2df
(AppKit) -[NSWindow _changeKeyAndMainLimitedOK:] + 0x3e3
(AppKit) -[NSWindow makeKeyWindow] + 0xc1
(AppKit) -[NSMenuWindowManagerWindow _restorePreviousKeyWindowFromSavedProperties] + 0x46
(AppKit) -[NSMenuWindowManagerWindow _setVisible:] + 0x1a1
(AppKit) -[NSWindow _doWindowWillBecomeHidden] + 0x6b
(AppKit) -[NSWindow _reallyDoOrderWindowOutRelativeTo:] + 0xc0
(AppKit) -[NSWindow _reallyDoOrderWindow:] + 0x63
(AppKit) -[NSWindow _doOrderWindow:] + 0x127
(HIToolbox) OrderNSMenuWindowManager(void () block_pointer) + 0x9
(HIToolbox) ShowHidePlatformWindows + 0x176
(HIToolbox) _ShowHideWindows + 0x33d
(HIToolbox) ShowHide + 0x23
(HIToolbox) HideMenuWindow + 0x30
(HIToolbox) HideMenus(MenuSelectData*, __CFArray*) + 0x47
(HIToolbox) MenuChanged(MenuSelectData*, unsigned char, unsigned char) + 0x1e9
(HIToolbox) _CloseMenu + 0xe5
(AppKit) -[NSCarbonMenuImpl _destroyMenuRef] + 0x27
(AppKit) -[NSCarbonMenuImpl _destroyMenuRefIfNotCheckedOut] + 0x59
(hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__destroyMenuRefIfNotCheckedOut] + 0xaa
(AppKit) -[NSCarbonMenuImpl _destroyMenuRefIfNotCheckedOut] + 0x12d
(hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__destroyMenuRefIfNotCheckedOut] + 0xaa
(AppKit) -[NSCarbonMenuImpl _menuLostMainMenuStatus] + 0x1ed
(hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__menuLostMainMenuStatus] + 0x73
(AppKit) -[NSMenu _setMenuName:] + 0x27e
(AppKit) -[NSApplication setMainMenu:] + 0x15e
(AppKit) -[NSMenu _setMenuName:] + 0x405
(XUL) nsMenuBarX::Paint() + 0xad
(XUL) -[WindowDelegate windowDidResignMain:] + 0x56
(CoreFoundation) __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 0x89
(CoreFoundation) ___CFXRegistrationPost_block_invoke + 0x58
(CoreFoundation) _CFXRegistrationPost + 0x218
(CoreFoundation) _CFXNotificationPost + 0x2d9
(Foundation) -[NSNotificationCenter postNotificationName:object:userInfo:] + 0x52
(AppKit) -[NSWindow _changeKeyAndMainLimitedOK:] + 0x413
(AppKit) -[NSWindow makeKeyWindow] + 0xc1
(AppKit) -[NSWindow _makeKeyRegardlessOfVisibility] + 0x4c
(AppKit) -[NSWindow makeKeyAndOrderFront:] + 0x1b
(XUL) nsCocoaWindow::Show(bool) + 0x1f0
(XUL) mozilla::AppWindow::SetVisibility(bool) + 0x81
(XUL) mozilla::AppWindow::OnChromeLoaded() + 0x4a
(XUL) mozilla::AppWindow::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, nsresult) + 0xc6
(XUL) non-virtual thunk to mozilla::AppWindow::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, nsresult) + 0xd
(XUL) nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, nsresult) + 0x10a
(XUL) nsDocLoader::doStopDocumentLoad(nsIRequest*, nsresult) + 0x105
(XUL) nsDocLoader::DocLoaderIsEmpty(bool, mozilla::Maybe<nsresult> const&) + 0x24f
(XUL) nsDocLoader::NotifyDoneWithOnload(nsDocLoader*) + 0x4b
(XUL) nsDocLoader::DocLoaderIsEmpty(bool, mozilla::Maybe<nsresult> const&) + 0x25a
(XUL) nsDocLoader::OnStopRequest(nsIRequest*, nsresult) + 0x1c1
(XUL) nsDocShell::OnStopRequest(nsIRequest*, nsresult) + 0x5a
(XUL) mozilla::net::nsLoadGroup::NotifyRemovalObservers(nsIRequest*, nsresult) + 0xd1
(XUL) mozilla::net::nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, nsresult) + 0x41
(XUL) mozilla::dom::Document::UnblockOnload(bool) + 0xbc
(XUL) mozilla::dom::Document::DispatchContentLoadedEvents() + 0x4b7
(XUL) mozilla::detail::RunnableMethodImpl<mozilla::dom::Document*, void (mozilla::dom::Document::*)(), true, (mozilla::RunnableKind)0>::Run() + 0x1f
(XUL) mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) + 0x9ba
(XUL) mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run() + 0x3e
(XUL) nsThread::ProcessNextEvent(bool, bool*) + 0x43a
(XUL) NS_ProcessPendingEvents(nsIThread*, unsigned int) + 0x7e
(XUL) nsBaseAppShell::NativeEventCallback() + 0x73
(XUL) nsAppShell::ProcessGeckoEvents(void*) + 0xa3
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 0x11
(CoreFoundation) __CFRunLoopDoSource0 + 0x9d
(CoreFoundation) __CFRunLoopDoSources0 + 0xd9
(CoreFoundation) __CFRunLoopRun + 0x394
(CoreFoundation) CFRunLoopRunSpecific + 0x230
(hook.dylib) Hooked_CFRunLoopRunSpecific(__CFRunLoop*, __CFString const*, double, unsigned char) + 0xd1
(HIToolbox) RunCurrentEventLoopInMode + 0x124
(hook.dylib) Hooked_RunCurrentEventLoopInMode(__CFString const*, signed char) + 0x2d
(HIToolbox) ReceiveNextEventCommon + 0xc7
(HIToolbox) _BlockUntilNextEventMatchingListInModeWithFilter + 0x40
(AppKit) _DPSNextEvent + 0x35a
(AppKit) -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x4be
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) _NSHLTBMenuEventProc + 0xd5
(HIToolbox) IsUserStillTracking(MenuSelectData*, unsigned char*) + 0xc2
(HIToolbox) TrackMenuCommon(MenuSelectData&, unsigned char*, SelectionData*, MenuResult*, MenuResult*) + 0x550
(HIToolbox) MenuSelectCore(MenuData*, Point, double, unsigned int, OpaqueMenuRef**, unsigned short*) + 0x18d
(HIToolbox) _HandleMenuSelection2 + 0x1cb
(AppKit) _NSHandleCarbonMenuEvent + 0xd7
(AppKit) _DPSEventHandledByCarbon + 0x36
(AppKit) -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0xb67
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) -[NSApplication run] + 0x24a
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x751
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x768
This is pretty clearly an Apple bug. I'll be looking for a workaround, but I won't have much time during the Thanksgiving holiday.
Comment hidden (obsolete) |
Comment 64•9 months ago
|
||
Here's the stack trace of a "mutation" that actually triggers one of this bug's crashes:
(Mon Nov 25 20:31:54 2024) /Applications/Firefox Nightly.app/Contents/MacOS/firefox[5395] MainThread[0x10ed95280] Hook.mm: -[__NSArrayM removeObjectsInRange:]: self (
"<GeckoNSMenuItem: 0x14e642510 Get Help, ke='Command-?'>",
"<GeckoNSMenuItem: 0x1480ef580 Get Help, ke='Command-?'>"
)
(hook.dylib) PrintStackTrace() + 0x68
(hook.dylib) -[NSObject(NSMutableArraySwizzling) __NSArrayM_removeObjectsInRange:] + 0xf4
(AppKit) -[NSMenuKEUniquer _coreRemoveItem:] + 0xc1
(AppKit) -[NSMenuKEUniquer removeItems:] + 0xdd
(AppKit) +[NSMenu(NSKeyEquivalents) _recursivelyUnregisterMenuForKeyEquivalentUniquing:] + 0x4d
(AppKit) -[NSMenu _setMenuName:] + 0x29d
(AppKit) -[NSApplication setMainMenu:] + 0x15e
(XUL) nsMenuBarX::Paint() + 0xad
(XUL) +[WindowDelegate paintMenubarForWindow:] + 0x72
(XUL) -[WindowDelegate windowDidBecomeMain:] + 0x6d
(CoreFoundation) __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 0x89
(CoreFoundation) ___CFXRegistrationPost_block_invoke + 0x58
(CoreFoundation) _CFXRegistrationPost + 0x218
(CoreFoundation) _CFXNotificationPost + 0x2d9
(Foundation) -[NSNotificationCenter postNotificationName:object:userInfo:] + 0x52
(AppKit) -[NSWindow _changeKeyAndMainLimitedOK:] + 0x422
(AppKit) -[NSWindow makeKeyWindow] + 0xc1
(AppKit) -[NSMenuWindowManagerWindow _restorePreviousKeyWindowFromSavedProperties] + 0x46
(AppKit) -[NSMenuWindowManagerWindow _setVisible:] + 0x1a1
(AppKit) -[NSWindow _doWindowWillBecomeHidden] + 0x6b
(AppKit) -[NSWindow _reallyDoOrderWindowOutRelativeTo:] + 0xc0
(AppKit) -[NSWindow _reallyDoOrderWindow:] + 0x63
(AppKit) -[NSWindow _doOrderWindow:] + 0x127
(HIToolbox) OrderNSMenuWindowManager(void () block_pointer) + 0x9
(HIToolbox) ShowHidePlatformWindows + 0x176
(HIToolbox) _ShowHideWindows + 0x33d
(HIToolbox) ShowHide + 0x23
(HIToolbox) HideMenuWindow + 0x30
(HIToolbox) HideMenus(MenuSelectData*, __CFArray*) + 0x47
(HIToolbox) MenuChanged(MenuSelectData*, unsigned char, unsigned char) + 0x1e9
(HIToolbox) _CloseMenu + 0xe5
(AppKit) -[NSCarbonMenuImpl _destroyMenuRef] + 0x27
(AppKit) -[NSCarbonMenuImpl _destroyMenuRefIfNotCheckedOut] + 0x59
(hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__destroyMenuRefIfNotCheckedOut] + 0xaa
(AppKit) -[NSCarbonMenuImpl _destroyMenuRefIfNotCheckedOut] + 0x12d
(hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__destroyMenuRefIfNotCheckedOut] + 0xaa
(AppKit) -[NSCarbonMenuImpl _menuLostMainMenuStatus] + 0x1ed
(hook.dylib) -[NSObject(NSCarbonMenuImplSwizzling) NSCarbonMenuImpl__menuLostMainMenuStatus] + 0x73
(AppKit) -[NSMenu _setMenuName:] + 0x27e
(AppKit) -[NSApplication setMainMenu:] + 0x15e
(AppKit) -[NSMenu _setMenuName:] + 0x405
(XUL) nsMenuBarX::Paint() + 0xad
(XUL) -[WindowDelegate windowDidResignMain:] + 0x56
(CoreFoundation) __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 0x89
(CoreFoundation) ___CFXRegistrationPost_block_invoke + 0x58
(CoreFoundation) _CFXRegistrationPost + 0x218
(CoreFoundation) _CFXNotificationPost + 0x2d9
(Foundation) -[NSNotificationCenter postNotificationName:object:userInfo:] + 0x52
(AppKit) -[NSWindow _changeKeyAndMainLimitedOK:] + 0x413
(AppKit) -[NSWindow makeKeyWindow] + 0xc1
(AppKit) -[NSWindow _makeKeyRegardlessOfVisibility] + 0x4c
(AppKit) -[NSWindow makeKeyAndOrderFront:] + 0x1b
(XUL) nsCocoaWindow::Show(bool) + 0x1f0
(XUL) mozilla::AppWindow::SetVisibility(bool) + 0x81
(XUL) mozilla::AppWindow::OnChromeLoaded() + 0x4a
(XUL) mozilla::AppWindow::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, nsresult) + 0xc6
(XUL) non-virtual thunk to mozilla::AppWindow::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, nsresult) + 0xd
(XUL) nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, nsresult) + 0x10a
(XUL) nsDocLoader::doStopDocumentLoad(nsIRequest*, nsresult) + 0x105
(XUL) nsDocLoader::DocLoaderIsEmpty(bool, mozilla::Maybe<nsresult> const&) + 0x24f
(XUL) nsDocLoader::NotifyDoneWithOnload(nsDocLoader*) + 0x4b
(XUL) nsDocLoader::DocLoaderIsEmpty(bool, mozilla::Maybe<nsresult> const&) + 0x25a
(XUL) nsDocLoader::OnStopRequest(nsIRequest*, nsresult) + 0x1c1
(XUL) nsDocShell::OnStopRequest(nsIRequest*, nsresult) + 0x5a
(XUL) mozilla::net::nsLoadGroup::NotifyRemovalObservers(nsIRequest*, nsresult) + 0xd1
(XUL) mozilla::net::nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, nsresult) + 0x41
(XUL) mozilla::dom::Document::UnblockOnload(bool) + 0xbc
(XUL) mozilla::dom::Document::DispatchContentLoadedEvents() + 0x4b7
(XUL) mozilla::detail::RunnableMethodImpl<mozilla::dom::Document*, void (mozilla::dom::Document::*)(), true, (mozilla::RunnableKind)0>::Run() + 0x1f
(XUL) mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) + 0x9ba
(XUL) mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run() + 0x3e
(XUL) nsThread::ProcessNextEvent(bool, bool*) + 0x43a
(XUL) NS_ProcessPendingEvents(nsIThread*, unsigned int) + 0x7e
(XUL) nsBaseAppShell::NativeEventCallback() + 0x73
(XUL) nsAppShell::ProcessGeckoEvents(void*) + 0xa3
(CoreFoundation) __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 0x11
(CoreFoundation) __CFRunLoopDoSource0 + 0x9d
(CoreFoundation) __CFRunLoopDoSources0 + 0xd9
(CoreFoundation) __CFRunLoopRun + 0x394
(CoreFoundation) CFRunLoopRunSpecific + 0x230
(hook.dylib) Hooked_CFRunLoopRunSpecific(__CFRunLoop*, __CFString const*, double, unsigned char) + 0xd1
(HIToolbox) RunCurrentEventLoopInMode + 0x124
(hook.dylib) Hooked_RunCurrentEventLoopInMode(__CFString const*, signed char) + 0x2d
(HIToolbox) ReceiveNextEventCommon + 0xc7
(HIToolbox) _BlockUntilNextEventMatchingListInModeWithFilter + 0x40
(AppKit) _DPSNextEvent + 0x35a
(AppKit) -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x4be
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) _NSHLTBMenuEventProc + 0xd5
(HIToolbox) IsUserStillTracking(MenuSelectData*, unsigned char*) + 0xc2
(HIToolbox) TrackMenuCommon(MenuSelectData&, unsigned char*, SelectionData*, MenuResult*, MenuResult*) + 0x550
(HIToolbox) MenuSelectCore(MenuData*, Point, double, unsigned int, OpaqueMenuRef**, unsigned short*) + 0x18d
(HIToolbox) _HandleMenuSelection2 + 0x1cb
(AppKit) _NSHandleCarbonMenuEvent + 0xd7
(AppKit) _DPSEventHandledByCarbon + 0x36
(AppKit) -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0xb67
(XUL) -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 0x7c
(AppKit) -[NSApplication run] + 0x24a
(XUL) -[GeckoNSApplication run] + 0x42
(XUL) nsAppShell::Run() + 0x13e
(XUL) nsAppStartup::Run() + 0x3c
(XUL) XREMain::XRE_mainRun() + 0x751
(XUL) XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x34d
(XUL) XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0xb1
(firefox) main + 0x22f
(dyld) start + 0x768
Comment 65•9 months ago
|
||
My STR from comment #61 don't work in Safari or Chrome -- which may mean that this bug doesn't effect them.
Comment 66•9 months ago
|
||
__NSArrayM
is the concrete implementation (in the CoreFoundation
framework) of the NSMutableArray
class.
Comment 67•9 months ago
•
|
||
My STR from comment #61 don't work on macOS Sonoma (14.7.1) and Sequoia (15.1.1), because the Firefox menu doesn't appear until all the Firefox windows are displayed. This may be why this bug's crashes only happen on macOS Ventura and below. There may be a clue here about how to work around them.
Comment 68•9 months ago
•
|
||
I discovered (by hacking in my hook library) that delivering NSWindowDidResignMainNotification
and NSWindowDidBecomeMainNotification
notifications asynchronously (to WindowDelegate
) makes the crashes go away. I'll try to find out if there's a high level way to do this, suitable for use in Mozilla code.
Comment 69•9 months ago
•
|
||
For what it's worth, the following variant of Stephen's last two patches fixes this bug in my tests:
https://treeherder.mozilla.org/jobs?repo=try&revision=fd761af8f1f0450da26d4e912adeb1b23377c4c3
https://hg.mozilla.org/try/rev/d087c3bc96947e9088f0c46bc653858dfdcac7f8
I backed out his patch from comment #26 and used only part of his patch from comment #54. I tested with a local build, and I'll test again with this tryserver build when/if it finishes. Edit: The tryserver build also fixes this bug's crashes. I tested on macOS 13.7.1.
See comment #71 below for another variant.
Comment 70•9 months ago
|
||
Here's the hook library I've been testing with, as a patch on https://github.com/steven-michaud/HookCase/blob/v9.0.0/HookLibraryTemplate/hook.mm.
Comment 71•9 months ago
•
|
||
(Following up comment #69)
Here's another variant of Stephen's last two patches:
https://treeherder.mozilla.org/jobs?repo=try&revision=468e76cc2d076d987cc8065b5c4022f65619bed0
https://hg.mozilla.org/try/rev/dfef7703281fe2f9d370ac619070d02b7f27df43
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TvCEPXixShyJsCPE4oPFqQ/runs/0/artifacts/public/build/target.dmg
This time I've guaranteed that geckoWindow
and hiddenWindowMenuBar
will still be alive in the "lambdas" to which they are passed. I tried doing this with NS_DispatchToCurrentThread(NS_NewRunnableFunction())
, but was stymied by bug 1421435. So I ended up using an Objective-C block instead.
As best I can tell std::function()
does nothing with the parameters it "captures". So we need to use AddRef()
and Release()
ourselves. But bug 1421435 prevents them from working in a std::function()
lambda. Apple's documentation on block usage is unclear, and partly just wrong. Local variables outside the block that are used in the block, and which don't have the __block
storage type, are copied by value. Looking at XUL
in a disassembler shows this happens with both geckoWindow
and hiddenWindowMenuBar
. So once again we must guarantee their survival ourselves. But Release()
works just fine in a block's lambda (or closure).
Comment 72•9 months ago
|
||
Steven, can you attach the patch you were trying which ran into bug 1421435? It should be possible to write the lambda in such a way that it captures a RefPtr by value, maybe with an extra RefPtr<nsCocoaWindow> self = this;
.
Comment 73•9 months ago
|
||
Markus, you can see my failed patches here:
https://treeherder.mozilla.org/jobs?repo=try&revision=4163567c1e5f1100109adbf2c65d742b56b23d7d
https://treeherder.mozilla.org/jobs?repo=try&revision=b166f4aeb35e071f09697f38cf91b310c6ac2e7b
https://treeherder.mozilla.org/jobs?repo=try&revision=17922e632521f24ebe4f3a66368b87d18d417943
None of them failed in local builds, which all worked fine. I guess the tryserver does extra "linting" (and probably also the servers that do "official" builds).
Comment 74•9 months ago
|
||
I would expect the following to work:
RefPtr<nsCocoaWindow> geckoWindow = [windowDelegate geckoWidget];
if (geckoWindow) {
// We dispatch this in order to prevent crashes when macOS is actively
// enumerating the menu items in `NSApp.mainMenu`. Use `geckoWindow`
// because with it we can handle (in the lambda) the case where
// `nsCocoaWindow::mWindow` has become nil.
NS_DispatchToCurrentThread(NS_NewRunnableFunction(
"PaintMenuBar", [geckoWindow]() mutable -> void {
// `-[[WindowDelegate paintMenubarForWindow:]` does nothing if
// nsCocoaWindow::GetCocoaWindow() returns nil.
[WindowDelegate paintMenubarForWindow:gw->GetCocoaWindow()];
}));
}
The lambda stores a field of type RefPtr<nsCocoaWindow>
, so when the lambda is done running and gets destroyed, the RefPtr's destructor calls Release.
Comment 75•9 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #73)
None of them failed in local builds, which all worked fine. I guess the tryserver does extra "linting" (and probably also the servers that do "official" builds).
I think this linting can be run locally with ./mach static-analysis
but I haven't tried it myself. https://firefox-source-docs.mozilla.org/code-quality/static-analysis/existing.html#mach-static-analysis
Comment 76•9 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #74)
I would expect the following to work:
RefPtr<nsCocoaWindow> geckoWindow = [windowDelegate geckoWidget]; if (geckoWindow) { // We dispatch this in order to prevent crashes when macOS is actively // enumerating the menu items in `NSApp.mainMenu`. Use `geckoWindow` // because with it we can handle (in the lambda) the case where // `nsCocoaWindow::mWindow` has become nil. NS_DispatchToCurrentThread(NS_NewRunnableFunction( "PaintMenuBar", [geckoWindow]() mutable -> void { // `-[[WindowDelegate paintMenubarForWindow:]` does nothing if // nsCocoaWindow::GetCocoaWindow() returns nil. [WindowDelegate paintMenubarForWindow:gw->GetCocoaWindow()]; })); }
The lambda stores a field of type
RefPtr<nsCocoaWindow>
, so when the lambda is done running and gets destroyed, the RefPtr's destructor calls Release.
I'll try this tomorrow. I'll also check XUL
in the disassembler to make sure the lambda really does store (locally) a field of type RefPtr<nsCocoaWindow>
.
Comment 77•9 months ago
•
|
||
It's taken me several days to verify it, and to understand its internal plumbing, but you're right, Markus -- your solution "works" (or one very much like it). In the following I'll work through what really happens when the following code gets compiled and runs. I discovered this by looking at the (Intel) machine code and playing with my HookCase hook library. I also made the machine code more readable by doing a customized debug build (actually an opt build with __attribute__((noinline))
added to lots of functions). What follows is very strange. I assume the compiler must have generated the code that actually runs. There's no way it could have passed muster with even the most lenient reviewer.
Here's a block of code from -[WindowDelegate windowDidBecomeMain:]
:
WindowDelegate* windowDelegate = nil;
if ([[window delegate] isKindOfClass:[WindowDelegate class]]) {
windowDelegate = (WindowDelegate*)[window delegate];
}
printf("-[WindowDelegate windowDidBecomeMain:](1)\n");
RefPtr<nsCocoaWindow> geckoWindow = [windowDelegate geckoWidget];
printf("-[WindowDelegate windowDidBecomeMain:](2)\n");
if (geckoWindow) {
// We dispatch this in order to prevent crashes when macOS is actively
// enumerating the menu items in `NSApp.mainMenu`. Use `geckoWindow`
// because with it we can handle (in the lambda) the case where
// `nsCocoaWindow::mWindow` has become nil.
NS_DispatchToCurrentThread(
NS_NewRunnableFunction("PaintMenuBar", [gw = geckoWindow]() -> void {
// `-[WindowDelegate paintMenubarForWindow:]` does nothing if
// nsCocoaWindow::GetCocoaWindow() returns nil.
printf("-[WindowDelegate windowDidBecomeMain:] lambda(1)\n");
[WindowDelegate paintMenubarForWindow:gw->GetCocoaWindow()];
printf("-[WindowDelegate windowDidBecomeMain:] lambda(2)\n");
}));
}
NS_NewRunnableFunction()
is templated, and is defined as follows:
template <typename Function>
already_AddRefed<mozilla::Runnable> NS_NewRunnableFunction(
const char* aName, Function&& aFunction) {
// We store a non-reference in RunnableFunction, but still forward aFunction
// to move if possible.
return do_AddRef(new mozilla::detail::RunnableFunctionImpl<Function>(
aName, std::forward<Function>(aFunction)));
}
The (also templated) RunnableFunction
class is defined as follows:
template <typename StoredFunction>
class RunnableFunction : public Runnable {
public:
template <typename F>
explicit RunnableFunction(const char* aName, F&& aFunction)
: Runnable(aName), mFunction(std::forward<F>(aFunction)) {}
NS_IMETHOD Run() override {
static_assert(std::is_void_v<decltype(mFunction())>,
"The lambda must return void!");
mFunction();
return NS_OK;
}
private:
StoredFunction mFunction;
};
That they're templated means the compiler replaces them with functions suitable to a specific "type". The replacements only barely conform to the templates.
One of the questions I had going into this was how the capture block ([gw = geckoWindow]
) is interpreted. Is it read as code? The answer is "no". If it were, gw
would be a nsCocoaWindow*
. Instead it's still a RefPtr<nsCocoaWindow>
. The first thing the compiler does is make a copy of gw
(aka geckoWindow
) using a copy constructor (RefPtr<nsCocoaWindow>::RefPtr(RefPtr<nsCocoaWindow> const&)
), which also AddRef()
s the original nsCocoaWindow
object. This second RefPtr<nsCocoaWindow>
object gets passed as the only parameter to the specialized replacement for NS_NewRunnableFunction()
. Yes, there's only one parameter, which doesn't match the types of either of the two parameters in the template.
The specialized replacement for NS_NewRunnableFunction()
first allocates a RunnableFunction
object and initializes it with a constructor whose template form would be RunnableFunction::RunnableFunction(char const*, F&& aFunction)
. Once again, though, it passes only one parameter (RefPtr<nsCocoaWindow>
), which doesn't match either of the template's two parameters. This constructor already knows aName
("PaintMenuBar"
), which it stores in mName
. It feeds the RefPtr<nsCocoaWindow>
parameter to yet another constructor (RefPtr<nsCocoaWindow>::RefPtr(RefPtr<nsCocoaWindow> const&&)
), which makes another copy and forget
s the original, and doesn't do any AddRef()
s. The copy gets stored in the RunnableFunction
object's mFunction
field -- which of course is supposed to have entirely different contents.
Next the specialized replacement for NS_NewRunnableFunction()
creates a RefPtr<RunnableFunction>
object by calling RefPtr<RunnableFunction>::RefPtr(RunnableFunction*)
. Then it calls RefPtr<RunnableFunction>::forget()
and RefPtr<RunnableFunction>::~RefPtr()
, and returns the the RunnableFunction
object returned by forget()
.
NS_DispatchToCurrentThread()
dispatches the RunnableFunction
returned by NS_NewRunnableFunction()
to run later on the current thread (the main thread). RunnableFunction::Run()
does what you'd expect -- no weirdness here. Later RunnableFunction::~RunnableFunction()
gets called, which calls the destructor for the RunnableFunction
's copy of RefPtr<nsCocoaWindow>
, which calls Release()
on its mRawPtr
. Before this happens the original RefPtr<nsCocoaWindow>
(geckoWindow
) goes out of scope and gets destroyed, at which time it calls Release()
on its own mRawPtr
.
The calls to AddRef()
and Release()
on the original nsCocoaWindow
object are balanced -- two of each. So there are no UAFs of this object, and it doesn't get leaked ... at least not with the code I've discussed here.
At times like this I really miss C. Yes, C++ is more powerful than C, and usually more efficient. But C is like fresh, clean water compared to the sludge we see here. What's most irritating, though, is how hard it was it was to dig up the sludge. C is simple and transparent by comparison.
Followup in comment #80.
Comment 78•9 months ago
|
||
Here's my customized debugging build:
https://treeherder.mozilla.org/jobs?repo=try&revision=ca2431ed5bf7ec9b3626a14457636ee84e8b8f57
https://hg.mozilla.org/try/rev/ea3ab0ef60a0c8be1b9cbae5a55ba039effe1adc
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bUpck_ThT2mwL7VmM2Fi0A/runs/1/artifacts/public/build/target.dmg
Comment 79•9 months ago
•
|
||
I've started a (non-debugging) tryserver build made according to my debugging patch:
https://treeherder.mozilla.org/jobs?repo=try&revision=bb9917dfba015e13537dc82c1861cd90382b741d
https://hg.mozilla.org/try/rev/6b25862ee41fa4e5a738c4d18622d28111b51294
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VXndYpiSTWG328k6owmRPA/runs/0/artifacts/public/build/target.dmg
Comment 80•8 months ago
•
|
||
(Following up comment #77)
I've noticed that if the "function" passed to NS_NewRunnableFunction()
(in the original source code) has an empty capture block ([]
), nothing at all is passed to the specialized NS_NewRunnableFunction()
in the code generated by the compiler. The same is true if the capture block only contains non-object values. And if the capture block has more than one object value, still only one parameter is passed to the specialized NS_NewRunnableFunction()
-- but this time it's an array of object values. Only if an object value is a RefPtr<>
is a copy constructor called on it, with the copy stored in the returned RunnableFunction
object. Otherwise no copy is made, and it's stored as is.
So in the machine code that runs when NS_NewRunnableFunction()
is called, the template's parameter list (const char* aName, Function&& aFunction
) is never accurate. Though, to be fair, its template does describe how it should be called in source code.
I'm going to try to find the source for the code that performs these code generations. The first place to look, I suppose, is in the source code for LLVM's clang. But I have a nagging suspicion that it's Mozilla code doing this, after all. It knows about RefPtr<>
, and uses moz_xmalloc()
.
Any ideas, Markus?
Edit: There's evidence clang knows about RefPtr<>
, at least informally.
Comment 81•8 months ago
|
||
I don't share your distrust of the compiler - people store objects with destructors on lambdas in C++ all the time all over the world.
Also, if you have a RefPtr<nsCocoaWindow> geckoWindow
, then writing auto gw = geckoView;
will make gw
also be of type RefPtr<nsCocoaWindow>
.
This is really quite off topic for this bug, but I'll go into your other question too:
For answering questions about specific compiler code generation, I recommend making a reduced testcase and playing around on godbolt.org, in particular with godbolt's Opt Pipeline Viewer. I made a testcase which shows the elimination of the runnable name string argument: https://godbolt.org/z/1r5YnP9fK
You can see in the optimization panel that the %aName
argument of the templated RunnableFunction
constructor goes away during the DeadArgumentEliminationPass
.
I think the compiler can often optimize templated functions more aggressively than non-templated functions - including making their call ABI not match the original signature. Probably because it's easier to prove that the function can't be called from outside the compilation unit.
Comment 82•8 months ago
•
|
||
Thanks for your information, Markus.
You're quite wrong that my inquiry isn't relevant. You seemed to be saying RefPtr<T>
objects "just work" with lambdas. This turns out to be true, but there wasn't any good reason for thinking so. Now I've found one ... or at least have begun to do so.
I now agree that, in principle, code that generates code from templates needn't stick precisely to the template if that doesn't "cause trouble". I had never seen this before, and I'm afraid I over-reacted. What I say about C versus C++ still stands, though. C++ hides all kinds of things in the name of making your code "easier". That approach only stays "easy" until you need to look under the hood.
Comment 83•8 months ago
|
||
Here's a fuller version of Markus's testcase from comment #81. It pays close attention to what happens as a lambda "captures" its arguments.
And here's a Makefile to build it:
CC=/usr/bin/clang
lambda-testcase : lambda-testcase.cpp
$(CC) -o lambda-testcase -std=c++11 -O2 -fno-exceptions -lstdc++ lambda-testcase.cpp
clean :
rm lambda-testcase
This shows that compiler-generated code doesn't perform any special handling for raw pointers, even if they point to objects that have copy constructors. It just passes them by value, like it does with int
values. It does do special handling for RefPtr<>
and std::shared_ptr<>
objects, to keep their referents alive until the lambda has finished running.
This special handling is a bit fragile, though. If you define BROKEN
in my testcase it stops working for both kinds of smart pointers.
The reason clang
knows about RefPtr<>
is that WebKit also uses it.
I still haven't been able to find the code (in clang
or gcc
) that generates "lambda glue". Though I'm pretty sure it happens in the "Semantic Analysis" stage (after "Parsing" and before "Code Generation"), I haven't been able to find a way to log "Semantic Analysis".
Thanks, Markus, for showing me the https://godbolt.org/ site. It didn't help me crack this particular puzzle. But it seems very useful for all kinds of other things.
Comment 84•8 months ago
|
||
I haven't looked in detail but my guess is that you found the reason for why we have a std::remove_reference_t
here: https://searchfox.org/mozilla-central/rev/a5e04104f9c6cfd69f9c855f0b0f665afd444c74/xpcom/threads/nsThreadUtils.h#556-561
I didn't port that part over faithfully to the reduced testcase.
Comment 85•8 months ago
•
|
||
You're right Markus ... I think.
I added the following block to my testcase (also replacing its implementation of NS_NewRunnableFunction()
):
template <class T>
using remove_reference_t = typename std::remove_reference<T>::type;
template <class Function>
using RunnableFunctionImpl =
class RunnableFunction<remove_reference_t<Function>>;
template <typename Function>
already_AddRefed<RunnableFunction<Function>>
NS_NewRunnableFunction(const char* aName, Function&& aFunction) {
return do_AddRef(new RunnableFunctionImpl<Function>(
aName, std::forward<Function>(aFunction)));
}
With this change my testcase compiles (and runs) fine with BROKEN undefined. But with BROKEN defined I get the following compilation errors:
lambda-testcase.cpp:268:26: error: no matching constructor for initialization of 'RunnableFunctionImpl<(lambda at lambda-testcase.cpp:295:13) &>' (aka 'RunnableFunction<(lambda at lambda-testcase.cpp:295:13)>')
return do_AddRef(new RunnableFunctionImpl<Function>(
^
lambda-testcase.cpp:302:20: note: in instantiation of function template specialization 'NS_NewRunnableFunction<(lambda at lambda-testcase.cpp:295:13) &>' requested here
runnable = NS_NewRunnableFunction("MyString", lambda);
^
lambda-testcase.cpp:229:31: note: candidate constructor not viable: expects an rvalue for 2nd argument
__attribute__((noinline)) RunnableFunction(const char* aName, F&& aFunction)
^
lambda-testcase.cpp:228:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 2 were provided
struct RunnableFunction : public Runnable {
^
1 error generated.
make: *** [lambda-testcase] Error 1
The compilation errors are grotesquely misleading. But with this change, at least one way to break the compiler's "lambda glue" refuses to compile.
As I mentioned above in comment #80, the "lambda glue" replacement for the call to NS_NewRunnableFunction()
actually takes just one parameter -- an array of captured object pointers (smart or dumb). So the RunnableFunctionImpl()
rigamarole seems intended to allow only altered calls to NS_NewRunnableFunction()
to be compilable. I suspect that NS_NewRunnableFunction("MyString", lambda)
wasn't altered (when it was still compilable), and this is what broke the compiler's "lambda glue". Edit: I've confirmed this by looking at my "broken" testcase in Hopper Disassembler. The assembly code is much more readable there than at https://godbolt.org/.
I suspect that NS_NewRunnableFunction("MyString", lambda) wasn't altered
It was altered, but so that it took no parameters. So it still didn't match the signature of a "correctly" altered NS_NewRunnableFunction()
.
Comment 86•8 months ago
|
||
Comment 87•8 months ago
|
||
Comment 88•8 months ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #85)
As I mentioned above in comment #80, the "lambda glue" replacement for the call to
NS_NewRunnableFunction()
actually takes just one parameter -- an array of captured object pointers (smart or dumb). So theRunnableFunctionImpl()
rigamarole seems intended to allow only altered calls toNS_NewRunnableFunction()
to be compilable. I suspect thatNS_NewRunnableFunction("MyString", lambda)
wasn't altered (when it was still compilable), and this is what broke the compiler's "lambda glue". Edit: I've confirmed this by looking at my "broken" testcase in Hopper Disassembler. The assembly code is much more readable there than at https://godbolt.org/.I suspect that NS_NewRunnableFunction("MyString", lambda) wasn't altered
It was altered, but so that it took no parameters. So it still didn't match the signature of a "correctly" altered
NS_NewRunnableFunction()
.
This is wrong :-( The compiler errors happen at the "Parsing" stage -- before the "Semantic Analysis" stage. At this point the "lambda glue" hasn't yet been created, and NS_NewRunnableFunction()
still has two parameters. The "RunnableFunctionImpl()
rigamarole" is just a way to make the compiler fussier about the second parameter, aFunction
. Without it the compiler accepts both [T = (lambda at lambda-testcase.cpp:nnn:nn) &]
(the "broken" case) and [T = (lambda at lambda-testcase.cpp:nnn:nn) &&]
(the non-"broken" case). With it the compiler only accepts [T = (lambda at lambda-testcase.cpp:nnn:nn) &&]
.
And so the compiler errors aren't misleading.
Comment 89•8 months ago
|
||
Comment 90•8 months ago
|
||
Default output of lambda-testcase
:
Created RefCountedObject 1
Ran AddRef() on RefCountedObject 1, mRefCnt 1
RefPtr<[T = RefCountedObject *]>::RefPtr(T* aRef) [construct from raw pointer]
Created RefCountedObject 2
Created RefCountedObject 3
*** Creating lambda ***
Ran AddRef() on RefCountedObject 1, mRefCnt 2
RefPtr<[T = RefCountedObject *]>::RefPtr(const RefPtr<T>& aRefPtr) [copy constructor]
Running NS_NewRunnableFunction(): aFunction [T = (lambda at lambda-testcase.cpp:326:13) &&], RunnableFunctionImpl<Function> [T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)>]
Ran AddRef() on RefCountedObject 1, mRefCnt 3
RefPtr<[T = RefCountedObject *]>::RefPtr(const RefPtr<T>& aRefPtr) [copy constructor]
*** Created function ***
Ran AddRef() on [T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *], mRefCnt 1
RefPtr<[T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *]>::RefPtr(T* aRef) [construct from raw pointer]
RefPtr<[T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *]>::forget()
Deleting RefPtr<[T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)>]>
RefPtr<[T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *]>::RefPtr<T>& operator=(already_AddRefed<I>&& aRef) [move assignment]
Deleting RefPtr<[T = RefCountedObject *]>
Running Release() on RefCountedObject 1, mRefCnt 3
Deleting RefPtr<[T = RefCountedObject *]>
Running Release() on RefCountedObject 1, mRefCnt 2
*** Now outside the scope of the original RefPtr<RefCountedObject> ***
Ran AddRef() on [T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *], mRefCnt 2
RefPtr<[T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *]>::RefPtr(const RefPtr<T>& aRefPtr) [copy constructor]
*** Running lambda *** smartPtr [T = const RefPtr<RefCountedObject> &]([T = RefCountedObject *]:1), sharedPtr [T = const std::shared_ptr<RefCountedObject> &]([T = RefCountedObject *]:2), dumbPtr [T = RefCountedObject *const &]:3, answer 42
Deleting RefPtr<[T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *]>
Running Release() on [T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *], mRefCnt 2
Deleting RefPtr<[T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *]>
Running Release() on [T = RunnableFunction<(lambda at lambda-testcase.cpp:326:13)> *], mRefCnt 1
*** Deleting function ***
Deleting RefCountedObject 2
Deleting RefPtr<[T = RefCountedObject *]>
Running Release() on RefCountedObject 1, mRefCnt 1
Deleting RefCountedObject 1
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment 93•8 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #83)
I still haven't been able to find the code (in
clang
orgcc
) that generates "lambda glue". Though I'm pretty sure it happens in the "Semantic Analysis" stage (after "Parsing" and before "Code Generation"), I haven't been able to find a way to log "Semantic Analysis".
Turns out this is wrong, or at least not the full story. The AST ("Abstract Syntax Tree", generated using clang -Xclang -ast-dump -fsyntax-only -std=c++11 lambda-testcase.cpp
) has a LambdaExpr
object. But its NS_NewRunnableFunction()
still has the two parameters you see in the source code. The same is true of the IR ("Intermediate Representation", aka "LLVM assembly language") and the assembly language outputs (and of the machine code), but only if you leave optimization off.
So where does the final implementation of "lamda glue" happen? Probably somewhere in the optimization pipeline, during the "Code Generation" stage.
clang -S -emit-llvm -std=c++11 lambda-testcase.cpp
outputs IR without optimization.
I used clang -S -emit-llvm -std=c++11 -O2 -fno-exceptions lambda-testcase.cpp
to output IR with optimization.
clang -S -std=c++11 lambda-testcase.cpp
outputs assembly language without optimization.
I used clang -S -std=c++11 -O2 -fno-exceptions lambda-testcase.cpp
to output assembly language with optimization.
Comment 94•8 months ago
•
|
||
You can see all optimization stages on godbolt.org. Above the assembly view, click Add new and select Opt pipeline. (It's available if you've picked a recent clang compiler in the compiler dropdown.)
Then you can click through the list of passes in the optimization panel, and find which pass made the change you're interested in. The passes that applied have green text in the list.
Comment 95•8 months ago
|
||
Ah, with your testcase I can't get that panel to work - it always hits a gateway timeout. And running compiler-explorer locally appears to require Linux.
Comment 96•8 months ago
|
||
Comment 97•8 months ago
|
||
Comment 98•8 months ago
|
||
Comment 99•8 months ago
|
||
I finally figured out the best way to find my "lambda glue" is to look at binaries compiled without any optimization. The weirdest things I noted above (in comment #77 and comment #80) are actually due to ordinary optimizations, which have nothing to do (per se) with the lambda glue.
This file shows the alterations made for my testcase. They're actually quite reasonable. I still wish they hadn't been so hard to find. But it took me a while for me to realize that the optimizations were standing in my way, so it was partly my fault.
Comment 100•8 months ago
|
||
bugherder |
Comment 101•8 months ago
|
||
Comment 102•8 months ago
|
||
Objective-C has a well defined ABI for its implementation of blocks. But there doesn't seem to be an equivalent for C++11 lambdas.
However the clang and gcc ABIs (for example) seem to differ only in small details. I got the same output at https://godbolt.org/ compiling with both clang and gcc.
Comment 103•8 months ago
|
||
My STR from comment #61 doesn't "work" on today's mozilla-central nightly -- the first with Stephen's latest patch. It doesn't crash.
Updated•8 months ago
|
Comment 104•8 months ago
|
||
Here's how to search for this bug's crashes on builds that contain Stephen's latest patch. I don't expect to find any, but just in case ...
Comment 105•8 months ago
|
||
Perfherder has detected a talos performance change from push 1543feed065207594a549e9a0d2738e0c39e9d2c.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
12% | twinopen ext+twinopen:twinopen.html | macosx1470-64-shippable | e10s fission stylo webrender-sw | 144.96 -> 127.23 |
11% | twinopen ext+twinopen:twinopen.html | macosx1470-64-shippable | e10s fission stylo webrender | 196.92 -> 176.07 |
8% | twinopen ext+twinopen:twinopen.html | macosx1470-64-shippable | e10s fission stylo webrender | 196.39 -> 180.68 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 43143
For more information on performance sheriffing please see our FAQ.
Updated•3 months ago
|
Description
•