Open Bug 1838947 Opened 2 years ago Updated 9 months ago

Crash in [@ _XMIGPostNotification] on macOS 14

Categories

(Core :: Disability Access APIs, defect)

Unspecified
macOS
defect

Tracking

()

People

(Reporter: mccr8, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [tbird crash])

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/6a0ec4f1-0286-49e1-a72b-b93e20230514

Reason: EXC_GUARD / GUARD_TYPE_MACH_PORT / 0x2000000300000000 / 0x0000000000000000

Top 10 frames of crashing thread:

0  libsystem_kernel.dylib  mach_msg2_trap  
1  libsystem_kernel.dylib  mach_msg2_internal  
2  libsystem_kernel.dylib  mach_msg_overwrite  
3  libsystem_kernel.dylib  mach_msg  
4  HIServices  _XMIGPostNotification  
5  HIServices  _AXUIElementPostNotificationWithInfo  
6  AppKit  _NSAccessibilityNotifyWithAXElement  
7  AppKit  _NSAccessibilityNotify  
8  AppKit  NSAccessibilityPostNotificationForObservedElementWithUserInfo  
9  AppKit  -[NSWindow _reallyDoOrderWindowAboveOrBelow:]  

Not a super huge volume on these. It looks like it is accessibility related.

This older crash includes a url and the comment "page down key" so maybe that will help.

OS: Unspecified → macOS

Weird. This is indeed a11y related, but our a11y engine doesn't seem to be involved. This looks like the a11y object related to the widget itself, which we don't really control apart from overriding some methods.

Eitan, any ideas here?

Severity: -- → S3
Flags: needinfo?(eitan)

I don't know if they're related, but there are lots of crashes with XMIG in the proto signature. (To eliminate the huge numbers of crashes on the oldest versions of macOS, I've confined my search to macOS 11 and above.) Most of these seem to involve accessibility, though only at the OS level. The MIG indicates Mach RPC.

https://crash-stats.mozilla.org/search/?proto_signature=~XMIG&platform_version=%21%5E10.&platform=Mac%20OS%20X&date=%3E%3D2023-03-19T20%3A54%3A00.000Z&date=%3C2023-06-19T20%3A54%3A00.000Z&_facets=signature&_facets=platform_version&_facets=proto_signature&_facets=cpu_arch&_facets=address&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

See Also: → 1777962

Yeah, that all looks internal to AppKit. If we are at fault anywhere is might be in the widget code and have something to do with resizing or moving the window perhaps? Would be cool to try to reproduce but I don't have a current MacOS build.

Flags: needinfo?(eitan)

I've seen this a couple of times in Nightly over the last few days on macOS 14.0. The crash stats indicate that I might not be the only one.

Unfortunately, I don't have any helpful STRs for this, Firefox just randomly crashes.. :/

At least one other application also has these crashes (on macOS 14.X):

https://github.com/mpv-player/mpv/issues/12632

Termination Reason: Namespace GUARD, Code 2305843022098595840

"Code" in hexadecimal is 0x2000000300000000.

Reason: EXC_GUARD / GUARD_TYPE_MACH_PORT / 0x2000000300000000 / 0x0000000000000000

As per rust-minidump code, the last two fields are the crash dump's raw code and subcode. But the "reason" is only reported this way if the flavor (== (code >> 32) & 0x1fffffff) is invalid.

It's a bit of a stretch, but I think the weird code values here may be related to the weird "mach port" values at bug 1853703.

If you interpret the code (0x2000000300000000) as possibly having more than one "flavor" (as Breakpad does), the "reason" would be:

EXC_GUARD / GUARD_TYPE_MACH_PORT / (GUARD_EXC_DESTROY | GUARD_EXC_MOD_REFS) port name: 0 guard identifier: 0

Which may or may not make sense.

Edit: mach_port_guard_exception() is declared as follows in XNU kernel code. reason here means "flavor" in the context of this bug.

void
mach_port_guard_exception(mach_port_name_t        name,
                          __unused uint64_t       inguard,
                          uint64_t                portguard,
                          unsigned                reason)

It's never called with more than one reason. So it seems rust-minidump is right and Breakpad is wrong: GUARD_TYPE_MACH_PORT exceptions can never have more than one "flavor". So the "flavor" in this bug's crash reports is corrupt.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #8)

It's never called with more than one reason. So it seems rust-minidump is right and Breakpad is wrong: GUARD_TYPE_MACH_PORT exceptions can never have more than one "flavor". So the "flavor" in this bug's crash reports is corrupt.

Thanks for your analysis Steven, this prompted me to look at Apple sources and here's what I found. In this stack mach_port_guard_exception() is being called here and it's fed the kGUARD_EXC_INVALID_OPTIONS parameter which I had never encountered before. That lead me here where more values have been declared since we added support to rust-minidump. I'll use this info to update it.

As for the bug these are the only useful comments I could find:

resizing the window

Resizing window when Firefox crashed.

Just minimized window…

Crashed when trying to reposition the window using the Magnet application

Resizing PiP video from YouTube.

I was on primevideo watching a show, then i resized the window from the left

Given that one macOS version (14.0.0 23A344) accounts for more than 80% of the crashes on file this might not be our fault, or at least not entirely.

You're right. The "reason" here should be:

EXC_GUARD / GUARD_TYPE_MACH_PORT / GUARD_EXC_INVALID_OPTIONS port name: 0 guard identifier: 0

I assumed (as Breakpad and rust-minidump also do) that the ExceptionCodeMacGuardMachPortFlavor enum's values are all bitflags.

So flavor isn't corrupt, and this bug probably isn't related to bug 1853703.

In my opinion this is almost certainly an Apple bug. Mozilla may be able to do something about it, but it'll be very difficult to figure out exactly what -- especially if (as is most likely) these crashes remain non-reproducible.

We can wait to see what happens with macOS 14.1. It seems the number of crashes has gone down, but it's only been out a week.

See Also: 1853703

Just for the record, I'm still seeing this crash in Firefox on macOS 14.2.1 (23C71), mostly by switching focus or when moving the window.

However, this probably is indeed not directly related to Firefox, as I'm also seeing crashes in other applications, like the Signal desktop app (based on Electron) that crashed just now with a very similar stacktrace. :/

Summary: Crash in [@ _XMIGPostNotification] → Crash in [@ _XMIGPostNotification] on macOS 14

Haik, can you open a bug on this with Apple?

Flags: needinfo?(haftandilian)

Thanks for the debugging, Steven.

Filed FB13559207.

Firefox and other apps are experiencing crashes in _XMIGPostNotification.
Many reports indicate the problem may be triggered by resizing a window.

Crashes are occurring on macOS 14 including the most recent
version 14.2.1 (as of Jan 24, 2024).

The problem is intermittent. See stack Firefox stack trace below.

Crash reason:
EXC_GUARD / GUARD_TYPE_MACH_PORT / GUARD_EXC_INVALID_OPTIONS port name: 0 guard identifier: 0

Firefox bug report:
https://bugzilla.mozilla.org/show_bug.cgi?id=1838947

Crash report from another 3rd party app:
https://github.com/mpv-player/mpv/issues/12632
Crashes have also been reported in the Signal Electron app.

Firefox crash report:
https://crash-stats.mozilla.org/report/index/0a5f66d7-e97e-4a66-ba35-ce5ef0240117
(auto deletes 6 months after 2024-01-10)

Firefox stack trace:
mach_msg2_trap
mach_msg2_internal
mach_msg_overwrite
mach_msg
_XMIGPostNotification
_AXUIElementPostNotificationWithInfo
_NSAccessibilityNotifyWithAXElement
_NSAccessibilityNotify
NSAccessibilityPostNotificationForObservedElementWithUserInfo
-[NSCocoaMenuImpl viewDidDisappear]
-[NSContextMenuImpl viewDidDisappear]
-[NSPopupMenuWindow _finishClosing:]
__NSFireDelayedPerform
__CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__
__CFRunLoopDoTimer
__CFRunLoopDoTimers
__CFRunLoopRun
CFRunLoopRunSpecific
RunCurrentEventLoopInMode
ReceiveNextEventCommon
_BlockUntilNextEventMatchingListInModeWithFilter
_DPSNextEvent
-[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]
-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]  widget/cocoa/nsAppShell.mm:196
-[NSMenuTrackingSession startRunningMenuEventLoop:]
-[NSContextMenuTrackingSession startMonitoringEventsInMode:]
+[NSContextMenuImpl presentPopup:fromView:withContext:animated:]
_NSPopUpMenu
-[NSCocoaMenuImpl _popUpContextMenu:withEvent:forView:withFont:]
-[NSMenu _popUpContextMenu:withEvent:forView:withFont:]
-[MOZMenuOpeningCoordinator _runMenu] 	widget/cocoa/MOZMenuOpeningCoordinator.mm:105
__NSFireDelayedPerform
__CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__
__CFRunLoopDoTimer
__CFRunLoopDoTimers
__CFRunLoopRun
CFRunLoopRunSpecific
RunCurrentEventLoopInMode
ReceiveNextEventCommon
_BlockUntilNextEventMatchingListInModeWithFilter
_DPSNextEvent
-[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]
-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] widget/cocoa/nsAppShell.mm:196
-[NSApplication run]
-[GeckoNSApplication run] 	widget/cocoa/nsAppShell.mm:174
nsAppShell::Run() 	widget/cocoa/nsAppShell.mm:871
XUL 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:296
XUL 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:5673
XUL 	XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:5882
XUL 	XRE_main(int, char**, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:5938
do_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:227 
main 	browser/app/nsBrowserApp.cpp:445
Flags: needinfo?(haftandilian)
Whiteboard: [tbird crash]
Crash Signature: [@ _XMIGPostNotification] → [@ _XMIGPostNotification] [@ HIServices@0x542f0 ]
Crash Signature: [@ _XMIGPostNotification] [@ HIServices@0x542f0 ] → [@ _XMIGPostNotification] [@ HIServices@0x542f0 ] [@ __abort_with_payload | abort_with_payload_wrapper_internal | abort_with_reason | _objc_fatalv ]
You need to log in before you can comment on or make changes to this bug.