Open Bug 1801419 Opened 1 year ago Updated 6 days ago

[macOS 13 and 14] Crashes on macOS 13 involving Apple's Sidecar functionality

Categories

(Core :: Widget: Cocoa, defect, P3)

x86_64
macOS
defect

Tracking

()

115 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox112 --- wontfix
firefox113 --- wontfix
firefox114 --- fixed
firefox115 --- fixed

People

(Reporter: smichaud, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tbird crash])

Crash Data

Attachments

(6 files)

These started appearing recently in small numbers. They happen on macOS 13 or 13.0.1. The proto signatures show they involve Apple's Sidecar functionality.

So far these have only happened on Intel Macs. But that may not continue to be true.

They don't show up for any of the macOS 13 betas, but it may just be that very few beta testers use Sidecar.

https://crash-stats.mozilla.org/search/?proto_signature=~SidecarDisplayIsSupportedMac&platform=Mac%20OS%20X&date=%3E%3D2022-10-18T21%3A56%3A00.000Z&date=%3C2022-11-18T21%3A56%3A00.000Z&_facets=signature&_facets=platform_version&_facets=cpu_arch&_facets=proto_signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-proto_signature

Also see comment #10 below.

Typical crash stack:

0  libsystem_platform.dylib  _platform_strlen   context
1  None  @0x00007ff7bd08d7ef   cfi
2  Foundation  +[NSString stringWithCString:encoding:]   frame_pointer
3  SidecarCore  SidecarDisplayIsSupportedMac   cfi
4  SidecarCore  updateDevices   cfi
5  AppKit  -[NSWindowSidecarMenuController reloadData]   cfi
6  AppKit  _NSWindowMenuUpdateSidecarItems   cfi
7  AppKit  __46-[NSMenu _performSidebandUpdatersPassingTest:]_block_invoke   cfi
8  AppKit  -[NSMenu _forEachCachedSidebandUpdaterDo:]   cfi
9  AppKit  -[NSMenu _performSidebandUpdatersPassingTest:]   cfi
10  AppKit  -[NSMenu _populateFromSidebandUpdatersOfSign:]   cfi
11  AppKit  -[NSMenu _populateWithEventRef:]   cfi
12  AppKit  -[NSCarbonMenuImpl _carbonPopulateEvent:handlerCallRef:]   cfi
13  AppKit  NSSLMMenuEventHandler   cfi
14  HIToolbox  DispatchEventToHandlers(EventTargetRec*, OpaqueEventRef*, HandlerCallRec*)   cfi
15  HIToolbox  SendEventToEventTargetInternal(OpaqueEventRef*, OpaqueEventTargetRef*, HandlerCallRec*)   cfi
16  HIToolbox  SendEventToEventTargetWithOptions   cfi
17  HIToolbox  SendMenuPopulate(MenuData*, OpaqueEventTargetRef*, unsigned int, double, unsigned int, OpaqueEventRef*, unsigned char, unsigned char*)   cfi
18  HIToolbox  PopulateMenu(MenuData*, OpaqueEventTargetRef*, CheckMenuData*, unsigned int, double)   cfi
19  HIToolbox  Check1MenuForKeyEvent(MenuData*, CheckMenuData*)   cfi
20  HIToolbox  CheckMenusForKeyEvent(MenuData*, CheckMenuData*)   cfi
21  HIToolbox  IsMatchingMenuKeyEvent(MenuData*, OpaqueEventRef*, unsigned int, MenuData**, unsigned short*)   cfi
22  HIToolbox  _IsMenuKeyEvent(MenuData*, OpaqueEventRef*, unsigned int, MenuData**, unsigned short*)   cfi
23  HIToolbox  IsMenuKeyEvent   cfi
24  AppKit  +[NSCarbonMenuImpl _menuItemWithKeyEquivalentMatchingEventRef:inMenu:includingDisabledItems:]   cfi
25  AppKit  _NSFindMenuItemMatchingCommandKeyEvent   cfi
26  AppKit  -[NSMenu performKeyEquivalent:]   cfi
27  XUL  -[GeckoNSMenu performKeyEquivalent:]  widget/cocoa/nsMenuBarX.mm:860  cfi
28  AppKit  routeKeyEquivalent   cfi
29  AppKit  -[NSApplication(NSEvent) sendEvent:]   cfi
30  XUL  -[GeckoNSApplication sendEvent:]  widget/cocoa/nsAppShell.mm:165  cfi
31  AppKit  -[NSApplication _handleEvent:]   cfi
32  AppKit  -[NSApplication run]   cfi
33  XUL  nsAppShell::Run()  widget/cocoa/nsAppShell.mm:801  cfi
34  XUL  nsAppStartup::Run()  toolkit/components/startup/nsAppStartup.cpp:295  cfi
35  XUL  XREMain::XRE_mainRun()  toolkit/xre/nsAppRunner.cpp:5736  cfi
36  XUL  XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)  toolkit/xre/nsAppRunner.cpp:5929  cfi
37  XUL  XRE_main(int, char**, mozilla::BootstrapConfig const&)  toolkit/xre/nsAppRunner.cpp:5985  cfi
38  firefox  do_main(int, char**, char**)  browser/app/nsBrowserApp.cpp:226  inlined
38  firefox  main  browser/app/nsBrowserApp.cpp:430  cfi
39  None  @0x00007ff804d8130f
Crash Signature: [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ]

Needless to say, this is probably an Apple bug.

Blocks: 1773708
Summary: Crashes on macOS 13 involving Apple's Sidecar functionality → [macOS 13] Crashes on macOS 13 involving Apple's Sidecar functionality

I found one of these on macOS 12.6.1:

bp-b39e65f9-434f-47af-8394-53a600221123

All the rest (including those with the new signature) are still on macOS 13.

Crash Signature: [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] → [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ]
Severity: -- → S3
Priority: -- → P3

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 5 desktop browser crashes on Mac on beta

:spohl, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(spohl.mozilla.bugs)
Keywords: topcrash
Flags: needinfo?(spohl.mozilla.bugs)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
See Also: → 1748022
See Also: 1748022

This bug's crashes all seem to be triggered by using some kind of key combination(s). I don't have permission to access the comments. Could someone please check them? I've just found out that doing Cmd-t can trigger loading the SidecarUI and SidecarCore system modules, even on a (local) network which doesn't (so far as I know) have any machines that support Sidecar. So also look out for comments about "opening a tab".

Edit: Actually, it seems like any Cmd-key combination can trigger loading these modules.

I still can't reproduce this bug's crashes. But those at [@ +[NSString stringWithCString:encoding:] ], at least, are definitely an Apple bug. And I may have found a workaround for them:

defaults write com.apple.sidecar.display AllowAllDevices -bool true

As you can see from their addresses, the crashes at [@ +[NSString stringWithCString:encoding:] ] all happen at a page boundary, in the bool SidecarDisplayIsSupportedMac() function in /System/Library/PrivateFrameworks/SidecarCore.framework/SidecarCore, while the code is iterating over a buffer called char *SidecarDisplayIsSupportedMac.unsupportedModels[]. In Objective-C, the code looks something like this:

bool retval = true;
NSString model = SidecarGetModelProperty();
for (int i = 0; i < unsupportedModels_length; ++i) {
  NSString unsupported = [NSString stringWithCString:unsupportedModels[i] encoding:NSMacOSRomanStringEncoding];
  if ([model isEqualToString:unsupported]) {
    retval = false;
    break;
  }
}

On Intel macOS 13 (and only there), one of the strings in SidecarDisplayIsSupportedMac.unsupportedModels[] (usually (always?) the last one) crosses a page boundary. That's where the [@ +[NSString stringWithCString:encoding:] ] crashes happen. None of these strings crosses a page boundary on Apple Silicon macOS 13, or on any variety of macOS 12 or 11. Like I said above, this can only be an Apple bug. The kernel should map in a new page when strlen() (called from +[NSString stringWithCString:encoding:]) crosses a page boundary. That it sometimes doesn't seems like a kernel bug.

The reason my workaround works (if it does) is that it makes SidecarDisplayIsSupportedMac() assume all models are supported. So it no longer needs to iterate over SidecarDisplayIsSupportedMac.unsupportedModels[]. It should be used with caution. I don't know what side effects it may have on machines in Apple's unsupported list.

It'd be really nice if Apple fixed this kernel bug. But they could also work around it by somehow ensuring that C strings never cross a page boundary. Or possibly by using strnlen() instead of strlen().

Whiteboard: Possible workaround in comment 6

Mozilla's jemalloc is much more precise and efficient in its memory allocation (and deallocation). That may explain why this bug's crashes don't seem to happen in Chrome or Safari. Of course it could also be because Google and Apple hide their crash stats :-(

I briefly looked into this bug's crashes with the signature mig_strncpy_zerofill. I don't have a workaround. But they only happen in small numbers, and I don't see any for versions later than macOS 13.2.1. So with luck they're now extinct.

The string involved there is "IOService:/", which is passed to IORegistryFromPath() and io_registry_entry_from_path() as the path parameter. That string doesn't cross a page boundary, and the bad access crashes happen at its very first character. It (like the strings in SidecarDisplayIsSupportedMac.unsupportedModels[]) is instantiated in the __cstrings section of the __TEXT segment (of the SidecarCore framework). It's very weird that accessing anything in the __TEXT segment should trigger an unresolved page fault. But it's nonetheless true that the sections that contain actual code (__text and __stubs) are on different pages than the __cstring section. This would explain how these sections could be mapped in without the __cstring section having been mapped in. It's still a bug, though -- presumably the same kernel bug that causes the +[NSString stringWithCString:encoding:] crashes.

It occurred to me these crashes might be caused by the SidecarCore module having been unloaded. But a significant number of them have uptimes of 30 seconds or less, and I haven't seen any crash reports whose modules tab lists unloaded modules. So I think it's very unlikely the SidecarCore module has been unloaded.

Here's a search for crashes that, like this bug's, are failed memory accesses at addresses inside the dyld shared cache:

https://crash-stats.mozilla.org/search/?reason=~KERN_MEMORY_ERROR&address=%40.%2A7ff.%7B9%2C9%7D&platform=Mac%20OS%20X&date=%3E%3D2023-03-14T17%3A29%3A00.000Z&date=%3C2023-04-14T17%3A29%3A00.000Z&_facets=signature&_facets=platform_version&_facets=proto_signature&_facets=address&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform_version&_columns=address#facet-signature

The vast majority belong here, and show that the SidecarCore module is somehow "special". But there are also a few others. At some point I'll work through the list to find out which other modules are effected by Apple's presumed kernel bug.

Edit: These are all on Intel hardware. But that's because I searched for them using their pattern on Intel boxes. The dyld shared cache addresses on Apple Silicon hardware aren't distinctive enough to do the same kind of search with cpu arch has terms arm64.

(Following up comment #5)

You know, it's odd that a module that belongs to the dyld shared cache doesn't get loaded into the firefox process along with its other constituent modules, like libobjc.A.dylib and AppKit.

It's just possible that Firefox could work around these crashes by explicitly linking against /System/Library/PrivateFrameworks/SidecarCore.framework/SidecarCore. dlopen() might not work. It's already being used by the AppKit framework (via _NSSoftLinkingLoadFramework()) to load /System/Library/PrivateFrameworks/SidecarUI.framework/SidecarUI, which itself is explicitly linked against SidecarCore (from the results of otool -Lv SidecarUI).

But in this context dlopen() is being called with mode == RTLD_FIRST. That might be the problem. If so, Firefox could work around these crashes by using RTLD_NOW, or perhaps RTLD_LAZY.

Whiteboard: Possible workaround in comment 6 → Possible workarounds in comment 6 and comment 12

I've created a patch that makes Firefox (and Thunderbird) explicitly link against the SidecarCore private framework. I don't know that it will fix this bug's crashes. But I think there's a good chance it might, and in itself the patch is completely harmless. macOS 10.12 and 10.13 don't have this framework, so I've used -weak_framework to generate the link.

This bug's crashes all happen when code (in SidecarCore or elsewhere) tries to access data in the SidecarCore framework, and the accesses fail with the "reason" set to EXC_BAD_ACCESS / KERN_MEMORY_ERROR. The entire framework has already been mapped into virtual memory -- otherwise the "reason" would be EXC_BAD_ACCESS / KERN_INVALID_ADDRESS. And it should all be backed by usable physical memory, so the kernel should just map it in when a page fault happens. But sometimes this doesn't happen, and the page fault is passed back to user space as a fatal error. The SidecarCore framework is in the dyld shared cache, and there's some indication these crashes are more likely in that case, and possibly only happen in that case. And this framework is loaded dynamically (on the first Cmd-key combination), which may also make the crashes more likely. In any case this is pretty clearly an Apple kernel bug. So there's nothing we can do about it directly.

But we might be able to find a workaround. I've started with the simplest and least invasive. These crashes are somewhat rare, especially on mozilla-central. So it will take a while to find out whether or not my patch works. We may not know for sure until it's spent a few weeks on a beta branch.

Assignee: nobody → smichaud
Status: NEW → ASSIGNED

I've made a tryserver build with my patch. Anyone here who sees these crashes, please try it out:

https://treeherder.mozilla.org/jobs?repo=try&revision=89c68ff8cb06a0db8011aa2b49f85d15939140ec
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UZVLv-AnTNqmCLxWir8ALw/runs/0/artifacts/public/build/target.dmg

I've also done a full set of tests. There don't seem to be any non-spurious failures:

https://treeherder.mozilla.org/jobs?repo=try&revision=04758afa2a9e5158211fc5496433752d6d3cd066

Edit: Forgot to mention that after you download target.dmg (above), you need to run xattr -c target.dmg on it before you open it. Otherwise the Firefox Nightly app you install from it will be unusable.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #13)

The SidecarCore framework is in the dyld shared cache, and there's some indication these crashes are more likely in that case, and possibly only happen in that case.

https://crash-stats.mozilla.org/search/?reason=~KERN_MEMORY_ERROR&platform_version=%5E11.&platform_version=%5E12.&platform_version=%5E13.&platform_version=%5E10.15&platform_version=%5E10.14&platform=Mac%20OS%20X&date=%3E%3D2023-03-13T17%3A39%3A00.000Z&date=%3C2023-04-13T17%3A39%3A00.000Z&_facets=signature&_facets=platform_version&_facets=proto_signature&_facets=address&_facets=cpu_arch&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform_version&_columns=address#facet-address

Edit: The addresses (in the list above) starting with 0x7fff and not containing 12 significant digits don't belong to the dyld shared cache. The easiest way to find out the dyld shared cache's (unslid) address range on a given machine and version of macOS is to run dyld_shared_cache_util -list -vmaddr [shared cache file] or dyld_shared_cache_util -map [shared cache file]. The only way to get this utility is to build it yourself. For recent versions of macOS, follow the instructions at bug 1661771 comment #24.

Edit: Oops, I was wrong about 0x7fff -- the dyld shared cache addresses in macOS 10.12 through 11 all start with this value.

https://crash-stats.mozilla.org/search/?reason=~KERN_MEMORY_ERROR&platform_version=%5E11.&platform_version=%5E12.&platform_version=%5E13.&platform_version=%5E10.15&platform_version=%5E10.14&address=%40.%2A7ff.%7B9%2C9%7D&platform=Mac%20OS%20X&date=%3E%3D2023-03-14T17%3A15%3A00.000Z&date=%3C2023-04-14T17%3A15%3A00.000Z&_facets=signature&_facets=platform_version&_facets=proto_signature&_facets=address&_facets=cpu_arch&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform_version&_columns=address#facet-address

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:

  • Top 5 desktop browser crashes on Mac on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash
Attachment #9328403 - Attachment description: Bug 1801419 - Explicitly link SidecarCore framework. r=mstange → Bug 1801419 - Explicitly link SidecarCore framework. r=mac-reviewers
Pushed by smichaud@pobox.com:
https://hg.mozilla.org/integration/autoland/rev/900d7e4af889
Explicitly link SidecarCore framework. r=mac-reviewers,spohl
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 114 Branch

Doesn't look like we're seeing any Nightly crash reports - want to nominate this for Beta uplift so we can see how it looks there?

Flags: needinfo?(smichaud)

Comment on attachment 9328403 [details]
Bug 1801419 - Explicitly link SidecarCore framework. r=mac-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: It will take longer to find out if my workaround really works. There are many more beta users than nightly users.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Any problems caused by this patch should be immediately apparent, but so far nothing has happened.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(smichaud)
Attachment #9328403 - Flags: approval-mozilla-beta?

Comment on attachment 9328403 [details]
Bug 1801419 - Explicitly link SidecarCore framework. r=mac-reviewers

Approved for 113.0b9, can't hurt to try!

Attachment #9328403 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

My workaround doesn't work :-(

bp-e3619595-2fc7-44ae-a8f0-20fb00230429
bp-d1e13224-e72f-4640-ad90-0f0cf0230429

Let me see what else I can come up with. In the meantime we can just leave the workaround in place -- it's not doing any harm.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 114 Branch → ---

For my own future reference:

In almost all of this bug's crash reports, SidecarCore and SidecarUI are the last modules loaded (or SidecarUI alone, for crashes in builds with my patch). But there are a few where this isn't true. Here are a couple of examples:

bp-b03b5dda-2823-4757-bc0f-e53e60230428
bp-58ac963c-885f-434b-b004-f14930230429

Above, in comment #6 and comment #8, I analyzed the crashes [@ +[NSString stringWithCString:encoding:] and [@ mig_strncpy_zerofill ]. Both happen accessing C strings (or parts of them) in the SidecarCore framework's __TEXT segment's __cstring section.

Now I've looked the [@ _mapStrHash ] crashes. They happen accessing the beginning of another C string in the same section: "SidecarDisableDevices".

Edit: Oops, it's "SidecarTransferDelegate", and it's in the __TEXT segment's __objc_classname section (which is less than one page (4096 bytes) after the __cstring section).

Comment on attachment 9328403 [details]
Bug 1801419 - Explicitly link SidecarCore framework. r=mac-reviewers

Clearing the Beta approval on this to get it off the needs-uplift radar.

Attachment #9328403 - Flags: approval-mozilla-beta+

As best I can tell, I'm very unlikely to find a good workaround for this bug's crashes. I'll say more about this next week.

But I also think we should leave my failed workaround in place for at least a few months. There's a chance (albeit small) that it may reduce the crashes' frequency.

My workaround from comment #6 also doesn't work.

Whiteboard: Possible workarounds in comment 6 and comment 12
Status: REOPENED → NEW
Crash Signature: [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] → [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] [@ prepareMethodLists ]

As best I can tell, I'm very unlikely to find a good workaround for this bug's crashes.

I was a little too pessimistic. I've managed to write an alternative patch. I've also found a way to test my patches.

I'm still not able to reproduce this bug's crashes. But I've written a HookCase hook library that traces page faults on the SidecarCore framework's special C string sections in its __TEXT segment -- those that happen when you press a Cmd-key combination for the first time in a browser session. It works with any version of Firefox, and in fact with any macOS app (including Safari and Chrome).

None of these page faults is fatal (they don't trigger Apple's kernel bug). But they all match this bug's signatures and stack traces. So I have reason to believe that if I prevent them from happening on the first Cmd-key combination, I'll also prevent the crashes.

My new patch explicitly loads the SidecarCore framework as nsAppShell is starting up (in the parent process). It also triggers its initialization. Many of the C strings at which the crashes happen are part of the Objective-C class hierarchy -- class, method and member names. Initializing these classes, and making them part of the class hierarchy, seems to "pin" these strings in place. Afterwards they're rarely, if ever, paged out again.

The string that triggers the most numerous of these crashes ([@ +[NSString stringWithCString:encoding:] ]) isn't part of the Objective-C class hierarchy. So I had to make the initialization a little more aggressive than I might have hoped. It's still pretty minimal, though.

The danger with my patch, of course, is that it will just shift the crashes to when the patch's initialization code runs. That wouldn't make things any worse. But the patch should still bake on the trunk, and (if need be) on beta, for quite a while before it gets into a release.

I've done a tryserver build with my latest patch:

https://treeherder.mozilla.org/jobs?repo=try&revision=98530237f7caa5771a99ae8e6278cf5eec571f27
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VZoqoiz5RrOmD-lz22L-sw/runs/0/artifacts/public/build/target.dmg

Edit: Forgot to mention that my new patch backs out my old patch.

Edit: For the tryserver build to work, you need to run xattr -c target.dmg after you download it.

Here's the hook library I tested with, as a patch on https://github.com/steven-michaud/HookCase/blob/v7.2.0/HookLibraryTemplate/hook.mm.

HookCase supports watchpoints. In their least invasive form, they just trace page faults that happen on a particular range of virtual memory.

Here's the output of my hook library on a mozilla-central build with no patches.

Here's the output of my hook library on a mozilla-central build with my old "explicitly link SidecarCore framework" patch. It's almost identical with the output for no patches. Only the stack trace at (libobjc.A.dylib) _mapStrHash(_NXMapTable*, void const*) + 0x7 is missing.

Here's the output of my hook library on a mozilla-central build with my new "explicitly load and initialize SidecarCore framework" patch. There aren't any traces of page faults.

All my tests, above, were done on the current version of macOS 13 -- 13.3.1 (a) build 22E772610a.

Using my hook library, I found that the current versions of Safari and Chrome both seem to be susceptible to this bug's crashes. I also found a bug report on Safari. It claims that only "unsupported Macs" are effected -- presumably ones that don't actually support macOS 13 (Ventura). I don't know if this is true, but it's an interesting possibility.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #37)

Using my hook library, I found that the current versions of Safari and Chrome both seem to be susceptible to this bug's crashes. I also found a bug report on Safari. It claims that only "unsupported Macs" are effected -- presumably ones that don't actually support macOS 13 (Ventura). I don't know if this is true, but it's an interesting possibility.

There's something to this. I just looked at the four most recent crashes in the list generated by search from comment #10. All are on Macs that are running Ventura but don't support it:

bp-ef4954e0-9102-402b-ad2a-b77fc0230505 iMac14,1
bp-9d47c8be-8f68-4aa0-a0c9-22af50230505 MacBookPro11,1
bp-8b981d16-aec7-418d-bc7c-303950230504 Macmini7,1
bp-f5187a63-2ae0-43ab-8c7b-f48de0230504 MacBookPro11,4

It's really too bad this kind of search can't be automated.

Edit: My guess is that OpenCore Legacy Patcher is running on all these machines.

I've got an old MacBook Pro (model id MacBookPro11,5) that's not supported by Apple for macOS 13 (Ventura), but is supported by OCLP (the OpenCore Legacy Patcher). I'm waiting on an order for an external USB SSD. Once it arrives I'll try installing Ventura on that machine and see whether or not I can use it to reproduce this bug's crashes. If I can, I'll also use it to test my latest patch.

I managed to use OCLP to install macOS 13.3.1 to my mid-2015 MacBook Pro (model id MacBookPro11,5) -- which Apple says doesn't support Ventura. I tested with its default settings and with Sidecar support disabled -- but neither way am I able to reproduce this bug's crashes. Moreover HookCase is incompatible with OCLP, at least for the time being. So I've been unable to learn much about how Sidecar's binaries operate in such a configuration.

I'll keep playing with this, but for the moment I'm stuck. My second patch is pretty clearly better than my first one, but I still can't be sure that it will work.

I managed to get HookCase working with OCLP. It shows me that the SidecarCore framework's initialization works almost identically in this environment to how it does on "native" macOS Ventura 13.3.1. But when Sidecar is disabled, I don't get watchpoint hits for the worst of the crash stacks ([@ +[NSString stringWithCString:encoding:] ]). So if anyone here is using OCLP and sees these crashes, you really can stop them by disabling Sidecar.

As with most things OCLP, it's not exactly clear from their docs how to do this. Here are the steps:

  1. Run the OpenCore Legacy Patcher app and choose "Settings", then "Misc Settings"
  2. Click on the drop-down list under "Feature Unlock Status" and choose either "Partially enabled (No Airplay/Sidecar)" or "Disabled".
  3. Return to the main menu and choose "Build and Install OpenCore". Follow the prompts.

I'm still not able to reproduce this bug's crashes, and probably never will be. My second patch is worth a try, though.

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
Crash Signature: [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] [@ prepareMethodLists ] → [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] [@ prepareMethodLists ] [@ +[NSMethodSignature signatureWithObjCTypes:] ]
Pushed by smichaud@pobox.com:
https://hg.mozilla.org/integration/autoland/rev/42416f96e9b7
Explicitly load and initialize SidecarCore framework. r=mac-reviewers,spohl

Backed out for causing build bustage on nsAppShell.mm

Backout link

Push with failures

Failure log

Flags: needinfo?(smichaud)

My patch worked fine in a local build and on the tryserver.

I'll try to guess what I changes I need to make, then resubmit my patch.

Flags: needinfo?(smichaud)

Adding #include <dlfcn.h> to nsAppShell.mm should do the trick, I'd think.

Attachment #9331646 - Attachment description: Bug 1801419 - Explicitly load and initialize SidecarCore framework. → WIP: Bug 1801419 - Explicitly load and initialize SidecarCore framework.

OK, Markus, that's what I've done. I noticed this in several other cocoa/widgets files, so I figured it was probably OK.

I'll wait for you or Stephen to review my change, then reland my patch.

Attachment #9331646 - Attachment description: WIP: Bug 1801419 - Explicitly load and initialize SidecarCore framework. → Bug 1801419 - Explicitly load and initialize SidecarCore framework.
Pushed by smichaud@pobox.com:
https://hg.mozilla.org/integration/autoland/rev/9b707876c054
Explicitly load and initialize SidecarCore framework. r=mac-reviewers,spohl
Status: NEW → RESOLVED
Closed: 10 months ago10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch

The patch landed in nightly and beta is affected.
:smichaud, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox114 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(smichaud)

Once again, I think it'd be worthwhile to know fairly soon whether or not my second patch works. So I'll be nominating it for beta approval.

Flags: needinfo?(smichaud)

Comment on attachment 9331646 [details]
Bug 1801419 - Explicitly load and initialize SidecarCore framework.

Beta/Release Uplift Approval Request

  • User impact if declined: It will take longer to find out whether or not my patch works. Beta has many more users than Nightly.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The risk is low. At worst, if the patch doesn't work, the crashes will continue as before.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9331646 - Flags: approval-mozilla-beta?

One crash has happened on a build with my second patch, which matches one of this bug's signatures:

bp-80f87013-5237-4bf1-a7d0-34f7f0230519

But it's not a SidecarCore crash, and so isn't relevant to this bug.

Edit: I took another look at this, and I was wrong -- it is relevant. The function from my second patch (PinSidecarCoreTextCStringSections()) is in the crash stack. So this bug's crashes may just end up being shifted from one place to another. I'll keep an eye on this.

Comment on attachment 9331646 [details]
Bug 1801419 - Explicitly load and initialize SidecarCore framework.

Approved for 114.0b7.

Attachment #9331646 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Things aren't looking good :-(

https://crash-stats.mozilla.org/search/?proto_signature=~PinSidecarCoreTextCStringSections&platform=Mac%20OS%20X&date=%3E%3D2023-05-15T19%3A04%3A00.000Z&date=%3C2023-05-22T19%3A04%3A00.000Z&_facets=signature&_facets=platform_version&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

There are now five crash reports with PinSidecarCoreTextCStringSections() in their proto signatures -- including three that I missed because of problems with the symbol server upload for macOS 13.4 (which came out a few days ago).

Looks like another failed patch. But it'll be at least a week before I know for sure.

Edit: By comparing its stacks with my hook library's output, it seems that the libsystem_kernel.dylib@0x2972 signature translates to mig_strncpy_zerofill.

Looks like another failed patch. But it'll be at least a week before I know for sure.

It hasn't been a week yet. But there's one thing I've already noticed: With no patch, and with my first patch, crashes with the [@ +[NSString stringWithCString:encoding:] ] signature are overwhelmingly the most common. With my second patch their proportion is far smaller -- currently just one out of ten. That could mean that my second patch is going to eliminate a lot of this bug's crashes.

Crash Signature: [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] [@ prepareMethodLists ] [@ +[NSMethodSignature signatureWithObjCTypes:] ] → [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] [@ prepareMethodLists ] [@ +[NSMethodSignature signatureWithObjCTypes:] ] [@ +[SidecarService minimumRapportVersion] ]

As best I can tell my second patch hasn't made any difference in the frequency of this bug's crashes. It hasn't made them more frequent, but it hasn't made them less so, either.

Here are the stats from my search in comment #56 over the last month. Where the signatures aren't symbolicated, I've folded them in with those that are. The crashes [@ +[NSString stringWithCString:encoding:] ] are once again overwhelmingly most common.

[@ +[NSString stringWithCString:encoding:] ]   96
[@ mig_strncpy_zerofill ]                      17
[@ _mapStrHash ]                               13
[@ prepareMethodLists ]                         1

Though my second patch has failed, I'm inclined to leave it in place, just to be able to watch the statistics going forward. Without more information there's nothing more I can do. But at some point Socorro will start collecting (and displaying) kernel boot args on macOS. That may be just the information we need.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → NEW
Regressions: 1841445

See bug 1841445 comment #2:

This isn't really a regression. The same crashes would have happened without my patch, but in a different place in the Mozilla code. Also, the crashes [@ libsystem_kernel.dylib@0x2972] are same as those [@ mig_strncpy_zerofill ]. The non-symbolicated signature is for a macOS 13.5.0 beta. I only scrape symbols for release versions and betas of the next major release (currently macOS 14).

Here's a search, copied from bug 1801419 comment #56 and updated, which captures all that bug's crashes for the last month (in builds with my second patch), including those with non-symbolicated signatures:

https://crash-stats.mozilla.org/search/?proto_signature=~PinSidecarCoreTextCStringSections&platform=Mac%20OS%20X&date=%3E%3D2023-06-03T19%3A26%3A00.000Z&date=%3C2023-07-03T19%3A26%3A00.000Z&_facets=signature&_facets=platform_version&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

And here's these stats with the non-symbolicated signatures folded in with the symbolicated ones. Compare them to those from bug 1801419 comment #58.

[@ +[NSString stringWithCString:encoding:] ]  200
[@ mig_strncpy_zerofill ]                      39
[@ _mapStrHash ]                               12
[@ prepareMethodLists ]                         7

They do seem higher, but that has nothing to do with this bug report.

Crash Signature: [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] [@ prepareMethodLists ] [@ +[NSMethodSignature signatureWithObjCTypes:] ] [@ +[SidecarService minimumRapportVersion] ] → [@ +[NSString stringWithCString:encoding:] ] [@ mig_strncpy_zerofill ] [@ _mapStrHash ] [@ prepareMethodLists ] [@ +[NSMethodSignature signatureWithObjCTypes:] ] [@ +[SidecarService minimumRapportVersion] ] [@ libsystem_platform.dylib@0xf48 ] [@ li…
No longer regressions: 1841445
See Also: → 1841445
Duplicate of this bug: 1841445
See Also: 1841445
Crash Signature: libsystem_kernel.dylib@0x2972 ] → libsystem_kernel.dylib@0x2972 ] [@ __CFStringHash]
Depends on: 1835881

Almost half of the crashes here have the following boot args keepsyms=1 debug=0x100 ipc_control_port_options=0 -nokcmismatchpanic. Those flags always appear together. The rest have none (or none where reported in the crash).

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #63)

Almost half of the crashes here have the following boot args keepsyms=1 debug=0x100 ipc_control_port_options=0 -nokcmismatchpanic. Those flags always appear together. The rest have none (or none where reported in the crash).

These are standard setting for OCLP. So this bug's crashes probably aren't triggered by any mac boot arg, or combination thereof.

It's puzzling, though, that some crashes happen without any mac boot args. That could mean that this bug's crashes don't always involve OCLP.

I'm giving up here. My last hope was that the mac boot args in this bug's crash reports would indicate their cause. That didn't pan out.

Mozilla can either keep my (failed) patch or back it out. Either way the crash frequency should stay the same. The one advantage of keeping my patch is that it makes it easier to search for all this bug's crashes:

https://crash-stats.mozilla.org/search/?proto_signature=~PinSidecarCoreTextCStringSections&platform=Mac%20OS%20X&date=%3E%3D2023-11-20T21%3A21%3A00.000Z&date=%3C2024-02-20T21%3A21%3A00.000Z&_facets=signature&_facets=platform_version&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Assignee: smichaud → nobody

(In reply to Steven Michaud [:smichaud] (Retired) from comment #64)

It's puzzling, though, that some crashes happen without any mac boot args. That could mean that this bug's crashes don't always involve OCLP.

There's also the possibility that we're failing to capture the boot args, I don't think we have a way to tell apart instances where the args are empty and instances where we couldn't read them.

(In reply to Gabriele Svelto [:gsvelto] from comment #66)

(In reply to Steven Michaud [:smichaud] (Retired) from comment #64)

It's puzzling, though, that some crashes happen without any mac boot args. That could mean that this bug's crashes don't always involve OCLP.

There's also the possibility that we're failing to capture the boot args, I don't think we have a way to tell apart instances where the args are empty and instances where we couldn't read them.

It's odd that we'd fail 50% of the time to capture the boot args. But if that's what's happening, your patch for bug 1878428 could fix it.

Whiteboard: [tbird crash]
Summary: [macOS 13] Crashes on macOS 13 involving Apple's Sidecar functionality → [macOS 13 and 14] Crashes on macOS 13 involving Apple's Sidecar functionality
You need to log in before you can comment on or make changes to this bug.