[macOS 13 and 14] Crashes on macOS 13 involving Apple's Sidecar functionality
Categories
(Core :: Widget: Cocoa, defect, P3)
Tracking
()
People
(Reporter: smichaud, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [tbird crash])
Crash Data
Attachments
(6 files)
These started appearing recently in small numbers. They happen on macOS 13 or 13.0.1. The proto signatures show they involve Apple's Sidecar functionality.
So far these have only happened on Intel Macs. But that may not continue to be true.
They don't show up for any of the macOS 13 betas, but it may just be that very few beta testers use Sidecar.
Also see comment #10 below.
Typical crash stack:
0 libsystem_platform.dylib _platform_strlen context
1 None @0x00007ff7bd08d7ef cfi
2 Foundation +[NSString stringWithCString:encoding:] frame_pointer
3 SidecarCore SidecarDisplayIsSupportedMac cfi
4 SidecarCore updateDevices cfi
5 AppKit -[NSWindowSidecarMenuController reloadData] cfi
6 AppKit _NSWindowMenuUpdateSidecarItems cfi
7 AppKit __46-[NSMenu _performSidebandUpdatersPassingTest:]_block_invoke cfi
8 AppKit -[NSMenu _forEachCachedSidebandUpdaterDo:] cfi
9 AppKit -[NSMenu _performSidebandUpdatersPassingTest:] cfi
10 AppKit -[NSMenu _populateFromSidebandUpdatersOfSign:] cfi
11 AppKit -[NSMenu _populateWithEventRef:] cfi
12 AppKit -[NSCarbonMenuImpl _carbonPopulateEvent:handlerCallRef:] cfi
13 AppKit NSSLMMenuEventHandler cfi
14 HIToolbox DispatchEventToHandlers(EventTargetRec*, OpaqueEventRef*, HandlerCallRec*) cfi
15 HIToolbox SendEventToEventTargetInternal(OpaqueEventRef*, OpaqueEventTargetRef*, HandlerCallRec*) cfi
16 HIToolbox SendEventToEventTargetWithOptions cfi
17 HIToolbox SendMenuPopulate(MenuData*, OpaqueEventTargetRef*, unsigned int, double, unsigned int, OpaqueEventRef*, unsigned char, unsigned char*) cfi
18 HIToolbox PopulateMenu(MenuData*, OpaqueEventTargetRef*, CheckMenuData*, unsigned int, double) cfi
19 HIToolbox Check1MenuForKeyEvent(MenuData*, CheckMenuData*) cfi
20 HIToolbox CheckMenusForKeyEvent(MenuData*, CheckMenuData*) cfi
21 HIToolbox IsMatchingMenuKeyEvent(MenuData*, OpaqueEventRef*, unsigned int, MenuData**, unsigned short*) cfi
22 HIToolbox _IsMenuKeyEvent(MenuData*, OpaqueEventRef*, unsigned int, MenuData**, unsigned short*) cfi
23 HIToolbox IsMenuKeyEvent cfi
24 AppKit +[NSCarbonMenuImpl _menuItemWithKeyEquivalentMatchingEventRef:inMenu:includingDisabledItems:] cfi
25 AppKit _NSFindMenuItemMatchingCommandKeyEvent cfi
26 AppKit -[NSMenu performKeyEquivalent:] cfi
27 XUL -[GeckoNSMenu performKeyEquivalent:] widget/cocoa/nsMenuBarX.mm:860 cfi
28 AppKit routeKeyEquivalent cfi
29 AppKit -[NSApplication(NSEvent) sendEvent:] cfi
30 XUL -[GeckoNSApplication sendEvent:] widget/cocoa/nsAppShell.mm:165 cfi
31 AppKit -[NSApplication _handleEvent:] cfi
32 AppKit -[NSApplication run] cfi
33 XUL nsAppShell::Run() widget/cocoa/nsAppShell.mm:801 cfi
34 XUL nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp:295 cfi
35 XUL XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp:5736 cfi
36 XUL XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:5929 cfi
37 XUL XRE_main(int, char**, mozilla::BootstrapConfig const&) toolkit/xre/nsAppRunner.cpp:5985 cfi
38 firefox do_main(int, char**, char**) browser/app/nsBrowserApp.cpp:226 inlined
38 firefox main browser/app/nsBrowserApp.cpp:430 cfi
39 None @0x00007ff804d8130f
| Reporter | ||
Updated•3 years ago
|
| Reporter | ||
Comment 1•3 years ago
|
||
Needless to say, this is probably an Apple bug.
| Reporter | ||
Updated•3 years ago
|
| Reporter | ||
Comment 2•3 years ago
|
||
I found one of these on macOS 12.6.1:
bp-b39e65f9-434f-47af-8394-53a600221123
All the rest (including those with the new signature) are still on macOS 13.
Updated•3 years ago
|
Comment 3•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Mac on beta
:spohl, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Comment 4•3 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
| Reporter | ||
Comment 5•3 years ago
•
|
||
This bug's crashes all seem to be triggered by using some kind of key combination(s). I don't have permission to access the comments. Could someone please check them? I've just found out that doing Cmd-t can trigger loading the SidecarUI and SidecarCore system modules, even on a (local) network which doesn't (so far as I know) have any machines that support Sidecar. So also look out for comments about "opening a tab".
Edit: Actually, it seems like any Cmd-key combination can trigger loading these modules.
| Reporter | ||
Comment 6•2 years ago
•
|
||
I still can't reproduce this bug's crashes. But those at [@ +[NSString stringWithCString:encoding:] ], at least, are definitely an Apple bug. And I may have found a workaround for them:
defaults write com.apple.sidecar.display AllowAllDevices -bool true
As you can see from their addresses, the crashes at [@ +[NSString stringWithCString:encoding:] ] all happen at a page boundary, in the bool SidecarDisplayIsSupportedMac() function in /System/Library/PrivateFrameworks/SidecarCore.framework/SidecarCore, while the code is iterating over a buffer called char *SidecarDisplayIsSupportedMac.unsupportedModels[]. In Objective-C, the code looks something like this:
bool retval = true;
NSString model = SidecarGetModelProperty();
for (int i = 0; i < unsupportedModels_length; ++i) {
NSString unsupported = [NSString stringWithCString:unsupportedModels[i] encoding:NSMacOSRomanStringEncoding];
if ([model isEqualToString:unsupported]) {
retval = false;
break;
}
}
On Intel macOS 13 (and only there), one of the strings in SidecarDisplayIsSupportedMac.unsupportedModels[] (usually (always?) the last one) crosses a page boundary. That's where the [@ +[NSString stringWithCString:encoding:] ] crashes happen. None of these strings crosses a page boundary on Apple Silicon macOS 13, or on any variety of macOS 12 or 11. Like I said above, this can only be an Apple bug. The kernel should map in a new page when strlen() (called from +[NSString stringWithCString:encoding:]) crosses a page boundary. That it sometimes doesn't seems like a kernel bug.
The reason my workaround works (if it does) is that it makes SidecarDisplayIsSupportedMac() assume all models are supported. So it no longer needs to iterate over SidecarDisplayIsSupportedMac.unsupportedModels[]. It should be used with caution. I don't know what side effects it may have on machines in Apple's unsupported list.
It'd be really nice if Apple fixed this kernel bug. But they could also work around it by somehow ensuring that C strings never cross a page boundary. Or possibly by using strnlen() instead of strlen().
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 7•2 years ago
|
||
Mozilla's jemalloc is much more precise and efficient in its memory allocation (and deallocation). That may explain why this bug's crashes don't seem to happen in Chrome or Safari. Of course it could also be because Google and Apple hide their crash stats :-(
| Reporter | ||
Comment 8•2 years ago
|
||
I briefly looked into this bug's crashes with the signature mig_strncpy_zerofill. I don't have a workaround. But they only happen in small numbers, and I don't see any for versions later than macOS 13.2.1. So with luck they're now extinct.
The string involved there is "IOService:/", which is passed to IORegistryFromPath() and io_registry_entry_from_path() as the path parameter. That string doesn't cross a page boundary, and the bad access crashes happen at its very first character. It (like the strings in SidecarDisplayIsSupportedMac.unsupportedModels[]) is instantiated in the __cstrings section of the __TEXT segment (of the SidecarCore framework). It's very weird that accessing anything in the __TEXT segment should trigger an unresolved page fault. But it's nonetheless true that the sections that contain actual code (__text and __stubs) are on different pages than the __cstring section. This would explain how these sections could be mapped in without the __cstring section having been mapped in. It's still a bug, though -- presumably the same kernel bug that causes the +[NSString stringWithCString:encoding:] crashes.
| Reporter | ||
Comment 9•2 years ago
|
||
It occurred to me these crashes might be caused by the SidecarCore module having been unloaded. But a significant number of them have uptimes of 30 seconds or less, and I haven't seen any crash reports whose modules tab lists unloaded modules. So I think it's very unlikely the SidecarCore module has been unloaded.
| Reporter | ||
Comment 10•2 years ago
|
||
A better search for crashes related to this bug:
| Reporter | ||
Comment 11•2 years ago
•
|
||
Here's a search for crashes that, like this bug's, are failed memory accesses at addresses inside the dyld shared cache:
The vast majority belong here, and show that the SidecarCore module is somehow "special". But there are also a few others. At some point I'll work through the list to find out which other modules are effected by Apple's presumed kernel bug.
Edit: These are all on Intel hardware. But that's because I searched for them using their pattern on Intel boxes. The dyld shared cache addresses on Apple Silicon hardware aren't distinctive enough to do the same kind of search with cpu arch has terms arm64.
| Reporter | ||
Comment 12•2 years ago
•
|
||
(Following up comment #5)
You know, it's odd that a module that belongs to the dyld shared cache doesn't get loaded into the firefox process along with its other constituent modules, like libobjc.A.dylib and AppKit.
It's just possible that Firefox could work around these crashes by explicitly linking against /System/Library/PrivateFrameworks/SidecarCore.framework/SidecarCore. dlopen() might not work. It's already being used by the AppKit framework (via _NSSoftLinkingLoadFramework()) to load /System/Library/PrivateFrameworks/SidecarUI.framework/SidecarUI, which itself is explicitly linked against SidecarCore (from the results of otool -Lv SidecarUI).
But in this context dlopen() is being called with mode == RTLD_FIRST. That might be the problem. If so, Firefox could work around these crashes by using RTLD_NOW, or perhaps RTLD_LAZY.
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 13•2 years ago
•
|
||
I've created a patch that makes Firefox (and Thunderbird) explicitly link against the SidecarCore private framework. I don't know that it will fix this bug's crashes. But I think there's a good chance it might, and in itself the patch is completely harmless. macOS 10.12 and 10.13 don't have this framework, so I've used -weak_framework to generate the link.
This bug's crashes all happen when code (in SidecarCore or elsewhere) tries to access data in the SidecarCore framework, and the accesses fail with the "reason" set to EXC_BAD_ACCESS / KERN_MEMORY_ERROR. The entire framework has already been mapped into virtual memory -- otherwise the "reason" would be EXC_BAD_ACCESS / KERN_INVALID_ADDRESS. And it should all be backed by usable physical memory, so the kernel should just map it in when a page fault happens. But sometimes this doesn't happen, and the page fault is passed back to user space as a fatal error. The SidecarCore framework is in the dyld shared cache, and there's some indication these crashes are more likely in that case, and possibly only happen in that case. And this framework is loaded dynamically (on the first Cmd-key combination), which may also make the crashes more likely. In any case this is pretty clearly an Apple kernel bug. So there's nothing we can do about it directly.
But we might be able to find a workaround. I've started with the simplest and least invasive. These crashes are somewhat rare, especially on mozilla-central. So it will take a while to find out whether or not my patch works. We may not know for sure until it's spent a few weeks on a beta branch.
| Reporter | ||
Comment 14•2 years ago
|
||
Updated•2 years ago
|
| Reporter | ||
Comment 15•2 years ago
|
||
For almost all of this bug's crashes the "mac memory pressure" is "normal":
| Reporter | ||
Comment 16•2 years ago
•
|
||
I've made a tryserver build with my patch. Anyone here who sees these crashes, please try it out:
https://treeherder.mozilla.org/jobs?repo=try&revision=89c68ff8cb06a0db8011aa2b49f85d15939140ec
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UZVLv-AnTNqmCLxWir8ALw/runs/0/artifacts/public/build/target.dmg
I've also done a full set of tests. There don't seem to be any non-spurious failures:
https://treeherder.mozilla.org/jobs?repo=try&revision=04758afa2a9e5158211fc5496433752d6d3cd066
Edit: Forgot to mention that after you download target.dmg (above), you need to run xattr -c target.dmg on it before you open it. Otherwise the Firefox Nightly app you install from it will be unusable.
| Reporter | ||
Comment 17•2 years ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #13)
The
SidecarCoreframework is in the dyld shared cache, and there's some indication these crashes are more likely in that case, and possibly only happen in that case.
Edit: The addresses (in the list above) starting with 0x7fff and not containing 12 significant digits don't belong to the dyld shared cache. The easiest way to find out the dyld shared cache's (unslid) address range on a given machine and version of macOS is to run dyld_shared_cache_util -list -vmaddr [shared cache file] or dyld_shared_cache_util -map [shared cache file]. The only way to get this utility is to build it yourself. For recent versions of macOS, follow the instructions at bug 1661771 comment #24.
Edit: Oops, I was wrong about 0x7fff -- the dyld shared cache addresses in macOS 10.12 through 11 all start with this value.
Comment 18•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Mac on beta
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Comment 19•2 years ago
|
||
Comment 20•2 years ago
|
||
| bugherder | ||
Comment 21•2 years ago
|
||
Doesn't look like we're seeing any Nightly crash reports - want to nominate this for Beta uplift so we can see how it looks there?
| Reporter | ||
Comment 22•2 years ago
|
||
Comment on attachment 9328403 [details]
Bug 1801419 - Explicitly link SidecarCore framework. r=mac-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: It will take longer to find out if my workaround really works. There are many more beta users than nightly users.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Any problems caused by this patch should be immediately apparent, but so far nothing has happened.
- String changes made/needed:
- Is Android affected?: No
Comment 23•2 years ago
|
||
Comment on attachment 9328403 [details]
Bug 1801419 - Explicitly link SidecarCore framework. r=mac-reviewers
Approved for 113.0b9, can't hurt to try!
Comment 24•2 years ago
|
||
| bugherder uplift | ||
| Reporter | ||
Comment 25•2 years ago
|
||
My workaround doesn't work :-(
bp-e3619595-2fc7-44ae-a8f0-20fb00230429
bp-d1e13224-e72f-4640-ad90-0f0cf0230429
Let me see what else I can come up with. In the meantime we can just leave the workaround in place -- it's not doing any harm.
Updated•2 years ago
|
| Reporter | ||
Comment 26•2 years ago
|
||
For my own future reference:
In almost all of this bug's crash reports, SidecarCore and SidecarUI are the last modules loaded (or SidecarUI alone, for crashes in builds with my patch). But there are a few where this isn't true. Here are a couple of examples:
bp-b03b5dda-2823-4757-bc0f-e53e60230428
bp-58ac963c-885f-434b-b004-f14930230429
| Reporter | ||
Comment 27•2 years ago
•
|
||
Above, in comment #6 and comment #8, I analyzed the crashes [@ +[NSString stringWithCString:encoding:] and [@ mig_strncpy_zerofill ]. Both happen accessing C strings (or parts of them) in the SidecarCore framework's __TEXT segment's __cstring section.
Now I've looked the [@ _mapStrHash ] crashes. They happen accessing the beginning of another C string in the same section: "SidecarDisableDevices".
Edit: Oops, it's "SidecarTransferDelegate", and it's in the __TEXT segment's __objc_classname section (which is less than one page (4096 bytes) after the __cstring section).
Comment 28•2 years ago
|
||
Comment on attachment 9328403 [details]
Bug 1801419 - Explicitly link SidecarCore framework. r=mac-reviewers
Clearing the Beta approval on this to get it off the needs-uplift radar.
| Reporter | ||
Comment 29•2 years ago
|
||
As best I can tell, I'm very unlikely to find a good workaround for this bug's crashes. I'll say more about this next week.
But I also think we should leave my failed workaround in place for at least a few months. There's a chance (albeit small) that it may reduce the crashes' frequency.
My workaround from comment #6 also doesn't work.
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 30•2 years ago
•
|
||
As best I can tell, I'm very unlikely to find a good workaround for this bug's crashes.
I was a little too pessimistic. I've managed to write an alternative patch. I've also found a way to test my patches.
I'm still not able to reproduce this bug's crashes. But I've written a HookCase hook library that traces page faults on the SidecarCore framework's special C string sections in its __TEXT segment -- those that happen when you press a Cmd-key combination for the first time in a browser session. It works with any version of Firefox, and in fact with any macOS app (including Safari and Chrome).
None of these page faults is fatal (they don't trigger Apple's kernel bug). But they all match this bug's signatures and stack traces. So I have reason to believe that if I prevent them from happening on the first Cmd-key combination, I'll also prevent the crashes.
My new patch explicitly loads the SidecarCore framework as nsAppShell is starting up (in the parent process). It also triggers its initialization. Many of the C strings at which the crashes happen are part of the Objective-C class hierarchy -- class, method and member names. Initializing these classes, and making them part of the class hierarchy, seems to "pin" these strings in place. Afterwards they're rarely, if ever, paged out again.
The string that triggers the most numerous of these crashes ([@ +[NSString stringWithCString:encoding:] ]) isn't part of the Objective-C class hierarchy. So I had to make the initialization a little more aggressive than I might have hoped. It's still pretty minimal, though.
The danger with my patch, of course, is that it will just shift the crashes to when the patch's initialization code runs. That wouldn't make things any worse. But the patch should still bake on the trunk, and (if need be) on beta, for quite a while before it gets into a release.
I've done a tryserver build with my latest patch:
https://treeherder.mozilla.org/jobs?repo=try&revision=98530237f7caa5771a99ae8e6278cf5eec571f27
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VZoqoiz5RrOmD-lz22L-sw/runs/0/artifacts/public/build/target.dmg
Edit: Forgot to mention that my new patch backs out my old patch.
Edit: For the tryserver build to work, you need to run xattr -c target.dmg after you download it.
| Reporter | ||
Comment 31•2 years ago
|
||
Here's the hook library I tested with, as a patch on https://github.com/steven-michaud/HookCase/blob/v7.2.0/HookLibraryTemplate/hook.mm.
HookCase supports watchpoints. In their least invasive form, they just trace page faults that happen on a particular range of virtual memory.
| Reporter | ||
Comment 32•2 years ago
|
||
Here's the output of my hook library on a mozilla-central build with no patches.
| Reporter | ||
Comment 33•2 years ago
|
||
Here's the output of my hook library on a mozilla-central build with my old "explicitly link SidecarCore framework" patch. It's almost identical with the output for no patches. Only the stack trace at (libobjc.A.dylib) _mapStrHash(_NXMapTable*, void const*) + 0x7 is missing.
| Reporter | ||
Comment 34•2 years ago
|
||
Here's the output of my hook library on a mozilla-central build with my new "explicitly load and initialize SidecarCore framework" patch. There aren't any traces of page faults.
| Reporter | ||
Comment 35•2 years ago
|
||
All my tests, above, were done on the current version of macOS 13 -- 13.3.1 (a) build 22E772610a.
| Reporter | ||
Comment 36•2 years ago
|
||
| Reporter | ||
Comment 37•2 years ago
|
||
Using my hook library, I found that the current versions of Safari and Chrome both seem to be susceptible to this bug's crashes. I also found a bug report on Safari. It claims that only "unsupported Macs" are effected -- presumably ones that don't actually support macOS 13 (Ventura). I don't know if this is true, but it's an interesting possibility.
| Reporter | ||
Comment 38•2 years ago
•
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #37)
Using my hook library, I found that the current versions of Safari and Chrome both seem to be susceptible to this bug's crashes. I also found a bug report on Safari. It claims that only "unsupported Macs" are effected -- presumably ones that don't actually support macOS 13 (Ventura). I don't know if this is true, but it's an interesting possibility.
There's something to this. I just looked at the four most recent crashes in the list generated by search from comment #10. All are on Macs that are running Ventura but don't support it:
bp-ef4954e0-9102-402b-ad2a-b77fc0230505 iMac14,1
bp-9d47c8be-8f68-4aa0-a0c9-22af50230505 MacBookPro11,1
bp-8b981d16-aec7-418d-bc7c-303950230504 Macmini7,1
bp-f5187a63-2ae0-43ab-8c7b-f48de0230504 MacBookPro11,4
It's really too bad this kind of search can't be automated.
Edit: My guess is that OpenCore Legacy Patcher is running on all these machines.
| Reporter | ||
Comment 39•2 years ago
|
||
I've got an old MacBook Pro (model id MacBookPro11,5) that's not supported by Apple for macOS 13 (Ventura), but is supported by OCLP (the OpenCore Legacy Patcher). I'm waiting on an order for an external USB SSD. Once it arrives I'll try installing Ventura on that machine and see whether or not I can use it to reproduce this bug's crashes. If I can, I'll also use it to test my latest patch.
| Reporter | ||
Comment 40•2 years ago
|
||
I managed to use OCLP to install macOS 13.3.1 to my mid-2015 MacBook Pro (model id MacBookPro11,5) -- which Apple says doesn't support Ventura. I tested with its default settings and with Sidecar support disabled -- but neither way am I able to reproduce this bug's crashes. Moreover HookCase is incompatible with OCLP, at least for the time being. So I've been unable to learn much about how Sidecar's binaries operate in such a configuration.
I'll keep playing with this, but for the moment I'm stuck. My second patch is pretty clearly better than my first one, but I still can't be sure that it will work.
| Reporter | ||
Comment 41•2 years ago
•
|
||
I managed to get HookCase working with OCLP. It shows me that the SidecarCore framework's initialization works almost identically in this environment to how it does on "native" macOS Ventura 13.3.1. But when Sidecar is disabled, I don't get watchpoint hits for the worst of the crash stacks ([@ +[NSString stringWithCString:encoding:] ]). So if anyone here is using OCLP and sees these crashes, you really can stop them by disabling Sidecar.
As with most things OCLP, it's not exactly clear from their docs how to do this. Here are the steps:
- Run the OpenCore Legacy Patcher app and choose "Settings", then "Misc Settings"
- Click on the drop-down list under "Feature Unlock Status" and choose either "Partially enabled (No Airplay/Sidecar)" or "Disabled".
- Return to the main menu and choose "Build and Install OpenCore". Follow the prompts.
I'm still not able to reproduce this bug's crashes, and probably never will be. My second patch is worth a try, though.
Comment 42•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
| Reporter | ||
Updated•2 years ago
|
Comment 43•2 years ago
|
||
Comment 44•2 years ago
|
||
| Reporter | ||
Comment 45•2 years ago
|
||
My patch worked fine in a local build and on the tryserver.
I'll try to guess what I changes I need to make, then resubmit my patch.
Comment 46•2 years ago
|
||
Adding #include <dlfcn.h> to nsAppShell.mm should do the trick, I'd think.
Updated•2 years ago
|
| Reporter | ||
Comment 47•2 years ago
|
||
OK, Markus, that's what I've done. I noticed this in several other cocoa/widgets files, so I figured it was probably OK.
I'll wait for you or Stephen to review my change, then reland my patch.
Updated•2 years ago
|
Comment 48•2 years ago
|
||
Comment 49•2 years ago
|
||
| bugherder | ||
Comment 50•2 years ago
|
||
The patch landed in nightly and beta is affected.
:smichaud, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox114towontfix.
For more information, please visit BugBot documentation.
| Reporter | ||
Comment 51•2 years ago
|
||
Once again, I think it'd be worthwhile to know fairly soon whether or not my second patch works. So I'll be nominating it for beta approval.
| Reporter | ||
Comment 52•2 years ago
|
||
Comment on attachment 9331646 [details]
Bug 1801419 - Explicitly load and initialize SidecarCore framework.
Beta/Release Uplift Approval Request
- User impact if declined: It will take longer to find out whether or not my patch works. Beta has many more users than Nightly.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The risk is low. At worst, if the patch doesn't work, the crashes will continue as before.
- String changes made/needed:
- Is Android affected?: No
| Reporter | ||
Comment 53•2 years ago
•
|
||
One crash has happened on a build with my second patch, which matches one of this bug's signatures:
bp-80f87013-5237-4bf1-a7d0-34f7f0230519
But it's not a SidecarCore crash, and so isn't relevant to this bug.
Edit: I took another look at this, and I was wrong -- it is relevant. The function from my second patch (PinSidecarCoreTextCStringSections()) is in the crash stack. So this bug's crashes may just end up being shifted from one place to another. I'll keep an eye on this.
Comment 54•2 years ago
|
||
Comment on attachment 9331646 [details]
Bug 1801419 - Explicitly load and initialize SidecarCore framework.
Approved for 114.0b7.
Comment 55•2 years ago
|
||
| bugherder uplift | ||
| Reporter | ||
Comment 56•2 years ago
•
|
||
Things aren't looking good :-(
There are now five crash reports with PinSidecarCoreTextCStringSections() in their proto signatures -- including three that I missed because of problems with the symbol server upload for macOS 13.4 (which came out a few days ago).
Looks like another failed patch. But it'll be at least a week before I know for sure.
Edit: By comparing its stacks with my hook library's output, it seems that the libsystem_kernel.dylib@0x2972 signature translates to mig_strncpy_zerofill.
Updated•2 years ago
|
| Reporter | ||
Comment 57•2 years ago
|
||
Looks like another failed patch. But it'll be at least a week before I know for sure.
It hasn't been a week yet. But there's one thing I've already noticed: With no patch, and with my first patch, crashes with the [@ +[NSString stringWithCString:encoding:] ] signature are overwhelmingly the most common. With my second patch their proportion is far smaller -- currently just one out of ten. That could mean that my second patch is going to eliminate a lot of this bug's crashes.
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 58•2 years ago
|
||
As best I can tell my second patch hasn't made any difference in the frequency of this bug's crashes. It hasn't made them more frequent, but it hasn't made them less so, either.
Here are the stats from my search in comment #56 over the last month. Where the signatures aren't symbolicated, I've folded them in with those that are. The crashes [@ +[NSString stringWithCString:encoding:] ] are once again overwhelmingly most common.
[@ +[NSString stringWithCString:encoding:] ] 96
[@ mig_strncpy_zerofill ] 17
[@ _mapStrHash ] 13
[@ prepareMethodLists ] 1
Though my second patch has failed, I'm inclined to leave it in place, just to be able to watch the statistics going forward. Without more information there's nothing more I can do. But at some point Socorro will start collecting (and displaying) kernel boot args on macOS. That may be just the information we need.
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 59•2 years ago
•
|
||
This isn't really a regression. The same crashes would have happened without my patch, but in a different place in the Mozilla code. Also, the crashes
[@ libsystem_kernel.dylib@0x2972]are same as those[@ mig_strncpy_zerofill ]. The non-symbolicated signature is for a macOS 13.5.0 beta. I only scrape symbols for release versions and betas of the next major release (currently macOS 14).Here's a search, copied from bug 1801419 comment #56 and updated, which captures all that bug's crashes for the last month (in builds with my second patch), including those with non-symbolicated signatures:
And here's these stats with the non-symbolicated signatures folded in with the symbolicated ones. Compare them to those from bug 1801419 comment #58.
[@ +[NSString stringWithCString:encoding:] ] 200 [@ mig_strncpy_zerofill ] 39 [@ _mapStrHash ] 12 [@ prepareMethodLists ] 7They do seem higher, but that has nothing to do with this bug report.
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 61•2 years ago
|
||
Crashes with PinSidecarCoreTextCStringSections() on the stack have started showing up on the macOS 14 betas. I don't know if they're related, though -- they're mostly assertion failures, which muddies up their crash stacks.
Updated•2 years ago
|
| Reporter | ||
Comment 62•2 years ago
•
|
||
Gabriele, when you get the chance, please look through the crash reports for this bug and find out which mac_boot_args (and their values) are present in every report (or almost every report).
Comment 63•2 years ago
|
||
Almost half of the crashes here have the following boot args keepsyms=1 debug=0x100 ipc_control_port_options=0 -nokcmismatchpanic. Those flags always appear together. The rest have none (or none where reported in the crash).
| Reporter | ||
Comment 64•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #63)
Almost half of the crashes here have the following boot args
keepsyms=1 debug=0x100 ipc_control_port_options=0 -nokcmismatchpanic. Those flags always appear together. The rest have none (or none where reported in the crash).
These are standard setting for OCLP. So this bug's crashes probably aren't triggered by any mac boot arg, or combination thereof.
It's puzzling, though, that some crashes happen without any mac boot args. That could mean that this bug's crashes don't always involve OCLP.
| Reporter | ||
Comment 65•2 years ago
|
||
I'm giving up here. My last hope was that the mac boot args in this bug's crash reports would indicate their cause. That didn't pan out.
Mozilla can either keep my (failed) patch or back it out. Either way the crash frequency should stay the same. The one advantage of keeping my patch is that it makes it easier to search for all this bug's crashes:
Comment 66•2 years ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #64)
It's puzzling, though, that some crashes happen without any mac boot args. That could mean that this bug's crashes don't always involve OCLP.
There's also the possibility that we're failing to capture the boot args, I don't think we have a way to tell apart instances where the args are empty and instances where we couldn't read them.
| Reporter | ||
Comment 67•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #66)
(In reply to Steven Michaud [:smichaud] (Retired) from comment #64)
It's puzzling, though, that some crashes happen without any mac boot args. That could mean that this bug's crashes don't always involve OCLP.
There's also the possibility that we're failing to capture the boot args, I don't think we have a way to tell apart instances where the args are empty and instances where we couldn't read them.
It's odd that we'd fail 50% of the time to capture the boot args. But if that's what's happening, your patch for bug 1878428 could fix it.
Updated•2 years ago
|
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 68•1 year ago
•
|
||
There are now a fair number of Sidecar crashes (all on macOS 14) that bypass the heart of my (failed) patch -- that don't have PinSidecarCoreTextCStringSections() on their stacks. I'm not entirely sure what this means, but I suspect these crashes don't involve OCLP. That might explain the "missing" boot args in Gabriele's comment #63.
Most of these "new" crashes have the signature [@ getMethodFromListArray<T> ]:
There are also quite a few with the signature [@ prepareMethodLists].
Description
•