Symbolication not working for OS symbols in crash reports for macOS 10.16/11 (Big Sur)
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: smichaud, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
The symbol server now has OS symbols for Big Sur Beta 5 (build 20A5354i). I scraped them, and Marco Castelluccio uploaded them to the symbol server yesterday (Thursday 2020-08-27) around 17:00 UTC. But these symbols still aren't appearing in Beta 5 crash stacks:
The symbols really did make it to the symbol server. I checked by re-running the gathersymbols.py
script from https://github.com/marco-c/breakpad-scrape-system-symbols. So the problem must have some other cause.
I'll be looking into this.
Reporter | ||
Comment 1•4 years ago
•
|
||
Digging a little deeper, I noticed a few recent crash stacks that are properly symbolicated (on the main thread):
bp-86468c8b-958d-42f5-9943-079430200828
bp-cfeef208-e932-4b97-b581-e4dbd0200828
bp-47159860-2ce0-4f0d-916c-298bf0200828
And others that aren't (on secondary threads):
bp-5f621954-470a-4554-a331-dff9a0200828
bp-3b989639-2417-4411-a25a-50c7d0200828
Plus one where some OS symbols are symbolicated and others aren't (on a secondary thread):
bp-9d348b8a-6253-48c6-b066-1738a0200828
The plot thickens :-)
Reporter | ||
Comment 2•4 years ago
•
|
||
I've figured this bug out. It's caused by a design flaw in macOS Big Sur. The problem is that many system files are no longer located in the OS's file system. Instead they're lumped together in a dynamic linker cache, where dump_syms
can't find them. Others have already run into this:
https://github.com/vispy/vispy/issues/1885
So we're going to have to rewrite dump_syms
:-(
I'll leave this bug open. Then, if nobody else beats me to it, at some point I'll do the required rewrite of dump_syms
, either here or in another bug.
Reporter | ||
Comment 3•4 years ago
|
||
I found the following document explaining how to extract binaries from the dynamic linker cache (which seems to be the same thing as the "dyld shared cache"):
https://lapcatsoftware.com/articles/bigsur.html
I haven't tried it yet. But as the article points out, this will be necessary for debugging problems in those binaries.
Reporter | ||
Comment 4•4 years ago
|
||
(Following up comment #2)
So we're going to have to rewrite dump_syms :-(
And as the Vispy article points out, Python will also need a fix. When the fix comes, it won't be backported to Python 2.7 (which is EOL). So the gathersymbols.py
script from https://github.com/marco-c/breakpad-scrape-system-symbols will also need to be rewritten to accommodate current Python versions. With luck this will just require fixing a bunch of what have become syntax errors. But in any case a fair amount of work will be required.
Reporter | ||
Comment 5•4 years ago
•
|
||
Apparently the find_libraries
problem has already been fixed in Python trunk:
https://bugs.python.org/issue41179
https://github.com/python/cpython/pull/21250#issuecomment-652441170
https://bugs.python.org/pull_request20400
https://github.com/python/cpython/pull/21250
Reporter | ||
Comment 6•4 years ago
•
|
||
Apparently the find_libraries problem has already been fixed in Python trunk:
Actually I'm no longer sure. Those links contain appalling levels of complexity and ambiguity. But the fixes they describe seem to have made it into Python 3.8.5. And full support for Big Sur is expected in Python 3.8.x or 3.9.0.
From https://docs.python.org/release/3.8.5/whatsnew/changelog.html:
"bpo-41100: Fix configure error when building on macOS 11. Note that the current Python release was released shortly after the first developer preview of macOS 11 (Big Sur); there are other known issues with building and running on the developer preview. Big Sur is expected to be fully supported in a future bugfix release of Python 3.8.x and with 3.9.0."
Reporter | ||
Comment 7•4 years ago
•
|
||
Here's a patch to dyld-750.5 that I used to build dyld_shared_cache_util
.
I started out with Jeff Johnson's instructions from comment #3, but I ended up using a different strategy. When he encountered a missing header file, he edited the code to no longer need it. Instead I found copies of almost all the missing header files in different distros on https://opensource.apple.com/.
The strategy I used was very simple. For every missing header file I performed the following search:
"missing_file.h" site:opensource.apple.com
Then I downloaded a copy of the most recent distro I could find that contained the missing header file.
I found _simple.h
in Libc-583, libc_private.h
in Libc-1272.250.1, and System/machine/cpu_capabilities.h
and corecrypto/ccdigest.h
and friends in xnu-6153.121.1. The only one I didn't find (not too surprisingly) was sandbox/private.h
. But I was able to reconstruct the necessary defines by searching around on the web.
Since this patch contains all these "extra" header files, it's quite large. But it's still only sufficient to make dyld_shared_cache_util
build successfully. I tried it on both macOS 10.15.6 and macOS 11 Beta 5. Build it by running the following command on the command line:
xcodebuild -target dyld_shared_cache_util
Edit: These instructions no longer work on XCode 13 and up -- XCode has gotten pickier. I've now updated my patch for the current dyld
source distro. See comment #24 and comment #25 below.
Reporter | ||
Comment 8•4 years ago
|
||
I forgot to mention that I had to tweak some of the "new" header files, to get them working in their new locations. But I didn't make any major changes.
To get around the problem that XCode doesn't recognize bridgeos
, I added the following define to both dyld.h
and dyld_priv.h
:
#define bridgeos watchos
This will probably make dyld_shared_cache_util
not work properly on watchos
. But I decided not to worry about that :-)
Comment 9•4 years ago
|
||
So we're going to have to rewrite
dump_syms
:-(
I don't really understand what's the link with dump_syms
here.
dump_syms
is just a tool to convert info we've in lib+dbg files into breakpad format.
Reporter | ||
Comment 10•4 years ago
•
|
||
dump_syms
is used by gathersymbols.py
(from https://github.com/marco-c/breakpad-scrape-system-symbols), which is the only tool Mozilla has for the manual scraping of symbols. To work in that context on BigSur, it will need to be able to pull a system library out of the dyld shared cache when it's not present in the file system. It can't do that at present. (I tested with both the dump_syms
from Google Breakpad and the Rust one that's currently being used by the Mozilla build infrastructure.)
Manual scraping is important. Mozilla does have a system for automated scraping of symbols from Apple updates (minor updates), as they come out. But it misses a lot of symbols, so manual scraping is still needed. For the last year or so, I've been the one doing the scraping, each time Apple releases a new minor update for macOS 10.13, 10.14 or 10.15. I've been sending the symbols I scrape to Marco Castelluccio, who's been uploading them to the symbol server. Without this, the quality of crash stacks on https://crash-stats.mozilla.org/ would have been much worse.
I do understand that the problems discussed in this bug will take a while to fix properly. If nothing else, we need to wait until Python fully supports BigSur. So in the meantime I've found a way to use the dyld_shared_cache_util
utility to extract binaries from the dyld shared cache into directories that our current tools (gathersymbols.py
, Python 2.7 and dump_syms
) can scrape manually. I've sent a bunch of symbols to Marco, which he should be uploading soon to the symbol server.
Reporter | ||
Comment 11•4 years ago
•
|
||
I've sent a bunch of symbols to Marco, which he should be uploading soon to the symbol server.
Marco did this, and I double-checked that the symbols really are on the symbol server. But most BigSur system binaries still aren't getting symbolicated at https://crash-stats.mozilla.org/ :-(
My guess is that it has something to do with GUIDs. But it will take a fair amount of digging to be sure.
I don't know how long this will take me.
Reporter | ||
Comment 12•4 years ago
•
|
||
My guess is that it has something to do with GUIDs.
I was right. When I use minidump_stackwalk -m
on a minidump generated (from a Firefox crash) on BigSur, all the binaries from the dyld shared cache have "null" GUIDs. So there's a bug in the breakpad code that generates minidumps. When I find it, I'll open new bug report on it.
Reporter | ||
Comment 13•4 years ago
|
||
I've opened bug 1662862.
Reporter | ||
Comment 14•4 years ago
|
||
(Following up comment #12)
Actually I should have called them "UUIDs".
Comment 15•4 years ago
|
||
It seems that bug 1672505 may have played a part in this.
Comment 16•4 years ago
|
||
Steven, I intend to work on this in the immediate future. I will first switch the system-symbols-mac task to use the new dump_syms tool (and a new Debian image so we have modern tools). After that I'll change the script with the changes you suggested in comment 10. Can you share the code you used to pull out the libraries out of the cache? Snippets are sufficient, just so that I get an idea of what needs to be changed.
And thanks for having done this work manually the last few months!
Reporter | ||
Comment 17•4 years ago
|
||
I didn't write any code myself. Instead I built Apple's dyld_shared_cache_util
, using Apple's source code and my patch from comment 7, and used that. That source code should, I hope, provide all the information you need. Note that since then, Apple has released a slightly more recent version. The only other clue I have is that Apple says dlopen()
works on files that are in the dyld shared cache but not in the file system. I don't know whether there's a way to convert a dlopen()
handle to a file descriptor.
And thanks for having done this work manually the last few months!
You're most welcome! I'll keep doing my manual scraping as long as it's needed. Even on macOS 10.15 through 10.13, the automated system still doesn't find all the symbols it should.
Reporter | ||
Comment 18•4 years ago
|
||
Reporter | ||
Comment 19•4 years ago
|
||
Gabriele, I now realize that I might have misunderstood your question. I took you to be asking how you'd rewrite dump_syms
to process files that are in the dyld shared cache but not in the file system. But maybe instead you're asking how I've been using dyld_shared_cache_util
in the interim, before dump_syms
and gathersymbols.py
have been updated to deal with macOS 11's design flaw.
That's very simple. I do the following by hand (I haven't bothered to write a script):
-
Create a working directory (I call it
symbols
). -
Run
gathersymbols.py
as usual to scrape symbols from the libraries and frameworks that are still in the file system. You must use Python 2.7 for this.OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms
-
In the working directory create directories named
arm64e
,x86_64
andx86_64h
, then run the following commands:dyld_shared_cache_util -extract arm64e /System/Library/dyld/dyld_shared_cache_arm64e dyld_shared_cache_util -extract x86_64 /System/Library/dyld/dyld_shared_cache_x86_64 dyld_shared_cache_util -extract x86_64h /System/Library/dyld/dyld_shared_cache_x86_64h
-
Run
gathersymbols.py
on each of these directories as follows:OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms arm64e OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64h
Comment 20•4 years ago
|
||
Thanks Steven, that's exactly what I needed! I'm working on making the system-symbols-mac work again but for now I don't want to change the tools it uses to minimize risk. If I can adapt it to scrape symbols using dyld_shared_cache_util
then it's a very good stop-gap solution while we work on a proper fix.
I usually go for the proper fix straight away but ATM we're overworked and understaffed as well as being under pressure to release Firefox for macOS/AArch64, so we need this working ASAP. I just can't afford the luxury of sitting on it while I rewrite stuff :-(
Updated•4 years ago
|
Comment 21•4 years ago
|
||
More people running into this issue
Reporter | ||
Comment 23•2 years ago
|
||
The instructions from comment #19 work fine on macOS 11 (Big Sur) and 12 (Monterey). But Apple made significant changes in macOS 13 (Ventura), of which the first beta was just released. Most notably, the location of the dyld_shared_cache
files has changed, and you have to look for them on both Apple Silicon and Intel hardware. Only Intel machines have the x86_64h
cache, and only Apple Silicon machines have the arm64e
and x86_64
caches. More hoops to jump through :-(
New instructions for manual symbol scraping on macOS 13 (Ventura):
-
Create a working directory (I call it symbols).
-
Run gathersymbols.py as usual to scrape symbols from the libraries and frameworks that are still in the file system. You must use Python 2.7 for this.
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms
-
On an Apple Silicon machine running macOS 13:
dyld_shared_cache_util -extract arm64e /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_arm64e dyld_shared_cache_util -extract x86_64 /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_x86_64
-
On an Intel machine running macOS 13:
dyld_shared_cache_util -extract x86_64h /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_x86_64h
-
Run gathersymbols.py on each of these directories as follows:
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms arm64e OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64h
Updated•2 years ago
|
Reporter | ||
Comment 24•2 years ago
•
|
||
(Following up comment #7)
I've created a new patch, for the (currently) latest dyld source distro (dyld-1042.1). I used the same strategy as outlined in comment #7.
I updated the "missing" headers to the newest versions I could find. Once again, I failed to find sandbox/private.h
. But I reconstructed its defines exactly as I did the first time. I also couldn't find VersionMap.h
, but I was able to figure out how to generate it:
Download AvailabilityVersions, then run make install_dyld_headers
. VersionMap.h
will be created somewhere under dst
.
As previously, this patch only allows you to build dyld_shared_cache_util
, as follows:
xcodebuild -target dyld_shared_cache_util
I've tested this on macOS 12.6.1 using XCode 13 and 14, and on macOS 13 using XCode 14.
Reporter | ||
Updated•2 years ago
|
Comment 25•2 years ago
|
||
Here's a much simpler way to get a working dyld_shared_cache_util: https://gist.github.com/mstange/7a642437b67ab7d8b0c68979ccd23a36
Compiling all of dyld doesn't make a big difference because it ends up calling into /usr/lib/dsc_extractor.bundle
anyway, rather than using the code you compiled.
Reporter | ||
Comment 26•2 years ago
•
|
||
I'll take a deep breath and look at it tomorrow :-)
Edit: Actually, the dyld_shared_cache_util
I compiled doesn't call dyld_shared_cache_extract_dylibs_progress()
in /usr/lib/dsc_extractor.bundle
. I found this out using a HookCase hook library, and by looking at the machine code of what I compiled. But it does call its own internal dyld_shared_cache_extract_dylibs_progress()
, which as best I can tell has exactly the same functionality. So yes, your code does provide a much simpler way to implement dyld_shared_cache_util
.
Edit: Interestingly, my previous build of dyld_shared_cache_util
(based on dyld-750.5) does call dyld_shared_cache_extract_dylibs_progress()
in /usr/lib/dsc_extractor.bundle
. Now I remember talking to you a year ago about how this saved it from failing after the macOS 12 upgrade changed the structure of the dyld cache files -- it called dyld_shared_cache_extract_dylibs_progress()
in an up-to-date /usr/lib/dsc_extractor.bundle
. So your method is not only simpler, it's also safer.
Reporter | ||
Updated•2 years ago
|
Comment 27•2 years ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #26)
Edit: Actually, the
dyld_shared_cache_util
I compiled doesn't calldyld_shared_cache_extract_dylibs_progress()
in/usr/lib/dsc_extractor.bundle
.
Edit: Interestingly, my previous build ofdyld_shared_cache_util
(based on dyld-750.5) does calldyld_shared_cache_extract_dylibs_progress()
in/usr/lib/dsc_extractor.bundle
Oh, I see, that's interesting that they changed that.
About this bug: Do we have a plan for automating this? I'm guessing we still want dump_syms to be able to dump the shared cache itself, so that we can run it on Linux, right? I guess that's still just https://github.com/mozilla/dump_syms/issues/242 .
Comment 28•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #27)
About this bug: Do we have a plan for automating this? I'm guessing we still want dump_syms to be able to dump the shared cache itself, so that we can run it on Linux, right? I guess that's still just https://github.com/mozilla/dump_syms/issues/242 .
I don't think we need to. I've picked apart recent installers and we can extract all the libraries before they're put inside the shared cache. The only stumbling blocks are that we need some tweaks to the dmg command to unpack the packages for macOS 11+ and we need an unarchiver for Apple's YAA format. The latter should be fairly straightforward but the former might require more work. I found a collection of patches that work but they also introduce a lot of regressions so more work is needed. In the meantime all versions up to 10.15 are now being re-scraped automatically.
Comment 29•1 year ago
|
||
Here's a short summary of what needs to be done to move this forward:
- We need an expander for Apple Archive format (formerly YAA). The format is extremely simple (see here) and is basically just a not-invented-here variation of ar
- Once we have it we need to wire it up with the PackageSymbolDumper.py script. The script already knows how to recursively extract Apple updates but currently stops when we stumble upon YAA archives
- Once this is also wired up we need to verify that the version of dmg that we use in the task is capable of expanding dmg for macOS versions 11+ or replace it with something that can (like maybe dmgwiz though I haven't tested it yet)
- Last but not least we need to uncomment the lines with the macOS 11+ repositories in reposado. We also need to add new entries for macOS 14
Once we have everything in place we should be able to follow this chain to all the system libraries in Apple's update packages:
- Find the packages on Apple servers & download them
- Extract the Payload files out of their .pkg files (aka flat-packages aka xar)
- Unpack the Payload files (these are PBZX-compresed cpio files)
- Dig out the .zip files they contain with the actual archives
- Unpack the Payloads (which are split across multiple payload.xyz files but they're still PBZX-compressed cpio)
- Unpack the YAA archives inside, these contain the actual binaries and libraries
- Look for all the binaries and libraries and dump them with dump_syms
Comment 30•1 year ago
|
||
BTW, as I was looking at this stuff I noticed that former mozillian Gregory Szorc wrote a bunch of packages to deal with these archives so we might use them instead, even though we already have almost everything available in Python (save for YAA).
Description
•