Open Bug 1661771 Opened 4 years ago Updated 6 months ago

Symbolication not working for OS symbols in crash reports for macOS 10.16/11 (Big Sur)

Categories

(Toolkit :: Crash Reporting, defect)

All
macOS
defect

Tracking

()

People

(Reporter: smichaud, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

The symbol server now has OS symbols for Big Sur Beta 5 (build 20A5354i). I scraped them, and Marco Castelluccio uploaded them to the symbol server yesterday (Thursday 2020-08-27) around 17:00 UTC. But these symbols still aren't appearing in Beta 5 crash stacks:

https://crash-stats.mozilla.org/search/?platform_version=~20A5354i&date=%3E%3D2020-08-27T17%3A35%3A00.000Z&date=%3C2020-08-28T17%3A35%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

The symbols really did make it to the symbol server. I checked by re-running the gathersymbols.py script from https://github.com/marco-c/breakpad-scrape-system-symbols. So the problem must have some other cause.

I'll be looking into this.

Digging a little deeper, I noticed a few recent crash stacks that are properly symbolicated (on the main thread):

bp-86468c8b-958d-42f5-9943-079430200828
bp-cfeef208-e932-4b97-b581-e4dbd0200828
bp-47159860-2ce0-4f0d-916c-298bf0200828

And others that aren't (on secondary threads):

bp-5f621954-470a-4554-a331-dff9a0200828
bp-3b989639-2417-4411-a25a-50c7d0200828

Plus one where some OS symbols are symbolicated and others aren't (on a secondary thread):

bp-9d348b8a-6253-48c6-b066-1738a0200828

The plot thickens :-)

I've figured this bug out. It's caused by a design flaw in macOS Big Sur. The problem is that many system files are no longer located in the OS's file system. Instead they're lumped together in a dynamic linker cache, where dump_syms can't find them. Others have already run into this:

https://github.com/vispy/vispy/issues/1885

So we're going to have to rewrite dump_syms :-(

I'll leave this bug open. Then, if nobody else beats me to it, at some point I'll do the required rewrite of dump_syms, either here or in another bug.

Blocks: 1654845
Blocks: 1655275
Blocks: 1588740

I found the following document explaining how to extract binaries from the dynamic linker cache (which seems to be the same thing as the "dyld shared cache"):

https://lapcatsoftware.com/articles/bigsur.html

I haven't tried it yet. But as the article points out, this will be necessary for debugging problems in those binaries.

(Following up comment #2)

So we're going to have to rewrite dump_syms :-(

And as the Vispy article points out, Python will also need a fix. When the fix comes, it won't be backported to Python 2.7 (which is EOL). So the gathersymbols.py script from https://github.com/marco-c/breakpad-scrape-system-symbols will also need to be rewritten to accommodate current Python versions. With luck this will just require fixing a bunch of what have become syntax errors. But in any case a fair amount of work will be required.

Apparently the find_libraries problem has already been fixed in Python trunk:

Actually I'm no longer sure. Those links contain appalling levels of complexity and ambiguity. But the fixes they describe seem to have made it into Python 3.8.5. And full support for Big Sur is expected in Python 3.8.x or 3.9.0.

From https://docs.python.org/release/3.8.5/whatsnew/changelog.html:

"bpo-41100: Fix configure error when building on macOS 11. Note that the current Python release was released shortly after the first developer preview of macOS 11 (Big Sur); there are other known issues with building and running on the developer preview. Big Sur is expected to be fully supported in a future bugfix release of Python 3.8.x and with 3.9.0."

Here's a patch to dyld-750.5 that I used to build dyld_shared_cache_util.

I started out with Jeff Johnson's instructions from comment #3, but I ended up using a different strategy. When he encountered a missing header file, he edited the code to no longer need it. Instead I found copies of almost all the missing header files in different distros on https://opensource.apple.com/.

The strategy I used was very simple. For every missing header file I performed the following search:

    "missing_file.h" site:opensource.apple.com

Then I downloaded a copy of the most recent distro I could find that contained the missing header file.

I found _simple.h in Libc-583, libc_private.h in Libc-1272.250.1, and System/machine/cpu_capabilities.h and corecrypto/ccdigest.h and friends in xnu-6153.121.1. The only one I didn't find (not too surprisingly) was sandbox/private.h. But I was able to reconstruct the necessary defines by searching around on the web.

Since this patch contains all these "extra" header files, it's quite large. But it's still only sufficient to make dyld_shared_cache_util build successfully. I tried it on both macOS 10.15.6 and macOS 11 Beta 5. Build it by running the following command on the command line:

    xcodebuild -target dyld_shared_cache_util

Edit: These instructions no longer work on XCode 13 and up -- XCode has gotten pickier. I've now updated my patch for the current dyld source distro. See comment #24 and comment #25 below.

I forgot to mention that I had to tweak some of the "new" header files, to get them working in their new locations. But I didn't make any major changes.

To get around the problem that XCode doesn't recognize bridgeos, I added the following define to both dyld.h and dyld_priv.h:

    #define bridgeos watchos

This will probably make dyld_shared_cache_util not work properly on watchos. But I decided not to worry about that :-)

So we're going to have to rewrite dump_syms :-(

I don't really understand what's the link with dump_syms here.
dump_syms is just a tool to convert info we've in lib+dbg files into breakpad format.

dump_syms is used by gathersymbols.py (from https://github.com/marco-c/breakpad-scrape-system-symbols), which is the only tool Mozilla has for the manual scraping of symbols. To work in that context on BigSur, it will need to be able to pull a system library out of the dyld shared cache when it's not present in the file system. It can't do that at present. (I tested with both the dump_syms from Google Breakpad and the Rust one that's currently being used by the Mozilla build infrastructure.)

Manual scraping is important. Mozilla does have a system for automated scraping of symbols from Apple updates (minor updates), as they come out. But it misses a lot of symbols, so manual scraping is still needed. For the last year or so, I've been the one doing the scraping, each time Apple releases a new minor update for macOS 10.13, 10.14 or 10.15. I've been sending the symbols I scrape to Marco Castelluccio, who's been uploading them to the symbol server. Without this, the quality of crash stacks on https://crash-stats.mozilla.org/ would have been much worse.

I do understand that the problems discussed in this bug will take a while to fix properly. If nothing else, we need to wait until Python fully supports BigSur. So in the meantime I've found a way to use the dyld_shared_cache_util utility to extract binaries from the dyld shared cache into directories that our current tools (gathersymbols.py, Python 2.7 and dump_syms) can scrape manually. I've sent a bunch of symbols to Marco, which he should be uploading soon to the symbol server.

I've sent a bunch of symbols to Marco, which he should be uploading soon to the symbol server.

Marco did this, and I double-checked that the symbols really are on the symbol server. But most BigSur system binaries still aren't getting symbolicated at https://crash-stats.mozilla.org/ :-(

My guess is that it has something to do with GUIDs. But it will take a fair amount of digging to be sure.

I don't know how long this will take me.

My guess is that it has something to do with GUIDs.

I was right. When I use minidump_stackwalk -m on a minidump generated (from a Firefox crash) on BigSur, all the binaries from the dyld shared cache have "null" GUIDs. So there's a bug in the breakpad code that generates minidumps. When I find it, I'll open new bug report on it.

I've opened bug 1662862.

See Also: → 1662862

(Following up comment #12)

Actually I should have called them "UUIDs".

It seems that bug 1672505 may have played a part in this.

Steven, I intend to work on this in the immediate future. I will first switch the system-symbols-mac task to use the new dump_syms tool (and a new Debian image so we have modern tools). After that I'll change the script with the changes you suggested in comment 10. Can you share the code you used to pull out the libraries out of the cache? Snippets are sufficient, just so that I get an idea of what needs to be changed.

And thanks for having done this work manually the last few months!

Flags: needinfo?(smichaud)

I didn't write any code myself. Instead I built Apple's dyld_shared_cache_util, using Apple's source code and my patch from comment 7, and used that. That source code should, I hope, provide all the information you need. Note that since then, Apple has released a slightly more recent version. The only other clue I have is that Apple says dlopen() works on files that are in the dyld shared cache but not in the file system. I don't know whether there's a way to convert a dlopen() handle to a file descriptor.

And thanks for having done this work manually the last few months!

You're most welcome! I'll keep doing my manual scraping as long as it's needed. Even on macOS 10.15 through 10.13, the automated system still doesn't find all the symbols it should.

Flags: needinfo?(smichaud)

Apple's dyld source code is browsable here and here.

Gabriele, I now realize that I might have misunderstood your question. I took you to be asking how you'd rewrite dump_syms to process files that are in the dyld shared cache but not in the file system. But maybe instead you're asking how I've been using dyld_shared_cache_util in the interim, before dump_syms and gathersymbols.py have been updated to deal with macOS 11's design flaw.

That's very simple. I do the following by hand (I haven't bothered to write a script):

  1. Create a working directory (I call it symbols).

  2. Run gathersymbols.py as usual to scrape symbols from the libraries and frameworks that are still in the file system. You must use Python 2.7 for this.

     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms
    
  3. In the working directory create directories named arm64e, x86_64 and x86_64h, then run the following commands:

     dyld_shared_cache_util -extract arm64e /System/Library/dyld/dyld_shared_cache_arm64e
    
     dyld_shared_cache_util -extract x86_64 /System/Library/dyld/dyld_shared_cache_x86_64
    
     dyld_shared_cache_util -extract x86_64h /System/Library/dyld/dyld_shared_cache_x86_64h
    
  4. Run gathersymbols.py on each of these directories as follows:

     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms arm64e
    
     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64
    
     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64h
    

Thanks Steven, that's exactly what I needed! I'm working on making the system-symbols-mac work again but for now I don't want to change the tools it uses to minimize risk. If I can adapt it to scrape symbols using dyld_shared_cache_util then it's a very good stop-gap solution while we work on a proper fix.

I usually go for the proper fix straight away but ATM we're overworked and understaffed as well as being under pressure to release Firefox for macOS/AArch64, so we need this working ASAP. I just can't afford the luxury of sitting on it while I rewrite stuff :-(

Blocks: 1648487
Severity: -- → S2

This depends on the changes I'm doing in bug 1709543.

Depends on: 1709543

The instructions from comment #19 work fine on macOS 11 (Big Sur) and 12 (Monterey). But Apple made significant changes in macOS 13 (Ventura), of which the first beta was just released. Most notably, the location of the dyld_shared_cache files has changed, and you have to look for them on both Apple Silicon and Intel hardware. Only Intel machines have the x86_64h cache, and only Apple Silicon machines have the arm64e and x86_64 caches. More hoops to jump through :-(

New instructions for manual symbol scraping on macOS 13 (Ventura):

  1. Create a working directory (I call it symbols).

  2. Run gathersymbols.py as usual to scrape symbols from the libraries and frameworks that are still in the file system. You must use Python 2.7 for this.

     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms
    
  3. On an Apple Silicon machine running macOS 13:

     dyld_shared_cache_util -extract arm64e  /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_arm64e
    
     dyld_shared_cache_util -extract x86_64  /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_x86_64
    
  4. On an Intel machine running macOS 13:

     dyld_shared_cache_util -extract x86_64h  /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_x86_64h
    
  5. Run gathersymbols.py on each of these directories as follows:

     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms arm64e
    
     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64
    
     OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python /usr/local/src/breakpad/marco-c/breakpad-scrape-system-symbols/scrapesymbols/gathersymbols.py -v /usr/local/bin/dump_syms x86_64h
    
Severity: S2 → S3
OS: All → macOS
Hardware: x86_64 → All

(Following up comment #7)

I've created a new patch, for the (currently) latest dyld source distro (dyld-1042.1). I used the same strategy as outlined in comment #7.

I updated the "missing" headers to the newest versions I could find. Once again, I failed to find sandbox/private.h. But I reconstructed its defines exactly as I did the first time. I also couldn't find VersionMap.h, but I was able to figure out how to generate it:

Download AvailabilityVersions, then run make install_dyld_headers. VersionMap.h will be created somewhere under dst.

As previously, this patch only allows you to build dyld_shared_cache_util, as follows:

xcodebuild -target dyld_shared_cache_util

I've tested this on macOS 12.6.1 using XCode 13 and 14, and on macOS 13 using XCode 14.

Attachment #9306123 - Attachment is patch: false

Here's a much simpler way to get a working dyld_shared_cache_util: https://gist.github.com/mstange/7a642437b67ab7d8b0c68979ccd23a36

Compiling all of dyld doesn't make a big difference because it ends up calling into /usr/lib/dsc_extractor.bundle anyway, rather than using the code you compiled.

I'll take a deep breath and look at it tomorrow :-)

Edit: Actually, the dyld_shared_cache_util I compiled doesn't call dyld_shared_cache_extract_dylibs_progress() in /usr/lib/dsc_extractor.bundle. I found this out using a HookCase hook library, and by looking at the machine code of what I compiled. But it does call its own internal dyld_shared_cache_extract_dylibs_progress(), which as best I can tell has exactly the same functionality. So yes, your code does provide a much simpler way to implement dyld_shared_cache_util.

Edit: Interestingly, my previous build of dyld_shared_cache_util (based on dyld-750.5) does call dyld_shared_cache_extract_dylibs_progress() in /usr/lib/dsc_extractor.bundle. Now I remember talking to you a year ago about how this saved it from failing after the macOS 12 upgrade changed the structure of the dyld cache files -- it called dyld_shared_cache_extract_dylibs_progress() in an up-to-date /usr/lib/dsc_extractor.bundle. So your method is not only simpler, it's also safer.

Attachment #9306123 - Attachment description: Patch to build dyld_shared_cache_util with XCode 14.1 → Patch to build dyld_shared_cache_util with XCode 13 and 14

(In reply to Steven Michaud [:smichaud] (Retired) from comment #26)

Edit: Actually, the dyld_shared_cache_util I compiled doesn't call dyld_shared_cache_extract_dylibs_progress() in /usr/lib/dsc_extractor.bundle.
Edit: Interestingly, my previous build of dyld_shared_cache_util (based on dyld-750.5) does call dyld_shared_cache_extract_dylibs_progress() in /usr/lib/dsc_extractor.bundle

Oh, I see, that's interesting that they changed that.

About this bug: Do we have a plan for automating this? I'm guessing we still want dump_syms to be able to dump the shared cache itself, so that we can run it on Linux, right? I guess that's still just https://github.com/mozilla/dump_syms/issues/242 .

(In reply to Markus Stange [:mstange] from comment #27)

About this bug: Do we have a plan for automating this? I'm guessing we still want dump_syms to be able to dump the shared cache itself, so that we can run it on Linux, right? I guess that's still just https://github.com/mozilla/dump_syms/issues/242 .

I don't think we need to. I've picked apart recent installers and we can extract all the libraries before they're put inside the shared cache. The only stumbling blocks are that we need some tweaks to the dmg command to unpack the packages for macOS 11+ and we need an unarchiver for Apple's YAA format. The latter should be fairly straightforward but the former might require more work. I found a collection of patches that work but they also introduce a lot of regressions so more work is needed. In the meantime all versions up to 10.15 are now being re-scraped automatically.

Here's a short summary of what needs to be done to move this forward:

  • We need an expander for Apple Archive format (formerly YAA). The format is extremely simple (see here) and is basically just a not-invented-here variation of ar
  • Once we have it we need to wire it up with the PackageSymbolDumper.py script. The script already knows how to recursively extract Apple updates but currently stops when we stumble upon YAA archives
  • Once this is also wired up we need to verify that the version of dmg that we use in the task is capable of expanding dmg for macOS versions 11+ or replace it with something that can (like maybe dmgwiz though I haven't tested it yet)
  • Last but not least we need to uncomment the lines with the macOS 11+ repositories in reposado. We also need to add new entries for macOS 14

Once we have everything in place we should be able to follow this chain to all the system libraries in Apple's update packages:

  1. Find the packages on Apple servers & download them
  2. Extract the Payload files out of their .pkg files (aka flat-packages aka xar)
  3. Unpack the Payload files (these are PBZX-compresed cpio files)
  4. Dig out the .zip files they contain with the actual archives
  5. Unpack the Payloads (which are split across multiple payload.xyz files but they're still PBZX-compressed cpio)
  6. Unpack the YAA archives inside, these contain the actual binaries and libraries
  7. Look for all the binaries and libraries and dump them with dump_syms

BTW, as I was looking at this stuff I noticed that former mozillian Gregory Szorc wrote a bunch of packages to deal with these archives so we might use them instead, even though we already have almost everything available in Python (save for YAA).

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: