Closed Bug 1662862 Opened 4 years ago Closed 4 years ago

Most system libraries have their debug ids (UUIDs) nulled out in macOS 11 (BigSur) minidumps

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox-esr78 81+ fixed
firefox81 + fixed
firefox82 + fixed

People

(Reporter: smichaud, Assigned: smichaud)

References

Details

Attachments

(2 files)

This bug is spun off from bug 1661771 comment #12.

Thanks to a design flaw in macOS 11, most system libraries have their debug ids (their UUIDs) zeroed out in minidumps created on that OS (when Firefox crashes). Because of this, none of these system libraries gets symbolicated in crash stacks on https://crash-stats.mozilla.org/.

The design flaw is that many system libraries no longer have separate copies in the filesystem. Instead they can only be found lumped together in the "dyld shared cache", where macOS has long stored copies of commonly used system binaries. (A copy of the dyld shared cache gets loaded into each process as it starts up.)

https://github.com/vispy/vispy/issues/1885
https://developer.apple.com/documentation/macos-release-notes/macos-big-sur-11-beta-release-notes

From the release notes:

"New in macOS Big Sur 11 beta, the system ships with a built-in dynamic linker cache of all system-provided libraries. As part of this change, copies of dynamic libraries are no longer present on the filesystem. Code that attempts to check for dynamic library presence by looking for a file at a path or enumerating a directory will fail. Instead, check for library presence by attempting to dlopen() the path, which will correctly check for the library in the cache. (62986286)."

Apple seems to consider this not a bug but a feature. So third-party applications will probably just have to learn how to deal with it.

Another BigSur-specific issue described in bug 1661771 has already been partially addressed -- the problem of how to scrape symbols manually from macOS 11. But this bug's problem is a separate issue.

Here's the output of minidump_stackwalk -m on a minidump from a Firefox crash on macOS 11. Note all the modules that have a NULL debug id.

See Also: → 1661771

I've got a fix for this bug. But before starting the review process I want to run it through tests on the tryserver. Strangely, though, my attempt to do this failed. I assume it's some kind of temporary glitch. I successfully pushed to try twice yesterday and once this morning. I'll try again tomorrow.

Summary: Most system libraries have their debug ids (GUIDs) nulled out in macOS 11 (BigSur) minidumps → Most system libraries have their debug ids (UUIDs) nulled out in macOS 11 (BigSur) minidumps

I gave it one more shot and did manage to push my patch to try. I just needed to remove some of the newer cruft from my .hgrc file.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed4f2efd094b65f68feb954698cb6f54c2f6340e

Here's a tryserver build that you can use to test my patch:

https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/av_7BBl8SoGZOmTjtBoF6Q/runs/0/artifacts/public/build/target.dmg

Just do kill -28 [pid] from a Terminal prompt to make it crash on demand. This works with either the main process (firefox) or one of the child processes (plugin-container). I find I get the best results (the quickest crashes) with the first "child" plugin-container process (the one with -childID 1 on its command line).

The reason BigSur minidumps have lots of nulled UUIDs (debug ids) is that breakpad code currently only looks at the filesystem to get a given system library's UUID -- it opens the file and reads its LD_UUID "command". But breakpad also contains code to read the LD_UUID "command" from a module (corresponding to the file) that's been loaded into memory (to run firefox or plugin-container). This code seems to work fine. My patch falls back to it if it's not possible to read the LC_UUID "command" from the actual file.

As my code comment points out, doing this for child process crashes (which happen in another process) might fail because the system library in question isn't loaded in the main process (only in the child process). But I think this is unlikely. The firefox and plugin-container processes basically use the same binaries (XUL and friends), so they very likely pull in exactly the same system libraries. And in any case occasional failure to get a system library's UUID is better than consistent failure to do so.

I don't know why current breakpad code doesn't look at modules loaded into memory for UUIDs, even when that would be appropriate -- when the crash is in the main process. Mozilla code's "blame" doesn't shed any light on the question. In my limited testing I didn't see any problems with the code that finds UUIDs in modules loaded in memory. We should keep an eye out for trouble, but the worst that can happen is that we'll sometimes fail to find a UUID on BigSur. My patch doesn't change how breakpad works on other versions of macOS, which still have separate copies of system files in the filesystem.

Assignee: nobody → smichaud
Status: NEW → ASSIGNED
Pushed by smichaud@pobox.com:
https://hg.mozilla.org/integration/autoland/rev/b0507b479019
Fall back to getting debug ids from modules in memory. r=gsvelto
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch

Comment on attachment 9173855 [details]
Bug 1662862 - Fall back to getting debug ids from modules in memory. r=gsvelto

Beta/Release Uplift Approval Request

  • User impact if declined: This bug might not be fixed in a Firefox release by the time Apple releases macOS 11. This would cause a large increase in the number of crash stacks at https://crash-stats.mozilla.org/ not being properly symbolicated, as Firefox users upgrade to macOS 11.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This bug's patch is a simple fix that only changes behavior on macOS 11.
  • String changes made/needed: None
Attachment #9173855 - Flags: approval-mozilla-beta?

Judging by past experience, macOS 11 is likely to be released in late September or early October:

macOS 10.15 release date: October 7, 2019
macOS 10.14 release date: September 24, 2018
macOS 10.13 release date: September 25, 2017

https://en.wikipedia.org/wiki/MacOS

Seems like something we'd want on ESR78 also.

Comment on attachment 9173855 [details]
Bug 1662862 - Fall back to getting debug ids from modules in memory. r=gsvelto

Approved for 81.0b9 and 78.3esr. Thanks for the patch, Steven!

Attachment #9173855 - Flags: approval-mozilla-esr78+
Attachment #9173855 - Flags: approval-mozilla-beta?
Attachment #9173855 - Flags: approval-mozilla-beta+
Regressions: 1676102
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: