Closed Bug 1652865 Opened 5 years ago Closed 5 years ago

dump_syms doesn't produce symbol information for executables

Categories

(Toolkit :: Crash Reporting, defect)

All
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- fixed

People

(Reporter: froydnj, Assigned: froydnj)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

TL; DR: we handle cpusubtype information badly and it causes us to silently drop symbol information for executables.

I attempted to rebase my patch in bug 1576748 so that moving to the new dump_syms implementation would be easier on try -- failures would be swift and immediate, and we wouldn't have to worry about silently breaking crash reporting...which my initial push with the new dump_syms implementation for Mac did, and I might have wound up pushing it to central if I hadn't inspected more closely.

I thought I remember having things working with the patch -- it went through several rounds of being backed out the first time around, but pushing it to try met with errors like:

Unable to calculate UUID of mach-o binary /builds/worker/workspace/obj-build/dist/bin/firefox!

which is...uh, unexpected, because on a local OS X cross build that I have lying about, one can see that there is indeed a UUID load command:

froydnj@hawkeye:~/src/dump_syms$ ~/.mozbuild/clang/bin/llvm-objdump -p /opt/build/froydnj/build-macosx-cross/dist/bin/firefox |grep -B 1 -A 2 LC_UUID
Load command 8
     cmd LC_UUID
 cmdsize 24
    uuid 302C8E84-80E4-3587-94AA-8DA186A126CF

but for whatever reason breakpad isn't able to access it.

Looking a little further afield, our crashreporter symbols for Mac don't feature any symbols for any of the executables we produce during the build process; we only produce symbols for dynamic libraries. Presumably this is because (currently) we are silently erroring out on executables and we just don't notice.

So what is the difference, exactly, between what executables look like and dynamic libraries look like? The first clue lies in the file header, which for firefox looks like:

/opt/build/froydnj/build-macosx-cross/dist/bin/firefox:	file format Mach-O 64-bit x86-64

Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
MH_MAGIC_64  X86_64        ALL LIB64     EXECUTE    22       2256   NOUNDEFS DYLDLINK TWOLEVEL PIE

and for, to pick an example at random, libnss3.dylib:

/opt/build/froydnj/build-macosx-cross/dist/bin/libnss3.dylib:	file format Mach-O 64-bit x86-64

Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
MH_MAGIC_64  X86_64        ALL  0x00       DYLIB    19       2296   NOUNDEFS DYLDLINK TWOLEVEL NO_REEXPORTED_DYLIBS

This output is courtesy of llvm-objdump, and the code that prints that header comes from here:

https://github.com/llvm/llvm-project/blob/c3e6555616fd92e21b17fbc27f8c145c27890f1a/llvm/tools/llvm-objdump/MachODump.cpp#L8231

The cpusubtype field in the above output comes from consulting the cpusubtype field of the file header:

https://github.com/phracker/MacOSX-SDKs/blob/master/MacOSX10.11.sdk/usr/include/mach-o/loader.h#L50-L81

The corresponding code that prints ALL comes from here:

https://github.com/llvm/llvm-project/blob/c3e6555616fd92e21b17fbc27f8c145c27890f1a/llvm/tools/llvm-objdump/MachODump.cpp#L8257-L8270

Notice that we are masking off CPU_SUBTYPE_MASK to switch on the value; that's important, and we'll come back to it later.

Where is the caps field coming from? Well, the upper eight bits of the cpusubtype header field actually store various interesting things; you can see the llvm-objdump code consulting that field to print out LIB64 for the firefox binary:

https://github.com/llvm/llvm-project/blob/c3e6555616fd92e21b17fbc27f8c145c27890f1a/llvm/tools/llvm-objdump/MachODump.cpp#L8364-L8369

The LIB64 bit is defined as 64 bit libraries here:

https://github.com/phracker/MacOSX-SDKs/blob/master/MacOSX10.11.sdk/usr/include/mach/machine.h#L124-L128

Why, exactly, firefox gets a bit indicating 64-bit libraries and the actual dynamic library does not is a question better left unexplored, I think. (I guess this simplifies the lookup or loading process for dependent libraries of a binary inside userspace, perhaps?)

Anyway, those upper eight bits for cpu subtypes are really all that differs between firefox and libnss3.dylib...and they make a difference hereabouts:

https://searchfox.org/mozilla-central/rev/82c04b9cad5b98bdf682bd477f2b1e3071b004ad/toolkit/crashreporter/google-breakpad/src/common/mac/macho_walker.cc#159

because at that point, when we're walking over the headers for firefox, we have a header.cpusubtype field of 0x80000003 (CPU_SUBTYPE_LIB64 | CPU_SUBTYPE_X86_64_ALL) and the passed in cpu_substype originally came from the file header itself. So before that line, header.cpusubtype == cpu_subtype. But after that line, we've masked off the upper bits and we've guaranteed that we cannot pass this test:

https://searchfox.org/mozilla-central/rev/82c04b9cad5b98bdf682bd477f2b1e3071b004ad/toolkit/crashreporter/google-breakpad/src/common/mac/macho_walker.cc#163

And because we don't pass that test, the MachoWalker thinks that we can't hope to find any relevant load commands in the file, and ultimately fails to produce symbols.

I don't know how to convince treeherder to show me jobs from eight months ago, but I bet we'd see that we silently stopped collecting symbols for executables when bug 1371390 landed.

I don't know what the right fix is. If Steven's bug 1371390 comment 31 is correct, I think we should be masking with ~CPU_SUBTYPE_MASK at least here:

https://searchfox.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/mac/dump_syms.cc#290

because selected_object_file_->cpusubtype comes directly from the object itself, and possibly here:

https://searchfox.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/mac/dump_syms.cc#382-386

We might never actually hit that second case, though, because I think we explicitly pass in an arch.

The alternate fix is to declare that we just don't care about executable symbols -- and we don't want to have to store anything more than we have to -- and just filter out those files in symbolstore.py so that we never have to worry about dump_syms getting invalid input. WDYT?

Summary: cpusubtype filtering causes → dump_syms doesn't produce symbol information for executables

In bug 1371390 comment 31 I promised to open another bug about fixing the other references to mach_header.cpusubtype in Breakpad code. I'd forgotten about that. But I could do it, if there's any interest.

The cpusubtype fix seems to work:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf6d98fbda4a3017b1bfb2fd40410bd75fe30e09

The target.crashreporter-symbols.zip archives have symbols for the executables in them!

If we don't do this, we run into problems when walking over the Mach-O load
commands, where we do mask off cpusubtype bits.

Assignee: nobody → nfroyd
Status: NEW → ASSIGNED
Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4c4e40bc3dfc mask off cpusubtype bits before determining Mach-O identifiers; r=gsvelto
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: