Closed Bug 421534 Opened 16 years ago Closed 15 years ago

Mac: generate breakpad .sym from dwarf symbols instead of stabs


(Toolkit :: Crash Reporting, defect)

1.9.1 Branch
Not set





(Reporter: benjamin, Assigned: ted)



(Keywords: fixed1.9.1)


(3 files, 4 obsolete files)

Dwarf is the new good thing, and we've encountered bugs in GCC where building with stabs just won't work. So we should bite the bullet and get the ability to get crash-reporting data out of dwarf instead of stabs. Licensing is an issue if we're going to commit this upstream in breakpad, so use of GPL dwarf-reading libraries is not so good.

Shebs has been looking into this.
Blocks: 417045
Just a little status update - I'm now finding and opening the .o files from the main executable, and the output is the cumulative result. However, I still need to come up with the correct base address for each .o, since all the lines are jumbled together. (Also I regressed stabs handling, so need to start cleaning up my hacks so I have the base case to compare against.)
OK, my read-.o-files-recursively hack doesn't actually work, things aren't happening in the right order to get the not-relocated symbols in the .o files to be moved to their real addresses. So gotta do a little rewrite, basically cache the .o files until we're done with the main symbols, then suck in the .o symbols and match each one with the main program version. (This is what Apple GDB has to do, fortunately we don't have to also support partial symbol tables!)
I think we need this before we can switch Mac nightlies/releases to Leopard, or we'll have to take the hit of using -save-temps (which bloats out the build time and tree).
Flags: blocking1.9?
Here is an ultra-hacky first-draft version of Apple-style dwarf2 handling for breakpad. Its only virtues are it gets through assorted Mozilla executables and dylibs without dying (usually), and produces plausible-looking output. There are hordes of omissions and limitations.
Already marked as blocking a blocker...
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
Current status is that it's run as part of "make buildsymbols" on a regular trunk build, but there are several serious problems that will have to be resolved before it's an adequate replacement for stabs, such as the need to produce full pathes to source files, mistaken address offsets in separate .o files, and the need to get symbols from objects in .a archives. At least another week will be needed to fix all these.
Looks like we're not going to get bug 417045 regardless, and we don't have a week to do this, so minusing now. I really think we'll want this in the Moz 2 timeframe. Shebs: could you just post a patch of what you've got done, with some comments on what still needs to be finished? Hopefully you or someone else will be able to pick this up and finish it off later.
Flags: wanted-next+
Flags: blocking1.9-
Flags: blocking1.9+
I've got some half-written pieces to finish off while they're flying around in my head, can take a couple days on those and then put in upstream while moz trunk is frozen.
Target Milestone: --- → mozilla1.9beta5
Priority: P1 → --
Target Milestone: mozilla1.9beta5 → ---
Hey Stan any updates here? We'd like to do this switch quickly after the release of FF3...
I managed to create a terrible muddle in my code trying to handle rearrangement of functions in object files (not hypothetical, one of the things ICC does differently from GCC), and set it aside to work on blockers and dehydra. I'll bring it forward again.
FWIW, we just enabled breakpad symbols on mozilla-central builds and discovered that with -gstabs -gfull there is problems...there is little to no source information and Ted said something about a visibility issue. (For reference here's the crash report and the symbols:
OK, translation of relative addresses in object files is basically working, though the code is somewhat hacky and not yet efficient enough to use on the browser.
Attached patch Snapshot of dwarf support (obsolete) — Splinter Review
WIP version of changes, just as a checkpoint.
Attachment #309018 - Attachment is obsolete: true
Address translation is now producing completely correct results, and efficiently enough to use with "make buildsymbols". Unfortunately, dump_syms crashes on seven of the files, with what looks like memory corruption. There are still some discrepancies between stabs and dwarf dumps, though some are unavoidable, and other may be desirable; the stabs dump will show some inline bodies that the dwarf dump skips (this is a good thing because otherwise the .h file of the inline gets listed in the stack dump, and the calling function never appears).
The crashes are now fixed, and I've added the heuristic that recognizes a random symbol as a function if there is any line number info at all in its address range (the hybrid stabs/dwarf scheme used by OS X does not have N_FUN stabs giving a length, so we get "PUBLIC" instead of "FUNC" entries). Remaining key issues are to get info from .a archives, and to figure out what to do about multiple source files apparently given the same address.
Reading from archives is working, and the multiple source files at one address is a fundamental design flaw that affects stabs also (the existing code simply uses the last file it found, whether or not it's the right choice), so I'm punting on it. I'm now cleaning out the temporary hacks in my code and putting it into "Google style". Also I went to check that the dwarf bits worked OK in 64-bit, but it's not even recognizing the file as valid, so need to debug that.
This is the code I plan to submit for incorporation into Google's Breakpad sources. I started on 64-bit support, but it's going to need more before that is functional. I did regression-test with "make buildsymbols", and there is no effect on stabs, so no regression for the existing 10.4 builds.
Attachment #315623 - Attachment is obsolete: true
The patch is currently being reviewed by Mark Mentovai and others.
I applied this patch, built a whole universal Firefox with libmozjs.dylib from ICC, then ran buildsymbols on it, crashed the resulting Firefox, and tried running minidump_stackwalk on the minidump using the symbols. I didn't get a stack trace, because apparently the symbol reader is unhappy with having line number data for PUBLIC functions:
2008-05-15 09:35:50: ERROR: Found source line data without a function at /Users/luser/build/obj-firefox-icc/ppc/dist/crashreporter-symbols/2008051508//libmozjs.dylib/21FEC0F63F7344B9928E28074CF30AD50/libmozjs.dylib.sym:118

The offending line is the second line here:
PUBLIC c60 0 math_static_methods.0
c62 1f 5803 1
c81 5 5808 1

This happens for every symbol file it tries to load, so I don't get a real stack trace.
Hmm, there's a heuristic for attaching a line number to a random symbol, necessary because in the dwarf2 format symbols don't get line numbers directly like is done in stabs - but the heuristics requires address of symbol and line to be the same, and here we have symbol at 0xc60 but the first line is at 0xc62. Probably the compiler should have issued line number info for the function. I think it's safe to have the dumper change PUBLIC to a FUNC if line number info follows, but we have no way to guess at the line number, so the stack walker just needs to cope with that (which it should, since a stack dump missing only a couple frames of symbol info is still useful).
Right, I was probably misleading above, I got a partial stack trace, I just didn't get any symbol information, because the symbol file reader aborts when it hits an instance of this.
Oh, there's also this:
008-05-27 16:08:46: ERROR: Found source line data without a function at /Users/luser/build/obj-firefox-icc/ppc/dist/crashreporter-symbols/2008051508//XUL/8D61558EF045AFE2F6CA72CAACADF0340/XUL.sym:340034

That's the first line after the FILE lines, it just goes right into line number data, without a FUNC or a PUBLIC.
Is this still being worked on? Symbols are still eating up a _ton_ of space on our Mac m-c/actionmonkey/tracemonkey slaves.
This is still hurting really bad on mac m-c/actionmonkey/tracemonkey slaves. Any update here?
We have a contractor lined up to do some work, but I haven't had time to interface with him enough. Hopefully we'll get to that very soon.
we discovered over in bug 448616 that upgrading to Xcode 3.1 will let us build without -save-temps, so this issue is going to be less critical from a build-infra perspective.
It looks like breakpad got dwarf support added on Oct 8. Is there anything holding this up now?
Need to sync up with upstream Breakpad code, make some local changes to match the upstream changes, and then change the symbol build script to produce .dSYM files, as that's what the Breakpad DWARF code expects. Probably ~1 week worth of work. I plan to make that happen once I finish off a few other things.
Depends on: 464093
Depends on: 465329
Assignee: stanshebs → ted.mielczarek
Blocks: 464093
No longer depends on: 464093
Attachment #317448 - Attachment is obsolete: true
Blocks: 445090
Attached patch first pass (obsolete) — Splinter Review
Depends on the patches from bug 465329. I'm still testing the output of this. It looks sort of ok, but might have some issues.
This definitely has some bugs. Two I've found so far:
1) Doesn't support DW_AT_specification, so it completely misses some functions. (Although it has the line number data for them, it doesn't get the function name.)
2) Doesn't provide the full function name for functions.

I'll file these both upstream and fix them there, then sync up again.
I have patches for both issues, should get them reviewed and landed upstream very soon, then I'll some rs to get them in our repo, and we'll be set. With those two issues fixed and the patch here, I'm able to build symbols from dwarf debuginfo and get a valid stack out the other end.
This works great with the breakpad sync + new patches that need to land upstream. I was getting tripped up when I ran buildsymbols twice without cleaning up the .dSYM files in between, so I added some code to work around that situation. Shouldn't be needed in the usual case, but I know I'll hit this again in future testing, so I'll save myself some hassle.
Attachment #350821 - Attachment is obsolete: true
Attachment #351432 - Flags: review?(benjamin)
Attachment #351432 - Flags: review?(benjamin) → review+
Comment on attachment 351887 [details] [diff] [review]
patch for nightly/release mozconfigs [checked in]

I'm r+ing this so much!
Attachment #351887 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 351432 [details] [diff] [review]
with a little protection if you run buildsymbols twice [checked in]


bhearsum: close this bug when you push that mozconfig change, please.
Attachment #351432 - Attachment description: with a little protection if you run buildsymbols twice → with a little protection if you run buildsymbols twice [checked in]
Comment on attachment 351887 [details] [diff] [review]
patch for nightly/release mozconfigs [checked in]

changeset:   587:b76bbdd5af57
Attachment #351887 - Attachment description: patch for nightly/release mozconfigs → patch for nightly/release mozconfigs [checked in]
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 468494
Attachment #351432 - Flags: approval1.9.1?
Comment on attachment 351432 [details] [diff] [review]
with a little protection if you run buildsymbols twice [checked in]

See my long justification in bug 465329 comment 11.
Attachment #351432 - Flags: approval1.9.1? → approval1.9.1+
So, I've just symlinked mozilla-1.9.1 and tracemonkey to mozilla-central -- the only difference between these directories was stabs vs. dwarf. May as well symlink to reduce maintenance burden.
Attachment #354187 - Flags: review?(ted.mielczarek)
Attachment #354187 - Flags: review?(ted.mielczarek) → review+
Attachment #354187 - Attachment description: switch to dwarf on tracemonkey and mozilla-1.9.1 → [checked in] switch to dwarf on tracemonkey and mozilla-1.9.1
Target Milestone: --- → mozilla1.9.1b3
Version: unspecified → 1.9.1 Branch
Blocks: 394031
Target Milestone: mozilla1.9.1b3 → mozilla1.9.2a1
You need to log in before you can comment on or make changes to this bug.