Closed Bug 895470 Opened 11 years ago Closed 11 years ago

Buggy stack trace for crashes on abort in Mac OS X since 25.0

Categories

(Toolkit :: Crash Reporting, defect)

25 Branch
All
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox24 --- unaffected
firefox25 --- affected

People

(Reporter: scoobidiver, Assigned: benjamin)

References

Details

(Keywords: regression)

Attachments

(1 file)

*sigh* I just looked at a symbol file from the crash you linked in bug 844819, and I realized what the problem is.

Some of our code doesn't change very often (libmozalloc, for example), and as such it compiles to the same binary across different versions. We haven't historically worried about that, since it doesn't usually cause problems (the same binaries produce the same symbols). Our symbol upload script unzips symbols with "unzip -n", which tells it not to overwrite existing files:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/tools/upload_symbols.sh#44

TL;DR: we have a bad libmozalloc symbol file on the symbol server, and it won't be overwritten through normal symbol uploads.

Scoobidiver: thanks for looking into this.
See Also: → 884300
Ted, the symbols have the same debug ID but different contents? Remind me of the algorithm for calculating debug IDs for mac binaries...

Neither the current unzip -n behavior nor an overwrite behavior seem ideal. Is there a simple way (using buildbot logs perhaps?) to figure out how common this is and whether it affects certain binaries more often?
Flags: needinfo?(ted)
The contents are different because of a bug (bug 884300) in the symbol dumper that incorrectly wrote out CFI rules. The actual symbols are exactly the same, but the on-disk format was broken during the period we had that bug.

Modulo bugs, the current behavior has worked fine for a long time. There's only one other quirk here that bothers people, which is that links to source repos can be confusing (because you pick up a symbol file from a different release). The source should be equivalent if the binaries are equivalent, though, so that hasn't been a problem in practice.

I suspect this only shows up in things like mozalloc or NSPR where the source doesn't change frequently, so they can compile to the same binary.

We could probably audit for a list of broken symbols, the broken CFI rules look like:
STACK CFI INIT 940 a .cfa: 0x101e02c00 8 + .ra: 0x101e06150 -8 + ^
STACK CFI 941 $rbp: 0x101e06150 -16 + ^ .cfa: 0x101e02c00 16 +

whereas non-broken ones look like:
STACK CFI INIT 2203ff3 30 .cfa: $rsp 8 + .ra: .cfa -8 + ^
STACK CFI 2203ff4 $rbp: .cfa -16 + ^ .cfa: $rsp 16 +

Grepping for "CFI.*0x" is probably sufficient. We'd then want to find symbol files that are used by lots of builds (something like "grep <debug id> *.txt | wc -l") and re-dump them with a fixed dump_syms.

In this specific case, we could just grab a libmozalloc.dylib.sym from a newer build that has the same debug ID and replace the one in symbols_ffx.
Flags: needinfo?(ted)
Oh, re: "Where do debug IDs come from"? We use .note.gnu.build-id if it's present on Linux, and LC_UUID on Mac if it's present. (Those are both present in our current binaries on their respective platforms.) These are both a hash of some parts of the binary, which is why they wind up identical.
Assignee: nobody → benjamin
I let this slip by accident and it doesn't appear to be a problem any more. scoobidiver can you verify, or link me to the recent reports that show the problem?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
I can reproduce this by crashing a Mac nightly with the testcase in bug 779424:
https://crash-stats.mozilla.com/report/index/ee15dd75-23e7-428d-8c4b-2b9032130908

http://hg.mozilla.org/mozilla-central/annotate/655ac375b1c7/gfx/thebes/gfxPlatform.cpp#l741

Bug 844819 is probably under-counted as a result of being spread out over many signatures.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Attached file mozalloc.todelete
ok, here are the files that need to be deleted and the correct version will be uploaded the next time a build reuses one of those GUIDs.
Removed!

[root@sp-admin01.phx1 895470]# cat files_to_remove.txt
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/40DC647F97783928AACD6CC468B404260/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/FBEB39CFA61D3A5C829FE52B47C8EC620/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/661E8C2444AC352AAF7106EF7C84C6C30/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/1FC811789DD6318BBF46CE272869404C0/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/D61D23A1135D305DA2B725827A2D05B90/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/32BE5E053EDC3FE89049578E650A08320/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/462CE2CA079B3656BC929DEF8E154E090/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/D859B005767F35F9A3EA0E6C294C16FF0/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/33BC19D88EB838BC905BB54F666CD78F0/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/3B0385F5B2A53244B6A77532FE34C94F0/libmozalloc.dylib.sym

[root@sp-admin01.phx1 895470]# for file in $(cat files_to_remove.txt); do ls $file; done
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/40DC647F97783928AACD6CC468B404260/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/FBEB39CFA61D3A5C829FE52B47C8EC620/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/661E8C2444AC352AAF7106EF7C84C6C30/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/1FC811789DD6318BBF46CE272869404C0/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/D61D23A1135D305DA2B725827A2D05B90/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/32BE5E053EDC3FE89049578E650A08320/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/462CE2CA079B3656BC929DEF8E154E090/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/D859B005767F35F9A3EA0E6C294C16FF0/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/33BC19D88EB838BC905BB54F666CD78F0/libmozalloc.dylib.sym
/mnt/socorro/symbols/symbols_ffx/libmozalloc.dylib/3B0385F5B2A53244B6A77532FE34C94F0/libmozalloc.dylib.sym

[root@sp-admin01.phx1 895470]# for file in $(cat files_to_remove.txt); do rm -rf $file; done
I get a good stack for bug 779424 now:
https://crash-stats.mozilla.com/report/index/a221964b-a31e-4dd8-aac8-bd4c02130922

The two top abort signatures for Nightly 27 look good, too.

Should we mark this as FIXED?
Yep. I don't think symbol backfill is necessary.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: