Closed
Bug 787302
Opened 12 years ago
Closed 12 years ago
Most stacks are truncated (using minidump_stackwalk on a Mac Tinderbox debug build)
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
VERIFIED
FIXED
mozilla18
People
(Reporter: jruderman, Assigned: espindola)
References
Details
(Keywords: regression, Whiteboard: [fuzzblocker])
Attachments
(8 files, 1 obsolete file)
263.50 KB,
text/plain
|
Details | |
443 bytes,
text/plain
|
Details | |
273.23 KB,
application/octet-stream
|
Details | |
421 bytes,
text/plain
|
Details | |
1.42 KB,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
795 bytes,
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
3.39 KB,
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
3.87 KB,
patch
|
akeybl
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
This broke a little over a month ago. Maybe due to the switch to clang? My DOM fuzzer relies on these crash stacks for distinguishing known bugs from new bugs. For many crashes, the top item is sufficient, but there are some (e.g. bug 588237) that I need to ignore based on something in the middle of the stack. I'll attach some dumps and truncated-stacks that I got using the testcases in bug 588237 and bug 763828. They're all using the build and symbol files in https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64-debug/1346378090/ along with the minidump_stackwalk executable from http://hg.mozilla.org/build/tools/raw-file/default/breakpad/osx/minidump_stackwalk
Reporter | ||
Comment 1•12 years ago
|
||
Reporter | ||
Comment 2•12 years ago
|
||
Reporter | ||
Comment 3•12 years ago
|
||
Reporter | ||
Comment 4•12 years ago
|
||
Reporter | ||
Comment 5•12 years ago
|
||
> https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla- > central-macosx64-debug/1346378090/ Copied to http://www.squarefree.com/bug787302/build_1346378090/ in case it gets deleted.
Comment 6•12 years ago
|
||
Okay, so the fact that the second frame in both stacks is "Found by: stack scanning" is telling. That means that we don't have usable unwind information for frame 0, so we're resorting to scanning.
Assignee | ||
Comment 7•12 years ago
|
||
Locally I get a full stack, but it is found by stack scanning. The first function in the stack is: __Z14mozalloc_abortPKc: 0000000000000b40 pushq %rbp 0000000000000b41 movq %rsp, %rbp 0000000000000b44 pushq %rbx 0000000000000b45 pushq %rax 0000000000000b46 movq 1219(%rip), %rbx 0000000000000b4d movq (%rbx), %rsi 0000000000000b50 callq 0xbc4 ## symbol stub for: _fputs 0000000000000b55 movq (%rbx), %rsi 0000000000000b58 leaq 241(%rip), %rdi ## literal pool for: 0000000000000b5f callq 0xbc4 ## symbol stub for: _fputs 0000000000000b64 movl $123, 0 So we should be able to just follow rsp, rbp. I will try to figure out what went wrong.
Status: NEW → ASSIGNED
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → respindola
Comment 8•12 years ago
|
||
It's possible the minidump_stackwalk the tinderbox machines are using is old enough that it doesn't know how to use frame pointers to walk an x86-64 stack.
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Ted Mielczarek [:ted] from comment #8) > It's possible the minidump_stackwalk the tinderbox machines are using is old > enough that it doesn't know how to use frame pointers to walk an x86-64 > stack. Do you know where can I check the version (and update it if needed)?
Assignee | ||
Comment 10•12 years ago
|
||
Looks like we have two problems 1) the 64 bit unwinder (even the current svn one) will not try rsp/rbp. 2) clang and breakpad are not agreeing on the cfi I will try to fix 2 first.
Comment 11•12 years ago
|
||
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #10) > Looks like we have two problems > > 1) the 64 bit unwinder (even the current svn one) will not try rsp/rbp. Oh huh, I would have sworn I implemented that at some point. It wouldn't be hard to add. > 2) clang and breakpad are not agreeing on the cfi In this case there definitely ought to be usable CFI here since we have full symbol information. Thanks for looking into it!
Assignee | ||
Comment 12•12 years ago
|
||
I just downloaded https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64-debug/1346378090/firefox-18.0a1.en-US.mac64.crashreporter-symbols.zip and: $ unzip -L firefox-18.0a1.en-US.mac64.crashreporter-symbols.zip $ find . -name *.sym | xargs grep STACK ./XUL/83BAA1821A40335FB1C01A83683E86240/XUL.sym:FUNC 1255f80 84 0 JSD_ASSERT_VALID_STACK_FRAME ./libnspr4.dylib/3FECC3934B0F385B90FDFC52F3AF1C0A0/libnspr4.dylib.sym:FUNC a830 34 0 _PR_RELEASE_LOCK_STACK I will debug why we are missing STACK entries tomorrow.
Comment 13•12 years ago
|
||
That is really exceptionally bad. So bad that I would backout clang on Firefox 17 if we don't have an easy fix.
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Ted Mielczarek [:ted] from comment #13) > That is really exceptionally bad. So bad that I would backout clang on > Firefox 17 if we don't have an easy fix. I just noticed that running dump_syms on a .dylib produces a dump with no line numbers, but with stack information. Running it on the .dylib.dSYM produces a dump without stack information but with line numbers. Our build process uses the .dylib.dSYM, which explains the problem. Do you now if we changed anything in this area?
Assignee | ||
Comment 15•12 years ago
|
||
A summary of what is going on: * on x86_64, gcc produces both eh_frame and debug_frame, clang produces only eh_frame * LOAD segments are not copied to .dSYM. From macho_reader.cc: // Mach-O files in .dSYM bundles have the contents of the loaded // segments removed, and their file offsets and file sizes zeroed // out. However, the sections within those segments still have // non-zero sizes. There's no reason to call MisplacedSectionData in // this case; the caller may just need the section's load // address. But do set the contents' limits to NULL, for safety. The correct fix is to change dump_syms to look at both the .dylib and the .dSYM. The easy fix is to patch clang to produce debug_frame too. I am doing the easy fix, ted, can I assign you a bug about fixing dump_syms and updating it?
Comment 16•12 years ago
|
||
Yes, I'm not sure I'll get to it right away though.
Assignee | ||
Comment 17•12 years ago
|
||
I tested this locally and we do get STACK lines in the output of "make buildsyms". Sending for catlee review since rail is away.
Attachment #660951 -
Flags: review?(catlee)
Comment 18•12 years ago
|
||
Comment on attachment 660951 [details] [diff] [review] patch clang to always produce debug_frame Review of attachment 660951 [details] [diff] [review]: ----------------------------------------------------------------- <rubberstamp>
Attachment #660951 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 19•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/436ae8b59f87
Whiteboard: [fuzzblocker] → [fuzzblocker], [leave-open]
Reporter | ||
Comment 21•12 years ago
|
||
I'm still getting truncated stacks with the crash in bug 588237 using https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64-debug/1347968942/ built from http://hg.mozilla.org/mozilla-central/rev/e4757379b99a
Assignee | ||
Comment 22•12 years ago
|
||
(In reply to Jesse Ruderman from comment #21) > I'm still getting truncated stacks with the crash in bug 588237 using > https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla- > central-macosx64-debug/1347968942/ > built from > http://hg.mozilla.org/mozilla-central/rev/e4757379b99a Yes, sorry. I had to install centos5 vms as we update clang on linux and OS X in sync. I am just finishing up the last clang build.
Assignee | ||
Comment 23•12 years ago
|
||
Try at: https://tbpl.mozilla.org/?tree=Try&rev=5bd058fc51b1
Attachment #662222 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #662222 -
Flags: review?(rail) → review+
Assignee | ||
Comment 24•12 years ago
|
||
Attachment #662265 -
Flags: review?(rail)
Assignee | ||
Comment 25•12 years ago
|
||
now with OS X patched. https://tbpl.mozilla.org/?tree=Try&rev=10cb43ddedc3
Attachment #662222 -
Attachment is obsolete: true
Attachment #662282 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #662282 -
Flags: review?(rail) → review+
Updated•12 years ago
|
Attachment #662265 -
Flags: review?(rail) → review+
Assignee | ||
Comment 26•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/20f683bc1ce7
Assignee | ||
Comment 27•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/35b2c35d3243
Assignee | ||
Comment 28•12 years ago
|
||
The build on try had STACK lines. I have just requested a clobber on inbound, so the next inbound build should be good too.
Whiteboard: [fuzzblocker], [leave-open] → [fuzzblocker]
Assignee | ||
Comment 29•12 years ago
|
||
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64-debug/1348002836/firefox-18.0a1.en-US.mac64.crashreporter-symbols.zip has STACK lines.
Comment 30•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/20f683bc1ce7 https://hg.mozilla.org/mozilla-central/rev/35b2c35d3243
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Reporter | ||
Comment 31•12 years ago
|
||
Much better! The stacks still aren't getting all the way to main, but they're getting far enough for my scripts to work.
Blocks: 733905
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 32•12 years ago
|
||
(In reply to Jesse Ruderman from comment #31) > Much better! The stacks still aren't getting all the way to main, but > they're getting far enough for my scripts to work. Where are they stopping? Can you report a bug about that?
Reporter | ||
Comment 33•12 years ago
|
||
I filed bug 793109
Assignee | ||
Comment 34•12 years ago
|
||
[Approval Request Comment] Bug caused by (feature/regressing bug #): 787302 User impact if declined: Harder to debug other bugs because of broken stacks. Testing completed (on m-c, etc.): Risk to taking this patch (and alternatives if risky): compiler bug, but low risk since it has been in use on m-c. We could backport just the fix, but we would then have a new version of clang to keep track of. String or UUID changes made by this patch:
Attachment #663536 -
Flags: approval-mozilla-aurora?
Comment 35•12 years ago
|
||
Comment on attachment 663536 [details] [diff] [review] Backport 787568 and 787302 to aurora [Triage Comment] Early enough in the cycle that we can take a fix like this in support of debugging/fuzzing.
Attachment #663536 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 36•12 years ago
|
||
Setting checkin-needed so we don't forget this for mozilla-aurora.
Keywords: checkin-needed
Comment 37•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/b88c6a990b0f Rafael, my apologies for your name getting a bit messed up. You can thank Windows' crappy command line for that. That said, you can avoid possible problems like that in the future by making sure your patches have the necessary commit information in them when you post them. https://developer.mozilla.org/en-US/docs/Creating_a_patch_that_can_be_checked_in
Comment 38•12 years ago
|
||
Note that in the future if you're branch-landing patches that have already landed on mozilla-central you can use "hg transplant" instead and just reuse the existing commit metadata: https://blog.mozilla.org/ted/2009/12/02/easy-branch-landing-of-patches/
Comment 39•12 years ago
|
||
I normally do transplant, but not when there's a branch-specific patch posted to a bug...
Comment 40•12 years ago
|
||
Ah, missed that. sorry!
Assignee | ||
Comment 41•12 years ago
|
||
Thanks everyone. The aurora patch was a squashed version of the patches that went to inbound.
You need to log in
before you can comment on or make changes to this bug.
Description
•