Closed Bug 571578 Opened 14 years ago Closed 14 years ago

minidump_stackwalk for x86_64 doesn't scan the stack for return address, so it fails when symbols are incomplete

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jruderman, Assigned: ted)

References

Details

> Thread 2
>  0  libpthread-2.11.1.so + 0xb85c
>     rbx = 0x66f2a710   r12 = 0x00000001   r13 = 0x00000001   r14 = 0x00000000
>     r15 = 0x00000000   rip = 0x7966985c   rsp = 0x66f29cc8   rbp = 0x66f29d20
>
> Thread 3
>  0  libpthread-2.11.1.so + 0xbbc9
>     rbx = 0x66729710   r12 = 0x00000009   r13 = 0x66728cc0   r14 = 0x00000000
>     r15 = 0x00000000   rip = 0x79669bc9   rsp = 0x66728c28   rbp = 0x66728cf0

<ted> currently, the x86-64 stackwalker doesn't have any heuristics like the x86 one does, where it can scan the stack for return addresses
<ted> so it gives up and goes home
<ted> without symbols we can't walk the stack on x86-64, because we need the CFI to walk it, and there's no fallback

The result is that I never get useful stacks for non-main-threads, and often don't get useful stacks for even the main thread.

This doesn't block me personally because I can use 32-bit builds.
For x86_64, does it make sense to unwind on the client, given that it has access to the CFI?  AFAIK, it doesn't need symbols to get return addresses because the CFI is always(?) there.
That would involve completely rearchitecting how Breakpad works.
Although we might be able to figure out some sort of clever hack where the client code uses the unwind data and inserts some extra metadata into the dump to give the stackwalker a hand.
Adding the scanning heuristic shouldn't be that hard.  If anyone wants to try it, find me on #breakpad and ask for help.
This is going to be more important since we're shipping 64-bit Linux and OS X binaries for Firefox 4. Any stack that starts out in an unknown module (like a system library) is going to stop there. Also, we don't have symbols for most Linux/OS X system libraries, so we'll see this a lot. For example, I'm testing OS X plugin hang reporting, and all my stacks look like:
http://crash-stats.mozilla.com/report/index/bp-ec518ff9-396b-4089-a477-56e742100824
Summary: minidump_stackwalk for x86_64 doesn't scan the stack for return address, so it fails when symbols are imcomplete → minidump_stackwalk for x86_64 doesn't scan the stack for return address, so it fails when symbols are incomplete
blocking2.0: --- → ?
seems like we need some solution on this for firefox4 or risk not understanding some increasing pct. of crash data.
Needs an owner; blocking for the scanning heuristic.
blocking2.0: ? → betaN+
Assignee: nobody → ted.mielczarek
Status: NEW → ASSIGNED
Patches are up for review:
http://breakpad.appspot.com/205001 (has r+ jimb already)
http://breakpad.appspot.com/206001

Should be able to get these landed upstream tomorrow, then get Socorro updated fairly soon after that.
Depends on: 601114
Blocks: 601117
Looks good:
http://crash-stats.mozilla.com/report/index/bp-3bd4810d-79da-478e-99bc-9e5572101006
(click "Show/hide other threads" to see more useful stacks)
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
(In reply to comment #12)
> Looks good:
> http://crash-stats.mozilla.com/report/index/bp-3bd4810d-79da-478e-99bc-9e5572101006
> (click "Show/hide other threads" to see more useful stacks)

That crash (and, I see, this bug) is on Linux.

I had a crash today on Mac OS X, and it still doesn't have a stack trace:
http://crash-stats.mozilla.com/report/index/bp-b0c7ac9d-83aa-42e4-aee6-2adb12101010

Perhaps bug 601312 isn't a duplicate of this after all?
You need to log in before you can comment on or make changes to this bug.