frame pointer-based unwinding for aarch64 is not so great
Categories
(Toolkit :: Crash Reporting, defect, P3)
Tracking
()
People
(Reporter: froydnj, Unassigned)
References
(Blocks 1 open bug)
Details
The current stackwalker for aarch64 on windows does frame-pointer based unwinding. This kind of unwinding is currently busted because we're not actually compiling everything with frame pointers (bug 1525630), but even once that gets fixed, there are still bugs.
Quick terminology note: fp/lr/sp are the frame pointer, link register, and stack pointer, respectively, and are aliases for the actual hardware registers x29/x30/x31.
Your typical stack frame with frame pointers on aarch64 is going to look like:
caller's sp ---> +----------------------------------+
| caller's frame pointer (8 bytes) |
+----------------------------------+
| return address (8 bytes) |
fp ---> +----------------------------------+
| |
| ........................ |
| |
sp ---> +----------------------------------+
The return address slot in the above diagram is going to be the value of lr on entry to the function.
The current unwinding code in breakpad lives here:
https://chromium.googlesource.com/breakpad/breakpad/+/master/src/processor/stackwalker_arm64.cc#189
https://searchfox.org/mozilla-central/rev/490ab7f9b84570573a49d7fa018673ce0d5ddf22/toolkit/crashreporter/google-breakpad/src/processor/stackwalker_arm64.cc#189-218
This currently matches what we have in m-c and what socorro is using to process crash dumps.
This bit parses out the two words for the return address and the previous frame pointer:
That part is fine, but the dodgy part comes in how we're going to fill in the previous frame:
The previous fp and sp are fine, those make sense. However, the assignment for pc in the previous frame is (mostly) incorrect: we should be using the return address that we saved to the stack. Instead we're using lr, which means that if lr is ever modified by a function, i.e.:
- it decides to call another function, modifying lr; or
- it decides to use lr for some other purpose (temporary values)
we're going to compute a completely bogus pc for our previous frame.
The code in breakpad is more correct for certain kinds of functions, e.g. most crashes will feature a bunch of threads like:
Thread 67
0 ntdll.dll!NtWaitForSingleObject + 0x4
and at that point, it's guaranteed that we can't have set up fp for our new function, so fp actually points to the frame for the caller of NtWaitForSingleObject, and if we used the recovered lr of the frame (as we should be doing), rather than the value of lr (as we're currently doing), we'd skip a frame. But I think that's preferable to unwinding off into the weeds if you get a bogus lr.
I can't tell whether this makes any difference on our current crashes, because our current crashes don't have frame pointers and I haven't been able to make a build with frame pointers give me a minidump that I can twiddle with. But I'm reasonable sure it's going to make a difference on crashes we get in the future.
Comment 1•5 years ago
|
||
For comparison's sake I built a little Breakpad test program I have for aarch64-linux and ran it under qemu to generate a minidump. I tarred up the binary, symbols, and minidump here:
https://drive.google.com/file/d/1eqxS5wk4ivksqcDAxEw2ho6RTNqjbjXZ/view?usp=sharing
minidump_stackwalk seems to be able to use frame pointers to unwind if I don't give it the symbols to use, although it loses the last frame, and the frame pointer values don't match up with the ones it recovers when unwinding with symbols.
Without symbols I get:
Thread 0 (crashed)
0 test_crash + 0x23f0
x0 = 0x7ffffffff0dea000 x1 = 0x0000000000000001
x2 = 0x0000000000000000 x3 = 0x0000000000000000
x4 = 0x0000aaaab2980408 x5 = 0x0000000000000000
x6 = 0x0000000000000001 x7 = 0x0000aaaab9736010
x8 = 0x0000000000000000 x9 = 0x0000ffffa2c0b400
x10 = 0x0000000000000040 x11 = 0x0000000000000000
x12 = 0x0000000000000002 x13 = 0x0000000000000000
x14 = 0x0000000000000000 x15 = 0x0000000000000001
x16 = 0x0000aaaab297fe20 x17 = 0x0000ffffa2bc3558
x18 = 0x000000000000000f x19 = 0x0000aaaab296a5f8
x20 = 0x0000000000000000 x21 = 0x0000aaaab295a2ac
x22 = 0x0000000000000000 x23 = 0x0000000000000000
x24 = 0x0000000000000000 x25 = 0x0000000000000000
x26 = 0x0000000000000000 x27 = 0x0000000000000000
x28 = 0x0000000000000000 fp = 0x0000ffffca04b590
lr = 0x0000aaaab295a4c4 sp = 0x0000ffffca04b580
pc = 0x0000aaaab295a3f0
Found by: given as instruction pointer in context
1 test_crash + 0x24c0
fp = 0x0000ffffca04b750 lr = 0x0000ffffa28aed24
sp = 0x0000ffffca04b5a0 pc = 0x0000aaaab295a4c4
Found by: previous frame's frame pointer
2 libc-2.28.so + 0x20d20
fp = 0x0000000000000000 lr = 0x0000aaaab295a2e0
sp = 0x0000ffffca04b760 pc = 0x0000ffffa28aed24
Found by: previous frame's frame pointer
With symbols I get:
Thread 0 (crashed)
0 test_crash!do_crash(int volatile*) + 0x10
x0 = 0x7ffffffff0dea000 x1 = 0x0000000000000001
x2 = 0x0000000000000000 x3 = 0x0000000000000000
x4 = 0x0000aaaab2980408 x5 = 0x0000000000000000
x6 = 0x0000000000000001 x7 = 0x0000aaaab9736010
x8 = 0x0000000000000000 x9 = 0x0000ffffa2c0b400
x10 = 0x0000000000000040 x11 = 0x0000000000000000
x12 = 0x0000000000000002 x13 = 0x0000000000000000
x14 = 0x0000000000000000 x15 = 0x0000000000000001
x16 = 0x0000aaaab297fe20 x17 = 0x0000ffffa2bc3558
x18 = 0x000000000000000f x19 = 0x0000aaaab296a5f8
x20 = 0x0000000000000000 x21 = 0x0000aaaab295a2ac
x22 = 0x0000000000000000 x23 = 0x0000000000000000
x24 = 0x0000000000000000 x25 = 0x0000000000000000
x26 = 0x0000000000000000 x27 = 0x0000000000000000
x28 = 0x0000000000000000 fp = 0x0000ffffca04b590
lr = 0x0000aaaab295a4c4 sp = 0x0000ffffca04b580
pc = 0x0000aaaab295a3f0
Found by: given as instruction pointer in context
1 test_crash!main + 0xc0
x19 = 0x0000aaaab296a5f8 x20 = 0x0000000000000000
x21 = 0x0000aaaab295a2ac x22 = 0x0000000000000000
x23 = 0x0000000000000000 x24 = 0x0000000000000000
x25 = 0x0000000000000000 x26 = 0x0000000000000000
x27 = 0x0000000000000000 x28 = 0x0000000000000000
fp = 0x0000ffffca04b590 sp = 0x0000ffffca04b590
pc = 0x0000aaaab295a4c4
Found by: call frame info
2 libc-2.28.so!__libc_start_main + 0xe0
x19 = 0x0000aaaab296a5f8 x20 = 0x0000000000000000
x21 = 0x0000aaaab295a2ac x22 = 0x0000000000000000
x23 = 0x0000000000000000 x24 = 0x0000000000000000
x25 = 0x0000000000000000 x26 = 0x0000000000000000
x27 = 0x0000000000000000 x28 = 0x0000000000000000
fp = 0x0000ffffca04b750 sp = 0x0000ffffca04b750
pc = 0x0000ffffa28aed24
Found by: call frame info
3 test_crash!_start + 0x30
x19 = 0x0000000000000000 x20 = 0x0000000000000000
x21 = 0x0000aaaab295a2ac x22 = 0x0000000000000000
x23 = 0x0000000000000000 x24 = 0x0000000000000000
x25 = 0x0000000000000000 x26 = 0x0000000000000000
x27 = 0x0000000000000000 x28 = 0x0000000000000000
fp = 0x0000000000000000 sp = 0x0000ffffca04b890
pc = 0x0000aaaab295a2e0
Found by: call frame info
Updated•4 years ago
|
Updated•2 years ago
|
Description
•