Closed Bug 1673755 Opened 4 years ago Closed 3 years ago

Frame pointer unwinding on arm64 reports incorrect return addresses in arm64e system libraries due to pointer authentication

Categories

(Core :: Gecko Profiler, defect, P2)

ARM64
macOS
defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox91 --- fixed

People

(Reporter: mstange, Assigned: mstange)

References

Details

Attachments

(1 file, 1 obsolete file)

Profile: https://share.firefox.dev/3e0GGOR

Our frame pointer stack walking seems to report unexpected return addresses on macOS arm64 in some parts of the stack. It looks like the bottom 32bits of those addresses are correct, but the high 32bits are unexpectedly non-zero.

Examples from this profile:

  • 0x8940800103bccb08 should be 0x103bccb08
  • 0xe35580018cafe884 should be 0x18cafe884

The profile shows hex addresses instead of function names in two different cases: (1) If it knows which library an address was in, but doesn't have symbols for that library, or (2) if it doesn't know which library an address was in.
This bug is concerned only with (2). You can differentiate (1) from (2) by checking the call tree row for a library name after the address. For example, "0x18b0b178c AppKit" is a case of (1) (so not relevant here), because we know the address was in AppKit. And "0xd57000018dfe57b8" is a case of (2) - this address is outside any of the library address ranges, so it's likely a wrong address.

In this profile there are a number of occurrences of (2) in the code that is called by -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]. Those stack frames should be from the libraries HIToolbox and CoreFoundation. But because the addresses are wrong, we cannot associate those stack frames with the correct libraries.

The address ranges for HIToolbox and CoreFoundation in this profile are:

  • HIToolbox: 0x18cacd000 - 0x18cdd5000
  • CoreFoundation: 0x18df65000 - 0x18e3fe000

You can get the address ranges by executing the following in the web console on the profiler:

var lib = filteredThread.libs.find(l => l.name.includes('HIToolbox'));
`${lib.name}: 0x${lib.start.toString(16)} - 0x${lib.end.toString(16)}`
Severity: -- → S3
Priority: -- → P2

Here's a comment I wrote for myself about how frame pointer stack walking works:

    // Do a frame pointer stack walk. Code that is compiled with frame pointers
    // has the following function prologues and epilogues:
    //
    // Function prologue:
    // pushq  %rbp
    // movq   %rsp, %rbp
    //
    // Function epilogue:
    // popq   %rbp
    // ret
    //
    // Functions are called with callq; callq pushes the return address onto the stack.
    // When a function reaches its end, ret pops the return address from the stack and jumps to it.
    // So when a function is called, we have the following stack layout:
    //
    //                                                                     [... rest of the stack]
    //                                                                     ^ rsp           ^ rbp
    //     callq some_function
    //                                                   [return address]  [... rest of the stack]
    //                                                   ^ rsp                             ^ rbp
    //     pushq %rbp
    //                         [caller's frame pointer]  [return address]  [... rest of the stack]
    //                         ^ rsp                                                       ^ rbp
    //     movq %rsp, %rbp
    //                         [caller's frame pointer]  [return address]  [... rest of the stack]
    //                         ^ rsp, rbp
    //     <other instructions>
    //       [... more stack]  [caller's frame pointer]  [return address]  [... rest of the stack]
    //       ^ rsp             ^ rbp
    //
    // So: *rbp is the caller's frame pointer, and *(rbp + 8) is the return address.
    //
    // Or, in other words, the following linked list is built up on the stack:
    // #[repr(C)]
    // struct CallFrameInfo {
    //     previous: *const CallFrameInfo,
    //     return_address: *const c_void,
    // }
    // and rbp is a *const CallFrameInfo.

The above is obviously x86_64 specific.

Summary: Frame pointer unwinding on arm64 reports incorrect return addresses in some system libraries, unexpected values in the high 32 bits (tagged pointers?) → Frame pointer unwinding on arm64 reports incorrect return addresses in some system libraries, unexpected values in the high 32 bits (pointer authentication?)

HIToolbox and CoreFoundation are arm64e binaries, and the affected functions start with the pacibsp instruction.

https://developer.apple.com/documentation/security/preparing_your_app_to_work_with_pointer_authentication?language=objc says:

For example, a crash reporting library that examines the stack contents needs to strip the PAC out of return addresses. The Apple Clang compiler provides utilities in the ptrauth.h header file — like the ptrauth_strip macro — to help with these kinds of tasks.

Reading ptrauth.h, it seems like calling ptrauth_strip(framePointer, ptrauth_key_frame_pointer) and ptrauth_strip(returnAddress, ptrauth_key_return_address) in the right places should do the trick.

I don't know if this requires that we build as arm64e, though...

__PTRAUTH_INTRINSICS__ does not seem to be defined in our code (probably because the compiler is building for arm64 and not arm64e?), so ptrauth_strip is unlikely to work.
The Big Sur SDK adds arm_thread_state64_ptrauth_strip, which might work.

(In reply to Markus Stange [:mstange] from comment #6)

The Big Sur SDK adds arm_thread_state64_ptrauth_strip, which might work.

No, that just calls through to ptrauth_strip.

Breakpad computes its own address mask:

    // ARM64 supports storing pointer authentication codes in the upper bits of
    // a pointer. Make a best guess at the range of valid addresses based on the
    // range of loaded modules.
    const CodeModule *high_module =
        modules->GetModuleAtSequence(modules->module_count() - 1);
    uint64_t mask = high_module->base_address() + high_module->size();
    mask |= mask >> 1;
    mask |= mask >> 2;
    mask |= mask >> 4;
    mask |= mask >> 8;
    mask |= mask >> 16;
    mask |= mask >> 32;
    address_range_mask_ = mask;

(In reply to Markus Stange [:mstange] from comment #5)

I don't know if this requires that we build as arm64e, though...

At the moment, the ABI for arm64e is not stable, so it's not advised to build as arm64e.

Probably-naive thought: Since we're using these decorated addresses to nearly-match addresses in libraries (i.e., we don't actually use them to jump somewhere, or even read memory), it seems fine to me if we did best-effort masking (I'm guessing like breakpad does, as you said in comment 8?) Of course, if there are blessed APIs that we can use and that do the correct job, we should try to use them eventually.

On a side note, if stack-walking already works, and that's the only issue here, this sounds like a good thing overall for getting the profiler to fully work on aarm64, right? Thank you for working on this!

(In reply to Gerald Squelart [:gerald] (he/him) from comment #10)

Probably-naive thought: Since we're using these decorated addresses to nearly-match addresses in libraries (i.e., we don't actually use them to jump somewhere, or even read memory), it seems fine to me if we did best-effort masking (I'm guessing like breakpad does, as you said in comment 8?) Of course, if there are blessed APIs that we can use and that do the correct job, we should try to use them eventually.

I agree, let's try to do something like what Breakpad does. One correction: It seems that, at least theoretically, the framepointer itself can also be an authenticated pointer, and that's a pointer that we do de-reference, to find the next call frame in the stack. But since we know that it should be pointing into stack memory, and we already know where the stack ends, masking off extra bits from the framepointer is easy.

On a side note, if stack-walking already works, and that's the only issue here, this sounds like a good thing overall for getting the profiler to fully work on aarm64, right? Thank you for working on this!

Yes, I was surprised too! See bug 1648662 comment 2.

Group: mozilla-employee-confidential
Summary: Frame pointer unwinding on arm64 reports incorrect return addresses in some system libraries, unexpected values in the high 32 bits (pointer authentication?) → Frame pointer unwinding on arm64 reports incorrect return addresses in arm64e system libraries due to pointer authentication

The easiest fix here would be to pick a fixed mask that works today, and change it once it stops working. 24 bits hash + 40 bits pointer seems to work well. The 24+40 split is also mentioned on https://lwn.net/Articles/718888/ and in various other resources: https://www.google.com/search?q=arm64e+pointer+authentication+40+24+bits

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED
Attachment #9184137 - Attachment is obsolete: true

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:mstange, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(gsquelart)

I'm happy for it to land whenever possible. I'll let Markus, as the author, decide when to land it (maybe there's a reason for the delay?), after the Mozilla Wellness week + potential holidays...

Flags: needinfo?(gsquelart)
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/1536bb082ac5
Strip pointer authentication hashes during stack walking on macOS arm64. r=gerald

There was one comment left to address, and I was on PTO. I've updated the patch and landed it.

Flags: needinfo?(mstange.moz)
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: