Closed Bug 1906827 Opened 1 year ago Closed 9 days ago

Failed to hook GetFileAttributesW from kernel32.dll in Windows ARM devices

Categories

(Core :: DLL Services, defect)

ARM
Windows
defect

Tracking

()

VERIFIED FIXED
146 Branch
Tracking Status
firefox128 --- wontfix
firefox129 --- wontfix
firefox146 --- verified

People

(Reporter: bmaris, Assigned: gstoll)

References

Details

Attachments

(4 files)

Found in

  • Firefox 129.0b1

Affected versions

  • Firefox 129.0b1

Tested platforms

  • Affected platforms: Windows 10 ARM and Windows 11 ARM
  • Unaffected platforms: Windows 10 (non ARM), Windows 11 (non ARM), macOS and Ubuntu 22.04

Steps to reproduce

  1. Download cppunittest.tests
  2. Run TestDLLInterceptor.exe

Expected result

  • All tests PASS successfully.

Actual result

  • Failed to hook GetFileAttributesW from kernel32.dll

Regression range

  • Not sure if this is a regression or not, I got the same error with 128.0b1, did not find any tests in archive.mozilla.org for 127beta or later, not sure if the file containing the tests are different for each build or not.

Additional notes

  • I also tried
  • Input from CMD:
TEST-UNEXPECTED-FAIL | WindowsDllInterceptor | Failed to hook GetFileAttributesW from kernel32.dll
    First 16 bytes of function:
    30 04 00 90 10 СА 44 F9 00 02 1F D6 00 00 00 00
  • I also noticed that there are is no TestDllInterceptorCrossProcess.exe file for ARM, not sure if this is expected or not.

I removed the affected flag from 130 since running the TestDLLInterceptor.exe with latest Nightly It only ran 6 tests and all of them passed except one that had the following error A wrong detour errorcode was set on detour error. Anyway I did not see GetFileAttributesW in the terminal for Nightly.

Component: mozglue → DLL Services

This may be higher than S3 severity if this is pointing to our DLL blocking not functioning as expected on ARM64 devices.

Summary: Failed to hook GetFileAttributesW from kernel32.dll in Windows11 ARM device → Failed to hook GetFileAttributesW from kernel32.dll in Windows ARM devices

I can't find a way to run this test on treeherder on ARM64, which is unfortunate.

If I'm doing this right, the instructions we're failing to patch are:

0x0000000000000000:  30 04 00 90    adrp x16, #0x84000
0x0000000000000004:  10 CA 44 F9    ldr  x16, [x16, #0x990]
0x0000000000000008:  00 02 1F D6    br   x16

Looking at the code, it seems like we should be decoding the adrp instruction correctly, so I'm not sure why this would fail...

Do we actually detour GetFileAttributesW anywhere outside tests?

Anyway, we always print all 16 bytes because that's where we will encounter issues, and the failure isn't necessarily with the first instruction. The code states:

    // The number of bytes required to facilitate a detour depends on the
    // proximity of the hook function to the target function. In the best case,
    // we can branch within +/- 128MB of the current location, requiring only
    // 4 bytes. In the worst case, we need 16 bytes to load an absolute address
    // into a register and then branch to it.

And so it is likely that all four instructions need to go through the loop below:

    while (origBytes.GetOffset() < bytesRequiredFromDecode) {
      uintptr_t curPC = origBytes.GetCurrentAbsolute();
      uint32_t curInst = origBytes.ReadNextInstruction();

      Result<arm64::LoadOrBranch, arm64::PCRelCheckError> pcRelInfo =
          arm64::CheckForPCRel(curPC, curInst);
      if (pcRelInfo.isErr()) {
        if (pcRelInfo.unwrapErr() ==
            arm64::PCRelCheckError::InstructionNotPCRel) {
          // Instruction is not PC-relative, we can just copy it verbatim
          tramp.WriteInstruction(curInst);
          continue;
        }

        // At this point we have determined that there is no decoder available
        // for the current, PC-relative, instruction.

        // origBytes is now pointing one instruction past the one that we
        // need the trampoline to jump back to.
        if (!origBytes.BackUpOneInstruction()) {
          return;
        }

        break;
      }

      // We need to load an absolute address into a particular register
      tramp.WriteLoadLiteral(pcRelInfo.inspect().mAbsAddress,
                             pcRelInfo.inspect().mDestReg);
    }

I don't understand everything in that code, but if I replicate our logic with a Python script:

patterns = [
    (0x9F000000, 0x10000000, 0),
    (0x9F000000, 0x90000000, 1),
    (0xFF000000, 0x58000000, 0),
    (0x3B000000, 0x18000000, 0),
    (0x7C000000, 0x14000000, 0),
    (0xFE000000, 0x54000000, 0),
    (0x7E000000, 0x34000000, 0),
    (0x7E000000, 0x36000000, 0),
    (0xFE000000, 0xD6000000, 0) 
]

shcode = b"\x30\x04\x00\x90\x10\xCA\x44\xF9\x00\x02\x1F\xD6\x00\x00\x00\x00"
import struct
codes = struct.unpack("<IIII", shcode)

for code in codes:
    for (i, (mask, value, is_implemented)) in enumerate(patterns):
        if (code & mask) == value:
            print("PC-rel", i, is_implemented)
            break
    else:
        print("Not PC-rel")

I get:

PC-rel 1 1
Not PC-rel
PC-rel 8 0
Not PC-rel

And so:

  • the first instruction is recognized as a PC-relative instruction that we know how to handle (index 1 -> ADRP);
  • the second instruction is recognized as non-PC-relative so easy to deal with;
  • the third instruction is recognized as a PC-relative instruction that we don't know how to handle (index 8 -> Unconditional branch (reg));
  • the fourth instruction is recognized as non-PC-relative so easy to deal with.

So I would assume that the problem lies with the third instruction here (Unconditional branch (reg) being associated with nullptr). And, while I agree with our code that this is indeed an unconditional branch instruction using a register value, I am not sure why our code categorizes these instructions as PC-relative instructions. This feels like a miscategorization, and in this case we should be able to solve the problem by simply removing the Unconditional branch (reg) line.

You're right, I don't think we hook GetFileAttributesW anywhere (outside of tests).

Ah, thanks - I had looked at the other instructions, but I figured the branch was obviously not PC-relative so I didn't run through the code there :-)

This seems like a pretty minor thing but it feels like we might as well fix it. I'll put up a patch later. Thanks!

The previous check was overly broad and would detect BR instructions, which
are an unconditional branch and aren't PC-relative.

I think it was trying to detect BLR instructions, which do an unconditional
branch but also set a register to a value that depends on PC. We should be
detecting these.

Based on the ARM64 documentation:

I think this condition should properly catch BLR family instructions but
not BR family instructions.

Assignee: nobody → gstoll
Status: NEW → ASSIGNED
Attachment #9520574 - Attachment description: Bug 1906827 - correctly handle BR instructions for patching in ARM64 r=yjuglaret! → Bug 1906827 part 1 - correctly handle BR instructions for patching in ARM64 r=yjuglaret!

Windows on AArch64 is a tier 1 target, and we'd like to run more (i.e.
some) tests there. I don't know if there's a plan to run everything on
AArch64, but this is an easy way to get some coverage, and as the bug
mentions some of these tests have been failing for a while on AArch64.

Pushed by gstoll@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/803b433de9ce https://hg.mozilla.org/integration/autoland/rev/f03cf5ad5366 part 1 - correctly handle BR instructions for patching in ARM64 r=yjuglaret,win-reviewers https://github.com/mozilla-firefox/firefox/commit/ad7e8a051e7d https://hg.mozilla.org/integration/autoland/rev/2728861d8989 part 2 - add tests and make all cppunittests pass on ARM64 r=yjuglaret https://github.com/mozilla-firefox/firefox/commit/a22f55826395 https://hg.mozilla.org/integration/autoland/rev/be30a3c36f16 part 3 - make cppunittests run on AArch64 r=ci-and-tooling,taskgraph-reviewers,jmaher,yjuglaret
Pushed by agoloman@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/eea9eb667177 https://hg.mozilla.org/integration/autoland/rev/fd052373445c Revert "Bug 1906827 part 3 - make cppunittests run on AArch64 r=ci-and-tooling,taskgraph-reviewers,jmaher,yjuglaret" for causing snap failures.
Pushed by agoloman@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/748f63a92260 https://hg.mozilla.org/integration/autoland/rev/f946fc52b418 part 1 - correctly handle BR instructions for patching in ARM64 r=yjuglaret,win-reviewers https://github.com/mozilla-firefox/firefox/commit/2cacd6b77836 https://hg.mozilla.org/integration/autoland/rev/5bc77c39f61b part 2 - add tests and make all cppunittests pass on ARM64 r=yjuglaret https://github.com/mozilla-firefox/firefox/commit/1059f0540149 https://hg.mozilla.org/integration/autoland/rev/6386ea87c58d part 3 - make cppunittests run on AArch64 r=ci-and-tooling,taskgraph-reviewers,jmaher,yjuglaret
Status: ASSIGNED → RESOLVED
Closed: 9 days ago
Resolution: --- → FIXED
Target Milestone: --- → 146 Branch

Looks fixed to me using latest Nightly 146.0a1 build from today and I'll mark this as verified for 146, but unfortunately there is another test that failed for me on my Windows 11 ARM but interestingly enough not on Windows 10 ARM (different devices but the same laptop model with same configurations), TrackPopupMenu.

I logged bug 1997530 for it.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-ver-done-c146/b145]
Regressed by: 1997530
No longer regressed by: 1997530
See Also: → 1997530
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: