1843991 - [ARM64] Incorrect trap site metadata for Wasm-GC access

P1umer

Reporter

Description

•

2 years ago

Attached file poc1.js — Details

Steps to reproduce:

The vulnerability exists in the WASM engine. The configuration for building is as follows:

ac_add_options --enable-project=js
ac_add_options --enable-optimize
ac_add_options --disable-debug
ac_add_options --enable-debug-symbols
ac_add_options --disable-tests
ac_add_options --disable-jemalloc

Commit: dab12ef62d97d9e2b3a44472e6355f4ff8017bfd

Flags:
--wasm-function-references
--wasm-gc
--wasm-compiler=optimizing

Actual results:

This is the full crash log:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x40)
    frame #0: 0x0000107532a70cd4
->  0x107532a70cd4: ldr    x9, [x11, #0x40]
    0x107532a70cd8: cmp    x23, x9
    0x107532a70cdc: b.eq   0x107532a70d20
    0x107532a70ce0: str    x23, [x28, #0x8]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x40)
  * frame #0: 0x0000107532a70cd4
    frame #1: 0x0000107532a71c80
    frame #2: 0x00000001010d008c js`js::wasm::Instance::callExport(this=<unavailable>, cx=<unavailable>, funcIndex=<unavailable>, args=<unavailable>, level=<unavailable>) at WasmInstance.cpp:2620:10 [opt]
    frame #3: 0x00000001010fed10 js`WasmCall(cx=0x0000000103808200, argc=<unavailable>, vp=<unavailable>) at WasmJS.cpp:2080:19 [opt]
    frame #4: 0x00000001000bf1ac js`js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [inlined] CallJSNative(cx=0x0000000103808200, native=(js`WasmCall(JSContext*, unsigned int, JS::Value*) at WasmJS.cpp:2074), reason=<unavailable>, args=0x000000016fdfe840) at Interpreter.cpp:486:13 [opt]
    frame #5: 0x00000001000bf054 js`js::InternalCallOrConstruct(cx=0x0000000103808200, args=0x000000016fdfe840, construct=<unavailable>, reason=<unavailable>) at Interpreter.cpp:580:12 [opt]
    frame #6: 0x00000001000d30ec js`js::Interpret(JSContext*, js::RunState&) [inlined] InternalCall(cx=0x0000000103808200, args=0x000000016fdfe840, reason=<unavailable>) at Interpreter.cpp:647:10 [opt]
    frame #7: 0x00000001000d30dc js`js::Interpret(JSContext*, js::RunState&) [inlined] js::CallFromStack(cx=0x0000000103808200, args=0x000000016fdfe840, reason=<unavailable>) at Interpreter.cpp:652:10 [opt]
    frame #8: 0x00000001000d30dc js`js::Interpret(cx=0x0000000103808200, state=0x000000016fdfe980) at Interpreter.cpp:3395:16 [opt]
    frame #9: 0x00000001000be678 js`js::RunScript(JSContext*, js::RunState&) [inlined] MaybeEnterInterpreterTrampoline(cx=0x0000000103808200, state=0x000000016fdfe980) at Interpreter.cpp:400:10 [opt]
    frame #10: 0x00000001000be654 js`js::RunScript(cx=<unavailable>, state=0x000000016fdfe980) at Interpreter.cpp:458:13 [opt]
    frame #11: 0x00000001000c16f8 js`js::Execute(JSContext*, JS::Handle<JSScript*>, JS::Handle<JSObject*>, JS::MutableHandle<JS::Value>) [inlined] js::ExecuteKernel(cx=0x0000000103808200, script=JS::HandleScript @ x22, envChainArg=JS::HandleObject @ x21, evalInFrame=(ptr_ = 0), result=JS::MutableHandleValue @ x20) at Interpreter.cpp:845:13 [opt]
    frame #12: 0x00000001000c15e0 js`js::Execute(cx=0x0000000103808200, script=JS::HandleScript @ x22, envChain=JS::HandleObject @ x21, rval=JS::MutableHandleValue @ x20) at Interpreter.cpp:877:10 [opt]
    frame #13: 0x00000001001e067c js`ExecuteScript(cx=<unavailable>, envChain=<unavailable>, script=<unavailable>, rval=<unavailable>) at CompilationAndEvaluation.cpp:493:10 [opt] [artificial]
    frame #14: 0x00000001001e0808 js`JS_ExecuteScript(cx=<unavailable>, scriptArg=<unavailable>) at CompilationAndEvaluation.cpp:517:10 [opt]
    frame #15: 0x0000000100060a18 js`RunFile(cx=0x0000000103808200, filename="1.txt", file=<unavailable>, compileMethod=<unavailable>, compileOnly=false, fullParse=false) at js.cpp:1104:10 [opt]
    frame #16: 0x000000010006056c js`Process(cx=0x0000000103808200, filename="1.txt", forceTTY=false, kind=FileScript) at js.cpp:1684:14 [opt]
    frame #17: 0x0000000100012c2c js`Shell(JSContext*, js::cli::OptionParser*) at js.cpp:10746:10 [opt]
    frame #18: 0x0000000100012bcc js`Shell(cx=0x0000000103808200, op=0x000000016fdff0f0) at js.cpp:10970:12 [opt]
    frame #19: 0x0000000100009048 js`main(argc=<unavailable>, argv=<unavailable>) at js.cpp:11402:12 [opt]
    frame #20: 0x0000000182b83f28 dyld`start + 2236

P1umer

Reporter

Comment 1

•

2 years ago

The vulnerability exists in the Javascript engine under the ARM64 architecture, so it needs to be reproduced in the gecko in ARM64.

P1umer

Reporter

Updated

•

2 years ago

Group: core-security

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Group: core-security → javascript-core-security

Ryan Hunt [:rhunt]

Comment 2

•

2 years ago

Julian, can you triage this?

Assignee: nobody → jseward

Andrew McCreight [:mccr8]

Comment 3

•

2 years ago

The revision in comment 0 appears to be a git hash from gecko-dev. The hg revision is here.

In the future, please use an hg revision, or at least specify where you are getting your revision from if you are using some git thing. Thanks.

Daniel Veditz [:dveditz]

Comment 4

•

2 years ago

wasm-gc is not enabled in shipping builds so I'll call the Firefox status "disabled"

status-firefox117: --- → disabled

P1umer

Reporter

Comment 5

•

2 years ago

(In reply to Daniel Veditz [:dveditz] from comment #4)

wasm-gc is not enabled in shipping builds so I'll call the Firefox status "disabled"

I just checked that this flag "wasm_gc" can be manually enabled in the nightly build version.

P1umer

Reporter

Comment 6

•

2 years ago

(In reply to Andrew McCreight [:mccr8] from comment #3)

The revision in comment 0 appears to be a git hash from gecko-dev. The hg revision is here.

In the future, please use an hg revision, or at least specify where you are getting your revision from if you are using some git thing. Thanks.

Sorry for that, I will add this info in the future report.

Julian Seward [:jseward]

Assignee

Comment 7

•

2 years ago

Reproduced on aarch64-linux (Debian 12), Cortex A72, with
../src/configure --disable-debug --enable-optimize="-g -O2" --disable-jemalloc --disable-tests --disable-shared-js --disable-sysroot
./dist/bin/js --wasm-function-references --wasm-gc --wasm-compiler=ion bug1843991_testcase.js

=> 0x00001f0866050cd4:  ldr     x9, [x11, #64]
   0x00001f0866050cd8:  cmp     x23, x9
   0x00001f0866050cdc:  b.eq    0x1f0866050d20  // b.none
   0x00001f0866050ce0:  str     x23, [x28, #8]

Julian Seward [:jseward]

Assignee

Comment 8

•

2 years ago

Initial observations:

is arm64 specific. Reproduces on arm64 (both native and simulator) but not on arm32, x86 or x86_64
requires a no-assertions build (--disable-debug); but any optimisation level is ok
requires --wasm-compiler=ion; =baseline is unaffected

The test case appears to do a (wasm-gc) null pointer dereference, which is caught by our runtime as we'd expect. Except -- it is not caught in the case of this bug; so it shows up as a segfault that takes out the process. I wonder if the table(s) that track instructions that might segfault as part of a wasm-gc null check, are incomplete or incorrect in this case.

BugBot [:suhaib / :marco/ :calixte]

Comment 9

•

2 years ago

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Julian Seward [:jseward]

Assignee

Comment 10

•

2 years ago

(In reply to Julian Seward [:jseward] from comment #8)

I wonder if the table(s) that track instructions that might segfault as part of a wasm-gc null check, are incomplete or incorrect in this case.

Initial investigations confirm that is the case. In this example, the faulting instruction

=> 0x00001f0866050cd4:  ldr     x9, [x11, #64]

immediately follows a constant pool. It seems likely that the associated wasm::TrapSite was created without regard for whether a constant pool was inserted, and hence the recorded trap site refers to the first byte of the constant pool and not to the load instruction. Hence the resulting segfault (because x11 == 0) is not converted into a "RuntimeError: dereferencing null pointer". Instead it takes out the entire process.

The offending piece of code is in js/src/jit/MacroAssembler.cpp, MacroAssembler::wasmCallRef:

  append(wasm::Trap::NullPointerDereference,
         wasm::TrapSite(currentOffset(), trapOffset));
  loadPtr(Address(calleeFnObj, instanceSlotOffset), newInstanceTemp);

currentOffset() produces the offset before anything emitted by loadPtr(..), but the latter call can emit a constant pool before the load proper. Hence the wrong offset is recorded.

TL;DR:

Two approaches to fix this have been proposed:
- Use AutoForbidPoolsAndNops to disable pool creation before these insns. I believe this will not work reliably, for reasons explained in comment 11.
- Change loadPtr et al to return the buffer offset of the load insn. This requires reaching far down into the MacroAssembler/VIXL machinery, and will affect all targets, not just arm64. But it can lead to a reliable fix.
This problem could have occurred on any target that uses constant pools; we have to fix it for all targets.
This problem [of correctly registering faulting memory instructions] is a potential reliability hazard for us. Both constant pools and wasm-gc insns that fault, are relatively rare, which makes it difficult to test any fix properly. For this reason I would propose additionally to add assertions that run at the end of wasm compilation. These should inspect the instruction for each recorded trapsite, to ensure (1) it is not in a constant pool, and (2) it actually is (or probably is) a load or store instruction.
This problem is similar to one we previously had with stackmaps for wasm: it proved difficult to reliably get correct buffer offsets for the call insn to be associated with each stackmap. This was solved the same way as proposed here: get the call-generating methods to return the offsets, and add the abovementioned assertions.
Hence, fixing this could be quite some work.
I am inclined to think this bug is not much of a security hazard. The effect is that wasm null-object accesses that should be caught are not caught, hence taking out the entire process. On that basis, it is at best a DOS hazard.

Julian Seward [:jseward]

Assignee

Comment 11

•

2 years ago

(In reply to Julian Seward [:jseward] from comment #10)

Use AutoForbidPoolsAndNops to disable pool creation before these insns. I believe this will not work reliably, [..]

AutoForbidPoolsAndNops solves part of the problem -- that of constant pools -- but not the whole problem.

The offset that we need is that of the first instruction in the sequence that might segfault -- that's the key concept here. However, at least for arm64, even if we avoid constant pools, we cannot just record the buffer offset at the start or end of the load/store sequence generated by calling loadPtr. This is because the assembler can place non-load/store instructions both at the start and end of the sequence. We see this in MacroAssembler::LoadStoreMacro in js/src/jit/arm64/vixl/MacroAssembler-vixl.cpp:

before the load/store(s):

   Mov(temp, addr.offset());
   LoadStore(rt, MemOperand(addr.base(), temp), op);

after the load/store(s):

   LoadStore(rt, MemOperand(addr.base()), op);
   Add(addr.base(), addr.base(), Operand(offset));

Hence a proper fix must involve LoadStore(..) returning the offset of the load/store itself. And that plumbing has to go all the way down to the bottom of the assembler in order to avoid constant pools -- it exists partially already.

This leads to potential confusion, because there are already many assembler methods of the form

   BufferOffset MacroAssembler::fooble(..)

and those BufferOffsets are interpreted to mean "the buffer offset of the start of the sequence generated by the call". Now we are proposing to add similar looking method signatures, but with the BufferOffset interpreted to mean "the buffer offset of the first possibly segfaulting instruction in the sequence". Allowing these two different concepts to be confused seems to me to be asking for trouble.

Therefore I'd propose we have two different types, or (better?) template it, so as to give BufferOffset<OfStartOfSequence> and BufferOffset<OfFirstFaultingInsnInSequence>.

Julian Seward [:jseward]

Assignee

Comment 12

•

2 years ago

Bug 1843631 is the same problem, except on 32-bit x86.

P1umer

Reporter

Updated

•

2 years ago

Flags: sec-bounty?

Andrew McCreight [:mccr8]

Comment 13

•

2 years ago

Null derefs aren't sec issues, so I'm unhiding this.

Group: javascript-core-security

Andrew McCreight [:mccr8]

Updated

•

2 years ago

Comment 14

•

2 years ago

I am not completely sure if adding a whole new return type across as many functions is worth the problem being solved.
Could this be solved by making a variant of currentOffset function which assert that the previous/next instruction is among the set of faulting instructions?

Julian Seward [:jseward]

Assignee

Updated

•

1 years ago

Depends on: 1846474

Ryan Hunt [:rhunt]

Updated

•

1 years ago

Severity: -- → S2

Priority: -- → P1

Summary: [WASM][ARM64] SEGV during WasmCall → [ARM64] Incorrect trap site metadata for Wasm-GC access

Julian Seward [:jseward]

Assignee

Comment 15

•

1 year ago

Closing this on the basis that it has been fixed by the landing of bug
1846474. Unfortunately I can no longer run the test case to verify that,
since the test case uses the pre-final wasm-gc instruction encoding, and SM
recently changed to the final encoding.

If the bug still exists, please feel free to re-open. @P1umer, thank you for
finding and reporting this.

Status: NEW → RESOLVED

Closed: 1 year ago

Duplicate of bug: 1846474

Resolution: --- → DUPLICATE

Daniel Veditz [:dveditz]

Updated

•

1 year ago

Flags: sec-bounty? → sec-bounty-

David Lawrence [:dkl]

Updated

•

8 months ago

Keywords: reporter-external