Closed Bug 1707774 (CVE-2021-29981) Opened 3 years ago Closed 3 years ago

Live range splitting can lead to conflicting assignments (was: Assertion failure: *def->output() == alloc, at jit/RegisterAllocator.cpp:257)

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

All
macOS
defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- unaffected
firefox89 --- disabled
firefox90 --- wontfix
firefox91 --- fixed

People

(Reporter: gkw, Assigned: lth)

References

(Regression)

Details

(4 keywords, Whiteboard: [sec-survey][post-critsmash-triage][adv-main91+])

Attachments

(4 files, 5 obsolete files)

Attached file testcase.wasm

This is testcase.wrapper, testcase.wasm is attached.

new WebAssembly.Instance(new WebAssembly.Module(read(scriptArgs[0], "binary")));
Process 36979 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100977330 js-dbg-64-darwin-arm64-15253a3168d2`_OUTLINED_FUNCTION_0 [opt]
js-dbg-64-darwin-arm64-15253a3168d2`_OUTLINED_FUNCTION_0:
->  0x100977330 <+0>: str    w19, [x8]
    0x100977334 <+4>: b      0x101406e94               ; symbol stub for: abort

js-dbg-64-darwin-arm64-15253a3168d2`_OUTLINED_FUNCTION_1:
    0x100977338 <+0>: mov    w19, #0x476
    0x10097733c <+4>: mov    w2, #0x476
Target 0: (js-dbg-64-darwin-arm64-15253a3168d2) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100977330 js-dbg-64-darwin-arm64-15253a3168d2`_OUTLINED_FUNCTION_0 [opt]
    frame #1: 0x000000010132dafc js-dbg-64-darwin-arm64-15253a3168d2`js::jit::AllocationIntegrityState::checkIntegrity(js::jit::LBlock*, js::jit::LInstruction*, unsigned int, js::jit::LAllocation) (.cold.17) at RegisterAllocator.cpp:39:19 [opt]
    frame #2: 0x000000010096d280 js-dbg-64-darwin-arm64-15253a3168d2`js::jit::AllocationIntegrityState::checkIntegrity(this=0x000000016fdf9a98, block=0x00000001060ebc28, ins=<unavailable>, vreg=7, alloc=(bits_ = 27)) at RegisterAllocator.cpp:257:9 [opt]
    frame #3: 0x000000010096bf80 js-dbg-64-darwin-arm64-15253a3168d2`js::jit::AllocationIntegrityState::check(this=0x000000016fdf9a98) at RegisterAllocator.cpp:207:14 [opt]
    frame #4: 0x00000001008ab05c js-dbg-64-darwin-arm64-15253a3168d2`js::jit::GenerateLIR(mir=0x000000016fdf9ef0) at Ion.cpp:1536:26 [opt]
    frame #5: 0x00000001009fe26c js-dbg-64-darwin-arm64-15253a3168d2`js::wasm::IonCompileFunctions(moduleEnv=0x000000016fdfda90, compilerEnv=<unavailable>, lifo=0x0000000105785820, inputs=<unavailable>, code=<unavailable>, error=<unavailable>) at WasmIonCompile.cpp:5760:23 [opt]
    frame #6: 0x00000001009e8c6c js-dbg-64-darwin-arm64-15253a3168d2`ExecuteCompileTask(task=0x0000000105785800, error=0x000000016fdfde50) at WasmGenerator.cpp:781:16 [opt]
    frame #7: 0x00000001009e9610 js-dbg-64-darwin-arm64-15253a3168d2`js::wasm::ModuleGenerator::finishFuncDefs() [inlined] js::wasm::ModuleGenerator::locallyCompileCurrentTask(this=0x000000016fdfc8d0) at WasmGenerator.cpp:842:8 [opt]
    frame #8: 0x00000001009e9608 js-dbg-64-darwin-arm64-15253a3168d2`js::wasm::ModuleGenerator::finishFuncDefs(this=0x000000016fdfc8d0) at WasmGenerator.cpp:980 [opt]
    frame #9: 0x00000001009c33b0 js-dbg-64-darwin-arm64-15253a3168d2`bool DecodeCodeSection<js::wasm::Decoder>(env=0x000000016fdfda90, d=0x000000016fdfc890, mg=0x000000016fdfc8d0) at WasmCompile.cpp:686:13 [opt]
    frame #10: 0x00000001009c304c js-dbg-64-darwin-arm64-15253a3168d2`js::wasm::CompileBuffer(args=<unavailable>, bytecode=<unavailable>, error=<unavailable>, warnings=<unavailable>, listener=<unavailable>) at WasmCompile.cpp:708:8 [opt]
    frame #11: 0x0000000100a1375c js-dbg-64-darwin-arm64-15253a3168d2`js::WasmModuleObject::construct(cx=0x0000000105743000, argc=<unavailable>, vp=<unavailable>) at WasmJS.cpp:1517:7 [opt]
    frame #12: 0x0000000100073eb0 js-dbg-64-darwin-arm64-15253a3168d2`CallJSNative(cx=0x0000000105743000, native=(js-dbg-64-darwin-arm64-15253a3168d2`js::WasmModuleObject::construct(JSContext*, unsigned int, JS::Value*) at WasmJS.cpp:1483), reason=<unavailable>, args=0x000000016fdfe198)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) at Interpreter.cpp:437:13 [opt]
    frame #13: 0x000000010007c5f4 js-dbg-64-darwin-arm64-15253a3168d2`CallJSNativeConstructor(cx=0x0000000105743000, native=(js-dbg-64-darwin-arm64-15253a3168d2`js::WasmModuleObject::construct(JSContext*, unsigned int, JS::Value*) at WasmJS.cpp:1483), args=0x000000016fdfe198)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) at Interpreter.cpp:453:8 [opt]
    frame #14: 0x00000001000748b0 js-dbg-64-darwin-arm64-15253a3168d2`InternalConstruct(cx=<unavailable>, args=0x000000016fdfe198) at Interpreter.cpp:626:14 [opt]
    frame #15: 0x00000001000745d8 js-dbg-64-darwin-arm64-15253a3168d2`js::ConstructFromStack(cx=<unavailable>, args=<unavailable>) at Interpreter.cpp:672:10 [opt] [artificial]
    frame #16: 0x000000010006b900 js-dbg-64-darwin-arm64-15253a3168d2`Interpret(cx=<unavailable>, state=0x000000016fdfe630) at Interpreter.cpp:3238:16 [opt]
    frame #17: 0x00000001000648f0 js-dbg-64-darwin-arm64-15253a3168d2`js::RunScript(cx=0x0000000105743000, state=0x000000016fdfe630) at Interpreter.cpp:406:13 [opt]
    frame #18: 0x00000001000750e4 js-dbg-64-darwin-arm64-15253a3168d2`js::ExecuteKernel(cx=0x0000000105743000, script=<unavailable>, envChainArg=JS::HandleObject @ x21, newTargetValue=<unavailable>, evalInFrame=(ptr_ = 0), result=JS::MutableHandleValue @ 0x000000016fdfe6a0) at Interpreter.cpp:776:13 [opt]
    frame #19: 0x0000000100075320 js-dbg-64-darwin-arm64-15253a3168d2`js::Execute(cx=0x0000000105743000, script=JS::HandleScript @ x21, envChain=JS::HandleObject @ x20, rval=JS::MutableHandleValue @ x19) at Interpreter.cpp:808:10 [opt]
    frame #20: 0x000000010015de94 js-dbg-64-darwin-arm64-15253a3168d2`ExecuteScript(cx=0x0000000105743000, envChain=JS::HandleObject @ 0x000000016fdfe780, script=JS::HandleScript @ 0x000000016fdfe778, rval=JS::MutableHandleValue @ x19) at CompilationAndEvaluation.cpp:491:10 [opt]
    frame #21: 0x000000010015e068 js-dbg-64-darwin-arm64-15253a3168d2`JS_ExecuteScript(cx=0x0000000105743000, scriptArg=JS::HandleScript @ x19) at CompilationAndEvaluation.cpp:515:10 [opt]
    frame #22: 0x00000001000463cc js-dbg-64-darwin-arm64-15253a3168d2`RunFile(cx=0x0000000105743000, filename=<unavailable>, file=<unavailable>, compileMethod=<unavailable>, compileOnly=false) at js.cpp:1065:10 [opt]
    frame #23: 0x0000000100045d68 js-dbg-64-darwin-arm64-15253a3168d2`Process(cx=0x0000000105743000, filename="testcase.wrapper", forceTTY=false, kind=FileScript) at js.cpp:1656:14 [opt]
    frame #24: 0x00000001000103b4 js-dbg-64-darwin-arm64-15253a3168d2`Shell(JSContext*, js::cli::OptionParser*, char**) at js.cpp:10934:10 [opt]
    frame #25: 0x0000000100010360 js-dbg-64-darwin-arm64-15253a3168d2`Shell(cx=<unavailable>, op=0x000000016fdff270, envp=<unavailable>) at js.cpp:11658 [opt]
    frame #26: 0x000000010000a908 js-dbg-64-darwin-arm64-15253a3168d2`main(argc=<unavailable>, argv=<unavailable>, envp=<unavailable>) at js.cpp:12613:12 [opt]
    frame #27: 0x000000019b1c5f34 libdyld.dylib`start + 4
The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/b3dfcb849dd2
user:        Julian Seward
date:        Wed Apr 14 07:43:27 2021 +0000
summary:     Bug 1686626 - Enable Ion by default for wasm on AArch64.  r=lth.

Run with --fuzzing-safe --no-threads --no-baseline --no-ion testcase.wrapper testcase.wasm, compile with AUTOCONF=/usr/local/Cellar/autoconf213/2.13/bin/autoconf213 AR=ar sh ./configure --enable-debug --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests, tested on m-c rev 15253a3168d2.

Not sure if this is s-s, I'd leave it to Julian/Lars.

Flags: sec-bounty?
Flags: needinfo?(jseward)

I'll take a look.

Flags: needinfo?(jseward)

Simplified tc, run in a debug-enabled build without any special flags:

new WebAssembly.Module(wasmTextToBinary(`
(module
  (func (result i64)
    (local i32 i64)
    (if (global.get 0)
        (return (i64.const 7)))
    loop
      (if (global.get 0)
          (return (i64.const 8)))
    end
    (if (global.get 0)
        (return (i64.const 6)))
    (local.set 1 (i64.mul (local.get 1) (local.get 1)))
    (if (global.get 0)
        (return (local.get 1)))
    (global.set 0 (local.get 0))
    (i64.add (i64.load (local.get 0)) (local.get 1)))
  (memory (;0;) 1)
  (global (;0;) (mut i32) (i32.const 2)))
`))

None of the remaining expressions appear removable, and in the last if, the value returned can't be replaced by a constant.

Also fails on an ARM64 simulator build, M1 hardware not required. But does not fail on x64, so this is something about how the JIT is configured / used.

On ARM64,

 void LIRGeneratorARM64::lowerForMulInt64(LMulI64* ins, MMul* mir,
                                          MDefinition* lhs, MDefinition* rhs) {
   ins->setInt64Operand(0, useInt64RegisterAtStart(lhs));
   ins->setInt64Operand(INT64_PIECES, (lhs != rhs)
                                         ? useInt64Register(rhs)
                                         : useInt64RegisterAtStart(rhs));
   defineInt64ReuseInput(ins, mir, 0);
 }

However, while lhs == rhs in the input, the input is also a constant (local 1 starts out as zero and is not updated at that point), and apparently useInt64RegisterAtStart does not work well with constant inputs. The allocator produces this mess:

[RegAlloc]   Block 7 [successor 8] [successor 9]
[RegAlloc]     [28,29 Integer64] [def v6<g>:x1]
[RegAlloc]     [MoveGroup] [x1 -> x3]
[RegAlloc]     [30,31 Integer64] [def v7<g>:x1]
[RegAlloc]     [MoveGroup] [x3 -> x1] [x3 -> x2]
[RegAlloc]     [32,33 MulI64] [def v8<g>:x1] [use v6:r x1] [use v7:r x1]
[RegAlloc]     [MoveGroup] [x1 -> x3]
[RegAlloc]     [34,35 TestIAndBranch] [use v2:r x0]

If we guard against this s.t. we use useInt64Register also if rhs is constant:

diff --git a/js/src/jit/arm64/Lowering-arm64.cpp b/js/src/jit/arm64/Lowering-arm64.cpp
--- a/js/src/jit/arm64/Lowering-arm64.cpp
+++ b/js/src/jit/arm64/Lowering-arm64.cpp
@@ -153,7 +153,7 @@ void LIRGeneratorARM64::lowerForALUInt64
 void LIRGeneratorARM64::lowerForMulInt64(LMulI64* ins, MMul* mir,
                                          MDefinition* lhs, MDefinition* rhs) {
   ins->setInt64Operand(0, useInt64RegisterAtStart(lhs));
-  ins->setInt64Operand(INT64_PIECES, (lhs != rhs)
+  ins->setInt64Operand(INT64_PIECES, (lhs != rhs || rhs->isConstant())
                                          ? useInt64Register(rhs)
                                          : useInt64RegisterAtStart(rhs));
   defineInt64ReuseInput(ins, mir, 0);

then the allocation is sane:

[RegAlloc]   Block 7 [successor 8] [successor 9]
[RegAlloc]     [32,33 Integer64] [def v8<g>:x0]
[RegAlloc]     [34,35 Integer64] [def v9<g>:x2]
[RegAlloc]     [36,37 MulI64] [def v10<g>:x0] [use v8:r x0] [use v9:r x2]
[RegAlloc]     [MoveGroup] [x0 -> x3]
[RegAlloc]     [38,39 TestIAndBranch] [use v4:r x1]

though not actually good, obviously.

But I think that what we want is not that guard but something deeper in the regalloc that handles the constant case for us by forcing the constant, if present, into a register. Presumably there's a divergence here between the 32-bit and the 64-bit case, that we've just not come across before.

MIPS64 does not have this problem for multiply, as it uses useInt64OrConstant(AtStart) here, we're running into this on ARM64 only because we (possibly I) were too lazy to implement support for multiply-by-constant earlier.

MWasmStore could be similarly afflicted on ARM64 and MIPS64 because it has a useInt64RegisterAtStart for an rhs operand that could plausibly be a constant. There appear to be no other uses of useInt64RegisterAtStart in non-lhs contexts.

It could however look like it's the handling of v6 (the lhs for the multiply) that is the problem:

[RegAlloc] Allocating  v6 [29,33) (def) v6:r?@32 ## v8 [33,48) (def) v8:x0@37 v8:r@47 [priority 19] [weight 473]
[RegAlloc]   Requirement x0, due to use at 37
[RegAlloc]   x0 collides with  v2 [5,8) x0 (def) v2:r@7 ## v2 [12,18) x0 v2:r@17 ## v2 [22,24) x0 v2:r@23 ## v2 [28,36) x0 v2:r@35 [weight 526]
[RegAlloc]   bundle does not contain hot code
[RegAlloc]   bundle is defined by a register
[RegAlloc]   bundle has no register uses
[RegAlloc]     splitting bundle  v6 [29,33) (def) ## v8 [33,48) (def) into:
[RegAlloc]        v6 [29,30) (def)
[RegAlloc]        v6 [32,33) v6:r?@32
[RegAlloc]        v8 [33,34) (def)
[RegAlloc]        v8 [36,38) v8:x0@37
[RegAlloc]        v8 [46,48) v8:r@47
[RegAlloc]        v6 [30,33) ## v8 [34,48)

This split has two overlapping bundles for v6, [32,33) and [30,33) ## [34,48), where the first is allocated to x2 and the second is allocated to x3, based on subsequent spew. This could account for the assertion.

Blocks: 1687626
Severity: -- → S2
Priority: -- → P1

Set release status flags based on info from the regressing bug 1686626

Feature is nightly-only until FF90.

(In reply to Lars T Hansen [:lth] from comment #8)

Feature is nightly-only until FF90.

We prefer "disabled" rather than "unaffected". "unaffected" means the bug doesn't exist, in which case we wouldn't have testcases that "work".

Lars didn't directly answer the question, but misusing JIT registers sounds ungood for security. Please correct if wrong.

Group: core-security → javascript-core-security

Nomenclature correction seen; thanks.

Re severity, I think we'll have to wait and see, if it turns out to be only a problem with the i64 multiplication code or even with how the regalloc handles 64-bit values then this is relatively benign, but if it's a problem with the register allocator's general splitting logic then it's likely bad.

Slightly simplified further, it gets rid of the spurious i32 local and makes the result not depend on the value loaded from memory, it becomes clear that a number of things are for their effect on the shape of the compile-time flow graph or on run-time control flow:

new WebAssembly.Module(wasmTextToBinary(`
(module
  (func (result i64)
    (local $n i64)                ;; zero
    (if (global.get $flag)        ;; do not
        (return (i64.const 7)))   ;;  touch $n
    loop                          ;;   but
      (if (global.get $flag)      ;;    introduce
          (return (i64.const 8))) ;;     confusing
    end                           ;;      control
    (if (global.get $flag)        ;;       flow
        (return (i64.const 6)))   ;;        constructs
    (local.set $n                 ;; $n = zero, but this is
      (i64.mul (local.get $n)     ;;  not known because i64.mul
               (local.get $n)))   ;;   is not constant folded
    (if (global.get $flag)        ;; conditionally
        (return (local.get $n)))  ;;  read $n
    (global.set $flag             ;; introduce
      (i32.const 1))              ;;  global effect
    (i64.load (i32.const 48))     ;; maybe traps
    drop                          ;;  but ignore result
    (local.get $n))               ;; read $n
  (memory 1)
  (global $flag (mut i32) (i32.const 2)))
`))

Odds and ends:

Replacing loop with block has the effect of removing an intermediate Block in the flow graph that contains only a Goto. In turn this means that v6 is given higher priority than v2 (the value read from the global) during allocation and so no splitting is triggered for v6 to resolve the fact that x0 is busy (for v2) yet needed for v6 (for the result of the function); hence the block version succeeds.

Presumably, loop ensures that v2 gets higher priority because it is presumed to be used more (because, it's in a loop).

FWIW, loop has a secondary effect: it introduces an interrupt check. Not obvious that this matters for anything.

Getting rid of loop by using v2 more, and get rid of the setting of $flag. I'm starting to suspect that the effect of the load is simply to make x0 be allocated to something other than $n in the function tail.

new WebAssembly.Module(wasmTextToBinary(`
(module
  (func (result i64)
    (local $n i64)                ;; zero
    (if (global.get $flag)        ;; give $flag high priority;
        (return (i64.const 7)))   ;;  do not touch $n
    (if (global.get $flag)
        (return (i64.const 8)))
    (if (global.get $flag)
        (return (i64.const 9)))
    (if (global.get $flag)
        (return (i64.const 10)))
    (if (global.get $flag)
        (return (i64.const 11)))
    (if (global.get $flag)
        (return (i64.const 12)))
    (local.set $n                 ;; $n = zero, but this is
      (i64.mul (local.get $n)     ;;  not known because i64.mul
               (local.get $n)))   ;;   is not constant folded
    (if (global.get $flag)        ;; conditionally
        (return (local.get $n)))  ;;  read $n
    (i64.load (i32.const 48))     ;; maybe traps
    drop                          ;;  but ignore result
    (local.get $n))               ;; read $n
  (memory 1)
  (global $flag (mut i32) (i32.const 2)))
`))

Assigning to Julian on the preliminary assumption that this is a problem with range splitting.

Assignee: nobody → jseward
Status: NEW → ASSIGNED

Here are two possible fixes for discussion. They seem to work in that they
don't break jit-tests, and they don't fail on the above case. Both fixes
require implementing multiply-by-constant in CodeGenerator::visitMulI64, which
I have done, and seems wise anyway.

-- Proposal 1 ------------------------

void LIRGeneratorARM64::lowerForMulInt64(LMulI64* ins, MMul* mir,
                                         MDefinition* lhs, MDefinition* rhs) {
  ins->setInt64Operand(0, useInt64RegisterAtStart(lhs));
  ins->setInt64Operand(INT64_PIECES, useInt64RegisterOrConstant(rhs));
  defineInt64ReuseInput(ins, mir, 0);
}

This forces the left operand into a register, and the right operand into
either a different register or into a constant. For the test case in comment 13
that produces

[Codegen] # instruction Integer64
[Codegen] [7] d2800001        mov     x1, #0x0
[Codegen] # instruction MulI64
[Codegen] [7] d2800010        mov     x16, #0x0
[Codegen] [7] 9b107c21        mul     x1, x1, x16
[Codegen] # instruction MoveGroup
[Codegen] [7] aa0103e2        mov     x2, x1

and for a modified version of it, in which the operand is made non-constant by
adding this line

(local.set $n (i64.extend_i32_u (global.get $flag)))

it produces

[Codegen] # instruction MoveGroup
[Codegen] [7] aa0103e2        mov     x2, x1
[Codegen] # instruction MulI64
[Codegen] [7] 9b017c42        mul     x2, x2, x1
[Codegen] # instruction MoveGroup
[Codegen] [7] aa0203e1        mov     x1, x2

-- Proposal 2 ------------------------

This is the same as Proposal 1 except that we special-case multiplication of
the same non-constant value by itself:

  ins->setInt64Operand(0, useInt64RegisterAtStart(lhs));
  ins->setInt64Operand(INT64_PIECES,
                       (lhs == rhs && !rhs->isConstant())
                         ? useInt64RegisterAtStart(rhs)
                         : useInt64RegisterOrConstant(rhs));
  defineInt64ReuseInput(ins, mir, 0);

For the comment 13 test case unmodified, the result is unchanged:

[Codegen] # instruction Integer64
[Codegen] [7] d2800001        mov     x1, #0x0
[Codegen] # instruction MulI64
[Codegen] [7] d2800010        mov     x16, #0x0
[Codegen] [7] 9b107c21        mul     x1, x1, x16
[Codegen] # instruction MoveGroup
[Codegen] [7] aa0103e2        mov     x2, x1

For the non-constant case we avoid forcing the right-hand operand into
different register, hence giving just

[Codegen] # instruction MulI64
[Codegen] [7] 9b007c00        mul     x0, x0, x0

Would either of the fixes in comment 15 be acceptable? If so, which is
preferred? My inner conservative engineer would tend towards Proposal 1 on
the basis that Proposal 2 adds a path for what sounds like a rare occurrence
-- squaring of unknown integers -- and hence constitutes a
verification/testing hazard. (If this was floating point I'd think
differently, since squaring numbers in FP is very common).

I continue to investigate why the original failure happened at all, since that
seems important to understand.

I agree Proposal 1 is preferable, but I don't think this issue is so urgent that we have to land a temporary fix quite yet - it's a Nightly-only issue after all, so far as we know, and even if it's not, the fix would apply only to Nightly, being Ion-on-ARM64 specific. IMO it is more important to find out what's going on here.

(If we were required to land a temp fix for ARM64, I might even suggest comment 4 as the simpler one, but that's a discussion for later. There are other, probably even simpler, solutions.)

Another point of reference: changing LIRGeneratorARM64::lowerForMulInt64 to
using exactly the same scheme as lowerForALUInt64 just above (literally,
copy-n-paste), together with handling constant RHS args in
CodeGenerator::visitMulI64, also works. In some sense this is a less risky
fix, because we're reusing the same scheme as lowerForALUInt64 and that is
surely a well-trodden path.

Fallback fix, which, per comment 18, we can likely deploy at low risk,
should nothing better come along.

Despite investigation I have not been able to determine the root cause of the
failure. Here's what I have so far. First, a smaller testcase:

new WebAssembly.Module(wasmTextToBinary(`
(module
  (func (result i64)
    (local $n i64)                ;; zero
    (if (i32.add (global.get $flag)
                 (i32.add (global.get $flag)
                          (global.get $flag)))  ;; give flag high prio
        (return (i64.const 8)))
    (local.set $n                 ;; $n = zero, but this is
      (i64.mul (local.get $n)     ;;  not known because i64.mul
               (local.get $n)))   ;;   is not constant folded
    (if (global.get $flag)        ;; conditionally
        (return (local.get $n)))  ;;  read $n
    (i64.load (i32.const 48))     ;; maybe traps
    drop                          ;;  but ignore result
    (local.get $n))               ;; read $n
  (memory 1)
  (global $flag (mut i32) (i32.const 2)))
`))

The relevant block is this (with debug printing modified per bug 1710025):

[RegAlloc]   Block 2 [successor 3] [successor 4]
[RegAlloc]     16-17 Integer64 [def v6<g>]
[RegAlloc]     18-19 Integer64 [def v7<g>]
[RegAlloc]     20-21 MulI64 [def v8<g>:tied(0)] [use v6:A] [use v7:R]
[RegAlloc]     22-23 TestIAndBranch [use v2:R]

The assertion failure happens because the output of insn 18-19 is given
conflicting allocations x1 and x3.

Working forward through the pipeline, the interesting values are v6, v7
and v8. Live range analysis for them appears correct:

[RegAlloc] Live ranges by virtual register (after grouping/queueing regs):
[RegAlloc]   v6 17-20 { 17_def 20_v6:A }
[RegAlloc]   v7 19-20 { 19_def 20_v7:R }
[RegAlloc]   v8 21-31 { 21_def 25_v8:F:x0 31_v8:F:x0 }

v6 and v8 are determined to be non-overlapping and hence are put in the
same bundle.

[RegAlloc] Live ranges by bundle (after grouping/queueing regs):
[RegAlloc]   v6 17-20 { 17_def 20_v6:A } ## v8 21-31 { 21_def 25_v8:F:x0 31_v8:F:x0 }
[RegAlloc]   v7 19-20 { 19_def 20_v7:R }

Subsequently the v6 / v8 bundle is split:

[RegAlloc]     splitting bundle  v6 17-20 { 17_def } ## v8 21-31 { 21_def } into:
[RegAlloc]        v6 17-17 { 17_def }
[RegAlloc]        v6 20-20 { 20_v6:A }
[RegAlloc]        v8 21-21 { 21_def }
[RegAlloc]        v8 24-25 { 25_v8:F:x0 }
[RegAlloc]        v8 30-31 { 31_v8:F:x0 }
[RegAlloc]        v6 18-20 { } ## v8 22-31 { }

viz: v6 is defined at 17, is used at 20, and is live but with no uses or
defs for 18, 19 and 20. As observed in comment 6, it's strange that this
final bundle has v6 live at 20, since that overlaps the singleton use at
20-20.

It's also strange that the v8 ranges after the split -- 21-21, 24-25 and
30-31 -- don't cover it's pre-split range 21-31. I don't know if that's
intended or not.

In any case, the incorrect allocation is soon visible in the output:

[RegAlloc]   Block 2 [successor 3] [successor 4]
[RegAlloc]     16-17 Integer64 [def v6<g>:x1]
[RegAlloc]     MoveGroup [x1 -> x3]
[RegAlloc]     18-19 Integer64 [def v7<g>:x1]
[RegAlloc]     MoveGroup [x3 -> x1] [x3 -> x2]
[RegAlloc]     20-21 MulI64 [def v8<g>:x1] [use v6:R x1] [use v7:R x1]

16-17 writes the first of the multiply's args to x1, and that is copied into
x3. Then 18-19 writes the second arg to x1, but that is then overwritten
by the subsequent MoveGroup.

Attached patch experiment.patch (obsolete) — Splinter Review

The problem is not limited to arm64 and it is not limited to 64-bit values.

Attached is a simple change that makes the 32-bit multiply on arm64 and arm use the same codegen strategy as the arm64 64-bit multiply (no constant folding + twice useRegisterAtStart + defineReuseInput). My reduced test case from comment 11, converted to 32-bit in the obvious way, triggers the assert on both arm64 and arm.

(On x86 and x64 multiply is weird because it uses fixed registers, so it's not straightforward to test there.)

Referring to the incorrect output in comment 20, viz:

[RegAlloc]   Block 2 [successor 3] [successor 4]
[RegAlloc]     16-17 Integer64 [def v6<g>:x1]
[RegAlloc]     MoveGroup [x1 -> x3]
[RegAlloc]     18-19 Integer64 [def v7<g>:x1]
[RegAlloc]     MoveGroup [x3 -> x1] [x3 -> x2]
[RegAlloc]     20-21 MulI64 [def v8<g>:x1] [use v6:R x1] [use v7:R x1]

the question I keep coming back to is: why does the second MoveGroup
overwrite x1? This is clearly wrong and this is also what the original
*def->output() == alloc assertion failure detected. In this case, for 18-19,
v7 has been allocated to x1 (that's what *def->output() tells us), but
alloc, which was also originally x1, got changed by the MoveGroup to
x3, hence causing the assertion to fail.

So I'm inclined to look into where the MoveGroup came from and why
it overwrites x1.

I think the MoveGroup is not the problem per se. The question is, why is v6 allocated to x1 on line 20? The backtrace of allocations earlier in the run gives a different picture of this, with allocations that are seemingly perfectly consistent: v6 -> x1 for 16-17, v6 -> x3 for 18-19, v6 -> x2 for 20-21, v7 -> x1 for 18-20, v8 to x1 for 21 ->. The x3 assignment comes in via spill logic and is forced because v7 becomes allocated to x1. Observe that this assignment explains the initial x1 -> x3 move and the x3 -> x2 move. But then something happens, and suddenly v6 -> x1 on 20 contrary to the reported allocations, and i believe that this is what precipitates the x3 -> x1 move in the preceding MoveGroup.

So what is that "something"? Clearly the MUST_REUSE constraint ties v6 and v8 together. So the x3 -> x1 move can't be avoided (except by allocating a different register to v8). But this means that v7 cannot be allocated to x1. The problem only arises if both the input values to the multiply are the same, so there's definitely a problem related to that. The input value need not be zero, I've tested this, but it looks like it needs to be a constant value known to the compiler. I can add 3 to $n at the beginning of the function and I see the same failure (but if I do so conditionally there is no failure). The constants can be defined with two different names, $m and $n. They can both be non-zero but must have the same value for the failure to occur.

More and more this looks to me to be a problem about shared structure somewhere. The LIR nodes that are the lhs and rhs inputs to the multiply operation are the same node in memory, so when defineReuseInput updates one it presumably updates the other... including the virtual register...

I believe I have a smoking gun. Consider the patch on comment 21. Set a breakpoint in lowerMulI and then step. Before the call to useRegisterAtStart(lhs), lhs->virtualRegister() is zero. After the call it is 9 (because in the test case I use this is v9; it corresponds to v6 in Julian's test case and the discussion above). After the call to useRegisterAtStart(rhs) [sic], lhs->virtualRegister() is 10 (what is v7 above). Now, there are two separate LIR nodes here with separate virtual registers, but when we call defineReuseInput for the lhs we use lhs->virtualRegister() which should be 9 (which is what is recorded in the LIR node) but is 10 (because it's recorded in the MIR node). It's hard to pin down precisely what goes wrong but it stands to reason that this will not end well, and it is consistent with symptoms, notably comment 23.

Looking through our code base for 'lhs == rhs' and 'lhs != rhs' to look for trouble spots, I find few candidates. All are a result of the regalloc requirement that if the lhs is useAtStart then the rhs must be useAtStart too if lhs == rhs. I'm unable as of yet to determine whether these uses are benign or not.

In a sense, it seems to me that the bug is that when the input nodes are the same, we should not be required or allowed to specify a use for the rhs because its use is already specified.

To wit, this code for multiply lowering:

  if (lhs == rhs) {
    LUse def = useRegisterAtStart(lhs);
    lir->setOperand(0, def);
    lir->setOperand(1, def);
  } else {
    lir->setOperand(1, useRegister(rhs));
  }
  defineReuseInput(lir, mul, 0);

passes the regalloc and generates good code.

Recording a thought: It's probably an important factor here that the constant is regenerated twice in the block because of emitAtUses. We have what looks like one MIR node but it turns into two LIR nodes, strictly because of emitAtUses. This happens for almost nothing else. We should consider whether this is a factor in this pattern, it would explain why we're not seeing more breakage.

(In reply to Lars T Hansen [:lth] from comment #24)

[…] but when we call defineReuseInput for the lhs we use lhs->virtualRegister() which should be 9 (which is what is recorded in the LIR node) but is 10 (because it's recorded in the MIR node). […]

Could it be related to the commutative aspect of the multiplication? We have so code in the Lowering which attempts to swap the operands based on some properties.

No, I think this is actually a really obscure bug in how emitAtUses combines with reuseInput. If all the following conditions are met:

  • a MIR node gets two input operands that are the same MIR node
  • this shared input is an integer constant, so EmittedAtUses is true for it
  • the lowering calls useRegisterAtStart (not useRegisterOrConstantAtStart) twice, one for "each" operand, lhs before rhs
  • the lowering uses defineReuseInput on the lhs

then the MIR node gets two LIR nodes because the emitAtUses generates a new LIR node for each call to useRegisterAtStart, each with a vreg. The vreg is recorded in the MIR node after the LIR node has been created, by define(). But the MIR node can only hold one of the vregs and will hold only the rhs vreg (because the LIR node for the rhs was created last). Now the compiler can become confused because its vreg reflects the vreg of the rhs, not the vreg of the lhs.

Assignee: jseward → lhansen

I'm still working on tracking down precisely what happens, but the clobbering of v6 in line 20-21 in comment 22 happens at the bottom of the MUST_REUSE_INPUT case of reifyAllocations. This general nest of code is also responsible for inserting the x3 -> x1 move. Initially, the allocation is set up correctly at the top of the inner use loop: v6 is allocated to x2 where the previous input move left it. The allocation for v6 is then overwritten by the MUST_REUSE_INPUT logic, as it should be, and a move is created.

It could look like the register allocator has gotten itself into a bad state by the time we get to this point. We must have v8==v6 by constraint, and v8 has been allocated to x1, which it can be because v7 (also allocated to x1) is dead by the time v8 is defined. But this leads to the seemingly impossible situation where both v6 and v7 must be in x1 at the start of the operation. (Of course it's not quite impossible since these represent the same value and could share a register, but I don't think that that's something the allocator recognizes as possible.) The allocator could fix that problem by moving v7 to another register, but the real issue seems to be that v7 was allowed to be allocated to x1 at all. But whether this is a problem with, say, the splitting logic not taking input reuse into account, or a problem stemming from the two LIR nodes for the constants sharing a single MIR node and hence being confused about their virtual register assignments, I don't know yet.

With a reuse of input as output at some ins, input is marked as live through inputOf(ins) and output is marked as live starting from outputOf(ins), but there is nothing to link the two directly. Their ranges are merged, but can later be split again as a result of allocation failure (this is what happens here), disconnecting input from output. This lack of connection allows unrelated values whose lifetimes end at inputOf(ins) to be allocated to the same register as output, getting in the way of the need for input to have that register, and making the assignment in reifyAllocations() that updates the allocation of input unsound. This is a general register allocation bug, not specific to arm64.

It's not obvious how to fix this. Attempts to extend the live range of output to start at inputOf(ins) or the live range of input to end at outputOf(ins) run into various problems with invariants in the allocator.

It's also not appealing to add a loop in reifyAllocations() to look for conflicts among the inputs and move conflicting non-reused values out of the way, though it would probably be sound to do so, as any conflicting values' lifetimes should end at inputOf(ins).

My best guess for how to attempt a fix for this is to not split the bundle in BacktrackingAllocator::splitAt() at uses that are being reused by the using instruction. There is already a fairly complicated guard for when to allow splitting that special-cases FIXED registers; this seems somehow analogous.

I'm going to do the following:

  • upgrade this to a JIT bug and change the title to reflect my current understanding
  • add a patch here with some code and a test case that allows the bug to be demonstrated reliably
  • offer a workaround patch for ARM64 only and land that, unblocking ARM64 Ion shipping

I may not be able to expedite a general fix for this bug so I may unassign it from myself after fixing the ARM64 case; we'll see.

Component: Javascript: WebAssembly → JavaScript Engine: JIT
Summary: Assertion failure: *def->output() == alloc, at jit/RegisterAllocator.cpp:257 → Live range splitting can lead to conflicting assignments (was: Assertion failure: *def->output() == alloc, at jit/RegisterAllocator.cpp:257)

This has a simplified test case derived from the original, and some very minor tweaks to Ion and arm64 lowering to reproduce the original problem in a controlled setting using only 32-bit operands. Basically, we disable constant folding for MulI and force the lowering of MulI to useRegisterAtStart for both its input operands.

To use this, apply the patch, then make a build with --enable-debug --enable-simulator=arm64 (or run natively on arm64 hardware), then run the shell as .../js --wasm-compiler=ion .../jit-test/tests/wasm/regress/bug1707774.js. The assert should be triggered. Running with IONFLAGS=regalloc it will be possible to observe the confused register use in the MulI instruction: both inputs, which are distinct, use the same register.

I believe there is nothing special about using MulI here or about being on arm64 or indeed about using wasm code, it is just a clean and localized way to demonstrate the problem, and meshes with the discussion earlier on the bug.

Attachment #9221486 - Attachment is obsolete: true
Keywords: leave-open
Attached file Bug 1707774 - Fix. r?jseward (obsolete) —

This demonstrates a problem in live range splitting leading to a confused register
assignment; an assertion fails in debug builds on arm64. The problem is however
cross-platform in the presence of the right use directives + defineReuseInput,
and is not specific to int64.

Depends on D115679

Attachment #9222926 - Attachment is obsolete: true

The proposed fix (comment 25, comment 33) does not work because it violates other invariants of the code generator, specifically, reusing the definition as in comment 25 leads to a null pointer in the MIR node's rhs. Since the eventual fate of mul64 on arm64 is to not reuse its input register, I will instead offer a fix for that on bug 1710024 and let that be the narrow fix for the arm64 problem originally reported here.

No longer blocks bug 1687626 because a workaround has landed on bug 1710024.

No longer blocks: 1687626

There is a previously documented constraint in the register allocator, see a comment at https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/Lowering-x86-shared.cpp#927, that speaks to this case (though obviously it's not documented in the register allocator...). Quoting:

  // The rhs is tricky due to register allocator restrictions:
  //  - if lhs == rhs and lhs is AtStart then rhs must be AtStart too
  //  - if lhs != rhs and lhs is AtStart then rhs must not be AtStart,
  //    this appears to have something to do with risk of the rhs
  //    being clobbered.  Anyway it doesn't matter much, since the
  //    liveness of rhs will not prevent the lhs register to be reused
  //    for the output.

This applies only to the defineReuseInput case; normal define is not affected I believe.

When this constraint is violated, for example, by making the code at the above location use useRegisterAtStart for both its arguments even when lhs != rhs, then the same regalloc assertion that gave rise to the present bug is triggered.

What is happening in our present use case is that the MIR node is the same for the lhs and rhs, yet gives rise to different LIR nodes (due to it being a constant with emitAtuses). Thus in cases where we end up testing lhs == rhs to determine whether to use useRegisterAtStart or useRegister, we will go wrong if lhs == rhs and it is an emitAtUses constant, since in this case we should be using useRegister because there are two different LIR nodes.

Since the regalloc catches this bug in debug builds at least, it may be that the best "fix" (it does not actually fix the problem, it just prevents is from happening) is to instead change the few spots in the code that may have this problem because they're testing lhs == rhs or lhs != rhs. The remaining spots all appear to be in in Lowering-x86-shared.cpp. The best fix is probably to package up the comparison into a predicate that can take all this complexity into account.

I have considered whether we might add an additional assert somewhere but I'm not sure what that would look like. Nodes can legitimately reuse multiple inputs (as when doing reuseInput on a 64-bit value on a 32-bit platform). It may be that a test would create a map from uses to definitions for every instruction and trip an assert when the number of useRegisterAtStart exceeds the number of reused inputs.

In instances where an MDefinition* can map to more than one LIR node,
a simple pointer test is insufficient to avoid breaking the regalloc
invariant that if one LIR node argument to a multi-operand operation
is reused for the result, then other distinct LIR node arguments
cannot be used atStart. (This invariant is implicit: when it is
broken, an assert will sometimes fire in when the regalloc attempts to
verify that its allocation is coherent.)

This patch abstracts the equality test into a predicate, uses the
predicate everywhere relevant, and makes the predicate take into
account the one situation where it is known that the MIR node maps to
two LIR nodes: when it is an emitAtUses constant. The patch also
documents the more complicated condition.

(This commit message may be elided for landing, depending on whether
we think this is a problem that can be exploited. There are no known
failures resulting from failing to obey the invariant.)

Comment on attachment 9224423 [details]
Bug 1707774 - More subtle equality on MDefinition* during lowering. r=nbp!,jandem!,jseward!

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Likely very tricky. The current patch fixes no actual bugs, see comment 37; it just creates a framework that helps prevent us from running into those bugs. To get to an exploit from this, an attacker would have to understand that what the patch does is prevent a register allocation confusion from occuring in very obscure cases, and from that, construct a wasm program that exploits that confusion with a test case that is more complicated than we've seen. (The test case that started this bug is almost certainly benign, in that it causes the confusion of two integer-valued registers that contain the same value. An assertion catches that in a debug build only.) The patch is copiously commented but the comments don't point directly at this possible confusion. If need be, I can remove comments and commit msg and land later, but I feel this is not necessary.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: All - it's an old problem stemming from a weird interaction between the register allocator and the optimizer
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: I don't think we should do that, but in principle patches for older branches could be derived fairly straightforwardly from this one.
  • How likely is this patch to cause regressions; how much testing does it need?: It is not likely to cause regressions; it guards against a very, very obscure problem and otherwise is no functional change from current code.
Attachment #9224423 - Flags: sec-approval?

If you don't think we need to uplift; I'm okay with that given the description; but in that case I'd like to land it under a public bug. Is that doable?

Flags: needinfo?(lhansen)

Yes, I can create a reasonable public placeholder bug for it and land under that, no problem.

Flags: needinfo?(lhansen)

Comment on attachment 9224423 [details]
Bug 1707774 - More subtle equality on MDefinition* during lowering. r=nbp!,jandem!,jseward!

Revision D116464 was moved to bug 1715203. Setting attachment 9224423 [details] to obsolete.

Attachment #9224423 - Attachment is obsolete: true
Attachment #9224423 - Flags: sec-approval?

Re the sec-bounty, a brief summary. While all the patches that address the bug have landed in the open, the original TC uncovered a previously unrecognized weakness in lowering/register allocation that could lead to extremely obscure but deterministic register confusion failures in the field (exploitability is unknown). And while the TC was for the ARM64 compiler, which is shipping only in FF90, said failures, if discovered, would affect x86/x64 on all branches.

Addressed by patches on bug 1715203 and (earlier) bug 1710024.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Group: javascript-core-security → core-security-release
Target Milestone: --- → 91 Branch
Attachment #9222927 - Attachment is obsolete: true
Flags: sec-bounty? → sec-bounty+

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(lhansen)
Whiteboard: [sec-survey]
Flags: needinfo?(lhansen)
Flags: qe-verify-
Whiteboard: [sec-survey] → [sec-survey][post-critsmash-triage]
Whiteboard: [sec-survey][post-critsmash-triage] → [sec-survey][post-critsmash-triage][adv-main90+]
Whiteboard: [sec-survey][post-critsmash-triage][adv-main90+] → [sec-survey][post-critsmash-triage][adv-main91+]
Alias: CVE-2021-29981
Attached file advisory.txt
Attachment #9235152 - Attachment is obsolete: true

(Nit for :tjr) "led lead" -> "led"

Flags: needinfo?(tom)

Thanks! I've corrected that in the staged yml file.

Flags: needinfo?(tom)
Group: core-security-release
Has Regression Range: --- → yes
Keywords: regression
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: