Closed Bug 2042331 Opened 12 days ago Closed 8 days ago

arm64 v128.any_true returns 0 for non-zero vectors whose 64-bit halves are additive inverses

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

defect

Tracking

()

RESOLVED FIXED
153 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox-esr140 --- wontfix
firefox151 --- wontfix
firefox152 --- wontfix
firefox153 --- fixed

People

(Reporter: jandem, Assigned: yury)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Summary

On arm64, MacroAssembler::anyTrueSimd128 (MacroAssembler-arm64-inl.h:3598) and the matching fused reduce-and-branch path in CodeGenerator::visitWasmReduceAndBranchSimd128 (CodeGenerator-arm64.cpp:4149-4155) implement v128.any_true by reducing the two 64-bit halves of the input with scalar ADDP and testing the sum against zero:

void MacroAssembler::anyTrueSimd128(FloatRegister src, Register dest_) {
  ScratchSimd128Scope scratch_(*this);
  ARMFPRegister scratch(Simd1D(scratch_));
  ARMRegister dest(dest_, 64);
  Addp(scratch, Simd2D(src));   // scratch.D = src.D[0] + src.D[1] (mod 2^64)
  Umov(dest, scratch, 0);
  Cmp(dest, Operand(0));
  Cset(dest, Assembler::NonZero);
}

v128.any_true must return 1 iff any bit of the 128-bit operand is set — i.e. (src.D[0] | src.D[1]) != 0. Addition isn't OR: the sum is 0 whenever the two halves are additive inverses (src.D[1] ≡ -src.D[0] mod 2^64), even though both halves — and hence the vector — are non-zero. The function then returns 0 when the correct answer is 1.

Inputs that hit the cancellation include i64x2(0x8000000000000000, 0x8000000000000000) (0x80…00 + 0x80…00 = 2^64 ≡ 0) and i64x2(1, -1).

The fused (if (v128.any_true …)) / br_if / select codegen in CodeGenerator-arm64.cpp:4149-4155 duplicates the same Addp-then-zero-test, so those paths mis-branch for the same inputs.

Cross-tier: affects both --wasm-compiler=baseline and --wasm-compiler=ion (both call anyTrueSimd128). Ion's constant-folding path is correct (MIR-wasm.cpp:692-693 uses !c.isZeroBits()), so the bug only manifests when the v128 input is not a compile-time constant.

Why x86/x64 is correct: uses vptest(src, src) (MacroAssembler-x86-shared-inl.h:1666), whose ZF reflects src == 0 exactly.

Why all_true on arm64 is unaffected by the same Addp reduction: the allTrueIntNxK family (MacroAssembler-arm64-inl.h:3610+) first runs Cmeq …, 0, producing a mask whose bytes are exclusively 0x00 or 0xFF. For any non-zero 0x00/0xFF byte pattern x (interpreted as a 64-bit integer), -x has at least one 0x01 byte and therefore isn't in the same 0x00/0xFF byte-pattern set — so the sum of two such halves can't wrap to zero unless both halves are zero. any_true skips that normalization and feeds raw user bytes straight into Addp, which is where the cancellation becomes reachable.

Test case

Verified locally: x64 build returns the correct 1/111; arm64 build returns 0/222 for inputs (A) and (B), under both --wasm-compiler=ion and --wasm-compiler=baseline. The control input (C, no cancellation) returns correctly on both.

function build() {
  let bin = wasmTextToBinary(`
    (module
      (memory (export "mem") 1 1)
      (func (export "anytrue") (result i32)
        (v128.any_true (v128.load (i32.const 0))))
      (func (export "anytrue_if") (result i32)
        (if (result i32) (v128.any_true (v128.load (i32.const 0)))
          (then (i32.const 111)) (else (i32.const 222)))))`);
  return new WebAssembly.Instance(new WebAssembly.Module(bin));
}
let ins = build();
let mem = new Uint8Array(ins.exports.mem.buffer);
function setBytes(b) { for (let i = 0; i < 16; i++) mem[i] = b[i]; }

// (A) i64x2 = [0x80…00, 0x80…00];  sum == 2^64 ≡ 0.
setBytes([0,0,0,0,0,0,0,0x80, 0,0,0,0,0,0,0,0x80]);
assertEq(ins.exports.anytrue(),    1);    // arm64: 0
assertEq(ins.exports.anytrue_if(), 111);  // arm64: 222  (mis-branched)

// (B) i64x2 = [1, -1].
setBytes([1,0,0,0,0,0,0,0, 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff]);
assertEq(ins.exports.anytrue(), 1);       // arm64: 0

// (C) control — no cancellation.
setBytes([1,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0]);
assertEq(ins.exports.anytrue(), 1);       // arm64: 1 (works)

Suggested fix

Normalize the input to a byte mask of 0x00/0xFF before reducing, so the existing Addp-then-zero-test sits in the same constrained value set the all_true family already relies on. Cmtst(scratch.16B, src.16B, src.16B) writes 0xFF per byte where the byte is non-zero and 0x00 elsewhere; for any non-zero such pattern x, -x has a 0x01 byte and so isn't in the same set, which means the two 64-bit halves cannot cancel mod 2^64.

Apply the same change at both sites — MacroAssembler::anyTrueSimd128 (MacroAssembler-arm64-inl.h:3598) and the duplicated reduction in CodeGenerator::visitWasmReduceAndBranchSimd128 (CodeGenerator-arm64.cpp:4149-4155).

A one-instruction-shorter alternative is Umaxv across the four i32 lanes (result is zero iff the whole 128-bit vector is zero), but I'd recommend the Cmtst+Addp shape for parity with the existing allTrueInt* family — same scratch flow, same reduce-and-test tail, easier to keep in sync if anyone touches the family later.

NI Yury because SIMD.

Flags: needinfo?(ydelendik)

Set release status flags based on info from the regressing bug 1609381

Severity: -- → S3
Priority: -- → P1

Suggested fix is good. (just to note, v8 is doing CMEQ tmp, src, 0)

Flags: needinfo?(ydelendik)
Assignee: nobody → ydelendik
Status: NEW → ASSIGNED
Attachment #9591177 - Attachment description: Bug 2042331 - Fix arm64 v128.any_true using Cmtst to prevent additive cancellation r?jseward → Bug 2042331 - Fix arm64 v128.any_true using Umaxv to prevent additive cancellation r?jseward
Status: ASSIGNED → RESOLVED
Closed: 8 days ago
Resolution: --- → FIXED
Target Milestone: --- → 153 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: