arm64 v128.any_true returns 0 for non-zero vectors whose 64-bit halves are additive inverses
Categories
(Core :: JavaScript: WebAssembly, defect, P1)
Tracking
()
People
(Reporter: jandem, Assigned: yury)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 file)
Summary
On arm64, MacroAssembler::anyTrueSimd128 (MacroAssembler-arm64-inl.h:3598) and the matching fused reduce-and-branch path in CodeGenerator::visitWasmReduceAndBranchSimd128 (CodeGenerator-arm64.cpp:4149-4155) implement v128.any_true by reducing the two 64-bit halves of the input with scalar ADDP and testing the sum against zero:
void MacroAssembler::anyTrueSimd128(FloatRegister src, Register dest_) {
ScratchSimd128Scope scratch_(*this);
ARMFPRegister scratch(Simd1D(scratch_));
ARMRegister dest(dest_, 64);
Addp(scratch, Simd2D(src)); // scratch.D = src.D[0] + src.D[1] (mod 2^64)
Umov(dest, scratch, 0);
Cmp(dest, Operand(0));
Cset(dest, Assembler::NonZero);
}
v128.any_true must return 1 iff any bit of the 128-bit operand is set — i.e. (src.D[0] | src.D[1]) != 0. Addition isn't OR: the sum is 0 whenever the two halves are additive inverses (src.D[1] ≡ -src.D[0] mod 2^64), even though both halves — and hence the vector — are non-zero. The function then returns 0 when the correct answer is 1.
Inputs that hit the cancellation include i64x2(0x8000000000000000, 0x8000000000000000) (0x80…00 + 0x80…00 = 2^64 ≡ 0) and i64x2(1, -1).
The fused (if (v128.any_true …)) / br_if / select codegen in CodeGenerator-arm64.cpp:4149-4155 duplicates the same Addp-then-zero-test, so those paths mis-branch for the same inputs.
Cross-tier: affects both --wasm-compiler=baseline and --wasm-compiler=ion (both call anyTrueSimd128). Ion's constant-folding path is correct (MIR-wasm.cpp:692-693 uses !c.isZeroBits()), so the bug only manifests when the v128 input is not a compile-time constant.
Why x86/x64 is correct: uses vptest(src, src) (MacroAssembler-x86-shared-inl.h:1666), whose ZF reflects src == 0 exactly.
Why all_true on arm64 is unaffected by the same Addp reduction: the allTrueIntNxK family (MacroAssembler-arm64-inl.h:3610+) first runs Cmeq …, 0, producing a mask whose bytes are exclusively 0x00 or 0xFF. For any non-zero 0x00/0xFF byte pattern x (interpreted as a 64-bit integer), -x has at least one 0x01 byte and therefore isn't in the same 0x00/0xFF byte-pattern set — so the sum of two such halves can't wrap to zero unless both halves are zero. any_true skips that normalization and feeds raw user bytes straight into Addp, which is where the cancellation becomes reachable.
Test case
Verified locally: x64 build returns the correct 1/111; arm64 build returns 0/222 for inputs (A) and (B), under both --wasm-compiler=ion and --wasm-compiler=baseline. The control input (C, no cancellation) returns correctly on both.
function build() {
let bin = wasmTextToBinary(`
(module
(memory (export "mem") 1 1)
(func (export "anytrue") (result i32)
(v128.any_true (v128.load (i32.const 0))))
(func (export "anytrue_if") (result i32)
(if (result i32) (v128.any_true (v128.load (i32.const 0)))
(then (i32.const 111)) (else (i32.const 222)))))`);
return new WebAssembly.Instance(new WebAssembly.Module(bin));
}
let ins = build();
let mem = new Uint8Array(ins.exports.mem.buffer);
function setBytes(b) { for (let i = 0; i < 16; i++) mem[i] = b[i]; }
// (A) i64x2 = [0x80…00, 0x80…00]; sum == 2^64 ≡ 0.
setBytes([0,0,0,0,0,0,0,0x80, 0,0,0,0,0,0,0,0x80]);
assertEq(ins.exports.anytrue(), 1); // arm64: 0
assertEq(ins.exports.anytrue_if(), 111); // arm64: 222 (mis-branched)
// (B) i64x2 = [1, -1].
setBytes([1,0,0,0,0,0,0,0, 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff]);
assertEq(ins.exports.anytrue(), 1); // arm64: 0
// (C) control — no cancellation.
setBytes([1,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0]);
assertEq(ins.exports.anytrue(), 1); // arm64: 1 (works)
Suggested fix
Normalize the input to a byte mask of 0x00/0xFF before reducing, so the existing Addp-then-zero-test sits in the same constrained value set the all_true family already relies on. Cmtst(scratch.16B, src.16B, src.16B) writes 0xFF per byte where the byte is non-zero and 0x00 elsewhere; for any non-zero such pattern x, -x has a 0x01 byte and so isn't in the same set, which means the two 64-bit halves cannot cancel mod 2^64.
Apply the same change at both sites — MacroAssembler::anyTrueSimd128 (MacroAssembler-arm64-inl.h:3598) and the duplicated reduction in CodeGenerator::visitWasmReduceAndBranchSimd128 (CodeGenerator-arm64.cpp:4149-4155).
A one-instruction-shorter alternative is Umaxv across the four i32 lanes (result is zero iff the whole 128-bit vector is zero), but I'd recommend the Cmtst+Addp shape for parity with the existing allTrueInt* family — same scratch flow, same reduce-and-test tail, easier to keep in sync if anyone touches the family later.
Comment 2•12 days ago
|
||
Set release status flags based on info from the regressing bug 1609381
Updated•12 days ago
|
Updated•12 days ago
|
| Assignee | ||
Comment 3•12 days ago
|
||
Suggested fix is good. (just to note, v8 is doing CMEQ tmp, src, 0)
| Assignee | ||
Updated•10 days ago
|
| Assignee | ||
Comment 4•10 days ago
|
||
Updated•10 days ago
|
Updated•9 days ago
|
Comment 6•8 days ago
|
||
| bugherder | ||
Description
•