Closed Bug 1364215 Opened 7 years ago Closed 7 years ago

Bad JIT results from SIMD allTrue() and anyTrue() on 8x16 and 16x8 types

Categories

(Core :: JavaScript Engine: JIT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: jolesen, Assigned: jolesen)

References

Details

Attachments

(1 file)

The boolean SIMD functions allTrue() and anyTrue() produce incorrect results for the SIMD.Bool8x16 and SIMD.Bool16x8 types when they are JIT-compiled.

Test case from bug 1336991:

---
var a = SIMD.Bool8x16.splat(true);
for (var i = 0; i < 2; i++) {
  a = SIMD.Bool8x16.replaceLane(a, i, false);
  assertEq(SIMD.Bool8x16.allTrue(a), false);
}
---

The implementation in CodeGeneratorX86Shared::visitSimdAllTrue() uses vmovmskps which only looks at 4 lanes. We need to use vpmovmskb for the 8-lane and 16-lane types.
The movmskps SSE instruction only transfers 4 bits from the xmm
register. This work for Bool32x4 and Bool64x2 vectors, but it misses
lanes of the Bool16x8 and Bool8x16 types.

Use a pmovmskb SSE2 instruction instead which transfers 16 byte sign
bits from the xmm register. This lets us resolve even Bool8x16 lanes
correctly.

We know that the input vector is a boolean type, so each lane is known
to be either 0 or -1. There is no harm in checking too many bits of the
types with lanes wider than 8 bits. It won't affect the result.
Attachment #8866999 - Flags: review?(bbouvier)
Comment on attachment 8866999 [details] [diff] [review]
Use pmovmskb for allTrue and anyTrue.

Review of attachment 8866999 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch, looks good!

::: js/src/jit-test/tests/SIMD/anyall.js
@@ +31,5 @@
> +        all(SIMD.Bool16x8, 8)
> +        any(SIMD.Bool16x8, 8)
> +        all(SIMD.Bool8x16, 16)
> +        any(SIMD.Bool8x16, 16)
> +    } while (!inIon());

I'm not too sure about the usage of inIon(), especially when ion is explicitly disabled, or in CGC builds which are painfully slow. Can you turn this into a classic for loop that ends at 4 or 5 times the ion warmup trigger?

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +2963,5 @@
>      FloatRegister input = ToFloatRegister(ins->input());
>      Register output = ToRegister(ins->output());
>  
> +    // We know that the input lanes are boolean, so they are either 0 or -1.
> +    // The all-true vector has all 128 bits set, no matter the lane geometry.

(It'd be nice to generate debug-only code that ensures that the lanes is actually all-0 or all-1, but it would probably make tests much slower in debug mode)
Attachment #8866999 - Flags: review?(bbouvier) → review+
Pushed by jolesen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9353a6d3d1a
Use pmovmskb for allTrue and anyTrue. r=bbouvier
https://hg.mozilla.org/mozilla-central/rev/d9353a6d3d1a
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: