Closed Bug 1044256 Opened 10 years ago Closed 10 years ago

SIMD backend: implement unary arithmetic operations

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: ivan, Assigned: ivan)

References

Details

Attachments

(3 files, 6 obsolete files)

SIMD.float32x4:
float32x4 abs(float32x4 a)
float32x4 neg(float32x4 a)
float32x4 sqrt(float32x4 a)
float32x4 reciprocalSqrt(float32x4 a)
float32x4 reciprocal(float32x4 a)

SIMD.int32x4:
int32x4 neg(int32x4 a)
Attached patch bug1044256.txt (obsolete) — Splinter Review
WIP patch: still needs to add the macro assembler for Float32x4.abs: absps, Float32x4.neg: negps, and Int32x4.neg: pnegd which are implemented using xorps and andps
Attachment #8462862 - Flags: feedback?(benj)
Attachment #8462862 - Flags: feedback?(sunfish)
Comment on attachment 8462862 [details] [diff] [review]
bug1044256.txt

Review of attachment 8462862 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/X86Assembler.h
@@ +2484,5 @@
> +        m_formatter.twoByteOp(OP2_RCPPS_VpsWps, (RegisterID)dst, address);
> +    }
> +
> +    void rsqrtps_rr(XMMRegisterID src, XMMRegisterID dst){
> +        spew("rcpps      %s, %s",

copy+pasto in the spew string here and a few more below.

::: js/src/jit/MIR.h
@@ +1312,5 @@
> +        reciprocal,
> +        reciprocalSqrt,
> +        sqrt
> +    };
> +  

Trailing whitespace.

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2185,5 @@
> +        masm.packedReciprocalFloat32x4(rhs, lhs);
> +        return true;
> +      case MSimdUnaryArith::reciprocalSqrt:
> +        masm.packedReciprocalSqrtFloat32x4(rhs, lhs);
> +        return true;

This uses the rcp and rsqrt instructions, which are approximation instructions. According to Intel's manual, of rcp, "The relative error for this approximation is: |Relative Error| ≤ 1.5 ∗ 2 −12". I don't think we want to expose these low-precision approximations directly in the JS API, so we'll need to use the Newton-Raphson method to refine the results.
Attachment #8462862 - Flags: feedback?(sunfish)
Comment on attachment 8462862 [details] [diff] [review]
bug1044256.txt

Review of attachment 8462862 [details] [diff] [review]:
-----------------------------------------------------------------

Nice start.

::: js/src/jit/Lowering.cpp
@@ +3642,5 @@
>      MOZ_ASSUME_UNREACHABLE("Unknown SIMD kind when getting lane");
>      return false;
>  }
>  
> +bool 

nit: trailing whitespace

@@ +3658,5 @@
> +        LSimdUnaryArithFx4 *lir = new(alloc()) LSimdUnaryArithFx4(op);
> +        return define(lir, ins);
> +    }
> +
> +    MOZ_ASSUME_UNREACHABLE("Unknown SIMD kind when comparing values");

nit: copy pasto of another error string here

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2160,5 @@
> +
> +    MSimdUnaryArith::Operation op = ins->operation();
> +    switch(op){
> +      case MSimdUnaryArith::neg:
> +        //TODO after 'not' has been implemented

nit: TODO is a copy pasto as well.

@@ +2162,5 @@
> +    switch(op){
> +      case MSimdUnaryArith::neg:
> +        //TODO after 'not' has been implemented
> +        return true;
> +      case MSimdUnaryArith::abs:

As far as I can see, int32x4 only have neg in the SIMD.js polyfill so far. Could you add assertions in MIR.h that if the type is MIRType_Int32x4, then the op must neg?
Attachment #8462862 - Flags: feedback?(benj)
Attached file Implement unary arithmetic operations. (obsolete) —
Rebase. Have not integrated the feedback yet.
Attachment #8462862 - Attachment is obsolete: true
Depends on: 1059529
Attached file Implement unary arithmetic operations. (obsolete) —
Rebase after bug 1025127 (the comparisons) and before bug 1021716 (shuffle).
Attachment #8479813 - Attachment is obsolete: true
Just reordered things a little for consistency.
Attachment #8480271 - Attachment is obsolete: true
Attachment #8480402 - Attachment is patch: true
Rebase. Some of the feedback has been addressed.
Attachment #8480402 - Attachment is obsolete: true
Some of these operations (neg and abs) appear to require the loading of a constant before a binary bitwise operation. It would probably be better to separate the loading of such constants into a separate MIR operation so that they can be hoisted etc. So some of these would be transformed into binary operations when generating the MIR.
Attached patch bug1044256.patch (obsolete) — Splinter Review
Rebased and completed.

About remarks:
- SIMD spec committee seems fine to have reciprocal and reciprocalSqrt be approximations, so we just use them in this patch.
- Having MIR constants getting out of the instructions would be beneficial (e.g. for not we use a SimdConstant which -1 in all lanes), but it seems weird to add the constraint at instantiation time that the second argument of a MIR node (say an hypothetical MSimdNot) be a constant with the precise given value (here, MSimdConstant(-1) or MSimdSplat(-1)). Perhaps this could be done with a separate optimization phase though, as this would be probably beneficial to all operations that use constants at the codegen level. The issue is not too grave, as constants (simd and non-simd) are already shared in pools, at least on x86 and x64.
Attachment #8481213 - Attachment is obsolete: true
Attachment #8499589 - Flags: review?(sunfish)
Attached patch bug1044256.patchSplinter Review
Interdiff: in codegen, the input has to be an Operand, not a FloatRegister.

See other comments above.
Attachment #8499589 - Attachment is obsolete: true
Attachment #8499589 - Flags: review?(sunfish)
Attachment #8499591 - Flags: review?(sunfish)
Odin integration.
Attachment #8499593 - Flags: review?(luke)
Comment on attachment 8499593 [details] [diff] [review]
bug1044256-2.patch

Review of attachment 8499593 [details] [diff] [review]:
-----------------------------------------------------------------

Way to be
Attachment #8499593 - Flags: review?(luke) → review+
Comment on attachment 8499591 [details] [diff] [review]
bug1044256.patch

Review of attachment 8499591 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR-Common.h
@@ +313,5 @@
> +// Unary SIMD arithmetic operation on a SIMD operand
> +class LSimdUnaryArith : public LInstructionHelper<1, 1, 0>
> +{
> +  public:
> +    LSimdUnaryArith(const LAllocation &in) {

nit: Add an explicit keyword.

@@ +324,5 @@
> +
> +// Unary SIMD arithmetic operation on a Int32x4 operand
> +class LSimdUnaryArithIx4 : public LSimdUnaryArith
> +{
> +    public:

nit: indent 'public:' at 2 spaces.

@@ +326,5 @@
> +class LSimdUnaryArithIx4 : public LSimdUnaryArith
> +{
> +    public:
> +    LIR_HEADER(SimdUnaryArithIx4);
> +    LSimdUnaryArithIx4(const LAllocation &in) : LSimdUnaryArith(in) {}

another explicit keyword

@@ +332,5 @@
> +
> +// Unary SIMD arithmetic operation on a Float32x4 operand
> +class LSimdUnaryArithFx4 : public LSimdUnaryArith
> +{
> +    public:

another 2-space indent

@@ +334,5 @@
> +class LSimdUnaryArithFx4 : public LSimdUnaryArith
> +{
> +    public:
> +    LIR_HEADER(SimdUnaryArithFx4);
> +    LSimdUnaryArithFx4(const LAllocation &in) : LSimdUnaryArith(in) {}

another explicit :)

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2539,5 @@
> +        // In int32 arithmetic, -x === ~x + 1 === ~x - 1
> +        masm.loadConstantInt32x4(allOnes, ScratchSimdReg);
> +        masm.moveAlignedInt32x4(ScratchSimdReg, out);
> +        masm.bitwiseXorX4(in, out);
> +        masm.packedSubInt32(Operand(ScratchSimdReg), out);

This looks correct (aside from a missing '-' in the comment), but it'd be simpler and faster to load zeros into |out| and then subtract |in| from it.

@@ +2567,5 @@
> +    int32_t maskSignBit = FloatingPoint<float>::Base::kSignBit;
> +    static const SimdConstant signBitMasks =
> +        SimdConstant::CreateX4(maskSignBit, maskSignBit, maskSignBit, maskSignBit);
> +    static const SimdConstant undefSignBitMasks =
> +        SimdConstant::CreateX4(~maskSignBit, ~maskSignBit, ~maskSignBit, ~maskSignBit);

In CodeGeneratorX86Shared::visitAbsF, we construct the all-ones-except-the-sign-bit value with SpecificNaN<float>(0, FloatingPoint<float>::kSignificandBits. I guess it's a matter of taste which is cleaner, but it'd be nice to do the same thing in both places (or perhaps even add utility routines for this). Using that expression, and -0.0f for the sign-bit, would allow you to use float values here, which would allow you to use loadConstantFloat32x4 instead of loadConstantInt32x4 below, which is faster on platforms with domain-crossing penalties.

It'd also be nice to add a SimdConstant::CreateSplatX4(s) or so for situations like this.

@@ +2586,5 @@
> +      case MSimdUnaryArith::reciprocal:
> +        masm.packedReciprocalFloat32x4(in, out);
> +        return true;
> +      case MSimdUnaryArith::reciprocalSqrt:
> +        masm.packedReciprocalSqrtFloat32x4(in, out);

rcp and rsqrt are only approximation instructions, and we haven't yet figured out what we want the spec to say for reciprocal and reciprocalSqrt [0]. I had previously assumed that we wouldn't want to have such approximation semantics in the spec, but I've since learned that this may not be as bad as I previously thought, so this seems ok for now.

I guess our strategy for such things now is just mentioning them in bug 1068028 and adding corresponding comments here.

[0] https://github.com/johnmccutchan/ecmascript_simd/issues/71
Attachment #8499591 - Flags: review?(sunfish) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #9)
> - Having MIR constants getting out of the instructions would be beneficial
> (e.g. for not we use a SimdConstant which -1 in all lanes), but it seems
> weird to add the constraint at instantiation time that the second argument
> of a MIR node (say an hypothetical MSimdNot) be a constant with the precise
> given value (here, MSimdConstant(-1) or MSimdSplat(-1)). Perhaps this could
> be done with a separate optimization phase though, as this would be probably
> beneficial to all operations that use constants at the codegen level. The
> issue is not too grave, as constants (simd and non-simd) are already shared
> in pools, at least on x86 and x64.

I think Douglas was suggesting that 'not' be represented in MIR as an MSimdConstant and a plain MSimdBinaryBitwise xor. Then normal LICM and GVN would see them. It's definitely worth investigating at some point, though my gut feeling is that we're ok without this for now.
Thanks for the reviews, added SimdConstant::SplatX4(int32_t/float v) by the way, and a comment in the masm about the precision of rcpps and rsqrtps.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/2e15d016544a
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/ab230e36399b
r+ over irc by nbp. Asan detected that the hash was actually using bits above the limits of the SimdConstant's array, nice!
Attachment #8501723 - Flags: review+
You need to log in before you can comment on or make changes to this bug.