Closed Bug 1025475 Opened 10 years ago Closed 10 years ago

SIMD backend: implement Int32x4 / Float32x4 constructors

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(4 files, 4 obsolete files)

int32x4(a,b,c,d)
float32x4(a,b,c,d)

I'll probably add MSimdConstant, for when a, b, c, and d are constant literals.
Attached patch 1 - Constructors (obsolete) — Splinter Review
This implements Int32x4(a,b,c,d) and Float32x4(a,b,c,d), where inputs are arbitrary MIR nodes.

Code generation does the following:
- reserve enough stack space for storing a SIMD value (as of now, 4 * 32 bits)
- move each input in a 32-bits stack slot
- (aligned) load into the destination register (should be sane, as SP is aligned from the function prologue)
- free stack

An improvement in the future could be to have this stack space reserved per function, every time we see we have at least N SimdValueX4 in the MIRGraph, where N is a constant >= 2.
Attachment #8459652 - Flags: feedback?(sunfish)
Attached patch 2 - Constants (obsolete) — Splinter Review
asm.js needs SIMD literals, for variable initializers in functions. E.g.
function f() {
  var x = int32x4(1,2,3,4);
}

So we need a way to have the equivalent of js::Value, for storing SIMD constants. Typed objects (being SIMD objects polyfills in the interpreter) are not convenient because they are JSObject and thus involve GC. So I've created an equivalent of constant js::Value for SIMD. This is called SimdConstant in this patch, and there is an equivalent MSimdConstant.
Attachment #8459656 - Flags: feedback?(sunfish)
Attachment #8459657 - Flags: feedback?(sunfish)
(for the record, splat, which can also be folded from SimdValueX4, will be implemented in another bug)
Comment on attachment 8459656 [details] [diff] [review]
2 - Constants

Review of attachment 8459656 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/X86Assembler.h
@@ +2917,5 @@
> +    }
> +#else
> +    void movdqa_mr(const void* address, XMMRegisterID dst)
> +    {
> +        spew("movss      %p, %s",

copy+pasto in the string?

@@ +4006,5 @@
> +        void int32x4Constant(const int32_t s[4])
> +        {
> +            m_buffer.ensureSpace(4 * sizeof(int32_t));
> +            for (size_t i = 0; i < 4; ++i)
> +                m_buffer.putIntUnchecked(s[i]);

It's odd that we can call floatConstant below, but there's no corresponding method for int32 yet. Could you add one, so that the code here in int32x4Constant can just call int32Constant?

::: js/src/jit/IonTypes.h
@@ +299,5 @@
> +            return false;
> +        // Comparing as one kind is enough.
> +        for (size_t i = 0; i < 4; ++i) {
> +            if (u.i32x4[i] != rhs.u.i32x4[i])
> +                return false;

I like this, but it is technically a violation of the C++ union rules. I suggest calling memcmp instead.

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +649,5 @@
> +        static SimdConstant zero(0,0,0,0);
> +        if (v == zero) {
> +            pxor(dest, dest);
> +            return true;
> +        }

Another value that's nice and easy to inline here is (-1,-1,-1,-1) since you can do that with just pcmpeqw(reg, reg) (Agner Optimizing Assembly table 13.10).

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +157,5 @@
>          bind(&flt.uses);
>          masm.floatConstant(flt.value);
>      }
>  
> +    // SIMD memory values must be 16-bits aligned

s/bits/bytes/. Actually, it may be better to avoid saying the number here at all, as it may change. How about just saying that SIMD memory must be suitably aligned (here and in the x86 version too).

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +157,5 @@
> +    SimdData *f4 = getSimdData(v);
> +    if (!f4)
> +        return;
> +    JS_ASSERT(f4->type() == SimdConstant::Float32x4);
> +    masm.movdqa_mr(reinterpret_cast<const void *>(f4->uses.prev()), dest.code());

This should be movaps instead of movdqa. Functionally equivalent, but avoids a domain-crossing stall on some micro-architectures.
Attachment #8459656 - Flags: feedback?(sunfish) → feedback+
Comment on attachment 8459652 [details] [diff] [review]
1 - Constructors

Review of attachment 8459652 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +1294,5 @@
> +    bool congruentTo(const MDefinition *ins) const {
> +        if (!ins->isSimdValueX4())
> +            return false;
> +        if (ins->type() != type())
> +            return false;

congruentIfOperandsEqual already checks that the op() and type() match, so these two checks are redundant.
Attachment #8459652 - Flags: feedback?(sunfish) → feedback+
Comment on attachment 8459656 [details] [diff] [review]
2 - Constants

Review of attachment 8459656 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +1300,5 @@
>      }
>  };
>  
> +// A constant SIMD value.
> +class MSimdConstant : public MNullaryInstruction

Looking at the third patch which folds to this made me question this a little.

The alternative to having an MSimdConstant class is to have lowering do the pattern-matching to recognize an MSimdValueX4 with all-constant operands in visitMSimdValueX4 and lower it to LInt32x4 or LFloat32x4 right there.

Offhand, the advantage of doing this in lowering is that we avoid creating as many MIRNodes, though perhaps it's not that big an advantage. The advantage of using MSimdConstant is that it would make it more convenient for us to do more pattern-matching and constant folding of SIMD values. This won't matter much for emscripten-generated code (which should already have all the constant-folding done already), but it may be nice for hand-written JS code. I wonder if there are other considerations.
Comment on attachment 8459656 [details] [diff] [review]
2 - Constants

Review of attachment 8459656 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +1300,5 @@
>      }
>  };
>  
> +// A constant SIMD value.
> +class MSimdConstant : public MNullaryInstruction

Thinking about this a little more, I now think MSimdConstant is fine.

The common case for MSimdConstant will be when all the operands are immediate constants, so if we want to reduce memory usage, we can have IonBuilder and/or AsmJS.cpp detect this pattern and create MSimdConstants directly instead of creating a bunch of nodes which will be folded away. We also don't need to do this right away, so the current patches are fine for now.
Comment on attachment 8459657 [details] [diff] [review]
3 - Fold SimdValue into SimdConstant

Review of attachment 8459657 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +593,5 @@
> +{
> +    MIRType scalarType = SIMDTypeToScalarType(type());
> +    for (size_t i = 0; i < 4; ++i) {
> +        MDefinition *op = getOperand(i);
> +        if (!op->isConstant() || op->type() != scalarType)

MSimdValueX4's constructor asserts that its operands have the right type, so it shouldn't be necessary to check it again here. An assert for the types wouldn't be unwelcome though, of course.

@@ +603,5 @@
> +      case MIRType_Int32x4: {
> +        int32_t a[4];
> +        for (size_t i = 0; i < 4; ++i)
> +            a[i] = getOperand(i)->toConstant()->value().toInt32();
> +        cst = SimdConstant(a[0], a[1], a[2], a[3]);

It might be nice to add a constructor to SimdConstant to allow |a| to be passed in directly here. That would also make it a little easier to generalize to SIMD types with different numbers of elements.

@@ +610,5 @@
> +      case MIRType_Float32x4: {
> +        float a[4];
> +        for (size_t i = 0; i < 4; ++i)
> +            a[i] = getOperand(i)->toConstant()->value().toNumber();
> +        cst = SimdConstant(a[0], a[1], a[2], a[3]);

as above
Attachment #8459657 - Flags: feedback?(sunfish) → feedback+
Thanks for the great feedback! pcmpeqw was already present in the Assembler-x86-shared, so that's bonus. I've added fill{Int,Float}32(x, y, z, w) functions to SimdConstant, and two constructors that take array inputs.

(In reply to Dan Gohman [:sunfish] from comment #8)
> Comment on attachment 8459656 [details] [diff] [review]
> 2 - Constants
> 
> Review of attachment 8459656 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The common case for MSimdConstant will be when all the operands are
> immediate constants, so if we want to reduce memory usage, we can have
> IonBuilder and/or AsmJS.cpp detect this pattern and create MSimdConstants
> directly instead of creating a bunch of nodes which will be folded away. We
> also don't need to do this right away, so the current patches are fine for
> now.

Actually, I am doing this pattern matching (partially at least) in my Odin SIMD patch, for variable initializers and globals, so that's already useful. The folding was just shown as a use of the MSimdConstant in this bug.
Attached patch 1 - ConstructorsSplinter Review
Flipping to r?
Attachment #8459652 - Attachment is obsolete: true
Attachment #8469351 - Flags: review?(sunfish)
Attached patch 2 - SimdConstant (obsolete) — Splinter Review
Attachment #8469354 - Flags: review?(sunfish)
Attached patch 2 - SimdConstantSplinter Review
oops
Attachment #8459656 - Attachment is obsolete: true
Attachment #8469354 - Attachment is obsolete: true
Attachment #8469354 - Flags: review?(sunfish)
Attachment #8469355 - Flags: review?(sunfish)
Attachment #8469356 - Flags: review?(sunfish)
Attachment #8459657 - Attachment is obsolete: true
Attachment #8469357 - Flags: review?(sunfish)
Comment on attachment 8469351 [details] [diff] [review]
1 - Constructors

Review of attachment 8469351 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2076,5 @@
> +        for (size_t i = 0; i < 4; ++i) {
> +            FloatRegister r = ToFloatRegister(ins->getOperand(i));
> +            masm.storeFloat32(r, Address(StackPointer, i * sizeof(float)));
> +        }
> +        masm.loadAlignedFloat32x4(Address(StackPointer, 0), output);

Another bonus opportunity here :). We can do this sequence with 3 unpcklps instructions and no stack space, though we'll need temporaries to be able to write intermediate values, and possibly copying:

  unpcklps %xmm3, %xmm1   # %xmm1 = %xmm1.x,%xmm3.x,%xmm1.y,%xmm3.y
  unpcklps %xmm2, %xmm0   # %xmm0 = %xmm0.x,%xmm2.x,%xmm0.y,%xmm2.y
  unpcklps %xmm1, %xmm0   # %xmm0 = %xmm0.x,%xmm1.x,%xmm0.y,%xmm1.y

We can theoretically do the same thing for int32x4 with punpckldq, though it's less nice there because we need to move everything into xmm registers first.

I'm also totally ok with just adding a comment about this for now and coming back to it later.
Attachment #8469351 - Flags: review?(sunfish) → review+
Comment on attachment 8469355 [details] [diff] [review]
2 - SimdConstant

Review of attachment 8469355 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonTypes.h
@@ +241,5 @@
> +    };
> +
> +  private:
> +    bool defined_;
> +    Type type_;

Since we have our own Type enum, we could perhaps add an Undefined type, and then we wouldn't need a separate defined_ flag.

Also, if this class is default-constructed, these fields aren't initialized anywhere. There should be a default constructor which initialized these.

@@ +273,5 @@
> +        SimdConstant cst;
> +        cst.fillInt32x4(x, y, z, w);
> +        return cst;
> +    }
> +    static SimdConstant Create(int32_t *array) {

Would it make sense to rename this to CreateX4? It feels awkward to pass in a plain pointer with no explicit mention of how many elements will be read from the pointer.

@@ +283,5 @@
> +        SimdConstant cst;
> +        cst.fillFloat32x4(x, y, z, w);
> +        return cst;
> +    }
> +    static SimdConstant Create(float *array) {

Same as above.

@@ +324,5 @@
> +    // SimdConstant is a HashPolicy
> +    typedef SimdConstant Lookup;
> +    static HashNumber hash(const SimdConstant &val) {
> +        const int32_t *a = val.u.i32x4;
> +        return mozilla::HashGeneric(a[0], a[1], a[2], a[3]);

This is contrary to the C++ union rules when val is actually holding a float32x4. I recommend using HashBytes here instead.
Attachment #8469355 - Flags: review?(sunfish) → review+
Comment on attachment 8469356 [details] [diff] [review]
3 - MSimdConstant

Review of attachment 8469356 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR-Common.h
@@ +326,5 @@
>  
> +// Constant SIMD int32x4
> +class LInt32x4 : public LInstructionHelper<1, 0, 0>
> +{
> +    SimdConstant v_;

It would save some memory here to use mir()->value() instead of storing an extra copy here.

@@ +330,5 @@
> +    SimdConstant v_;
> +  public:
> +    LIR_HEADER(Int32x4);
> +
> +    LInt32x4(const SimdConstant &v) : v_(v) {}

This constructor should get the explicit keyword.

@@ +331,5 @@
> +  public:
> +    LIR_HEADER(Int32x4);
> +
> +    LInt32x4(const SimdConstant &v) : v_(v) {}
> +    SimdConstant getValue() const { return v_; }

SimdConstant is big enough that it makes sense to return a const& here.

@@ +337,5 @@
> +
> +// Constant SIMD float32x4
> +class LFloat32x4 : public LInstructionHelper<1, 0, 0>
> +{
> +    SimdConstant v_;

ditto

@@ +341,5 @@
> +    SimdConstant v_;
> +  public:
> +    LIR_HEADER(Float32x4);
> +
> +    LFloat32x4(const SimdConstant &v) : v_(v) {}

ditto

@@ +342,5 @@
> +  public:
> +    LIR_HEADER(Float32x4);
> +
> +    LFloat32x4(const SimdConstant &v) : v_(v) {}
> +    SimdConstant getValue() const { return v_; }

ditto

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +662,5 @@
>      }
>  
> +    bool maybeInlineInt32x4(const SimdConstant &v, const FloatRegister &dest) {
> +        static SimdConstant zero = SimdConstant::Create(0, 0, 0, 0);
> +        static SimdConstant minusOne = SimdConstant::Create(-1, -1, -1, -1);

These variables can be const, which helps people auditing code for non-const statics.

@@ +674,5 @@
> +        }
> +        return false;
> +    }
> +    bool maybeInlineFloat32x4(const SimdConstant &v, const FloatRegister &dest) {
> +        static SimdConstant zero = SimdConstant::Create(0.f, 0.f, 0.f, 0.f);

Another const.

@@ +675,5 @@
> +        return false;
> +    }
> +    bool maybeInlineFloat32x4(const SimdConstant &v, const FloatRegister &dest) {
> +        static SimdConstant zero = SimdConstant::Create(0.f, 0.f, 0.f, 0.f);
> +        if (v == zero) {

This == works because SimdConstant's operator== uses memcmp, rather than elementwise == because -0.0 == 0.0. A brief comment about this would be good.

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +79,5 @@
>      masm.setNextJump(j, prev);
>  }
>  
> +MacroAssemblerX64::SimdData *
> +MacroAssemblerX64::lookupOrAddSimdConstant(const SimdConstant &v)

The corresponding function on x86 is named getSimdData. I'm ok with either name, but it'd be nice to use the same name for both, unless there's some subtle difference I'm not seeing. If the code is actually the same, it'd be nice to move it to shared/MacroAssembler-x86-shared.*
Attachment #8469356 - Flags: review?(sunfish) → review+
Attachment #8469357 - Flags: review?(sunfish) → review+
(In reply to Dan Gohman [:sunfish] from comment #16)
>   unpcklps %xmm3, %xmm1   # %xmm1 = %xmm1.x,%xmm3.x,%xmm1.y,%xmm3.y
>   unpcklps %xmm2, %xmm0   # %xmm0 = %xmm0.x,%xmm2.x,%xmm0.y,%xmm2.y
>   unpcklps %xmm1, %xmm0   # %xmm0 = %xmm0.x,%xmm1.x,%xmm0.y,%xmm1.y
> 
> We can theoretically do the same thing for int32x4 with punpckldq, though
> it's less nice there because we need to move everything into xmm registers
> first.

Also with SSE4.1, int32x4 construction can be done nicely with pinsrd:

  movd   %edi, %xmm0
  pinsrd $1, %esi, %xmm0
  pinsrd $2, %edx, %xmm0
  pinsrd $3, %ecx, %xmm0

Again, not imperative to do now.
See question below regarding the default constructor.

(In reply to Dan Gohman [:sunfish] from comment #17)
> Comment on attachment 8469355 [details] [diff] [review]
> 2 - SimdConstant
> 
> Review of attachment 8469355 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/IonTypes.h
> @@ +241,5 @@
> > +    };
> > +
> > +  private:
> > +    bool defined_;
> > +    Type type_;
> 
> Since we have our own Type enum, we could perhaps add an Undefined type, and
> then we wouldn't need a separate defined_ flag.
Good idea. That makes sizeof(SimdConstant) == Simd128DataSize, which is good.

> 
> Also, if this class is default-constructed, these fields aren't initialized
> anywhere. There should be a default constructor which initialized these.
That actually cannot be done, and that's why we have the CreateX4 functions. If we have a user-defined ctor, we cannot include SimdConstant into unions, as it has a non trivial ctor. C++11 allows to create default ctors, but all our targeted platforms don't support it (B2G simulators in particular complained about it, during a try build i made a few weeks ago). I will add a comment to make it clearer. Is there anything else we can do here?

> 
> @@ +273,5 @@
> > +        SimdConstant cst;
> > +        cst.fillInt32x4(x, y, z, w);
> > +        return cst;
> > +    }
> > +    static SimdConstant Create(int32_t *array) {
> 
> Would it make sense to rename this to CreateX4? It feels awkward to pass in
> a plain pointer with no explicit mention of how many elements will be read
> from the pointer.
I named it CreateX4 everywhere, for consistency purposes. Thanks.

> 
> @@ +324,5 @@
> > +    // SimdConstant is a HashPolicy
> > +    typedef SimdConstant Lookup;
> > +    static HashNumber hash(const SimdConstant &val) {
> > +        const int32_t *a = val.u.i32x4;
> > +        return mozilla::HashGeneric(a[0], a[1], a[2], a[3]);
> 
> This is contrary to the C++ union rules when val is actually holding a
> float32x4. I recommend using HashBytes here instead.
Thanks, I didn't know this existed! Fixed.
Flags: needinfo?(sunfish)
(In reply to Benjamin Bouvier [:bbouvier] from comment #20)
> > Also, if this class is default-constructed, these fields aren't initialized
> > anywhere. There should be a default constructor which initialized these.
> That actually cannot be done, and that's why we have the CreateX4 functions.
> If we have a user-defined ctor, we cannot include SimdConstant into unions,
> as it has a non trivial ctor. C++11 allows to create default ctors, but all
> our targeted platforms don't support it (B2G simulators in particular
> complained about it, during a try build i made a few weeks ago). I will add
> a comment to make it clearer. Is there anything else we can do here?

Oh, I think I forgot about this when I saw the defined_ flag, because I though that that meant that we always knew if we were defined or not, but then I noticed that it wasn't actually initialized. I don't know of anything else we can easily do here at present, so the code is fine, with a comment :).
Flags: needinfo?(sunfish)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: