Closed Bug 1025127 Opened 10 years ago Closed 10 years ago

SIMD backend: implement Comparisons

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: ivan, Assigned: ivan)

References

Details

Attachments

(1 file, 7 obsolete files)

SIMD.float32x4:
int32x4 greaterThan(float32x4 a, float32x4 b)
int32x4 greaterThanOrEqual(float32x4 a, float32x4 b)
int32x4 lessThan(float32x4 a, float32x4 b)
int32x4 lessThanOrEqual(float32x4 a, float32x4 b)
int32x4 equal(float32x4 a, float32x4 b)
int32x4 notEqual(float32x4 a, float32x4 b)

SIMD.int32x4:
int32x4 greaterThan(int32x4 a, int32x4 b)
int32x4 lessThan(int32x4 a, int32x4 b)
int32x4 equal(int32x4 a, int32x4 b)
Blocks: 1023404
Assignee: nobody → ivan
Just saw #ionmonkey scrollback, gave you editbugs/canconfirm permissions, so you should be able to assign yourself to bugs, mark them as confirmed, etc.  Use these powers wisely!  :-)
Attached patch bug1025127.txt (obsolete) — Splinter Review
Attachment #8462269 - Flags: review?(sunfish)
Comment on attachment 8462269 [details] [diff] [review]
bug1025127.txt

Review of attachment 8462269 [details] [diff] [review]:
-----------------------------------------------------------------

(drive by review, as I was curious :)) Looks pretty good to me! Thanks for doing this.

::: js/src/assembler/assembler/X86Assembler.h
@@ +328,5 @@
>          OP2_MOVZX_GvEb      = 0xB6,
>          OP2_MOVZX_GvEw      = 0xB7,
>          OP2_XADD_EvGv       = 0xC1,
>          OP2_PEXTRW_GdUdIb   = 0xC5,
> +        OP2_CMPPS_VsdWsd    = 0xC2,

OP2_CMPPS_VpsWps (see also http://ref.x86asm.net/geek.html - I plan to add this link at the top of this list in a future separate bug)

::: js/src/jit/MIR.h
@@ +1304,5 @@
>  };
>  
> +
> +// Applies the with operation to the input, creating a copy of the
> +// vector with 1 lane changed

Hmm this comment seems to apply for your other patch :)

@@ +1323,5 @@
> +
> +    MSimdBinaryComp(MDefinition *left, MDefinition *right, Operation op, MIRType type)
> +      : MBinaryInstruction(left, right), operation_(op)
> +    {
> +        JS_ASSERT(IsSIMDType(type));

oops, forgot to upload my new patch in which IsSIMDType got renamed into IsSimdType.

@@ +1342,5 @@
> +    AliasSet getAliasSet() const {
> +        return AliasSet::None();
> +    }
> +
> +    Operation operation() const { return operation_;}

can you add a space after the semicolon please?

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2151,5 @@
>      return true;
>  }
>  
>  bool
> +CodeGeneratorX86Shared::visitSimdBinaryCompIx4(LSimdBinaryCompIx4 *ins) 

nit: trailing space

@@ +2163,5 @@
> +      case MSimdBinaryComp::greaterThan:
> +        masm.packedGreaterThanInt32x4(rhs, lhs);
> +        return true;
> +      case MSimdBinaryComp::lessThan:
> +        masm.packedGreaterThanInt32x4(ToOperand(ins->lhs()), ToFloarRegister(ins->rhs()));

nit: you can reuse rhs and lhs here.

@@ +2169,5 @@
> +      case MSimdBinaryComp::equal:
> +        masm.packedEqualInt32x4(rhs, lhs);
> +        return true;
> +      default:
> +        MOZ_ASSUME_UNREACHABLE("unexpected SIMD op");

I would:
1) put the other cases explicitly here, with a break;
2) move the MOZ_ASSUME_UNREACHABLE outside the switch
3) not put a default case
This way, if we ever add any other enum options, compilers will complain and emit warnings saying that a case is not handled in an enum. Up to you though, it's mainly a question of style.

@@ +2174,5 @@
> +    }
> +}
> +
> +bool
> +CodeGeneratorX86Shared::visitSimdBinaryCompFx4(LSimdBinaryCompFx4 *ins) 

nit: trailing space

@@ +2200,5 @@
> +        return true;
> +      case MSimdBinaryComp::greaterThanOrEqual:
> +        masm.packedCompareFloat32x4(rhs, lhs, 0x5);
> +        return true;
> +      default:

See comment above regarding the default case and warning.
Attachment #8462269 - Flags: feedback+
(In reply to Benjamin Bouvier [:bbouvier] from comment #3)
> @@ +2163,5 @@
> > +      case MSimdBinaryComp::greaterThan:
> > +        masm.packedGreaterThanInt32x4(rhs, lhs);
> > +        return true;
> > +      case MSimdBinaryComp::lessThan:
> > +        masm.packedGreaterThanInt32x4(ToOperand(ins->lhs()), ToFloarRegister(ins->rhs()));
> 
> nit: you can reuse rhs and lhs here.

As Ivan told me on IRC, we're using packedGreaterThanInt32x4 here as there's no SSE instruction for lessThan for int32x4, so this comment can be forgotten (modulo ToFloarRegister => ToFloatRegister). I think rhs may be in something else than a register (e.g. on the stack, at a memory address), and thus ToFloatRegister will fail in this case.
Attached patch bug1025127.txt (obsolete) — Splinter Review
Thanks for the quick review! I made the changes noted except the one for the rhs possibly being on the stack or memory address which would cause ToFloatRegister to fail. Will need to look into this more.
Attachment #8462269 - Attachment is obsolete: true
Attachment #8462269 - Flags: review?(sunfish)
Attachment #8462457 - Flags: review?(sunfish)
Attachment #8462457 - Flags: review?(benj)
Comment on attachment 8462457 [details] [diff] [review]
bug1025127.txt

Review of attachment 8462457 [details] [diff] [review]:
-----------------------------------------------------------------

Looks even better.

::: js/src/jit/MIR.h
@@ +1343,5 @@
> +        return AliasSet::None();
> +    }
> +
> +    Operation operation() const { return operation_; }
> +

nit: blank line

@@ +1346,5 @@
> +    Operation operation() const { return operation_; }
> +
> +};
> +
> +

nit: blank line

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2158,5 @@
> +    Operand rhs = ToOperand(ins->rhs());
> +    JS_ASSERT(ToFloatRegister(ins->output()) == lhs);
> +
> +    MSimdBinaryComp::Operation op = ins->operation();
> +    switch(op){

nit:
switch (op) {

@@ +2163,5 @@
> +      case MSimdBinaryComp::greaterThan:
> +        masm.packedGreaterThanInt32x4(rhs, lhs);
> +        return true;
> +      case MSimdBinaryComp::lessThan:
> +        masm.packedGreaterThanInt32x4(ToOperand(ins->lhs()), ToFloatRegister(ins->rhs()));

You can just move rhs on the ScratchFloatRegister and then use the same method?

@@ +2184,5 @@
> +    Operand rhs = ToOperand(ins->rhs());
> +    JS_ASSERT(ToFloatRegister(ins->output()) == lhs);
> +
> +    MSimdBinaryComp::Operation op = ins->operation();
> +    switch(op){

ditto

@@ +2201,5 @@
> +      case MSimdBinaryComp::greaterThan:
> +        masm.packedCompareFloat32x4(rhs, lhs, 0x6);
> +        return true;
> +      case MSimdBinaryComp::greaterThanOrEqual:
> +        masm.packedCompareFloat32x4(rhs, lhs, 0x5);

Can you put this one above greaterThan, so that the opcode series gets monotonic? (where have you been, 0x3?)
Attachment #8462457 - Flags: review?(benj) → feedback+
Comment on attachment 8462457 [details] [diff] [review]
bug1025127.txt

Review of attachment 8462457 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR-Common.h
@@ +218,5 @@
>      SIMDLane laneW() const { return laneW_; }
>  };
>  
> +// Binary SIMD comparison operation between two SIMD operands
> +class LSimdBinaryComp: public LInstructionHelper<1, 2, 0>

This class can override the extraName() function to make spew output a little easier to read.

@@ +220,5 @@
>  
> +// Binary SIMD comparison operation between two SIMD operands
> +class LSimdBinaryComp: public LInstructionHelper<1, 2, 0>
> +{
> +    MSimdBinaryComp::Operation operation_;

An alternative to storing the operation_ in the class here is to implement this class's operation() as mir()->operation().

@@ +223,5 @@
> +{
> +    MSimdBinaryComp::Operation operation_;
> +
> +  public:
> +    LSimdBinaryComp(MSimdBinaryComp::Operation op) : operation_(op) {}

This constuctor should be protected, since it's only supposed to be called by its subclasses.

::: js/src/jit/MIR.h
@@ +1304,5 @@
>  };
>  
> +
> +// Applies the with operation to the input, creating a copy of the
> +// vector with 1 lane changed

This comment appears to be a copy+pasto.

The comment for this class should describe the return value, since there are many flavors of Simd comparisons and they primarily differ in their return value. Does it return a vector or does it reduce to a single boolean? If a vector, is true indicated by 1 or by all-ones or by an unspecified non-zero value? I think I know the answers here, but these are the questions I'm likely to have if I don't :-).

@@ +1327,5 @@
> +        JS_ASSERT(IsSIMDType(type));
> +        JS_ASSERT(left->type() == right->type());
> +        JS_ASSERT(left->type() == type);
> +        setResultType(type);
> +        setMovable();

equal and notEqual can get setCommutative too.

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +480,5 @@
>      }
> +    void packedEqualInt32x4(const Operand &src, FloatRegister dest) {
> +        pcmpeqd(src, dest);
> +    }
> +    void packedGreaterThanInt32x4(const Operand &src, FloatRegister dest){

Nit: here and elsewhere, put a space between ){
Attachment #8462457 - Flags: review?(sunfish) → review+
Rebased. Carry forward the r+.

Have not looked at the feedback, just rebased.
Attachment #8462457 - Attachment is obsolete: true
Attachment #8479718 - Flags: review+
Flags: needinfo?(benj)
Attached patch Patch with addressed comments (obsolete) — Splinter Review
Addressed all comments. Needs integration and testing in the Odin patch. (ideally, this should land before Shuffle and other ops, as it's higher priority)
Attachment #8479718 - Attachment is obsolete: true
Attachment #8480015 - Flags: review+
Flags: needinfo?(benj)
Attachment #8479718 - Attachment is patch: true
* Some of the LIR methods were still passing around the op, breaking compilation. Just removed them as they are no longer needed.

* visitSimdBinaryCompIx4: the lessThan case would not compile, and it looks like a lessThanOrEqual so moved.

* Tried to fix a ScratchFloat32Reg reference and move. I guess this works because the float32 move uses movaps, but this is probably something to clean up someday.
Attachment #8480015 - Attachment is obsolete: true
Attachment #8480367 - Flags: review+
Flags: needinfo?(benj)
Attached patch New patch (obsolete) — Splinter Review
Enough things have changed so that we need another quick review:

- SSE prefix were missing for int32x4 BaseAssembler's functions
- int32x4.lessThan codegen has changed (needed to take into account that rhs can be in other places than a register)
- MSimdBinaryComp unconditionnally returns MIRType_int32x4, so we need to add an enum to know the type of the compare (as done in MCompare).

This patch has been tested on basic inputs in Odin and seems to work well.
Attachment #8480367 - Attachment is obsolete: true
Attachment #8480491 - Flags: review?(sunfish)
Flags: needinfo?(benj)
Comment on attachment 8480491 [details] [diff] [review]
New patch

Review of attachment 8480491 [details] [diff] [review]:
-----------------------------------------------------------------

Overall looks good, I have several comments, and another review round would be good.

::: js/src/jit/Lowering.cpp
@@ +3729,5 @@
>  
>  bool
> +LIRGenerator::visitSimdBinaryComp(MSimdBinaryComp *ins)
> +{
> +    JS_ASSERT(ins->type() == MIRType_Int32x4);

Please use MOZ_ASSERT rather than JS_ASSERT for new code.

::: js/src/jit/MIR.h
@@ +1413,5 @@
> +        return AliasSet::None();
> +    }
> +
> +    Operation operation() const { return operation_; }
> +    CompareType compareType() const { return compareType_; }

Please implement a congruentTo function for MBinarySimdComp.

::: js/src/jit/shared/Assembler-x86-shared.h
@@ +1544,5 @@
> +          default:
> +            MOZ_CRASH("unexpected operand kind");
> +        }
> +    }
> +    void cmpps(const Operand &src, FloatRegister dest, uint8_t order) {

Instead of having a generic uint8_t for the immediate, let's add an enum to BaseAssembler-x86-shared.h and use the enum values in BaseAssembler-x86-shared.h and here. For example, see the RoundingMode enum.

::: js/src/jit/shared/BaseAssembler-x86-shared.h
@@ +2583,5 @@
> +        m_formatter.prefix(PRE_SSE_66);
> +        m_formatter.twoByteOp(OP2_PCMPGTD_VdqWdq, (RegisterID)dst, address);
> +    }
> +
> +    void cmpps_rr(XMMRegisterID src, XMMRegisterID dst, uint8_t order){

Style nit: put the { on its own line (I know this style isn't consistently applied across the codebase, but it's nice to follow the local style).

@@ +2590,5 @@
> +        m_formatter.twoByteOp(OP2_CMPPS_VpsWps, (RegisterID)dst, (RegisterID)src);
> +        m_formatter.immediate8(order);
> +    }
> +
> +    void cmpps_mr(int offset, RegisterID base, XMMRegisterID dst, uint8_t order){

ditto

@@ +2597,5 @@
> +        m_formatter.twoByteOp(OP2_CMPPS_VpsWps, (RegisterID)dst, base, offset);
> +        m_formatter.immediate8(order);
> +    }
> +
> +    void cmpps_mr(const void* address, XMMRegisterID dst, uint8_t order){

ditto

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2196,5 @@
> +CodeGeneratorX86Shared::visitSimdBinaryCompIx4(LSimdBinaryCompIx4 *ins)
> +{
> +    FloatRegister lhs = ToFloatRegister(ins->lhs());
> +    Operand rhs = ToOperand(ins->rhs());
> +    JS_ASSERT(ToFloatRegister(ins->output()) == lhs);

Please use MOZ_ASSERT instead of JS_ASSERT for new code.

@@ +2199,5 @@
> +    Operand rhs = ToOperand(ins->rhs());
> +    JS_ASSERT(ToFloatRegister(ins->output()) == lhs);
> +
> +    MSimdBinaryComp::Operation op = ins->operation();
> +    switch (op){

Style nit: put a space between ) and {.

@@ +2215,5 @@
> +            masm.loadAlignedInt32x4(rhs, ScratchSimdReg);
> +
> +        // scr := scr > lhs (i.e. lhs < rhs)
> +        masm.packedGreaterThanInt32x4(ToOperand(ins->lhs()), ScratchSimdReg);
> +        masm.moveAlignedInt32x4(ScratchFloat32Reg, lhs);

Hmm, yeah. I guess we could improve this by doing custom lowering instead of just using lowerForFPU, and tell the register allocator that in this case it's the rhs that's tied to the output register. However, I think what you have here is fine for now, though please mention this opportunity in a comment.

@@ +2220,5 @@
> +        return true;
> +      case MSimdBinaryComp::notEqual:
> +      case MSimdBinaryComp::greaterThanOrEqual:
> +      case MSimdBinaryComp::lessThanOrEqual:
> +        break;

It'd be nice to put a separate "not yet implemented" assert here rather than following through to "unexpected SIMD op".

@@ +2230,5 @@
> +CodeGeneratorX86Shared::visitSimdBinaryCompFx4(LSimdBinaryCompFx4 *ins)
> +{
> +    FloatRegister lhs = ToFloatRegister(ins->lhs());
> +    Operand rhs = ToOperand(ins->rhs());
> +    JS_ASSERT(ToFloatRegister(ins->output()) == lhs);

MOZ_ASSERT here too.

@@ +2233,5 @@
> +    Operand rhs = ToOperand(ins->rhs());
> +    JS_ASSERT(ToFloatRegister(ins->output()) == lhs);
> +
> +    MSimdBinaryComp::Operation op = ins->operation();
> +    switch (op){

style nit ditto

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +517,5 @@
>          movups(src, Operand(dest));
>      }
> +    void packedCompareFloat32x4(const Operand &src, FloatRegister dest, uint8_t order) {
> +        cmpps(src, dest, order);
> +    }

I guess the purpose of these wrappers is to provide a (somewhat) target-independent API, however the uint8_t argument is x86-specific (the same applies if changed to an enum defined in X86Assembler, following my comment above). I think we should use the existing DoubleCondition enum in this API, and translate it to the appropriate x86 value for the cmpps call.
Attachment #8480491 - Flags: review?(sunfish)
Attached patch New patch (obsolete) — Splinter Review
(In reply to Dan Gohman [:sunfish] from comment #12)
> @@ +2220,5 @@
> > +        return true;
> > +      case MSimdBinaryComp::notEqual:
> > +      case MSimdBinaryComp::greaterThanOrEqual:
> > +      case MSimdBinaryComp::lessThanOrEqual:
> > +        break;
> 
> It'd be nice to put a separate "not yet implemented" assert here rather than
> following through to "unexpected SIMD op".

These are operations are not part of the spec (thus they won't actually be implemented)

> 
> ::: js/src/jit/shared/MacroAssembler-x86-shared.h
> @@ +517,5 @@
> >          movups(src, Operand(dest));
> >      }
> > +    void packedCompareFloat32x4(const Operand &src, FloatRegister dest, uint8_t order) {
> > +        cmpps(src, dest, order);
> > +    }
> 
> I guess the purpose of these wrappers is to provide a (somewhat)
> target-independent API, however the uint8_t argument is x86-specific (the
> same applies if changed to an enum defined in X86Assembler, following my
> comment above). I think we should use the existing DoubleCondition enum in
> this API, and translate it to the appropriate x86 value for the cmpps call.

I didn't add the enum for the uint8_t in the float32x4 compares because of this comment (which I'm not clear how to implement). Since we are currently landing support for x86 only, should I go ahead and just add the enum and add a comment to change it later to this? or someone else can implement this.
Attachment #8480491 - Attachment is obsolete: true
Attachment #8480832 - Flags: review?(sunfish)
Comment on attachment 8480832 [details] [diff] [review]
New patch

Review of attachment 8480832 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, looks good, r+, with a few more minor comments.

(In reply to Ivan Jibaja from comment #13)
> Created attachment 8480832 [details] [diff] [review]
> New patch
> 
> (In reply to Dan Gohman [:sunfish] from comment #12)
> > @@ +2220,5 @@
> > > +        return true;
> > > +      case MSimdBinaryComp::notEqual:
> > > +      case MSimdBinaryComp::greaterThanOrEqual:
> > > +      case MSimdBinaryComp::lessThanOrEqual:
> > > +        break;
> > 
> > It'd be nice to put a separate "not yet implemented" assert here rather than
> > following through to "unexpected SIMD op".
> 
> These are operations are not part of the spec (thus they won't actually be
> implemented)

Ah, ok. Let's add a comment about that.

> 
> > 
> > ::: js/src/jit/shared/MacroAssembler-x86-shared.h
> > @@ +517,5 @@
> > >          movups(src, Operand(dest));
> > >      }
> > > +    void packedCompareFloat32x4(const Operand &src, FloatRegister dest, uint8_t order) {
> > > +        cmpps(src, dest, order);
> > > +    }
> > 
> > I guess the purpose of these wrappers is to provide a (somewhat)
> > target-independent API, however the uint8_t argument is x86-specific (the
> > same applies if changed to an enum defined in X86Assembler, following my
> > comment above). I think we should use the existing DoubleCondition enum in
> > this API, and translate it to the appropriate x86 value for the cmpps call.
> 
> I didn't add the enum for the uint8_t in the float32x4 compares because of
> this comment (which I'm not clear how to implement). Since we are currently
> landing support for x86 only, should I go ahead and just add the enum and
> add a comment to change it later to this? or someone else can implement this.

Actually, for now, let's just omit the packedCompareFloat32x4 wrapper and call cmpps directly in CodeGenerator-x86-shared.cpp. The thing I want to avoid is mixing in target-specific details with APIs that use the target-independent naming scheme.

::: js/src/jit/MIR.h
@@ +1417,5 @@
> +    CompareType compareType() const { return compareType_; }
> +    bool congruentTo(const MDefinition *ins) const {
> +        if (!binaryCongruentTo(ins))
> +            return false;
> +        return operation_ == ins->toSimdBinaryComp()->operation();

It looks like this function is missing a closing brace.
Attachment #8480832 - Flags: review?(sunfish) → review+
Address reviewer feedback. Carry forward r+.
Attachment #8480832 - Attachment is obsolete: true
Attachment #8481187 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/5fa26de0a04f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: