Status

()

defect
RESOLVED FIXED
5 years ago
a year ago

People

(Reporter: ivan, Assigned: bbouvier)

Tracking

Trunk
x86_64
All
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 3 obsolete attachments)

(Reporter)

Description

5 years ago
int32x4 withX(integer x)
int32x4 withY(integer y)
int32x4 withZ(integer z)
int32x4 withW(integer w)

float32x4 withX(double x)
float32x4 withY(double y)
float32x4 withZ(double z)
float32x4 withW(double w)
(Reporter)

Updated

5 years ago
Blocks: 1023404
Assignee: nobody → ivan
For consistency with the new naming in bug 1021716 (ExtractElement), it seems code for these should probably use InsertElement for its names.
(Assignee)

Comment 2

5 years ago
Assignee: ivan → benj
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8490156 - Flags: review?(sunfish)
(Assignee)

Comment 3

5 years ago
Posted patch 1) SIMD with (obsolete) — Splinter Review
Naive strategy:
- spill the entire SIMD register datum onto the stack
- replace the value on the stack
- read the entire SIMD register back

I just discovered INSERTPS for float32x4 if we have SSE4.1, so expect another patch soon :)
Attachment #8490173 - Flags: review?(sunfish)
(Assignee)

Comment 4

5 years ago
Attachment #8490174 - Flags: review?(luke)
Attachment #8490156 - Flags: review?(sunfish) → review+
Comment on attachment 8490174 [details] [diff] [review]
2) Odin support

Review of attachment 8490174 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.

::: js/src/asmjs/AsmJSValidate.cpp
@@ +4847,5 @@
> +  public:
> +    explicit CheckSimdVectorScalarArgs(Type t) : formalType_(t) {}
> +
> +    bool operator()(FunctionCompiler &f, ParseNode *arg, unsigned argIndex, Type actualType,
> +                    MDefinition **argDef) const

Pre-existing: with the make-float32x4-less-coercive patch, do we still need to pass argDef?
Attachment #8490174 - Flags: review?(luke) → review+
Comment on attachment 8490173 [details] [diff] [review]
1) SIMD with

Review of attachment 8490173 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Lowering.cpp
@@ +3725,5 @@
> +{
> +    JS_ASSERT(IsSimdType(ins->type()));
> +
> +    LUse vec = useRegisterAtStart(ins->vector());
> +    LUse val = useRegisterAtStart(ins->value());

As with the other patch, this should be useRegister, since it's the non-reused input in a defineReuse instruction.

::: js/src/jit/MIR.h
@@ +1413,5 @@
> +// Replaces the datum in the given lane by a scalar value of the same type.
> +class MSimdInsertElement : public MBinaryInstruction
> +{
> +  protected:
> +    SimdLane lane_;

This could be private rather than protected, since subclasses use the lane() accessor.

::: js/src/jit/shared/Assembler-x86-shared.h
@@ +529,4 @@
>      // register-to-register form has different semantics (it doesn't clobber
>      // the whole output register) and isn't needed currently.
> +    // The same comment should be applied to movss, if not for moving the lowest
> +    // lane of a float32 register to a float32x4 register.

Instead of saying that a comment should be applied elsewhere, let's just put the comment there (by movss) :-).

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2220,5 @@
>      SimdLane lane = ins->lane();
>      if (lane == LaneX) {
>          // The value we want to extract is in the low double-word
>          if (input != output)
> +            masm.movss(input, output);

For a SIMD extract, movss is suboptimal because we don't care about the high elements, so we'd prefer to clobber them and break dependencies on earlier instructions. moveFloat32 uses movaps, which is what we want here.

@@ +2241,5 @@
> +    MOZ_ASSERT(vector == output); // defineReuseInput(0)
> +
> +    unsigned component = 0;
> +    switch (ins->lane()) {
> +      case SimdLane::LaneX: break;

It might be nice to have a comment somewhere around here mentioning that we can't use movd because it zeros the other elements.

@@ +2245,5 @@
> +      case SimdLane::LaneX: break;
> +      case SimdLane::LaneY: component = 1; break;
> +      case SimdLane::LaneZ: component = 2; break;
> +      case SimdLane::LaneW: component = 3; break;
> +    }

The SimdLane enum values are the same as the component values here, and they seem unlikely to change. Can we just do "unsigned component = (unsigned)ins->lane()"?

@@ +2256,5 @@
> +    masm.reserveStack(Simd128DataSize);
> +    masm.storeAlignedInt32x4(vector, Address(StackPointer, 0));
> +    masm.store32(value, Address(StackPointer, component * sizeof(int32_t)));
> +    masm.loadAlignedInt32x4(Address(StackPointer, 0), output);
> +    masm.freeStack(Simd128DataSize);

This is doable without using memory by using a temporary register and shuffle instructions, but it's not urgent to do this now.

@@ +2285,5 @@
> +    masm.reserveStack(Simd128DataSize);
> +    masm.storeAlignedFloat32x4(vector, Address(StackPointer, 0));
> +    masm.storeFloat32(value, Address(StackPointer, component * sizeof(int32_t)));
> +    masm.loadAlignedFloat32x4(Address(StackPointer, 0), output);
> +    masm.freeStack(Simd128DataSize);

The patch to make this use insertps on SSE4.1 is greatly anticipated :-).

On pre-SSE4.1 this is doable with temporaries and shuffles, but it's not urgent.
Attachment #8490173 - Flags: review?(sunfish)
(Assignee)

Comment 7

5 years ago
Posted patch 1) SIMD with, v2 (obsolete) — Splinter Review
Attachment #8491425 - Flags: review?(sunfish)
(Assignee)

Updated

5 years ago
Attachment #8490173 - Attachment is obsolete: true
(Assignee)

Comment 8

5 years ago
Changes: slightly different control flow in float32 codegen, to also use the unsigned(ins->lane()) way.
Attachment #8491425 - Attachment is obsolete: true
Attachment #8491425 - Flags: review?(sunfish)
Attachment #8491433 - Flags: review?(sunfish)
(Assignee)

Comment 9

5 years ago
So I've tried a simple API to create the mask. As it's pretty specific to insertps, and thus to SSE4.1, I've put the method in Assembler-x86-shared.cpp (hesitated between this one and the MacroAssembler).
Attachment #8491450 - Flags: review?(sunfish)
Comment on attachment 8491450 [details] [diff] [review]
3) Add INSERTPS and use it whenever possible

Review of attachment 8491450 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/Assembler-x86-shared.h
@@ +1969,5 @@
> +        unsigned ret = 0;
> +        if (zero) {
> +            for (size_t i = 0; i < 4; i++)
> +                ret |= unsigned(zero[i]) << i;
> +        }

The zero argument isn't used anywhere yet, but I imagine its use would look something like this:

  insertpsMask(src, dst, (bool[]){ true, false, true, false });

or maybe:

  bool zero[4] = {};
  zero[LaneX] = true;
  zero[LaneY] = true;
  insertpsMask(src, dst, zero);

I guess it's a matter of preference, but I'd find an API like this more readable:

  insertpsMask(src, dst, (1 << LaneX) | (1 << LaneZ));

As a bonus, it'd be simpler to implement. Feel free to disagree if you feel strongly about it though.
Attachment #8491450 - Flags: review?(sunfish) → review+
Comment on attachment 8491433 [details] [diff] [review]
1) SIMD with, v2.1

Review of attachment 8491433 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2239,5 @@
> +    Register value = ToRegister(ins->value());
> +    FloatRegister output = ToFloatRegister(ins->output());
> +    MOZ_ASSERT(vector == output); // defineReuseInput(0)
> +
> +    unsigned component = unsigned(ins->lane());

Style nit: put a blank line here, as a multi-line comment looks prettier with a blank line before it.

@@ +2262,5 @@
> +{
> +    FloatRegister vector = ToFloatRegister(ins->vector());
> +    FloatRegister value = ToFloatRegister(ins->value());
> +    FloatRegister output = ToFloatRegister(ins->output());
> +    MOZ_ASSERT(vector == output); // defineReuseInput

The comment should be defineReuseInput(0), or the comment in visitSimdInsertElementI should be just defineReuseInput, so that they're not spuriously different.
Attachment #8491433 - Flags: review?(sunfish) → review+
(Assignee)

Comment 13

5 years ago
Also, I will try to see whether I can do withFlag{X,Y,Z,W} from the With instructions. We can generate it at the MIR level directly, sharing some constants and allowing optimizations in some easy cases.

withFlagX(vector, value) is equivalent to the following:

var truthy = !!value;
var tmp = truthy ? 0xFFFFFFFF : 0x0;
return SIMD.int32x4.withX(vector, tmp);
Keywords: leave-open
(Assignee)

Comment 14

5 years ago
Posted patch 4) withFlag (obsolete) — Splinter Review
As expected, it could be done during MIR generation.
Attachment #8492190 - Flags: review?(luke)
I think I'm missing some of the backstory of withFlag*.  Is there not a way to implement withFlag* in SIMD ops that avoids a branch (and thus a new MIR/LIR node to express withFlag directly)?  What would Ion do for withFlag*?
(Assignee)

Comment 17

5 years ago
(In reply to Luke Wagner [:luke] from comment #15)
> I think I'm missing some of the backstory of withFlag*.  Is there not a way
> to implement withFlag* in SIMD ops that avoids a branch (and thus a new
> MIR/LIR node to express withFlag directly)?  What would Ion do for withFlag*?

From my look at the Intel x86 dev manual (and according to the two SIMD implementations I have access to), there is no such instruction. Dan, can you please confirm? If such an instruction isn't available, it might be also a good start for a spec discussion (what is the use case?).

For Ion, we could do the same: generate the MIR instructions by hand (like in this patch), and have fallible ToInt32x4 / ToInt32 for the arguments.
Flags: needinfo?(sunfish)
(In reply to Benjamin Bouvier [:bbouvier] from comment #17)
> (In reply to Luke Wagner [:luke] from comment #15)
> > I think I'm missing some of the backstory of withFlag*.  Is there not a way
> > to implement withFlag* in SIMD ops that avoids a branch (and thus a new
> > MIR/LIR node to express withFlag directly)?  What would Ion do for withFlag*?
> 
> From my look at the Intel x86 dev manual (and according to the two SIMD
> implementations I have access to), there is no such instruction. Dan, can
> you please confirm? If such an instruction isn't available, it might be also
> a good start for a spec discussion (what is the use case?).

For MIR for now, could we do -1+ToInt32(Not(x)), or perhaps better 0-ToInt32(Not(Not(x))?

That ought to get us pretty close to x86 code like this:

    xorl  %eax, %eax
    testl %edi, %edi
    setne %al
    negl  %eax

Another possible sequence is this:

    negl %eax
    sbbl %ecx, %ecx

but Agner suggests that sbb is not a dependence-breaking instruction on at least Core2 and Nehalem, so this may be slower.

> For Ion, we could do the same: generate the MIR instructions by hand (like
> in this patch), and have fallible ToInt32x4 / ToInt32 for the arguments.

I don't have enough context to comment on this at present.
Flags: needinfo?(sunfish)
(Assignee)

Comment 19

5 years ago
Ha, that's indeed better. I've filed [0] on the ecmascript_simd repo, to discuss the interest of such a function.

The generated code now looks like this (there is room for improvement here):
   0x7ffff7ff5024:	test   %edi,%edi
   0x7ffff7ff5026:	sete   %al
   0x7ffff7ff5029:	movzbl %al,%eax
   0x7ffff7ff502c:	test   %eax,%eax
   0x7ffff7ff502e:	sete   %al
   0x7ffff7ff5031:	movzbl %al,%eax
   0x7ffff7ff5034:	xor    %ecx,%ecx
   0x7ffff7ff5036:	sub    %eax,%ecx

[0] https://github.com/johnmccutchan/ecmascript_simd/issues/68
(Assignee)

Comment 20

5 years ago
Posted patch 4) withFlagSplinter Review
Considering the size of the patch, it seems acceptable to have it in the tree, even if the instructions get discarded by the spec people later.
Attachment #8492190 - Attachment is obsolete: true
Attachment #8492190 - Flags: review?(luke)
Attachment #8493626 - Flags: review?(luke)
Well, I guess the question I'm getting at is: why is this a SIMD op (in the spec) if it doesn't translate into a SIMD instruction?  The two reasons I can think of are: there is a NEON SIMD op; this is a common operation and we can generate more efficient code than you could using the other primitives.  Is either of these the case?
(Assignee)

Comment 23

5 years ago
Yeah, I just wanted to implement the equivalent of the current spec in asm.js, but if we're waiting on an answer here, it could take some time. I'll open a new bug if needs be.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Comment 24

5 years ago
Comment on attachment 8493626 [details] [diff] [review]
4) withFlag

Cancelling review for now.
Attachment #8493626 - Flags: review?(luke)
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.