Closed Bug 1025100 Opened 10 years ago Closed 10 years ago

SIMD backend: implement With

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ivan, Assigned: bbouvier)

References

Details

Attachments

(5 files, 3 obsolete files)

int32x4 withX(integer x) int32x4 withY(integer y) int32x4 withZ(integer z) int32x4 withW(integer w) float32x4 withX(double x) float32x4 withY(double y) float32x4 withZ(double z) float32x4 withW(double w)
Blocks: 1023404
Assignee: nobody → ivan
For consistency with the new naming in bug 1021716 (ExtractElement), it seems code for these should probably use InsertElement for its names.
Assignee: ivan → benj
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8490156 - Flags: review?(sunfish)
Attached patch 1) SIMD with (obsolete) — Splinter Review
Naive strategy: - spill the entire SIMD register datum onto the stack - replace the value on the stack - read the entire SIMD register back I just discovered INSERTPS for float32x4 if we have SSE4.1, so expect another patch soon :)
Attachment #8490173 - Flags: review?(sunfish)
Attached patch 2) Odin supportSplinter Review
Attachment #8490174 - Flags: review?(luke)
Attachment #8490156 - Flags: review?(sunfish) → review+
Comment on attachment 8490174 [details] [diff] [review] 2) Odin support Review of attachment 8490174 [details] [diff] [review]: ----------------------------------------------------------------- Nice. ::: js/src/asmjs/AsmJSValidate.cpp @@ +4847,5 @@ > + public: > + explicit CheckSimdVectorScalarArgs(Type t) : formalType_(t) {} > + > + bool operator()(FunctionCompiler &f, ParseNode *arg, unsigned argIndex, Type actualType, > + MDefinition **argDef) const Pre-existing: with the make-float32x4-less-coercive patch, do we still need to pass argDef?
Attachment #8490174 - Flags: review?(luke) → review+
Comment on attachment 8490173 [details] [diff] [review] 1) SIMD with Review of attachment 8490173 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/Lowering.cpp @@ +3725,5 @@ > +{ > + JS_ASSERT(IsSimdType(ins->type())); > + > + LUse vec = useRegisterAtStart(ins->vector()); > + LUse val = useRegisterAtStart(ins->value()); As with the other patch, this should be useRegister, since it's the non-reused input in a defineReuse instruction. ::: js/src/jit/MIR.h @@ +1413,5 @@ > +// Replaces the datum in the given lane by a scalar value of the same type. > +class MSimdInsertElement : public MBinaryInstruction > +{ > + protected: > + SimdLane lane_; This could be private rather than protected, since subclasses use the lane() accessor. ::: js/src/jit/shared/Assembler-x86-shared.h @@ +529,4 @@ > // register-to-register form has different semantics (it doesn't clobber > // the whole output register) and isn't needed currently. > + // The same comment should be applied to movss, if not for moving the lowest > + // lane of a float32 register to a float32x4 register. Instead of saying that a comment should be applied elsewhere, let's just put the comment there (by movss) :-). ::: js/src/jit/shared/CodeGenerator-x86-shared.cpp @@ +2220,5 @@ > SimdLane lane = ins->lane(); > if (lane == LaneX) { > // The value we want to extract is in the low double-word > if (input != output) > + masm.movss(input, output); For a SIMD extract, movss is suboptimal because we don't care about the high elements, so we'd prefer to clobber them and break dependencies on earlier instructions. moveFloat32 uses movaps, which is what we want here. @@ +2241,5 @@ > + MOZ_ASSERT(vector == output); // defineReuseInput(0) > + > + unsigned component = 0; > + switch (ins->lane()) { > + case SimdLane::LaneX: break; It might be nice to have a comment somewhere around here mentioning that we can't use movd because it zeros the other elements. @@ +2245,5 @@ > + case SimdLane::LaneX: break; > + case SimdLane::LaneY: component = 1; break; > + case SimdLane::LaneZ: component = 2; break; > + case SimdLane::LaneW: component = 3; break; > + } The SimdLane enum values are the same as the component values here, and they seem unlikely to change. Can we just do "unsigned component = (unsigned)ins->lane()"? @@ +2256,5 @@ > + masm.reserveStack(Simd128DataSize); > + masm.storeAlignedInt32x4(vector, Address(StackPointer, 0)); > + masm.store32(value, Address(StackPointer, component * sizeof(int32_t))); > + masm.loadAlignedInt32x4(Address(StackPointer, 0), output); > + masm.freeStack(Simd128DataSize); This is doable without using memory by using a temporary register and shuffle instructions, but it's not urgent to do this now. @@ +2285,5 @@ > + masm.reserveStack(Simd128DataSize); > + masm.storeAlignedFloat32x4(vector, Address(StackPointer, 0)); > + masm.storeFloat32(value, Address(StackPointer, component * sizeof(int32_t))); > + masm.loadAlignedFloat32x4(Address(StackPointer, 0), output); > + masm.freeStack(Simd128DataSize); The patch to make this use insertps on SSE4.1 is greatly anticipated :-). On pre-SSE4.1 this is doable with temporaries and shuffles, but it's not urgent.
Attachment #8490173 - Flags: review?(sunfish)
Attached patch 1) SIMD with, v2 (obsolete) — Splinter Review
Attachment #8491425 - Flags: review?(sunfish)
Attachment #8490173 - Attachment is obsolete: true
Changes: slightly different control flow in float32 codegen, to also use the unsigned(ins->lane()) way.
Attachment #8491425 - Attachment is obsolete: true
Attachment #8491425 - Flags: review?(sunfish)
Attachment #8491433 - Flags: review?(sunfish)
So I've tried a simple API to create the mask. As it's pretty specific to insertps, and thus to SSE4.1, I've put the method in Assembler-x86-shared.cpp (hesitated between this one and the MacroAssembler).
Attachment #8491450 - Flags: review?(sunfish)
Comment on attachment 8491450 [details] [diff] [review] 3) Add INSERTPS and use it whenever possible Review of attachment 8491450 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/shared/Assembler-x86-shared.h @@ +1969,5 @@ > + unsigned ret = 0; > + if (zero) { > + for (size_t i = 0; i < 4; i++) > + ret |= unsigned(zero[i]) << i; > + } The zero argument isn't used anywhere yet, but I imagine its use would look something like this: insertpsMask(src, dst, (bool[]){ true, false, true, false }); or maybe: bool zero[4] = {}; zero[LaneX] = true; zero[LaneY] = true; insertpsMask(src, dst, zero); I guess it's a matter of preference, but I'd find an API like this more readable: insertpsMask(src, dst, (1 << LaneX) | (1 << LaneZ)); As a bonus, it'd be simpler to implement. Feel free to disagree if you feel strongly about it though.
Attachment #8491450 - Flags: review?(sunfish) → review+
Comment on attachment 8491433 [details] [diff] [review] 1) SIMD with, v2.1 Review of attachment 8491433 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/shared/CodeGenerator-x86-shared.cpp @@ +2239,5 @@ > + Register value = ToRegister(ins->value()); > + FloatRegister output = ToFloatRegister(ins->output()); > + MOZ_ASSERT(vector == output); // defineReuseInput(0) > + > + unsigned component = unsigned(ins->lane()); Style nit: put a blank line here, as a multi-line comment looks prettier with a blank line before it. @@ +2262,5 @@ > +{ > + FloatRegister vector = ToFloatRegister(ins->vector()); > + FloatRegister value = ToFloatRegister(ins->value()); > + FloatRegister output = ToFloatRegister(ins->output()); > + MOZ_ASSERT(vector == output); // defineReuseInput The comment should be defineReuseInput(0), or the comment in visitSimdInsertElementI should be just defineReuseInput, so that they're not spuriously different.
Attachment #8491433 - Flags: review?(sunfish) → review+
Also, I will try to see whether I can do withFlag{X,Y,Z,W} from the With instructions. We can generate it at the MIR level directly, sharing some constants and allowing optimizations in some easy cases. withFlagX(vector, value) is equivalent to the following: var truthy = !!value; var tmp = truthy ? 0xFFFFFFFF : 0x0; return SIMD.int32x4.withX(vector, tmp);
Keywords: leave-open
Attached patch 4) withFlag (obsolete) — Splinter Review
As expected, it could be done during MIR generation.
Attachment #8492190 - Flags: review?(luke)
I think I'm missing some of the backstory of withFlag*. Is there not a way to implement withFlag* in SIMD ops that avoids a branch (and thus a new MIR/LIR node to express withFlag directly)? What would Ion do for withFlag*?
(In reply to Luke Wagner [:luke] from comment #15) > I think I'm missing some of the backstory of withFlag*. Is there not a way > to implement withFlag* in SIMD ops that avoids a branch (and thus a new > MIR/LIR node to express withFlag directly)? What would Ion do for withFlag*? From my look at the Intel x86 dev manual (and according to the two SIMD implementations I have access to), there is no such instruction. Dan, can you please confirm? If such an instruction isn't available, it might be also a good start for a spec discussion (what is the use case?). For Ion, we could do the same: generate the MIR instructions by hand (like in this patch), and have fallible ToInt32x4 / ToInt32 for the arguments.
Flags: needinfo?(sunfish)
(In reply to Benjamin Bouvier [:bbouvier] from comment #17) > (In reply to Luke Wagner [:luke] from comment #15) > > I think I'm missing some of the backstory of withFlag*. Is there not a way > > to implement withFlag* in SIMD ops that avoids a branch (and thus a new > > MIR/LIR node to express withFlag directly)? What would Ion do for withFlag*? > > From my look at the Intel x86 dev manual (and according to the two SIMD > implementations I have access to), there is no such instruction. Dan, can > you please confirm? If such an instruction isn't available, it might be also > a good start for a spec discussion (what is the use case?). For MIR for now, could we do -1+ToInt32(Not(x)), or perhaps better 0-ToInt32(Not(Not(x))? That ought to get us pretty close to x86 code like this: xorl %eax, %eax testl %edi, %edi setne %al negl %eax Another possible sequence is this: negl %eax sbbl %ecx, %ecx but Agner suggests that sbb is not a dependence-breaking instruction on at least Core2 and Nehalem, so this may be slower. > For Ion, we could do the same: generate the MIR instructions by hand (like > in this patch), and have fallible ToInt32x4 / ToInt32 for the arguments. I don't have enough context to comment on this at present.
Flags: needinfo?(sunfish)
Ha, that's indeed better. I've filed [0] on the ecmascript_simd repo, to discuss the interest of such a function. The generated code now looks like this (there is room for improvement here): 0x7ffff7ff5024: test %edi,%edi 0x7ffff7ff5026: sete %al 0x7ffff7ff5029: movzbl %al,%eax 0x7ffff7ff502c: test %eax,%eax 0x7ffff7ff502e: sete %al 0x7ffff7ff5031: movzbl %al,%eax 0x7ffff7ff5034: xor %ecx,%ecx 0x7ffff7ff5036: sub %eax,%ecx [0] https://github.com/johnmccutchan/ecmascript_simd/issues/68
Attached patch 4) withFlagSplinter Review
Considering the size of the patch, it seems acceptable to have it in the tree, even if the instructions get discarded by the spec people later.
Attachment #8492190 - Attachment is obsolete: true
Attachment #8492190 - Flags: review?(luke)
Attachment #8493626 - Flags: review?(luke)
Well, I guess the question I'm getting at is: why is this a SIMD op (in the spec) if it doesn't translate into a SIMD instruction? The two reasons I can think of are: there is a NEON SIMD op; this is a common operation and we can generate more efficient code than you could using the other primitives. Is either of these the case?
Yeah, I just wanted to implement the equivalent of the current spec in asm.js, but if we're waiting on an answer here, it could take some time. I'll open a new bug if needs be.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment on attachment 8493626 [details] [diff] [review] 4) withFlag Cancelling review for now.
Attachment #8493626 - Flags: review?(luke)
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: