Closed
Bug 1025100
Opened 10 years ago
Closed 10 years ago
SIMD backend: implement With
Categories
(Core :: JavaScript Engine: JIT, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: ivan, Assigned: bbouvier)
References
Details
Attachments
(5 files, 3 obsolete files)
3.35 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
14.08 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
15.18 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
6.51 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
11.92 KB,
patch
|
Details | Diff | Splinter Review |
int32x4 withX(integer x)
int32x4 withY(integer y)
int32x4 withZ(integer z)
int32x4 withW(integer w)
float32x4 withX(double x)
float32x4 withY(double y)
float32x4 withZ(double z)
float32x4 withW(double w)
Updated•10 years ago
|
Assignee: nobody → ivan
Comment 1•10 years ago
|
||
For consistency with the new naming in bug 1021716 (ExtractElement), it seems code for these should probably use InsertElement for its names.
Assignee | ||
Comment 2•10 years ago
|
||
Assignee: ivan → benj
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8490156 -
Flags: review?(sunfish)
Assignee | ||
Comment 3•10 years ago
|
||
Naive strategy:
- spill the entire SIMD register datum onto the stack
- replace the value on the stack
- read the entire SIMD register back
I just discovered INSERTPS for float32x4 if we have SSE4.1, so expect another patch soon :)
Attachment #8490173 -
Flags: review?(sunfish)
Assignee | ||
Comment 4•10 years ago
|
||
Attachment #8490174 -
Flags: review?(luke)
Updated•10 years ago
|
Attachment #8490156 -
Flags: review?(sunfish) → review+
Comment 5•10 years ago
|
||
Comment on attachment 8490174 [details] [diff] [review]
2) Odin support
Review of attachment 8490174 [details] [diff] [review]:
-----------------------------------------------------------------
Nice.
::: js/src/asmjs/AsmJSValidate.cpp
@@ +4847,5 @@
> + public:
> + explicit CheckSimdVectorScalarArgs(Type t) : formalType_(t) {}
> +
> + bool operator()(FunctionCompiler &f, ParseNode *arg, unsigned argIndex, Type actualType,
> + MDefinition **argDef) const
Pre-existing: with the make-float32x4-less-coercive patch, do we still need to pass argDef?
Attachment #8490174 -
Flags: review?(luke) → review+
Comment 6•10 years ago
|
||
Comment on attachment 8490173 [details] [diff] [review]
1) SIMD with
Review of attachment 8490173 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/Lowering.cpp
@@ +3725,5 @@
> +{
> + JS_ASSERT(IsSimdType(ins->type()));
> +
> + LUse vec = useRegisterAtStart(ins->vector());
> + LUse val = useRegisterAtStart(ins->value());
As with the other patch, this should be useRegister, since it's the non-reused input in a defineReuse instruction.
::: js/src/jit/MIR.h
@@ +1413,5 @@
> +// Replaces the datum in the given lane by a scalar value of the same type.
> +class MSimdInsertElement : public MBinaryInstruction
> +{
> + protected:
> + SimdLane lane_;
This could be private rather than protected, since subclasses use the lane() accessor.
::: js/src/jit/shared/Assembler-x86-shared.h
@@ +529,4 @@
> // register-to-register form has different semantics (it doesn't clobber
> // the whole output register) and isn't needed currently.
> + // The same comment should be applied to movss, if not for moving the lowest
> + // lane of a float32 register to a float32x4 register.
Instead of saying that a comment should be applied elsewhere, let's just put the comment there (by movss) :-).
::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2220,5 @@
> SimdLane lane = ins->lane();
> if (lane == LaneX) {
> // The value we want to extract is in the low double-word
> if (input != output)
> + masm.movss(input, output);
For a SIMD extract, movss is suboptimal because we don't care about the high elements, so we'd prefer to clobber them and break dependencies on earlier instructions. moveFloat32 uses movaps, which is what we want here.
@@ +2241,5 @@
> + MOZ_ASSERT(vector == output); // defineReuseInput(0)
> +
> + unsigned component = 0;
> + switch (ins->lane()) {
> + case SimdLane::LaneX: break;
It might be nice to have a comment somewhere around here mentioning that we can't use movd because it zeros the other elements.
@@ +2245,5 @@
> + case SimdLane::LaneX: break;
> + case SimdLane::LaneY: component = 1; break;
> + case SimdLane::LaneZ: component = 2; break;
> + case SimdLane::LaneW: component = 3; break;
> + }
The SimdLane enum values are the same as the component values here, and they seem unlikely to change. Can we just do "unsigned component = (unsigned)ins->lane()"?
@@ +2256,5 @@
> + masm.reserveStack(Simd128DataSize);
> + masm.storeAlignedInt32x4(vector, Address(StackPointer, 0));
> + masm.store32(value, Address(StackPointer, component * sizeof(int32_t)));
> + masm.loadAlignedInt32x4(Address(StackPointer, 0), output);
> + masm.freeStack(Simd128DataSize);
This is doable without using memory by using a temporary register and shuffle instructions, but it's not urgent to do this now.
@@ +2285,5 @@
> + masm.reserveStack(Simd128DataSize);
> + masm.storeAlignedFloat32x4(vector, Address(StackPointer, 0));
> + masm.storeFloat32(value, Address(StackPointer, component * sizeof(int32_t)));
> + masm.loadAlignedFloat32x4(Address(StackPointer, 0), output);
> + masm.freeStack(Simd128DataSize);
The patch to make this use insertps on SSE4.1 is greatly anticipated :-).
On pre-SSE4.1 this is doable with temporaries and shuffles, but it's not urgent.
Attachment #8490173 -
Flags: review?(sunfish)
Assignee | ||
Comment 7•10 years ago
|
||
Attachment #8491425 -
Flags: review?(sunfish)
Assignee | ||
Updated•10 years ago
|
Attachment #8490173 -
Attachment is obsolete: true
Assignee | ||
Comment 8•10 years ago
|
||
Changes: slightly different control flow in float32 codegen, to also use the unsigned(ins->lane()) way.
Attachment #8491425 -
Attachment is obsolete: true
Attachment #8491425 -
Flags: review?(sunfish)
Attachment #8491433 -
Flags: review?(sunfish)
Assignee | ||
Comment 9•10 years ago
|
||
So I've tried a simple API to create the mask. As it's pretty specific to insertps, and thus to SSE4.1, I've put the method in Assembler-x86-shared.cpp (hesitated between this one and the MacroAssembler).
Attachment #8491450 -
Flags: review?(sunfish)
Comment 10•10 years ago
|
||
Comment on attachment 8491450 [details] [diff] [review]
3) Add INSERTPS and use it whenever possible
Review of attachment 8491450 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/shared/Assembler-x86-shared.h
@@ +1969,5 @@
> + unsigned ret = 0;
> + if (zero) {
> + for (size_t i = 0; i < 4; i++)
> + ret |= unsigned(zero[i]) << i;
> + }
The zero argument isn't used anywhere yet, but I imagine its use would look something like this:
insertpsMask(src, dst, (bool[]){ true, false, true, false });
or maybe:
bool zero[4] = {};
zero[LaneX] = true;
zero[LaneY] = true;
insertpsMask(src, dst, zero);
I guess it's a matter of preference, but I'd find an API like this more readable:
insertpsMask(src, dst, (1 << LaneX) | (1 << LaneZ));
As a bonus, it'd be simpler to implement. Feel free to disagree if you feel strongly about it though.
Attachment #8491450 -
Flags: review?(sunfish) → review+
Comment 11•10 years ago
|
||
Comment on attachment 8491433 [details] [diff] [review]
1) SIMD with, v2.1
Review of attachment 8491433 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2239,5 @@
> + Register value = ToRegister(ins->value());
> + FloatRegister output = ToFloatRegister(ins->output());
> + MOZ_ASSERT(vector == output); // defineReuseInput(0)
> +
> + unsigned component = unsigned(ins->lane());
Style nit: put a blank line here, as a multi-line comment looks prettier with a blank line before it.
@@ +2262,5 @@
> +{
> + FloatRegister vector = ToFloatRegister(ins->vector());
> + FloatRegister value = ToFloatRegister(ins->value());
> + FloatRegister output = ToFloatRegister(ins->output());
> + MOZ_ASSERT(vector == output); // defineReuseInput
The comment should be defineReuseInput(0), or the comment in visitSimdInsertElementI should be just defineReuseInput, so that they're not spuriously different.
Attachment #8491433 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 12•10 years ago
|
||
I prefer the unsigned way as well in insertpsMask signature, thanks!
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/70698c31fca1
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d418a4d0f8d5
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/02e8c6942c85
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/4df302f6b719
Assignee | ||
Comment 13•10 years ago
|
||
Also, I will try to see whether I can do withFlag{X,Y,Z,W} from the With instructions. We can generate it at the MIR level directly, sharing some constants and allowing optimizations in some easy cases.
withFlagX(vector, value) is equivalent to the following:
var truthy = !!value;
var tmp = truthy ? 0xFFFFFFFF : 0x0;
return SIMD.int32x4.withX(vector, tmp);
Keywords: leave-open
Assignee | ||
Comment 14•10 years ago
|
||
As expected, it could be done during MIR generation.
Attachment #8492190 -
Flags: review?(luke)
Comment 15•10 years ago
|
||
I think I'm missing some of the backstory of withFlag*. Is there not a way to implement withFlag* in SIMD ops that avoids a branch (and thus a new MIR/LIR node to express withFlag directly)? What would Ion do for withFlag*?
Comment 16•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/70698c31fca1
https://hg.mozilla.org/mozilla-central/rev/d418a4d0f8d5
https://hg.mozilla.org/mozilla-central/rev/02e8c6942c85
https://hg.mozilla.org/mozilla-central/rev/4df302f6b719
Flags: in-testsuite+
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #15)
> I think I'm missing some of the backstory of withFlag*. Is there not a way
> to implement withFlag* in SIMD ops that avoids a branch (and thus a new
> MIR/LIR node to express withFlag directly)? What would Ion do for withFlag*?
From my look at the Intel x86 dev manual (and according to the two SIMD implementations I have access to), there is no such instruction. Dan, can you please confirm? If such an instruction isn't available, it might be also a good start for a spec discussion (what is the use case?).
For Ion, we could do the same: generate the MIR instructions by hand (like in this patch), and have fallible ToInt32x4 / ToInt32 for the arguments.
Flags: needinfo?(sunfish)
Comment 18•10 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #17)
> (In reply to Luke Wagner [:luke] from comment #15)
> > I think I'm missing some of the backstory of withFlag*. Is there not a way
> > to implement withFlag* in SIMD ops that avoids a branch (and thus a new
> > MIR/LIR node to express withFlag directly)? What would Ion do for withFlag*?
>
> From my look at the Intel x86 dev manual (and according to the two SIMD
> implementations I have access to), there is no such instruction. Dan, can
> you please confirm? If such an instruction isn't available, it might be also
> a good start for a spec discussion (what is the use case?).
For MIR for now, could we do -1+ToInt32(Not(x)), or perhaps better 0-ToInt32(Not(Not(x))?
That ought to get us pretty close to x86 code like this:
xorl %eax, %eax
testl %edi, %edi
setne %al
negl %eax
Another possible sequence is this:
negl %eax
sbbl %ecx, %ecx
but Agner suggests that sbb is not a dependence-breaking instruction on at least Core2 and Nehalem, so this may be slower.
> For Ion, we could do the same: generate the MIR instructions by hand (like
> in this patch), and have fallible ToInt32x4 / ToInt32 for the arguments.
I don't have enough context to comment on this at present.
Flags: needinfo?(sunfish)
Assignee | ||
Comment 19•10 years ago
|
||
Ha, that's indeed better. I've filed [0] on the ecmascript_simd repo, to discuss the interest of such a function.
The generated code now looks like this (there is room for improvement here):
0x7ffff7ff5024: test %edi,%edi
0x7ffff7ff5026: sete %al
0x7ffff7ff5029: movzbl %al,%eax
0x7ffff7ff502c: test %eax,%eax
0x7ffff7ff502e: sete %al
0x7ffff7ff5031: movzbl %al,%eax
0x7ffff7ff5034: xor %ecx,%ecx
0x7ffff7ff5036: sub %eax,%ecx
[0] https://github.com/johnmccutchan/ecmascript_simd/issues/68
Assignee | ||
Comment 20•10 years ago
|
||
Considering the size of the patch, it seems acceptable to have it in the tree, even if the instructions get discarded by the spec people later.
Attachment #8492190 -
Attachment is obsolete: true
Attachment #8492190 -
Flags: review?(luke)
Attachment #8493626 -
Flags: review?(luke)
Comment 21•10 years ago
|
||
Well, I guess the question I'm getting at is: why is this a SIMD op (in the spec) if it doesn't translate into a SIMD instruction? The two reasons I can think of are: there is a NEON SIMD op; this is a common operation and we can generate more efficient code than you could using the other primitives. Is either of these the case?
Comment 22•10 years ago
|
||
(I guess we'll wait on an answer to https://github.com/johnmccutchan/ecmascript_simd/issues/68 ?)
Assignee | ||
Comment 23•10 years ago
|
||
Yeah, I just wanted to implement the equivalent of the current spec in asm.js, but if we're waiting on an answer here, it could take some time. I'll open a new bug if needs be.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 24•10 years ago
|
||
Comment on attachment 8493626 [details] [diff] [review]
4) withFlag
Cancelling review for now.
Attachment #8493626 -
Flags: review?(luke)
Comment 25•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•