Closed Bug 1135039 Opened 9 years ago Closed 9 years ago

SIMD: Generalize shuffles/swizzles for Ion

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(7 files, 3 obsolete files)

Currently, our SIMD shuffles / swizzles MIR nodes expect constant patterns (masks?), so they need to generalized before being inlinable within Ion.
Assignee: nobody → benj
Attached patch 1.polyfill.patch (obsolete) — Splinter Review
This will make the interpreter behave more like the polyfill: if the lane index is out of bounds ([0..3] for float32x4/int32x4, [0..1] for float64x2), just load the coerced undefined value in it.  Might be not intended in the polyfill (see also [0]), but it's actually pretty convenient to implement in the JITs...

[0] https://github.com/johnmccutchan/ecmascript_simd/issues/111#issuecomment-75968636
Attachment #8569307 - Flags: review?(sunfish)
Attached patch swizzle.patch (obsolete) — Splinter Review
This patch implements a generalized swizzle for int32x4 and float32x4, from inlining to codegen on x86. Implementation does the following:
- reserve stack space for 2 vectors
- spill the input vector onto the stack
- load each value addressed by the lane index into a temp register
- spill this temp register at its final position in the output vector (onto the stack)
Not really efficient (with sse4.1, we could probably insert the value directly into the lane), but It Works (tm).
Attachment #8569310 - Flags: feedback?(sunfish)
Comment on attachment 8569310 [details] [diff] [review]
swizzle.patch

Review of attachment 8569310 [details] [diff] [review]:
-----------------------------------------------------------------

Nice and straight-forward.

::: js/src/jit-test/tests/SIMD/swizzle.js
@@ +37,5 @@
> +        var r = SIMD.int32x4.swizzle(i4, i % 4, (i + 1) % 4, (i + 2) % 4, (i + 3) % 4);
> +        assertEqX4(r, compI[i % 4]);
> +
> +        // Out of bounds variables lanes
> +        assertEqX4(SIMD.int32x4.swizzle(i4, i + 4, 3, 2, 1), [0, 4, 3, 2]);

It'd be good to test negative lane indices too.

::: js/src/jit/MIR.h
@@ +1821,5 @@
>  
>      ALLOW_CLONE(MSimdSwizzle)
>  };
>  
> +class MSimdGeneralSwizzle :

Please add a brief comment here describing what a "general" swizzle is.

::: js/src/jit/TypePolicy.h
@@ +331,5 @@
>  };
>  
> +class SimdSwizzlePolicy MOZ_FINAL : public TypePolicy
> +{
> +    public:

Style nit: this should be at 2 spaces according to SpiderMonkey style

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2363,5 @@
> +{
> +    FloatRegister input = ToFloatRegister(ins->base());
> +    Register temp = ToRegister(ins->temp());
> +
> +    masm.reserveStack(Simd128DataSize * 2);

It might be nice to have a comment somewhere around here mentioning that this isn't going to generate amazingly fast code, but that it's ok because it's expected that users will use constant swizzle indices.

@@ +2374,5 @@
> +        Label loadZero, go, join;
> +        masm.cmp32(lane, Imm32(0));
> +        masm.j(Assembler::LessThan, &loadZero);
> +        masm.cmp32(lane, Imm32(4));
> +        masm.j(Assembler::LessThan, &go);

An unsigned comparison would allow this to be done in a single branch. Not that performance is a big concern here, but it would make the code a little simpler.

@@ +2416,5 @@
> +        Label loadNaN, go, join;
> +        masm.cmp32(lane, Imm32(0));
> +        masm.j(Assembler::LessThan, &loadNaN);
> +        masm.cmp32(lane, Imm32(4));
> +        masm.j(Assembler::LessThan, &go);

Ditto about unsigned comparison.

@@ +2419,5 @@
> +        masm.cmp32(lane, Imm32(4));
> +        masm.j(Assembler::LessThan, &go);
> +
> +        masm.bind(&loadNaN);
> +        masm.loadConstantFloat32(0.f, ScratchFloat32Reg);

This is loading zero, rather than NaN.
Attachment #8569310 - Flags: feedback?(sunfish) → feedback+
Comment on attachment 8569307 [details] [diff] [review]
1.polyfill.patch

Review of attachment 8569307 [details] [diff] [review]:
-----------------------------------------------------------------

The current polyfill behavior isn't really intentional. I'm inclined to wait for resolution on the above-mentioned #111 before making changes here, but I could be convinced otherwise.
Attachment #8569307 - Flags: review?(sunfish)
Attached patch swizzle.patchSplinter Review
Thanks for the feedback! Switching to review, even if it will introduce new differences between the JIT and the interpreter. My hope is to get this landed by next week, so that SIMD in Ion can be demo'ed at GDC (otherwise, it's fine if it's not, because we can just have a try-build and demo it at GDC, but well, having it on nightlies would make more impact!).
Attachment #8569310 - Attachment is obsolete: true
Attachment #8569813 - Flags: review?(sunfish)
Note to self, I'll also implement shuffle later, as it's not used in any of the SIMD demos...
Keywords: leave-open
Comment on attachment 8569813 [details] [diff] [review]
swizzle.patch

Review of attachment 8569813 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +933,5 @@
> +MSimdGeneralSwizzle::foldsTo(TempAllocator &alloc)
> +{
> +    int32_t lanes[4];
> +    for (size_t i = 0; i < 4; i++) {
> +        if (!lane(i)->isConstant() || !lane(i)->type() == MIRType_Int32)

The right side there should be lane(i)->type() != MIRType_Int32.

@@ +936,5 @@
> +    for (size_t i = 0; i < 4; i++) {
> +        if (!lane(i)->isConstant() || !lane(i)->type() == MIRType_Int32)
> +            return this;
> +        lanes[i] = lane(i)->toConstant()->value().toInt32();
> +        if (lanes[i] < 0 || lanes[i] > 4)

The right side here should be >= 4.
Attachment #8569813 - Flags: review?(sunfish) → review+
Thanks for the review! Unfortunately, I realized I can't land it without the interpreter patch, because on ARM where we don't have SIMD in Ion, the code stays in the interpreter and thus the test doesn't pass (difference of behavior between engines... it's pretty bad).

Do you prefer me to rework the patch so as to throw when a lane index is out of bounds, or should we check in the first patch (at least, temporarily, until the spec polyfill gets updated)?
Flags: needinfo?(sunfish)
(In reply to Benjamin Bouvier [:bbouvier] from comment #8)
> Do you prefer me to rework the patch so as to throw when a lane index is out
> of bounds, or should we check in the first patch (at least, temporarily,
> until the spec polyfill gets updated)?

Throwing when a lane out of bounds sounds to me like desirable semantics. I just commented on issue #111 to this effect. I'd be ok implementing that here for now, and may even hope that we can keep it that way.
Flags: needinfo?(sunfish)
Attached patch 2.jit.patchSplinter Review
And the patch for swizzle in the JITs (note that i somehow forgot to add inlining of SIMD.float32x4.swizzle in the previous patch, shame on me!).
Attachment #8569307 - Attachment is obsolete: true
Attachment #8573222 - Flags: review?(sunfish)
Attached patch 3.shuffle.patch (obsolete) — Splinter Review
So while looking at the codegen for swizzle, i realized that the same codegen could be used for shuffles, and for other vector types as well, so I've tried to generalize the MIR node as much as possible. Codegen is ready to support int8x16 and int16x8 and float64x2.

This means that LSimdShuffleBase becomes variadic. If the approach looks fine, I could create a LVariadicInstruction and have LSimdShuffleBase inherit from it, so that LVariadicInstruction can be reused for other operations...

Another solution was to just have a fixed amount of operands, which would be an upper bounds, but that would mean this upper bound would need to be 18 (for SIMD.int8x16.swizzle). Thoughts?
Attachment #8573350 - Flags: feedback?(sunfish)
Attachment #8573350 - Flags: feedback?(jdemooij)
Comment on attachment 8573350 [details] [diff] [review]
3.shuffle.patch

Review of attachment 8573350 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I'm overloaded with reviews and needinfos these weeks, so I'll leave this to sunfish. He's also more familiar with the SIMD instructions here.
Attachment #8573350 - Flags: feedback?(jdemooij)
Attachment #8573222 - Flags: review?(sunfish) → review+
Comment on attachment 8573218 [details] [diff] [review]
1.throw-oob.patch

Review of attachment 8573218 [details] [diff] [review]:
-----------------------------------------------------------------

yay!
Attachment #8573218 - Flags: review?(sunfish) → review+
Also WinXP jit-test timeouts. Looking at your last Try push, all of these issues were present there as well. Please verify that you're green on Try before pushing in the future.
Flags: needinfo?(benj)
Comment on attachment 8573350 [details] [diff] [review]
3.shuffle.patch

Review of attachment 8573350 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Benjamin Bouvier [:bbouvier] from comment #14)
> Created attachment 8573350 [details] [diff] [review]
> 3.shuffle.patch
> 
> So while looking at the codegen for swizzle, i realized that the same
> codegen could be used for shuffles, and for other vector types as well, so
> I've tried to generalize the MIR node as much as possible. Codegen is ready
> to support int8x16 and int16x8 and float64x2.

Great!

> This means that LSimdShuffleBase becomes variadic. If the approach looks
> fine, I could create a LVariadicInstruction and have LSimdShuffleBase
> inherit from it, so that LVariadicInstruction can be reused for other
> operations...

Creating an LVariadicInstruction sounds nice, to keep the mundane operand-accessor kind of code somewhat separate from the interesting operation-specific code, even if we only have one subclass of LVariadicInstruction.

> Another solution was to just have a fixed amount of operands, which would be
> an upper bounds, but that would mean this upper bound would need to be 18
> (for SIMD.int8x16.swizzle). Thoughts?

I think variadic things are fine here. We mainly want to keep the code simple, since this is code that we hope won't be used much anyway.

::: js/src/jit/LIR-Common.h
@@ +376,5 @@
>  
> +class LSimdGeneralSwizzleBase : public LInstruction
> +{
> +    LDefinition def_;
> +    mozilla::Vector<LAllocation> operands_;

You can use the FixedList class here instead of Vector.
Attachment #8573350 - Flags: feedback?(sunfish) → feedback+
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #19)
> Also WinXP jit-test timeouts. Looking at your last Try push, all of these
> issues were present there as well. Please verify that you're green on Try
> before pushing in the future.

Sorry for being an idiot... I've somehow lost track of this try push, so there are a few things I need to enhance in my bugzilla workflow...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9dcc46796f42
Here we go!
Attachment #8573350 - Attachment is obsolete: true
Attachment #8576657 - Flags: review?(sunfish)
Attached patch 4. ShuffleSplinter Review
Attachment #8576658 - Flags: review?(sunfish)
Totally not mandatory as well
Attachment #8576661 - Flags: review?(sunfish)
And a try build for this new set of patches as well...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9161b0d10d3f
Comment on attachment 8576658 [details] [diff] [review]
4. Shuffle

Review of attachment 8576658 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR-Common.h
@@ +376,5 @@
>  
> +class LSimdGeneralShuffleBase : public LVariadicInstruction<1, 1>
> +{
> +  public:
> +    LSimdGeneralShuffleBase(const LDefinition &temp) {

(fixed locally: made this one and the two other ones below explicit)
Comment on attachment 8576657 [details] [diff] [review]
3. LVariadicInstruction

Review of attachment 8576657 [details] [diff] [review]:
-----------------------------------------------------------------

Cool!
Attachment #8576657 - Flags: review?(sunfish) → review+
Comment on attachment 8576658 [details] [diff] [review]
4. Shuffle

Review of attachment 8576658 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: js/src/jit/MIR.cpp
@@ +950,5 @@
>  
>  MDefinition *
> +MSimdGeneralShuffle::foldsTo(TempAllocator &alloc)
> +{
> +    mozilla::Vector<uint32_t> lanes;

It'd be slightly nicer to use FixedList here, since we know the size up front.

@@ +963,2 @@
>              return this;
> +        lanes.infallibleAppend(uint32_t(temp));

so this becomes lanes[i] = uint32_t(temp);
Attachment #8576658 - Flags: review?(sunfish) → review+
Comment on attachment 8576659 [details] [diff] [review]
5. bonus: Factor LInstructionHelper/LVariadicInstruction code

Review of attachment 8576659 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. The details namespace pattern is probably something we should use more :).
Attachment #8576659 - Flags: review?(sunfish) → review+
Attachment #8576661 - Flags: review?(sunfish) → review+
Flags: needinfo?(benj)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: