Closed Bug 1096684 Opened 5 years ago Closed 5 years ago

Optimize SIMD shuffle and swizzle sequences with SSE3 and SSE4.1

Categories

(Core :: JavaScript Engine: JIT, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: sunfish, Assigned: sunfish)

References

Details

Attachments

(4 files)

SSE3 and SSE4.1 provide movsldup, movshdup, insertps, blendps, and pextrd, which are advantageous in SIMD shuffle, insert, and extract sequences.
Attached patch extraname.patchSplinter Review
This patch implements extraName() for the LIR extract and insert element classes, for more readable debug output.
Attachment #8520298 - Flags: review?(benj)
This patch optimizes shuffles with insertps and blendps.
Attachment #8520301 - Flags: review?(benj)
Attached patch pextrd.patchSplinter Review
This patch adds the pextrd instruction and optimizes extract-element and boxDouble on x86 (where unboxDouble already uses pinsrd).
Attachment #8520303 - Flags: review?(benj)
This patch adds movsldup and movshdup and uses them for swizzles.
Attachment #8520305 - Flags: review?(benj)
Comment on attachment 8520298 [details] [diff] [review]
extraname.patch

Review of attachment 8520298 [details] [diff] [review]:
-----------------------------------------------------------------

Cool.

::: js/src/jit/LIR-Common.h
@@ +162,5 @@
> +          case LaneX: return "lane x";
> +          case LaneY: return "lane y";
> +          case LaneZ: return "lane z";
> +          case LaneW: return "lane w";
> +          default: return "unknown lane";

can you please remove the default case, such that the compiler complains if one adds a member to SimdLane and forgets to add the lane name here?

@@ +211,5 @@
> +          case LaneX: return "lane x";
> +          case LaneY: return "lane y";
> +          case LaneZ: return "lane z";
> +          case LaneW: return "lane w";
> +          default: return "unknown lane";

ditto
Attachment #8520298 - Flags: review?(benj) → review+
Comment on attachment 8520301 [details] [diff] [review]
insertps-blendps.patch

Review of attachment 8520301 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, r+ with tests.

::: js/src/jit/shared/BaseAssembler-x86-shared.h
@@ +379,5 @@
>  
>      typedef enum {
>          OP3_ROUNDSS_VsdWsd  = 0x0A,
>          OP3_ROUNDSD_VsdWsd  = 0x0B,
> +        OP3_BLENDPS_VpdWpdIb = 0x0C,

nit: should be OP3_BLENDPS_VpsWpsIb as (legit) operands are packed singles

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2476,5 @@
>      MOZ_ASSERT(numLanesFromLHS < 4);
>  
> +    // If all values stay in their lane, this is a blend.
> +    if (AssemblerX86Shared::HasSSE41()) {
> +        if (x % 4 == 0 && y % 4 == 1 && z % 4 == 2 && w % 4 == 3) {

Could you add tests in jit-tests/tests/asm.js/testSIMD.js for this case and the ones below, below the line commented as:

// Impl-specific special cases for shuffle (case and swapped case)
Attachment #8520301 - Flags: review?(benj) → review+
Comment on attachment 8520303 [details] [diff] [review]
pextrd.patch

Review of attachment 8520303 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/jit/shared/BaseAssembler-x86-shared.h
@@ +3728,5 @@
>  
> +    void pextrd_irr(unsigned lane, XMMRegisterID src, RegisterID dst)
> +    {
> +        MOZ_ASSERT(lane < 4);
> +        spew("pextrd     $%u, %s, %s", lane, nameFPReg(src), nameIReg(4, dst));

could be $%x for symmetry with pinsrd_irr

@@ +3737,5 @@
> +
> +    void pextrd_imr(unsigned lane, XMMRegisterID src, int offset, RegisterID base)
> +    {
> +        MOZ_ASSERT(lane < 4);
> +        spew("pextrd     $%u, %s, %s0x%x(%s)", lane, nameFPReg(src),

ditto
Attachment #8520303 - Flags: review?(benj) → review+
Comment on attachment 8520305 [details] [diff] [review]
movsldup-movshdup.patch

Review of attachment 8520305 [details] [diff] [review]:
-----------------------------------------------------------------

Cool!

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2416,5 @@
>      uint32_t z = ins->laneZ();
>      uint32_t w = ins->laneW();
>  
> +    if (AssemblerX86Shared::HasSSE3()) {
> +        if (ins->lanesMatch(0, 0, 2, 2)) {

Can you add tests for these two specific patterns in testSIMD.js as well please?
Attachment #8520305 - Flags: review?(benj) → review+
Depends on: 1167025
You need to log in before you can comment on or make changes to this bug.