Closed Bug 1060789 Opened 11 years ago Closed 11 years ago

Odin SIMD: Implement splat

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla35

People

(Reporter: dougc, Assigned: dougc)

References

Details

Attachments

(3 files, 3 obsolete files)

Implement splat. 11 years ago Douglas Crosher [:dougc] 14.16 KB, patch	sunfish : feedback+	Details \| Diff \| Splinter Review
Add 'splat' backend support and optimize constructors to use this. 11 years ago Douglas Crosher [:dougc] 19.32 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
Add 'splat' backend support and optimize constructors to use this. 11 years ago Douglas Crosher [:dougc] 19.35 KB, patch	dougc : review+	Details \| Diff \| Splinter Review
bug1060789-4.patch 11 years ago Benjamin Bouvier [:bbouvier] (inactive) 15.54 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1060789-odin.patch 11 years ago Benjamin Bouvier [:bbouvier] (inactive) 3.95 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
bug1060789-splat-support.patch 11 years ago Benjamin Bouvier [:bbouvier] (inactive) 13.92 KB, patch	luke : review+	Details \| Diff \| Splinter Review

Douglas Crosher [:dougc]

Assignee

Description

•

11 years ago

No description provided.

Douglas Crosher [:dougc]

Assignee

Updated

•

11 years ago

Blocks: 1023404

Douglas Crosher [:dougc]

Assignee

Comment 1

•

11 years ago

Attached patch Implement splat. (obsolete) — Details — Splinter Review

This patch optimizes a simd x4 ctor with all plane arguments the same to use the shufps instruction. A challenge using this instruction for splat is that the source and destination need to be in the same register. For a float32x4 the patch uses defineReuseInput() for this, and I hope this will work as intended give the input is a float32 and the output a float32x4? It was not clear if a new MIR class should have been created, or SimdValueX4 reused with all arguments the same? The patch does not yet define splat, just optimizes the ctor. Anyway this was adequate to confirm that this optimization gives a good performance improvement for the Flappy Bird demo. The SIMD version is now 75% fast that the non-SIMD version.

Attachment #8481797 - Flags: feedback?(sunfish)

Dan Gohman [:sunfish]

Comment 2

•

11 years ago

Comment on attachment 8481797 [details] [diff] [review] Implement splat. Review of attachment 8481797 [details] [diff] [review]: ----------------------------------------------------------------- This looks reasonable to me, off the top of my head. A dedicated class for Splat rather than making it a special-case of MSimdValueX4 sounds good, as I think splats will be common enough. ::: js/src/jit/Lowering.cpp @@ +3700,5 @@ > + switch (ins->type()) { > + case MIRType_Int32x4: > + return define(lir, ins); > + case MIRType_Float32x4: > + return defineReuseInput(lir, ins, 0); The defineReuseInput here looks right for x86/x64 using shufps, so the only problem here is that this is target-independent code here. I think splats will have to be lowered in */Lowering-*.cpp.

Attachment #8481797 - Flags: feedback?(sunfish) → feedback+

Douglas Crosher [:dougc]

Assignee

Comment 3

•

11 years ago

Attached patch Add 'splat' backend support and optimize constructors to use this. (obsolete) — Details — Splinter Review

Thank you for the quick feedback. * Moved the Lowering into the backends. * Added a couple of tests. Shall implement the Odin 'slat' support in a follow up patch. Optimizing the constructor is enough for the demos, and we need to land this asap.

Assignee: nobody → dtc-moz

Attachment #8481797 - Attachment is obsolete: true

Attachment #8481923 - Flags: review?(sunfish)

Douglas Crosher [:dougc]

Assignee

Comment 4

•

11 years ago

Please leave open to add the Odin 'splat' support.

Keywords: leave-open

Dan Gohman [:sunfish]

Comment 5

•

11 years ago

Comment on attachment 8481923 [details] [diff] [review] Add 'splat' backend support and optimize constructors to use this. Review of attachment 8481923 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/shared/CodeGenerator-x86-shared.cpp @@ +2165,5 @@ > + switch (mir->type()) { > + case MIRType_Int32x4: { > + Register r = ToRegister(ins->getOperand(0)); > + masm.movd(r, output); > + masm.shufps(0, output, output); int32x4 splat should use pshufd instead of shufps to avoid a domain crossing penalty. pshufd(0, output, output). @@ +2170,5 @@ > + break; > + } > + case MIRType_Float32x4: { > + FloatRegister r = ToFloatRegister(ins->getOperand(0)); > + masm.shufps(0, r, output); You can assert r == output here, to sanity-check that the defineReusedInput did its job.

Attachment #8481923 - Flags: review?(sunfish) → review+

Douglas Crosher [:dougc]

Assignee

Comment 6

•

11 years ago

Attached patch Add 'splat' backend support and optimize constructors to use this. (obsolete) — Details — Splinter Review

Thank you for the quick review, and your help. This patch integrates the feedback. Carrying forward the r+.

Attachment #8481923 - Attachment is obsolete: true

Attachment #8481933 - Flags: review+

Benjamin Bouvier [:bbouvier] (inactive)

Comment 7

•

11 years ago

Attached patch bug1060789-4.patch — Details — Splinter Review

Just splitting the patch into two parts: implementation of splat vs use it in Odin

Attachment #8481933 - Attachment is obsolete: true

Attachment #8482263 - Flags: review+

Benjamin Bouvier [:bbouvier] (inactive)

Comment 8

•

11 years ago

Attached patch bug1060789-odin.patch — Details — Splinter Review

Attachment #8482264 - Flags: review+

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

11 years ago

Status: NEW → ASSIGNED

Benjamin Bouvier [:bbouvier] (inactive)

Comment 9

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a65fbd070a58 https://hg.mozilla.org/integration/mozilla-inbound/rev/e28ec487d050

Phil Ringnalda (:philor)

Comment 10

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/a65fbd070a58 https://hg.mozilla.org/mozilla-central/rev/e28ec487d050

Benjamin Bouvier [:bbouvier] (inactive)

Comment 11

•

11 years ago

Attached patch bug1060789-splat-support.patch — Details — Splinter Review

Adds proper support for int32x4.splat and float32x4 in Odin. Semantics: - int32x4.splat(x) === int32x4(x, x, x, x); - same rules as the tuple constructors: int32x4.splat accepts intish and float32x4.splat accepts signed, unsigned, double?, floatish.

Attachment #8482610 - Flags: review?(luke)

Douglas Crosher [:dougc]

Assignee

Comment 12

•

11 years ago

We might have a problem here, see the unaligned movaps storing to the stack: 0x00007f2629a030a8: cvtsd2ss %xmm0,%xmm1 0x00007f2629a030ac: shufps $0x0,%xmm1,%xmm1 0x00007f2629a030b0: movss 0x4(%rsp),%xmm0 0x00007f2629a030b6: shufps $0x0,%xmm0,%xmm0 0x00007f2629a030ba: movaps %xmm0,%xmm2 0x00007f2629a030bd: movaps %xmm0,0x4(%rsp) <<<<<<<<<< I might have missed some important point wrt register targeting etc.

Luke Wagner [:luke]

Comment 13

•

11 years ago

Comment on attachment 8482610 [details] [diff] [review] bug1060789-splat-support.patch Review of attachment 8482610 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/AsmJSValidate.cpp @@ +4720,5 @@ > + return false; > + break; > + } > + > + *type = global->simdOperationType(); Perhaps add a JS_ASSERT(global->simdOperationOp() == Splat) (with \n before and after)? Alternatively, I wonder if we could generalize CheckBinarySimd+CheckUnarySimd into an n-ary CheckSimdCall.

Attachment #8482610 - Flags: review?(luke) → review+

Douglas Crosher [:dougc]

Assignee

Comment 14

•

11 years ago

Here's some of the close debug dump. Note the blendvps instruction uses the xmm0 register as an input and this can be seen to cause a spill below that tickles this problem. [RegAlloc] Allocating v8[1] [20,33) [priority 13] [weight 0] [RegAlloc] Hint xmm0, used by group allocation [RegAlloc] xmm0 collides with fixed use [23,24) [RegAlloc] Spilling interval [RegAlloc] Allocating spill location stack:12 <<< stack allocated here [RegAlloc] Allocating v1[1] [4,15) [priority 11] [weight 0] [RegAlloc] Hint xmm0, used by group allocation [RegAlloc] xmm0 collides with fixed use [11,12) [RegAlloc] Spilling interval [RegAlloc] Reusing group spill location stack:12 ... [RegAlloc] Allocating v15[1] [34,192) v15:r?@34 v15:r?@34 v15:r?@137 v15:r?@143 [priority 158] [weight 25] [RegAlloc] Hint xmm0, used by group allocation [RegAlloc] xmm0 collides with v68[0] [147,151) [weight 1000] [RegAlloc] Spilling interval [RegAlloc] Reusing group spill location stack:12 v1[0] req(xmm0) has(xmm0) [3,4) / v1[1] req(xmm0?) has(stack:12) [4,15) / v1[2] req(xmm0?) has(xmm0) [14,15) v1:r?@14 ... v8[0] req(r,xmm0?) has(xmm0) [19,20) / v8[1] req(xmm0?) has(stack:12) [20,33) / v8[2] req(xmm0?) has(xmm0) [32,33) v8:r?@32 ... v15[0] req(r,xmm0?) has(xmm0) [33,34) / v15[1] req(xmm0?) has(stack:12) [34,192) v15:r?@34 v15:r?@34 v15:r?@137 v15:r?@143 [28,29 DoubleToFloat32] [def v13<f>:xmm1] [use v12:r xmm0] [30,31 SimdSplatX4] [def v14<(null)>:xmm1] [use v13:r xmm1] [MoveGroup] [stack:12 -> xmm0] [32,33 SimdSplatX4] [def v15<(null)>:xmm0] [use v8:r xmm0] [MoveGroup] [xmm0 -> xmm2] [xmm0 -> stack:12] <<< unaligned access [34,35 SimdBinaryArithFx4] [def v16<(null)>:xmm2] [use v15:r xmm2] [use v15:r? stack:12] [Codegen] instruction DoubleToFloat32 [Codegen] cvtsd2ss %xmm0, %xmm1 [Codegen] instruction SimdSplatX4 [Codegen] shufps 0x0, %xmm1, %xmm1 [Codegen] instruction MoveGroup [Codegen] movss 0x4(%rsp), %xmm0 [Codegen] instruction SimdSplatX4 [Codegen] shufps 0x0, %xmm0, %xmm0 [Codegen] instruction MoveGroup [Codegen] movaps %xmm0, %xmm2 [Codegen] movaps %xmm0, 0x4(%rsp) <<< unaligned access [Codegen] instruction SimdBinaryArithFx4:Mul [Codegen] mulps 0x4(%rsp), %xmm2 [Codegen] instruction Float32x4 [Codegen] movaps ?(%rip), %xmm3 [Codegen] instruction MoveGroup [Codegen] movl 0x8(%rsp), %eax

Douglas Crosher [:dougc]

Assignee

Updated

•

11 years ago

Depends on: 1062067

Benjamin Bouvier [:bbouvier] (inactive)

Comment 15

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/881bdc3b017f

Keywords: leave-open

Ryan VanderMeulen [:RyanVM]

Comment 16

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/881bdc3b017f

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla35

You need to log in before you can comment on or make changes to this bug.