Closed
Bug 1060789
Opened 10 years ago
Closed 10 years ago
Odin SIMD: Implement splat
Categories
(Core :: JavaScript Engine: JIT, defect)
Tracking
()
RESOLVED
FIXED
mozilla35
People
(Reporter: dougc, Assigned: dougc)
References
Details
Attachments
(3 files, 3 obsolete files)
15.54 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
3.95 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
13.92 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•10 years ago
|
||
This patch optimizes a simd x4 ctor with all plane arguments the same to use the shufps instruction. A challenge using this instruction for splat is that the source and destination need to be in the same register. For a float32x4 the patch uses defineReuseInput() for this, and I hope this will work as intended give the input is a float32 and the output a float32x4? It was not clear if a new MIR class should have been created, or SimdValueX4 reused with all arguments the same? The patch does not yet define splat, just optimizes the ctor. Anyway this was adequate to confirm that this optimization gives a good performance improvement for the Flappy Bird demo. The SIMD version is now 75% fast that the non-SIMD version.
Attachment #8481797 -
Flags: feedback?(sunfish)
Comment 2•10 years ago
|
||
Comment on attachment 8481797 [details] [diff] [review] Implement splat. Review of attachment 8481797 [details] [diff] [review]: ----------------------------------------------------------------- This looks reasonable to me, off the top of my head. A dedicated class for Splat rather than making it a special-case of MSimdValueX4 sounds good, as I think splats will be common enough. ::: js/src/jit/Lowering.cpp @@ +3700,5 @@ > + switch (ins->type()) { > + case MIRType_Int32x4: > + return define(lir, ins); > + case MIRType_Float32x4: > + return defineReuseInput(lir, ins, 0); The defineReuseInput here looks right for x86/x64 using shufps, so the only problem here is that this is target-independent code here. I think splats will have to be lowered in */Lowering-*.cpp.
Attachment #8481797 -
Flags: feedback?(sunfish) → feedback+
Assignee | ||
Comment 3•10 years ago
|
||
Thank you for the quick feedback. * Moved the Lowering into the backends. * Added a couple of tests. Shall implement the Odin 'slat' support in a follow up patch. Optimizing the constructor is enough for the demos, and we need to land this asap.
Assignee: nobody → dtc-moz
Attachment #8481797 -
Attachment is obsolete: true
Attachment #8481923 -
Flags: review?(sunfish)
Assignee | ||
Comment 4•10 years ago
|
||
Please leave open to add the Odin 'splat' support.
Keywords: leave-open
Comment 5•10 years ago
|
||
Comment on attachment 8481923 [details] [diff] [review] Add 'splat' backend support and optimize constructors to use this. Review of attachment 8481923 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/shared/CodeGenerator-x86-shared.cpp @@ +2165,5 @@ > + switch (mir->type()) { > + case MIRType_Int32x4: { > + Register r = ToRegister(ins->getOperand(0)); > + masm.movd(r, output); > + masm.shufps(0, output, output); int32x4 splat should use pshufd instead of shufps to avoid a domain crossing penalty. pshufd(0, output, output). @@ +2170,5 @@ > + break; > + } > + case MIRType_Float32x4: { > + FloatRegister r = ToFloatRegister(ins->getOperand(0)); > + masm.shufps(0, r, output); You can assert r == output here, to sanity-check that the defineReusedInput did its job.
Attachment #8481923 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 6•10 years ago
|
||
Thank you for the quick review, and your help. This patch integrates the feedback. Carrying forward the r+.
Attachment #8481923 -
Attachment is obsolete: true
Attachment #8481933 -
Flags: review+
Comment 7•10 years ago
|
||
Just splitting the patch into two parts: implementation of splat vs use it in Odin
Attachment #8481933 -
Attachment is obsolete: true
Attachment #8482263 -
Flags: review+
Comment 8•10 years ago
|
||
Attachment #8482264 -
Flags: review+
Updated•10 years ago
|
Status: NEW → ASSIGNED
Comment 9•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a65fbd070a58 https://hg.mozilla.org/integration/mozilla-inbound/rev/e28ec487d050
Comment 10•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a65fbd070a58 https://hg.mozilla.org/mozilla-central/rev/e28ec487d050
Comment 11•10 years ago
|
||
Adds proper support for int32x4.splat and float32x4 in Odin. Semantics: - int32x4.splat(x) === int32x4(x, x, x, x); - same rules as the tuple constructors: int32x4.splat accepts intish and float32x4.splat accepts signed, unsigned, double?, floatish.
Attachment #8482610 -
Flags: review?(luke)
Assignee | ||
Comment 12•10 years ago
|
||
We might have a problem here, see the unaligned movaps storing to the stack: 0x00007f2629a030a8: cvtsd2ss %xmm0,%xmm1 0x00007f2629a030ac: shufps $0x0,%xmm1,%xmm1 0x00007f2629a030b0: movss 0x4(%rsp),%xmm0 0x00007f2629a030b6: shufps $0x0,%xmm0,%xmm0 0x00007f2629a030ba: movaps %xmm0,%xmm2 0x00007f2629a030bd: movaps %xmm0,0x4(%rsp) <<<<<<<<<< I might have missed some important point wrt register targeting etc.
Comment 13•10 years ago
|
||
Comment on attachment 8482610 [details] [diff] [review] bug1060789-splat-support.patch Review of attachment 8482610 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/AsmJSValidate.cpp @@ +4720,5 @@ > + return false; > + break; > + } > + > + *type = global->simdOperationType(); Perhaps add a JS_ASSERT(global->simdOperationOp() == Splat) (with \n before and after)? Alternatively, I wonder if we could generalize CheckBinarySimd+CheckUnarySimd into an n-ary CheckSimdCall.
Attachment #8482610 -
Flags: review?(luke) → review+
Assignee | ||
Comment 14•10 years ago
|
||
Here's some of the close debug dump. Note the blendvps instruction uses the xmm0 register as an input and this can be seen to cause a spill below that tickles this problem. [RegAlloc] Allocating v8[1] [20,33) [priority 13] [weight 0] [RegAlloc] Hint xmm0, used by group allocation [RegAlloc] xmm0 collides with fixed use [23,24) [RegAlloc] Spilling interval [RegAlloc] Allocating spill location stack:12 <<< stack allocated here [RegAlloc] Allocating v1[1] [4,15) [priority 11] [weight 0] [RegAlloc] Hint xmm0, used by group allocation [RegAlloc] xmm0 collides with fixed use [11,12) [RegAlloc] Spilling interval [RegAlloc] Reusing group spill location stack:12 ... [RegAlloc] Allocating v15[1] [34,192) v15:r?@34 v15:r?@34 v15:r?@137 v15:r?@143 [priority 158] [weight 25] [RegAlloc] Hint xmm0, used by group allocation [RegAlloc] xmm0 collides with v68[0] [147,151) [weight 1000] [RegAlloc] Spilling interval [RegAlloc] Reusing group spill location stack:12 v1[0] req(xmm0) has(xmm0) [3,4) / v1[1] req(xmm0?) has(stack:12) [4,15) / v1[2] req(xmm0?) has(xmm0) [14,15) v1:r?@14 ... v8[0] req(r,xmm0?) has(xmm0) [19,20) / v8[1] req(xmm0?) has(stack:12) [20,33) / v8[2] req(xmm0?) has(xmm0) [32,33) v8:r?@32 ... v15[0] req(r,xmm0?) has(xmm0) [33,34) / v15[1] req(xmm0?) has(stack:12) [34,192) v15:r?@34 v15:r?@34 v15:r?@137 v15:r?@143 [28,29 DoubleToFloat32] [def v13<f>:xmm1] [use v12:r xmm0] [30,31 SimdSplatX4] [def v14<(null)>:xmm1] [use v13:r xmm1] [MoveGroup] [stack:12 -> xmm0] [32,33 SimdSplatX4] [def v15<(null)>:xmm0] [use v8:r xmm0] [MoveGroup] [xmm0 -> xmm2] [xmm0 -> stack:12] <<< unaligned access [34,35 SimdBinaryArithFx4] [def v16<(null)>:xmm2] [use v15:r xmm2] [use v15:r? stack:12] [Codegen] instruction DoubleToFloat32 [Codegen] cvtsd2ss %xmm0, %xmm1 [Codegen] instruction SimdSplatX4 [Codegen] shufps 0x0, %xmm1, %xmm1 [Codegen] instruction MoveGroup [Codegen] movss 0x4(%rsp), %xmm0 [Codegen] instruction SimdSplatX4 [Codegen] shufps 0x0, %xmm0, %xmm0 [Codegen] instruction MoveGroup [Codegen] movaps %xmm0, %xmm2 [Codegen] movaps %xmm0, 0x4(%rsp) <<< unaligned access [Codegen] instruction SimdBinaryArithFx4:Mul [Codegen] mulps 0x4(%rsp), %xmm2 [Codegen] instruction Float32x4 [Codegen] movaps ?(%rip), %xmm3 [Codegen] instruction MoveGroup [Codegen] movl 0x8(%rsp), %eax
Comment 15•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/881bdc3b017f
Keywords: leave-open
Comment 16•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/881bdc3b017f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
You need to log in
before you can comment on or make changes to this bug.
Description
•