Closed Bug 1051860 Opened 11 years ago Closed 11 years ago

SIMD x86-x64: Optimize MSimdValueX4

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(2 files, 3 obsolete files)

Blocks: 1023404
This unpcklps trick is really magic.
Attachment #8486702 - Flags: review?(sunfish)
Assignee: nobody → benj
Status: NEW → ASSIGNED
And pinsrd was actually already implemented for unboxing double values in x86... Nice!
Attachment #8486705 - Flags: review?(sunfish)
So it appears this first patch was really not regalloc friendly, in particular it clobbered two inputs without any shame. We need two temporaries (for r0 and r1), so by reusing r0's register for the output, we have one; then we need only one copy. This patch splits lowering for int32x4 and float32x4 (note that there is no LOpcodes-x86-shared.h and so the opcodes have to be repeated in each of the x64/LOpcodes-x64.h and x86 files).
Attachment #8486702 - Attachment is obsolete: true
Attachment #8486702 - Flags: review?(sunfish)
Attachment #8487061 - Flags: review?(sunfish)
Attachment #8486705 - Attachment is obsolete: true
Attachment #8486705 - Flags: review?(sunfish)
Attachment #8487062 - Flags: review?(sunfish)
forgot to qref
Attachment #8487062 - Attachment is obsolete: true
Attachment #8487062 - Flags: review?(sunfish)
Attachment #8487063 - Flags: review?(sunfish)
Comment on attachment 8487061 [details] [diff] [review] 1. Optimize Float32x4 Codegen With This One Weird Trick Review of attachment 8487061 [details] [diff] [review]: ----------------------------------------------------------------- One Weird Trick that CPUs, um, actually like. ::: js/src/jit/shared/CodeGenerator-x86-shared.cpp @@ +2150,5 @@ > + FloatRegister r3 = ToFloatRegister(ins->getOperand(3)); > + > + masm.unpcklps(r3, r1); > + masm.unpcklps(r2, r0); > + masm.unpcklps(r1, r0);
Attachment #8487061 - Flags: review?(sunfish) → review+
Comment on attachment 8487063 [details] [diff] [review] 2. Optimize int32x4 codegen when SSE4.1 is available Review of attachment 8487063 [details] [diff] [review]: ----------------------------------------------------------------- Groovy. We could optimize the non-SSE4.1 case more, but not *that* much more, and it'd require temporary registers, so it's not worth worrying about for now. ::: js/src/jit/shared/CodeGenerator-x86-shared.cpp @@ +2127,5 @@ > + FloatRegister output = ToFloatRegister(ins->output()); > + if (AssemblerX86Shared::HasSSE41()) { > + for (size_t i = 0; i < 4; ++i) { > + Register r = ToRegister(ins->getOperand(i)); > + masm.pinsrd(i, r, output); You want movd for the first element, and pinsrd for the rest. movd clears the entire register, clearing dependencies on earlier instructions.
Attachment #8487063 - Flags: review?(sunfish) → review+
Thanks! Out of curiosity, how could we optimize the non SSE4.1 case more? Also, it looks your review comments of the first patch may have disappeared somehow (or there is just an empty comment that didn't get deleted). Please let me know if you had any remarks about the patch, I'll address them in a follow-up patch if needed. remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/179193fbcccd remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/6cc38353cfa4
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
(In reply to Benjamin Bouvier [:bbouvier] from comment #8) > Thanks! > > Out of curiosity, how could we optimize the non SSE4.1 case more? movd all the integers into xmm registers and then do the shuffles like in the float32 case to pack them together. Doing a shuffle after the first 2 movd can keep it to 2 temporaries: movd %(W), %xmm0 movd %(Y), %xmm1 punpckldq %xmm0, %xmm1 movd %(Z), %xmm2 movd %(X), %xmm0 punpckldq %xmm2, %xmm0 punpckldq %xmm1, %xmm0 > Also, it > looks your review comments of the first patch may have disappeared somehow > (or there is just an empty comment that didn't get deleted). Please let me > know if you had any remarks about the patch, I'll address them in a > follow-up patch if needed. I had posted a unicode emoji character celebrating the triple-unpcklps sequence. It looks like bugzilla swallowed it. The patch is fine :-).
Depends on: 1067133
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: