Closed Bug 1019831 Opened 10 years ago Closed 10 years ago

SIMD x86/x64: Regalloc and moves bits

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla34

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(12 files, 13 obsolete files)

SIMD x86-x64: LIR definition 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 4.76 KB, patch		Details \| Diff \| Splinter Review
2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 3.73 KB, patch	sunfish : feedback+	Details \| Diff \| Splinter Review
SIMD x86-x64: masm / asm instructions for int32x4 moves 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 5.38 KB, patch		Details \| Diff \| Splinter Review
SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 7.54 KB, patch		Details \| Diff \| Splinter Review
SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 7.57 KB, patch		Details \| Diff \| Splinter Review
1 - SIMD x86-x64: LIR definition 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 8.10 KB, patch	sunfish : feedback+	Details \| Diff \| Splinter Review
3 - SIMD x86-x64: masm / asm instructions for int32x4 moves 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 5.40 KB, patch	sunfish : feedback+	Details \| Diff \| Splinter Review
4 - SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 7.62 KB, patch	sunfish : feedback+	Details \| Diff \| Splinter Review
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 8.21 KB, patch	sunfish : feedback-	Details \| Diff \| Splinter Review
6 - Same things for Float32x4 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 18.71 KB, patch	sunfish : feedback+	Details \| Diff \| Splinter Review
Unaligned moves for Int32x4 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 5.36 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
Unaligned moves for Float32x4 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 5.99 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 9.50 KB, patch		Details \| Diff \| Splinter Review
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 11.50 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
7 - Add an optional simdSet parameter to push and pop regs in mask 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 8.53 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
1 - SIMD x86-x64: LIR definition 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 8.53 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 3.73 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
3 - SIMD x86-x64: masm / asm instructions for int32x4 moves 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 5.99 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
4 - SIMD x86-x64: ScratchSimdRegister and ReturnSimdRegister 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 4.17 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
5 - SIMD x86-x64: Introduce Simd128DataSize 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 1.79 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
6 - SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 7.37 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
7 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 11.51 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
8 - Same things for Float32x4 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 18.25 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
9 - Add an optional simdSet parameter to push and pop regs in mask 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 8.54 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
ARM stubs 10 years ago Benjamin Bouvier [:bbouvier] (inactive) 2.00 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Description

•

10 years ago

This should be x86 / x64 only and should crash under ARM, for now.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 1

•

10 years ago

Attached patch SIMD x86-x64: LIR definition (obsolete) — Details — Splinter Review

This one just creates a LIR definition for Int32x4, and adds a const bool in Architecture files to assert that we can't
fall under ARM in SIMD functions.

Attachment #8433541 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 2

•

10 years ago

Attached patch 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots (obsolete) — Details — Splinter Review

This one implements quad slots in the stack slot allocator, so that we can use them for register spilling later.

Attachment #8433543 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 3

•

10 years ago

Attached patch SIMD x86-x64: masm / asm instructions for int32x4 moves (obsolete) — Details — Splinter Review

Macro assembler and assembler {load,store,move}Int32x4 functions.

Attachment #8433544 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 4

•

10 years ago

Attached patch SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 (obsolete) — Details — Splinter Review

And finally, MoveResolver and MoveEmitter for Int32x4.
I'll update all my patches with your comments and Float32x4 implementation as well in the future.

Attachment #8433545 - Flags: feedback?(sunfish)

Dan Gohman [:sunfish]

Comment 5

•

10 years ago

Comment on attachment 8433541 [details] [diff] [review]
SIMD x86-x64: LIR definition

Review of attachment 8433541 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR.h
@@ +434,5 @@
>          OBJECT,     // Pointer that may be collected as garbage (GPR).
>          SLOTS,      // Slots/elements pointer that may be moved by minor GCs (GPR).
>          FLOAT32,    // 32-bit floating-point value (FPU).
>          DOUBLE,     // 64-bit floating-point value (FPU).
> +        INT32X4,    // Packed data containing four 32-bit integers (FPU).

"Packed data" means a variety of things in tangentially related contexts. I suggest saying "SIMD data".

@@ +483,5 @@
>          return (Policy)((bits_ >> POLICY_SHIFT) & POLICY_MASK);
>      }
>      Type type() const {
> +        Type ret = (Type)((bits_ >> TYPE_SHIFT) & TYPE_MASK);
> +        JS_ASSERT_IF(ret == INT32X4, SupportsSIMD);

It'd be better to put this assert in the set function, so we catch SIMD types at the point where they are produced, rather than where they are consumed.

::: js/src/jit/arm/Assembler-arm.h
@@ +140,5 @@
>  static const bool StackKeptAligned = true;
>  static const uint32_t NativeFrameSize = sizeof(void*);
>  static const uint32_t AlignmentAtAsmJSPrologue = sizeof(void*);
>  
> +static const bool SupportsSIMD = false;

A brief comment saying that SIMD is merely not implemented yet would be good.

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8433543 - Flags: feedback?(sunfish) → feedback+

Dan Gohman [:sunfish]

Comment 6

•

10 years ago

Comment on attachment 8433544 [details] [diff] [review]
SIMD x86-x64: masm / asm instructions for int32x4 moves

Review of attachment 8433544 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/X86Assembler.h
@@ +2843,5 @@
>      }
>  
> +    void movdqa_rr(XMMRegisterID src, XMMRegisterID dst)
> +    {
> +        spew("movdqa %s, %s",

Please add spaces to keep the first operand indented consistently.

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +470,5 @@
> +        movdqa(Operand(src), dest);
> +    }
> +    void storeInt32x4(FloatRegister src, const Address &dest) {
> +        movdqa(src, Operand(dest));
> +    }

The interesting question here is alignment, since we're eventually going to need unaligned loads and stores too. Will it make sense to name them loadUnalignedInt32x4, or should we rename these aligned functions to loadAlignedInt32x4? I don't have a strong opinion right now. Either way, it'd be nice to have comments explaining which one these are.

Attachment #8433544 - Flags: feedback?(sunfish)

Dan Gohman [:sunfish]

Comment 7

•

10 years ago

Comment on attachment 8433545 [details] [diff] [review]
SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4

Review of attachment 8433545 [details] [diff] [review]:
-----------------------------------------------------------------

MoveOperand.isFloatReg() returns true for SIMD types, right? If so, characterizeCycle will set allFloatRegs to true for SIMD registers, and maybeEmitOptimizedCycle will do the xor swap for SIMD values too, which is fun.

::: js/src/jit/shared/MoveEmitter-x86-shared.cpp
@@ +158,5 @@
> +// SIMD prerequisites for cycle slot's size:
> +static const size_t CYCLE_SLOT_SIZE = 4 * sizeof(int32_t);
> +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 2 * sizeof(double));
> +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 4 * sizeof(float));
> +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 4 * sizeof(int32_t));

It doesn't matter right now, but these asserts could be >= instead of ==, in case we ever add larger slots.

Attachment #8433545 - Flags: feedback?(sunfish)

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8433541 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 8

•

10 years ago

(In reply to Dan Gohman [:sunfish] from comment #6)
> Comment on attachment 8433544 [details] [diff] [review]
> SIMD x86-x64: masm / asm instructions for int32x4 moves
> 
> Review of attachment 8433544 [details] [diff] [review]:
> -----------------------------------------------------------------
> The interesting question here is alignment, since we're eventually going to
> need unaligned loads and stores too. Will it make sense to name them
> loadUnalignedInt32x4, or should we rename these aligned functions to
> loadAlignedInt32x4? I don't have a strong opinion right now. Either way,
> it'd be nice to have comments explaining which one these are.

Actually I was wondering the same thing. Now, explicit is better than implicit, so I would even suggest not to have loadInt32x4 at all, but rather load{Un,}AlignedInt32x4. Otherwise, this would make any of the other two a norm (however if we need to choose a side, the exception should be loadUnalignedInt32x4).

Dan Gohman [:sunfish]

Comment 9

•

10 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #8)
> Actually I was wondering the same thing. Now, explicit is better than
> implicit, so I would even suggest not to have loadInt32x4 at all, but rather
> load{Un,}AlignedInt32x4.

Yeah, that sounds good. Alignment of SIMD values is often enough a concern that it'll be good to make it explicit everywhere. Nice Zen of Python quote, btw :).

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 10

•

10 years ago

Attached patch SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present (obsolete) — Details — Splinter Review

See bug 1014083 comment 21. There shall be a pass that checks whether there is at least one SIMD operation, and in this
case will set needsSimdStackAlignment for CodeGen.

Attachment #8435635 - Flags: feedback?(sunfish)

Dan Gohman [:sunfish]

Comment 11

•

10 years ago

Comment on attachment 8435635 [details] [diff] [review]
SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8435635 [details] [diff] [review]:
-----------------------------------------------------------------

Is there a reason why we don't want to just align the stack to 16 bytes always, instead of trying to cleverly track when it's needed?

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8435635 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 12

•

10 years ago

(In reply to Dan Gohman [:sunfish] from comment #11)
> Is there a reason why we don't want to just align the stack to 16 bytes
> always, instead of trying to cleverly track when it's needed?

Last time we talked about it, Luke was very against unconditional alignment, to avoid de-optimizing leaf functions in asm.js. I'll let him weigh in here for more details.

Luke Wagner [:luke]

Comment 13

•

10 years ago

Well, I wouldn't say *very*, just wanted to optimize leaf functions assuming it's not an undue burden.  Btw, it seems like we could remove setPerformsAsmJSCall() and needsInitialStackAlignment_: we'd align the stack if gen->performsCall() and, since MAsmJSCall::isCall()=true, performsCall is already set for asm.js calls.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 14

•

10 years ago

Attached patch 1 - SIMD x86-x64: LIR definition (obsolete) — Details — Splinter Review

Fixed nits.

Attachment #8433541 - Attachment is obsolete: true

Attachment #8439343 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Attachment #8433543 - Attachment description: SIMD x86-x64: Allow stack slot allocator to emit quad slots → 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 15

•

10 years ago

Attached patch 3 - SIMD x86-x64: masm / asm instructions for int32x4 moves (obsolete) — Details — Splinter Review

Fixed remarks.

Attachment #8433544 - Attachment is obsolete: true

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Attachment #8439344 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 16

•

10 years ago

Attached patch 4 - SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 (obsolete) — Details — Splinter Review

Attachment #8439345 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Attachment #8433545 - Attachment is obsolete: true

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 17

•

10 years ago

Attached patch 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present (obsolete) — Details — Splinter Review

I would like to make some measurements in a separate bug before having it aligned in all cases, as I think we'd also need the stack top to be aligned, to avoid cache lines faults. In the meanwhile, what do you think of the current patch?

Attachment #8435635 - Attachment is obsolete: true

Attachment #8439349 - Flags: feedback?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 18

•

10 years ago

Attached patch 6 - Same things for Float32x4 (obsolete) — Details — Splinter Review

Float32x4 bundle.

Attachment #8439350 - Flags: feedback?(sunfish)

Dan Gohman [:sunfish]

Comment 19

•

10 years ago

Comment on attachment 8439343 [details] [diff] [review]
1 - SIMD x86-x64: LIR definition

Review of attachment 8439343 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR.h
@@ +434,5 @@
>          OBJECT,     // Pointer that may be collected as garbage (GPR).
>          SLOTS,      // Slots/elements pointer that may be moved by minor GCs (GPR).
>          FLOAT32,    // 32-bit floating-point value (FPU).
>          DOUBLE,     // 64-bit floating-point value (FPU).
> +        INT32X4,    // SIMD data containing four 32-bit integers (FPU).

Nit: It'd be nice to keep all the SIMD types next to each other.

@@ +447,5 @@
>      };
>  
>      void set(uint32_t index, Type type, Policy policy) {
>          JS_STATIC_ASSERT(MAX_VIRTUAL_REGISTERS <= VREG_MASK);
> +        JS_ASSERT_IF(type == INT32X4, SupportsSIMD);

This looks like it should use isSIMDType, or whatever it's called, rather than just checking for INT32X4. Or if we don't yet have such a function, it would be nice to create it :-).

Attachment #8439343 - Flags: feedback?(sunfish) → feedback+

Dan Gohman [:sunfish]

Comment 20

•

10 years ago

Comment on attachment 8439344 [details] [diff] [review]
3 -  SIMD x86-x64: masm / asm instructions for int32x4 moves

Review of attachment 8439344 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/X86Assembler.h
@@ +2846,5 @@
> +    {
> +        spew("movdqa     %s, %s",
> +             nameFPReg(src), nameFPReg(dst));
> +        m_formatter.prefix(PRE_SSE_F2);
> +        m_formatter.twoByteOp(OP2_MOVDQ_VsdWsd, (RegisterID)dst, (RegisterID)src);

Should this be OP2_MOVQ_VdqWdq?

Attachment #8439344 - Flags: feedback?(sunfish) → feedback+

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8439345 - Flags: feedback?(sunfish) → feedback+

Dan Gohman [:sunfish]

Comment 21

•

10 years ago

Comment on attachment 8439350 [details] [diff] [review]
6 - Same things for Float32x4

Review of attachment 8439350 [details] [diff] [review]:
-----------------------------------------------------------------

One related thing that'd be nice to do is teach characterizeCycle and maybeEmitOptimizedCycle to use xorps or pxor instead of xorpd if all the moves are FLOAT32x4 or INT32x4 respectively. It isn't critical though, and doesn't have to happen in this patch series.

Attachment #8439350 - Flags: feedback?(sunfish) → feedback+

Dan Gohman [:sunfish]

Comment 22

•

10 years ago

Comment on attachment 8439349 [details] [diff] [review]
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8439349 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIRGenerator.h
@@ +117,5 @@
>      }
>      bool performsCall() const {
>          return performsCall_;
>      }
> +    void setUsesSimd() {

setUsesSimd() doesn't appear to be called anywhere. Also, this appears to be the only place where usesSimd_ is set to true, so with this patch, it appears it will never be set.

::: js/src/jit/arm/Assembler-arm.h
@@ +145,5 @@
>  // this architecture or not. Rather than a method in the LIRGenerator, it is
>  // here such that it is accessible from the entire codebase. Once full support
>  // for SIMD is reached on all tier-1 platforms, this constant can be deleted.
>  static const bool SupportsSIMD = false;
> +static const bool SimdStackAlignment = 8; // TODO?

If you plan to leave this TODO here for a while, it'd be good to have a little blurb explaining what needs to be done. e.g. // TODO: Does ARM want its stack 16-byte-aligned for SIMD?

Attachment #8439349 - Flags: feedback?(sunfish) → feedback-

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 23

•

10 years ago

(In reply to Dan Gohman [:sunfish] from comment #19)
> Comment on attachment 8439343 [details] [diff] [review]
> 1 - SIMD x86-x64: LIR definition
> 
> Review of attachment 8439343 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/LIR.h
> @@ +434,5 @@
> >          OBJECT,     // Pointer that may be collected as garbage (GPR).
> >          SLOTS,      // Slots/elements pointer that may be moved by minor GCs (GPR).
> >          FLOAT32,    // 32-bit floating-point value (FPU).
> >          DOUBLE,     // 64-bit floating-point value (FPU).
> > +        INT32X4,    // SIMD data containing four 32-bit integers (FPU).
> 
> Nit: It'd be nice to keep all the SIMD types next to each other.
Huh, I am confused: as of this patch, there is only INT32X4?

> This looks like it should use isSIMDType, or whatever it's called, rather
> than just checking for INT32X4. Or if we don't yet have such a function, it
> would be nice to create it :-).
Done (this was in a later patch).

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 24

•

10 years ago

(In reply to Dan Gohman [:sunfish] from comment #20)
> Comment on attachment 8439344 [details] [diff] [review]
> 3 -  SIMD x86-x64: masm / asm instructions for int32x4 moves
> 
> Review of attachment 8439344 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/assembler/assembler/X86Assembler.h
> @@ +2846,5 @@
> > +    {
> > +        spew("movdqa     %s, %s",
> > +             nameFPReg(src), nameFPReg(dst));
> > +        m_formatter.prefix(PRE_SSE_F2);
> > +        m_formatter.twoByteOp(OP2_MOVDQ_VsdWsd, (RegisterID)dst, (RegisterID)src);
> 
> Should this be OP2_MOVQ_VdqWdq?

Good question. I'd like to keep MOVDQ rather than MOVQ (to avoid confusion with (single) quad-words operations). As it operates on int32x4, VdqWdq is the right name. So, OP2_MOVDQ_VdqWdq? I've also changed existing uses of the reciprocal operation (WsdVsd) to follow that convention too.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 25

•

10 years ago

Attached patch Unaligned moves for Int32x4 — Details — Splinter Review

We do actually need these in some places in Odin where we don't have control on the caller's stack pointer. (and if we don't actually need these, i'd prefer to have that done in follow up bugs)

Attachment #8461660 - Flags: review?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 26

•

10 years ago

Attached patch Unaligned moves for Float32x4 — Details — Splinter Review

Attachment #8461663 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Comment 27

•

10 years ago

Comment on attachment 8461660 [details] [diff] [review]
Unaligned moves for Int32x4

Review of attachment 8461660 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

Attachment #8461660 - Flags: review?(sunfish) → review+

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8461663 - Flags: review?(sunfish) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 28

•

10 years ago

Attached patch 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present (obsolete) — Details — Splinter Review

An enhanced version of sp alignment in Codegen shared: the attached MIRGraph will be scanned and if there's at least one SIMD instruction, alignment will be forced.  To avoid scanning several times, the result value is cached.

Attachment #8439349 - Attachment is obsolete: true

Attachment #8463356 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Comment 29

•

10 years ago

Comment on attachment 8463356 [details] [diff] [review]
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8463356 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIRGraph.cpp
@@ +48,5 @@
> +                                  end   = graph_->rpoEnd();
> +         block != end;
> +         block++)
> +    {
> +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {

It's ok to use MInstructionIterator here because you don't need to worry about phis, since any reachable phi (or phi cycle) will have at least one instruction as an input. Please add a comment to this effect.

@@ +49,5 @@
> +         block != end;
> +         block++)
> +    {
> +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {
> +            if (IsSimdType(inst->type())) {

It's ok to ignore instructions which use SIMD but don't have a SIMD result type (e.g. stores) because they'll have at least one SIMD operand so there will be an instruction somewhere with a SIMD result type. At least, this is true now and for the forseeable future. Please comment this too.

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +74,5 @@
>              if (unsigned rem = alignmentAtCall % StackAlignment)
>                  frameDepth_ += StackAlignment - rem;
> +
> +            if (gen->usesSimd()) {
> +                // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment,

I don't understand this comment. frameDepth_ is some function of how many slots the function used, and other things, so it isn't always going to be less than StackAlignment here. I think you want something with % StackAlignment == 0.

@@ +84,5 @@
> +                frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment;
> +                // We need at least the frame depth that was available before all
> +                // adjustments.
> +                if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth)
> +                    frameDepth_ += StackAlignment;

I don't quite understand this code. Is this assuming that at most one StackAlignment is needed to reach Simd alignment? It happens that SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true, but it's not obvious.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 30

•

10 years ago

(In reply to Dan Gohman [:sunfish] from comment #29)
> Comment on attachment 8463356 [details] [diff] [review]
> 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD
> instructions are present
> 
> Review of attachment 8463356 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/MIRGraph.cpp
> @@ +48,5 @@
> > +                                  end   = graph_->rpoEnd();
> > +         block != end;
> > +         block++)
> > +    {
> > +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {
> 
> It's ok to use MInstructionIterator here because you don't need to worry
> about phis, since any reachable phi (or phi cycle) will have at least one
> instruction as an input. Please add a comment to this effect.
> 
> @@ +49,5 @@
> > +         block != end;
> > +         block++)
> > +    {
> > +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {
> > +            if (IsSimdType(inst->type())) {
> 
> It's ok to ignore instructions which use SIMD but don't have a SIMD result
> type (e.g. stores) because they'll have at least one SIMD operand so there
> will be an instruction somewhere with a SIMD result type. At least, this is
> true now and for the forseeable future. Please comment this too.

Good points. I've added comments.

> 
> ::: js/src/jit/shared/CodeGenerator-shared.cpp
> @@ +74,5 @@
> >              if (unsigned rem = alignmentAtCall % StackAlignment)
> >                  frameDepth_ += StackAlignment - rem;
> > +
> > +            if (gen->usesSimd()) {
> > +                // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment,
> 
> I don't understand this comment. frameDepth_ is some function of how many
> slots the function used, and other things, so it isn't always going to be
> less than StackAlignment here. I think you want something with %
> StackAlignment == 0.

Hum, maybe just a notation nit: A == -B % C should be read "C divides A+B" here (like in math notation). I understand that there is some confusion with C syntax, so I've made it more explicit by just having (A+B) % C == 0.

> 
> @@ +84,5 @@
> > +                frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment;
> > +                // We need at least the frame depth that was available before all
> > +                // adjustments.
> > +                if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth)
> > +                    frameDepth_ += StackAlignment;
> 
> I don't quite understand this code. Is this assuming that at most one
> StackAlignment is needed to reach Simd alignment? It happens that
> SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true,
> but it's not obvious.

Wow, good catch. I think it's just a typo and I had in mind SimdStackAlignment. How about instead using SimdStackAlignment, and static assert above that SimdStackAlignment % StackAlignment == 0? (true on all platforms: x86,x64: 16 % (16 or 4), arm: 8 % 8)

(if it were not for cases where StackAlignment == 4, we could get rid of SimdStackAlignment, as it's usually equal to StackAlignment)

Flags: needinfo?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Attachment #8463356 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Comment 31

•

10 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #30)
> > ::: js/src/jit/shared/CodeGenerator-shared.cpp
> > @@ +74,5 @@
> > >              if (unsigned rem = alignmentAtCall % StackAlignment)
> > >                  frameDepth_ += StackAlignment - rem;
> > > +
> > > +            if (gen->usesSimd()) {
> > > +                // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment,
> > 
> > I don't understand this comment. frameDepth_ is some function of how many
> > slots the function used, and other things, so it isn't always going to be
> > less than StackAlignment here. I think you want something with %
> > StackAlignment == 0.
> 
> Hum, maybe just a notation nit: A == -B % C should be read "C divides A+B"
> here (like in math notation). I understand that there is some confusion with
> C syntax, so I've made it more explicit by just having (A+B) % C == 0.

Oh, I get it now. I do think that C syntax is best in this context; thanks for changing this.

> > 
> > @@ +84,5 @@
> > > +                frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment;
> > > +                // We need at least the frame depth that was available before all
> > > +                // adjustments.
> > > +                if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth)
> > > +                    frameDepth_ += StackAlignment;
> > 
> > I don't quite understand this code. Is this assuming that at most one
> > StackAlignment is needed to reach Simd alignment? It happens that
> > SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true,
> > but it's not obvious.
> 
> Wow, good catch. I think it's just a typo and I had in mind
> SimdStackAlignment. How about instead using SimdStackAlignment, and static
> assert above that SimdStackAlignment % StackAlignment == 0? (true on all
> platforms: x86,x64: 16 % (16 or 4), arm: 8 % 8)

Yeah, that's a reasonable static_assert.

I'm still a little confused though. The code for aligning to plain StackAlignment does this:

 if (unsigned rem = alignmentAtCall % StackAlignment)
     frameDepth_ += StackAlignment - rem;

Why doesn't the code for aligning to SimdStackAlignment have to do something similar?

> (if it were not for cases where StackAlignment == 4, we could get rid of
> SimdStackAlignment, as it's usually equal to StackAlignment)

Yeah, this is unfortunately still relevant.

Flags: needinfo?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 32

•

10 years ago

(In reply to Dan Gohman [:sunfish] from comment #31)
> I'm still a little confused though. The code for aligning to plain
> StackAlignment does this:
> 
>  if (unsigned rem = alignmentAtCall % StackAlignment)
>      frameDepth_ += StackAlignment - rem;
> 
> Why doesn't the code for aligning to SimdStackAlignment have to do something
> similar?

I'll try to explain it to make it clearer for everyone (including me :))

The constraints we have are the following:
- (frameDepth_ + AsmJSFrameSize) % StackAlignment == 0 (1 -- asmjs call alignment)
- (frameDepth_ - frameInitialAdjustment_) % SimdStackAlignment == 0 (2 -- SIMD operand alignment)
- (frameDepth_ - frameInitialAdjustment_) >= initialFrameDepth (3 -- the amount of stack depth must be sufficient)

(3) tells us that frameInitialAdjustment_ := frameDepth_ % SimdStackAlignment is sufficient.

Let's take a concrete example. On x86, with StackAlignment := 16 (assume GCC), if frameDepth_ := 16k + 8, then there's nothing to add to frameDepth_ for the asmjs call restriction (as AsmJSFrameSize is 2 * sizeof(void*) == 8), but frameInitialAdjustment == 8, so (frameDepth_ - 8) == 16k < 16k + 8, so (2) is not satisfied anymore.

More generally, under these conditions (x86 and gcc), if frameDepth_ := 16k + r, frameInitialAdjustment_ ends up being 8 all the time and frameDepth_ - frameInitialAdjustment_ < initial frameDepth if r == 4 or r == 8.

We actually need to add SimdStackAlignment if the first frame depth fix up is less than the initial adjustment, I think.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 33

•

10 years ago

Attached patch 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present (obsolete) — Details — Splinter Review

Different wording, same principle. As the comment makes this function way fatter than it should be, I moved it to a setupSimdAlignment function, which makes the constructor slimmer.

Attachment #8463356 - Attachment is obsolete: true

Attachment #8465589 - Flags: review?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 34

•

10 years ago

Attached patch 7 - Add an optional simdSet parameter to push and pop regs in mask (obsolete) — Details — Splinter Review

This is the last thing needed for Odin support, at least: every time we use PushRegsInMask (or PopRegsInMask), we need to know whether we want to push 64 or 128 bits of a float register on the stack.

The proper long term fix is to have the TypedRegisterSet<FloatRegister> contain a type information about the float register. More precisely, this type information could initially fit in a bit, indicating whether the type is 64 bits or 128 bits. Ideally, we'd want to indicate whether the type is float32, float64, or which SIMD type it is (to use the right SIMD move).

Attachment #8467595 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Comment 35

•

10 years ago

Comment on attachment 8467595 [details] [diff] [review]
7 - Add an optional simdSet parameter to push and pop regs in mask

Review of attachment 8467595 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Benjamin Bouvier [:bbouvier] from comment #34)
> The proper long term fix is to have the TypedRegisterSet<FloatRegister>
> contain a type information about the float register. More precisely, this
> type information could initially fit in a bit, indicating whether the type
> is 64 bits or 128 bits. Ideally, we'd want to indicate whether the type is
> float32, float64, or which SIMD type it is (to use the right SIMD move).

Yeah, this sounds nice. Your current patch looks fine for now though.

Attachment #8467595 - Flags: review?(sunfish) → review+

Dan Gohman [:sunfish]

Comment 36

•

10 years ago

Comment on attachment 8465589 [details] [diff] [review]
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8465589 [details] [diff] [review]:
-----------------------------------------------------------------

Cool, this looks good.

Attachment #8465589 - Flags: review?(sunfish) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 37

•

10 years ago

Attached patch 1 - SIMD x86-x64: LIR definition — Details — Splinter Review

Attachment #8439343 - Attachment is obsolete: true

Attachment #8468483 - Flags: review?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 38

•

10 years ago

Attached patch 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots — Details — Splinter Review

Attachment #8433543 - Attachment is obsolete: true

Attachment #8468484 - Flags: review?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 39

•

10 years ago

Attached patch 3 - SIMD x86-x64: masm / asm instructions for int32x4 moves — Details — Splinter Review

Attachment #8439344 - Attachment is obsolete: true

Attachment #8468485 - Flags: review?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

10 years ago

Attachment #8468484 - Attachment description: bug1019831-2-stackslot.patch → 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 40

•

10 years ago

Attached patch 4 - SIMD x86-x64: ScratchSimdRegister and ReturnSimdRegister — Details — Splinter Review

This one is new.

Attachment #8468487 - Flags: review?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 41

•

10 years ago

Attached patch 5 - SIMD x86-x64: Introduce Simd128DataSize — Details — Splinter Review

Coming from bug 1023404, carrying forward r+ from sunfish.

Attachment #8468489 - Flags: review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 42

•

10 years ago

Attached patch 6 - SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 — Details — Splinter Review

Attachment #8439345 - Attachment is obsolete: true

Attachment #8468492 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Comment 43

•

10 years ago

Comment on attachment 8468483 [details] [diff] [review]
1 - SIMD x86-x64: LIR definition

Review of attachment 8468483 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

::: js/src/jit/LIR.h
@@ +446,5 @@
>  
>      void set(uint32_t index, Type type, Policy policy) {
>          JS_STATIC_ASSERT(MAX_VIRTUAL_REGISTERS <= VREG_MASK);
>          bits_ = (index << VREG_SHIFT) | (policy << POLICY_SHIFT) | (type << TYPE_SHIFT);
> +        JS_ASSERT_IF(isSimdType(), SupportsSimd);

This would be slightly nicer as JS_ASSERT_IF(!SupportsSimd, !isSimdType()) since SupportsSimd is the less varying of the two expressions.

Attachment #8468483 - Flags: review?(sunfish) → review+

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8468484 - Flags: review?(sunfish) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 44

•

10 years ago

Attached patch 7 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present — Details — Splinter Review

carrying forward r+ from sunfish

Attachment #8465589 - Attachment is obsolete: true

Attachment #8468496 - Flags: review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 45

•

10 years ago

Attached patch 8 - Same things for Float32x4 — Details — Splinter Review

Attachment #8439350 - Attachment is obsolete: true

Attachment #8468498 - Flags: review?(sunfish)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 46

•

10 years ago

Attached patch 9 - Add an optional simdSet parameter to push and pop regs in mask — Details — Splinter Review

carrying forward r+ from sunfish

Attachment #8467595 - Attachment is obsolete: true

Attachment #8468503 - Flags: review+

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8468485 - Flags: review?(sunfish) → review+

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8468487 - Flags: review?(sunfish) → review+

Dan Gohman [:sunfish]

Comment 47

•

10 years ago

Comment on attachment 8468492 [details] [diff] [review]
6 - SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4

Review of attachment 8468492 [details] [diff] [review]:
-----------------------------------------------------------------

MoveEmitterX86::maybeEmitOptimizedCycle currently emits an xorpd sequence if all moves are "isFloatReg()" and other conditions hold. Ideally characterizeCycle should be extended to look at the move types too, so that maybeEmitOptimizedCycle can emit pxor or xorps as appropriate. If you don't want to do that in this patch, we can do it later; xorpd is functionally correct here.

::: js/src/jit/shared/MoveEmitter-x86-shared.cpp
@@ +282,5 @@
>      // saved value of B, to A.
>      switch (type) {
> +      case MoveOp::INT32X4:
> +        JS_ASSERT(pushedAtCycle_ != -1);
> +        JS_ASSERT(pushedAtCycle_ - pushedAtStart_ >= 4 * sizeof(int32_t));

Instead of computing 4 * sizeof(int32_t), could this use Simd128Size (or whatever it's called)?

Attachment #8468492 - Flags: review?(sunfish) → review+

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8468498 - Flags: review?(sunfish) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 48

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=7cc3f4f2a4e8

Ryan VanderMeulen [:RyanVM]

Comment 49

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/b0117ac74441
https://hg.mozilla.org/mozilla-central/rev/6120f6d2c90e
https://hg.mozilla.org/mozilla-central/rev/113476b5c52b
https://hg.mozilla.org/mozilla-central/rev/9c8ed786f307
https://hg.mozilla.org/mozilla-central/rev/a59517b4353b
https://hg.mozilla.org/mozilla-central/rev/8333f43d2bfe
https://hg.mozilla.org/mozilla-central/rev/b8887016ec4f
https://hg.mozilla.org/mozilla-central/rev/c18202c2ccd0
https://hg.mozilla.org/mozilla-central/rev/7d603367bda5
https://hg.mozilla.org/mozilla-central/rev/55117245d4ac
https://hg.mozilla.org/mozilla-central/rev/7cc3f4f2a4e8

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

Douglas Crosher [:dougc]

Updated

•

10 years ago

Depends on: 1051591

Steve Singer (:stevensn)

Updated

•

10 years ago

Depends on: 1052380

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 50

•

10 years ago

Attached patch ARM stubs — Details — Splinter Review

For completeness and consistency, these should be in this bug too. These are just stubs for the load{Una,A}ligned{Int,Float}32x4 methods.

Attachment #8472204 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Updated

•

10 years ago

Attachment #8472204 - Flags: review?(sunfish) → review+

Ryan VanderMeulen [:RyanVM]

Comment 51

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/81c93f4a6212

You need to log in before you can comment on or make changes to this bug.