Closed Bug 1019831 Opened 10 years ago Closed 10 years ago

SIMD x86/x64: Regalloc and moves bits

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(12 files, 13 obsolete files)

5.36 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
5.99 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
8.53 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
3.73 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
5.99 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
4.17 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
1.79 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
7.37 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
11.51 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
18.25 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
8.54 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
2.00 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
This should be x86 / x64 only and should crash under ARM, for now.
Attached patch SIMD x86-x64: LIR definition (obsolete) — Splinter Review
This one just creates a LIR definition for Int32x4, and adds a const bool in Architecture files to assert that we can't
fall under ARM in SIMD functions.
Attachment #8433541 - Flags: feedback?(sunfish)
This one implements quad slots in the stack slot allocator, so that we can use them for register spilling later.
Attachment #8433543 - Flags: feedback?(sunfish)
Macro assembler and assembler {load,store,move}Int32x4 functions.
Attachment #8433544 - Flags: feedback?(sunfish)
And finally, MoveResolver and MoveEmitter for Int32x4.
I'll update all my patches with your comments and Float32x4 implementation as well in the future.
Attachment #8433545 - Flags: feedback?(sunfish)
Comment on attachment 8433541 [details] [diff] [review]
SIMD x86-x64: LIR definition

Review of attachment 8433541 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR.h
@@ +434,5 @@
>          OBJECT,     // Pointer that may be collected as garbage (GPR).
>          SLOTS,      // Slots/elements pointer that may be moved by minor GCs (GPR).
>          FLOAT32,    // 32-bit floating-point value (FPU).
>          DOUBLE,     // 64-bit floating-point value (FPU).
> +        INT32X4,    // Packed data containing four 32-bit integers (FPU).

"Packed data" means a variety of things in tangentially related contexts. I suggest saying "SIMD data".

@@ +483,5 @@
>          return (Policy)((bits_ >> POLICY_SHIFT) & POLICY_MASK);
>      }
>      Type type() const {
> +        Type ret = (Type)((bits_ >> TYPE_SHIFT) & TYPE_MASK);
> +        JS_ASSERT_IF(ret == INT32X4, SupportsSIMD);

It'd be better to put this assert in the set function, so we catch SIMD types at the point where they are produced, rather than where they are consumed.

::: js/src/jit/arm/Assembler-arm.h
@@ +140,5 @@
>  static const bool StackKeptAligned = true;
>  static const uint32_t NativeFrameSize = sizeof(void*);
>  static const uint32_t AlignmentAtAsmJSPrologue = sizeof(void*);
>  
> +static const bool SupportsSIMD = false;

A brief comment saying that SIMD is merely not implemented yet would be good.
Attachment #8433543 - Flags: feedback?(sunfish) → feedback+
Comment on attachment 8433544 [details] [diff] [review]
SIMD x86-x64: masm / asm instructions for int32x4 moves

Review of attachment 8433544 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/X86Assembler.h
@@ +2843,5 @@
>      }
>  
> +    void movdqa_rr(XMMRegisterID src, XMMRegisterID dst)
> +    {
> +        spew("movdqa %s, %s",

Please add spaces to keep the first operand indented consistently.

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +470,5 @@
> +        movdqa(Operand(src), dest);
> +    }
> +    void storeInt32x4(FloatRegister src, const Address &dest) {
> +        movdqa(src, Operand(dest));
> +    }

The interesting question here is alignment, since we're eventually going to need unaligned loads and stores too. Will it make sense to name them loadUnalignedInt32x4, or should we rename these aligned functions to loadAlignedInt32x4? I don't have a strong opinion right now. Either way, it'd be nice to have comments explaining which one these are.
Attachment #8433544 - Flags: feedback?(sunfish)
Comment on attachment 8433545 [details] [diff] [review]
SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4

Review of attachment 8433545 [details] [diff] [review]:
-----------------------------------------------------------------

MoveOperand.isFloatReg() returns true for SIMD types, right? If so, characterizeCycle will set allFloatRegs to true for SIMD registers, and maybeEmitOptimizedCycle will do the xor swap for SIMD values too, which is fun.

::: js/src/jit/shared/MoveEmitter-x86-shared.cpp
@@ +158,5 @@
> +// SIMD prerequisites for cycle slot's size:
> +static const size_t CYCLE_SLOT_SIZE = 4 * sizeof(int32_t);
> +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 2 * sizeof(double));
> +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 4 * sizeof(float));
> +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 4 * sizeof(int32_t));

It doesn't matter right now, but these asserts could be >= instead of ==, in case we ever add larger slots.
Attachment #8433545 - Flags: feedback?(sunfish)
Attachment #8433541 - Flags: feedback?(sunfish)
(In reply to Dan Gohman [:sunfish] from comment #6)
> Comment on attachment 8433544 [details] [diff] [review]
> SIMD x86-x64: masm / asm instructions for int32x4 moves
> 
> Review of attachment 8433544 [details] [diff] [review]:
> -----------------------------------------------------------------
> The interesting question here is alignment, since we're eventually going to
> need unaligned loads and stores too. Will it make sense to name them
> loadUnalignedInt32x4, or should we rename these aligned functions to
> loadAlignedInt32x4? I don't have a strong opinion right now. Either way,
> it'd be nice to have comments explaining which one these are.

Actually I was wondering the same thing. Now, explicit is better than implicit, so I would even suggest not to have loadInt32x4 at all, but rather load{Un,}AlignedInt32x4. Otherwise, this would make any of the other two a norm (however if we need to choose a side, the exception should be loadUnalignedInt32x4).
(In reply to Benjamin Bouvier [:bbouvier] from comment #8)
> Actually I was wondering the same thing. Now, explicit is better than
> implicit, so I would even suggest not to have loadInt32x4 at all, but rather
> load{Un,}AlignedInt32x4.

Yeah, that sounds good. Alignment of SIMD values is often enough a concern that it'll be good to make it explicit everywhere. Nice Zen of Python quote, btw :).
See bug 1014083 comment 21. There shall be a pass that checks whether there is at least one SIMD operation, and in this
case will set needsSimdStackAlignment for CodeGen.
Attachment #8435635 - Flags: feedback?(sunfish)
Comment on attachment 8435635 [details] [diff] [review]
SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8435635 [details] [diff] [review]:
-----------------------------------------------------------------

Is there a reason why we don't want to just align the stack to 16 bytes always, instead of trying to cleverly track when it's needed?
Attachment #8435635 - Flags: feedback?(sunfish)
(In reply to Dan Gohman [:sunfish] from comment #11)
> Is there a reason why we don't want to just align the stack to 16 bytes
> always, instead of trying to cleverly track when it's needed?

Last time we talked about it, Luke was very against unconditional alignment, to avoid de-optimizing leaf functions in asm.js. I'll let him weigh in here for more details.
Well, I wouldn't say *very*, just wanted to optimize leaf functions assuming it's not an undue burden.  Btw, it seems like we could remove setPerformsAsmJSCall() and needsInitialStackAlignment_: we'd align the stack if gen->performsCall() and, since MAsmJSCall::isCall()=true, performsCall is already set for asm.js calls.
Attached patch 1 - SIMD x86-x64: LIR definition (obsolete) — Splinter Review
Fixed nits.
Attachment #8433541 - Attachment is obsolete: true
Attachment #8439343 - Flags: feedback?(sunfish)
Attachment #8433543 - Attachment description: SIMD x86-x64: Allow stack slot allocator to emit quad slots → 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots
Fixed remarks.
Attachment #8433544 - Attachment is obsolete: true
Attachment #8439344 - Flags: feedback?(sunfish)
Attachment #8433545 - Attachment is obsolete: true
I would like to make some measurements in a separate bug before having it aligned in all cases, as I think we'd also need the stack top to be aligned, to avoid cache lines faults. In the meanwhile, what do you think of the current patch?
Attachment #8435635 - Attachment is obsolete: true
Attachment #8439349 - Flags: feedback?(sunfish)
Attached patch 6 - Same things for Float32x4 (obsolete) — Splinter Review
Float32x4 bundle.
Attachment #8439350 - Flags: feedback?(sunfish)
Comment on attachment 8439343 [details] [diff] [review]
1 - SIMD x86-x64: LIR definition

Review of attachment 8439343 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR.h
@@ +434,5 @@
>          OBJECT,     // Pointer that may be collected as garbage (GPR).
>          SLOTS,      // Slots/elements pointer that may be moved by minor GCs (GPR).
>          FLOAT32,    // 32-bit floating-point value (FPU).
>          DOUBLE,     // 64-bit floating-point value (FPU).
> +        INT32X4,    // SIMD data containing four 32-bit integers (FPU).

Nit: It'd be nice to keep all the SIMD types next to each other.

@@ +447,5 @@
>      };
>  
>      void set(uint32_t index, Type type, Policy policy) {
>          JS_STATIC_ASSERT(MAX_VIRTUAL_REGISTERS <= VREG_MASK);
> +        JS_ASSERT_IF(type == INT32X4, SupportsSIMD);

This looks like it should use isSIMDType, or whatever it's called, rather than just checking for INT32X4. Or if we don't yet have such a function, it would be nice to create it :-).
Attachment #8439343 - Flags: feedback?(sunfish) → feedback+
Comment on attachment 8439344 [details] [diff] [review]
3 -  SIMD x86-x64: masm / asm instructions for int32x4 moves

Review of attachment 8439344 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/X86Assembler.h
@@ +2846,5 @@
> +    {
> +        spew("movdqa     %s, %s",
> +             nameFPReg(src), nameFPReg(dst));
> +        m_formatter.prefix(PRE_SSE_F2);
> +        m_formatter.twoByteOp(OP2_MOVDQ_VsdWsd, (RegisterID)dst, (RegisterID)src);

Should this be OP2_MOVQ_VdqWdq?
Attachment #8439344 - Flags: feedback?(sunfish) → feedback+
Attachment #8439345 - Flags: feedback?(sunfish) → feedback+
Comment on attachment 8439350 [details] [diff] [review]
6 - Same things for Float32x4

Review of attachment 8439350 [details] [diff] [review]:
-----------------------------------------------------------------

One related thing that'd be nice to do is teach characterizeCycle and maybeEmitOptimizedCycle to use xorps or pxor instead of xorpd if all the moves are FLOAT32x4 or INT32x4 respectively. It isn't critical though, and doesn't have to happen in this patch series.
Attachment #8439350 - Flags: feedback?(sunfish) → feedback+
Comment on attachment 8439349 [details] [diff] [review]
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8439349 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIRGenerator.h
@@ +117,5 @@
>      }
>      bool performsCall() const {
>          return performsCall_;
>      }
> +    void setUsesSimd() {

setUsesSimd() doesn't appear to be called anywhere. Also, this appears to be the only place where usesSimd_ is set to true, so with this patch, it appears it will never be set.

::: js/src/jit/arm/Assembler-arm.h
@@ +145,5 @@
>  // this architecture or not. Rather than a method in the LIRGenerator, it is
>  // here such that it is accessible from the entire codebase. Once full support
>  // for SIMD is reached on all tier-1 platforms, this constant can be deleted.
>  static const bool SupportsSIMD = false;
> +static const bool SimdStackAlignment = 8; // TODO?

If you plan to leave this TODO here for a while, it'd be good to have a little blurb explaining what needs to be done. e.g. // TODO: Does ARM want its stack 16-byte-aligned for SIMD?
Attachment #8439349 - Flags: feedback?(sunfish) → feedback-
(In reply to Dan Gohman [:sunfish] from comment #19)
> Comment on attachment 8439343 [details] [diff] [review]
> 1 - SIMD x86-x64: LIR definition
> 
> Review of attachment 8439343 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/LIR.h
> @@ +434,5 @@
> >          OBJECT,     // Pointer that may be collected as garbage (GPR).
> >          SLOTS,      // Slots/elements pointer that may be moved by minor GCs (GPR).
> >          FLOAT32,    // 32-bit floating-point value (FPU).
> >          DOUBLE,     // 64-bit floating-point value (FPU).
> > +        INT32X4,    // SIMD data containing four 32-bit integers (FPU).
> 
> Nit: It'd be nice to keep all the SIMD types next to each other.
Huh, I am confused: as of this patch, there is only INT32X4?

> This looks like it should use isSIMDType, or whatever it's called, rather
> than just checking for INT32X4. Or if we don't yet have such a function, it
> would be nice to create it :-).
Done (this was in a later patch).
(In reply to Dan Gohman [:sunfish] from comment #20)
> Comment on attachment 8439344 [details] [diff] [review]
> 3 -  SIMD x86-x64: masm / asm instructions for int32x4 moves
> 
> Review of attachment 8439344 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/assembler/assembler/X86Assembler.h
> @@ +2846,5 @@
> > +    {
> > +        spew("movdqa     %s, %s",
> > +             nameFPReg(src), nameFPReg(dst));
> > +        m_formatter.prefix(PRE_SSE_F2);
> > +        m_formatter.twoByteOp(OP2_MOVDQ_VsdWsd, (RegisterID)dst, (RegisterID)src);
> 
> Should this be OP2_MOVQ_VdqWdq?

Good question. I'd like to keep MOVDQ rather than MOVQ (to avoid confusion with (single) quad-words operations). As it operates on int32x4, VdqWdq is the right name. So, OP2_MOVDQ_VdqWdq? I've also changed existing uses of the reciprocal operation (WsdVsd) to follow that convention too.
We do actually need these in some places in Odin where we don't have control on the caller's stack pointer. (and if we don't actually need these, i'd prefer to have that done in follow up bugs)
Attachment #8461660 - Flags: review?(sunfish)
Comment on attachment 8461660 [details] [diff] [review]
Unaligned moves for Int32x4

Review of attachment 8461660 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.
Attachment #8461660 - Flags: review?(sunfish) → review+
Attachment #8461663 - Flags: review?(sunfish) → review+
An enhanced version of sp alignment in Codegen shared: the attached MIRGraph will be scanned and if there's at least one SIMD instruction, alignment will be forced.  To avoid scanning several times, the result value is cached.
Attachment #8439349 - Attachment is obsolete: true
Attachment #8463356 - Flags: review?(sunfish)
Comment on attachment 8463356 [details] [diff] [review]
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8463356 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIRGraph.cpp
@@ +48,5 @@
> +                                  end   = graph_->rpoEnd();
> +         block != end;
> +         block++)
> +    {
> +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {

It's ok to use MInstructionIterator here because you don't need to worry about phis, since any reachable phi (or phi cycle) will have at least one instruction as an input. Please add a comment to this effect.

@@ +49,5 @@
> +         block != end;
> +         block++)
> +    {
> +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {
> +            if (IsSimdType(inst->type())) {

It's ok to ignore instructions which use SIMD but don't have a SIMD result type (e.g. stores) because they'll have at least one SIMD operand so there will be an instruction somewhere with a SIMD result type. At least, this is true now and for the forseeable future. Please comment this too.

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +74,5 @@
>              if (unsigned rem = alignmentAtCall % StackAlignment)
>                  frameDepth_ += StackAlignment - rem;
> +
> +            if (gen->usesSimd()) {
> +                // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment,

I don't understand this comment. frameDepth_ is some function of how many slots the function used, and other things, so it isn't always going to be less than StackAlignment here. I think you want something with % StackAlignment == 0.

@@ +84,5 @@
> +                frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment;
> +                // We need at least the frame depth that was available before all
> +                // adjustments.
> +                if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth)
> +                    frameDepth_ += StackAlignment;

I don't quite understand this code. Is this assuming that at most one StackAlignment is needed to reach Simd alignment? It happens that SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true, but it's not obvious.
(In reply to Dan Gohman [:sunfish] from comment #29)
> Comment on attachment 8463356 [details] [diff] [review]
> 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD
> instructions are present
> 
> Review of attachment 8463356 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/MIRGraph.cpp
> @@ +48,5 @@
> > +                                  end   = graph_->rpoEnd();
> > +         block != end;
> > +         block++)
> > +    {
> > +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {
> 
> It's ok to use MInstructionIterator here because you don't need to worry
> about phis, since any reachable phi (or phi cycle) will have at least one
> instruction as an input. Please add a comment to this effect.
> 
> @@ +49,5 @@
> > +         block != end;
> > +         block++)
> > +    {
> > +        for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) {
> > +            if (IsSimdType(inst->type())) {
> 
> It's ok to ignore instructions which use SIMD but don't have a SIMD result
> type (e.g. stores) because they'll have at least one SIMD operand so there
> will be an instruction somewhere with a SIMD result type. At least, this is
> true now and for the forseeable future. Please comment this too.

Good points. I've added comments.

> 
> ::: js/src/jit/shared/CodeGenerator-shared.cpp
> @@ +74,5 @@
> >              if (unsigned rem = alignmentAtCall % StackAlignment)
> >                  frameDepth_ += StackAlignment - rem;
> > +
> > +            if (gen->usesSimd()) {
> > +                // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment,
> 
> I don't understand this comment. frameDepth_ is some function of how many
> slots the function used, and other things, so it isn't always going to be
> less than StackAlignment here. I think you want something with %
> StackAlignment == 0.

Hum, maybe just a notation nit: A == -B % C should be read "C divides A+B" here (like in math notation). I understand that there is some confusion with C syntax, so I've made it more explicit by just having (A+B) % C == 0.

> 
> @@ +84,5 @@
> > +                frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment;
> > +                // We need at least the frame depth that was available before all
> > +                // adjustments.
> > +                if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth)
> > +                    frameDepth_ += StackAlignment;
> 
> I don't quite understand this code. Is this assuming that at most one
> StackAlignment is needed to reach Simd alignment? It happens that
> SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true,
> but it's not obvious.

Wow, good catch. I think it's just a typo and I had in mind SimdStackAlignment. How about instead using SimdStackAlignment, and static assert above that SimdStackAlignment % StackAlignment == 0? (true on all platforms: x86,x64: 16 % (16 or 4), arm: 8 % 8)

(if it were not for cases where StackAlignment == 4, we could get rid of SimdStackAlignment, as it's usually equal to StackAlignment)
Flags: needinfo?(sunfish)
Attachment #8463356 - Flags: review?(sunfish)
(In reply to Benjamin Bouvier [:bbouvier] from comment #30)
> > ::: js/src/jit/shared/CodeGenerator-shared.cpp
> > @@ +74,5 @@
> > >              if (unsigned rem = alignmentAtCall % StackAlignment)
> > >                  frameDepth_ += StackAlignment - rem;
> > > +
> > > +            if (gen->usesSimd()) {
> > > +                // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment,
> > 
> > I don't understand this comment. frameDepth_ is some function of how many
> > slots the function used, and other things, so it isn't always going to be
> > less than StackAlignment here. I think you want something with %
> > StackAlignment == 0.
> 
> Hum, maybe just a notation nit: A == -B % C should be read "C divides A+B"
> here (like in math notation). I understand that there is some confusion with
> C syntax, so I've made it more explicit by just having (A+B) % C == 0.

Oh, I get it now. I do think that C syntax is best in this context; thanks for changing this.

> > 
> > @@ +84,5 @@
> > > +                frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment;
> > > +                // We need at least the frame depth that was available before all
> > > +                // adjustments.
> > > +                if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth)
> > > +                    frameDepth_ += StackAlignment;
> > 
> > I don't quite understand this code. Is this assuming that at most one
> > StackAlignment is needed to reach Simd alignment? It happens that
> > SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true,
> > but it's not obvious.
> 
> Wow, good catch. I think it's just a typo and I had in mind
> SimdStackAlignment. How about instead using SimdStackAlignment, and static
> assert above that SimdStackAlignment % StackAlignment == 0? (true on all
> platforms: x86,x64: 16 % (16 or 4), arm: 8 % 8)

Yeah, that's a reasonable static_assert.

I'm still a little confused though. The code for aligning to plain StackAlignment does this:

 if (unsigned rem = alignmentAtCall % StackAlignment)
     frameDepth_ += StackAlignment - rem;

Why doesn't the code for aligning to SimdStackAlignment have to do something similar?

> (if it were not for cases where StackAlignment == 4, we could get rid of
> SimdStackAlignment, as it's usually equal to StackAlignment)

Yeah, this is unfortunately still relevant.
Flags: needinfo?(sunfish)
(In reply to Dan Gohman [:sunfish] from comment #31)
> I'm still a little confused though. The code for aligning to plain
> StackAlignment does this:
> 
>  if (unsigned rem = alignmentAtCall % StackAlignment)
>      frameDepth_ += StackAlignment - rem;
> 
> Why doesn't the code for aligning to SimdStackAlignment have to do something
> similar?

I'll try to explain it to make it clearer for everyone (including me :))

The constraints we have are the following:
- (frameDepth_ + AsmJSFrameSize) % StackAlignment == 0 (1 -- asmjs call alignment)
- (frameDepth_ - frameInitialAdjustment_) % SimdStackAlignment == 0 (2 -- SIMD operand alignment)
- (frameDepth_ - frameInitialAdjustment_) >= initialFrameDepth (3 -- the amount of stack depth must be sufficient)

(3) tells us that frameInitialAdjustment_ := frameDepth_ % SimdStackAlignment is sufficient.

Let's take a concrete example. On x86, with StackAlignment := 16 (assume GCC), if frameDepth_ := 16k + 8, then there's nothing to add to frameDepth_ for the asmjs call restriction (as AsmJSFrameSize is 2 * sizeof(void*) == 8), but frameInitialAdjustment == 8, so (frameDepth_ - 8) == 16k < 16k + 8, so (2) is not satisfied anymore.

More generally, under these conditions (x86 and gcc), if frameDepth_ := 16k + r, frameInitialAdjustment_ ends up being 8 all the time and frameDepth_ - frameInitialAdjustment_ < initial frameDepth if r == 4 or r == 8.

We actually need to add SimdStackAlignment if the first frame depth fix up is less than the initial adjustment, I think.
Different wording, same principle. As the comment makes this function way fatter than it should be, I moved it to a setupSimdAlignment function, which makes the constructor slimmer.
Attachment #8463356 - Attachment is obsolete: true
Attachment #8465589 - Flags: review?(sunfish)
This is the last thing needed for Odin support, at least: every time we use PushRegsInMask (or PopRegsInMask), we need to know whether we want to push 64 or 128 bits of a float register on the stack.

The proper long term fix is to have the TypedRegisterSet<FloatRegister> contain a type information about the float register. More precisely, this type information could initially fit in a bit, indicating whether the type is 64 bits or 128 bits. Ideally, we'd want to indicate whether the type is float32, float64, or which SIMD type it is (to use the right SIMD move).
Attachment #8467595 - Flags: review?(sunfish)
Comment on attachment 8467595 [details] [diff] [review]
7 - Add an optional simdSet parameter to push and pop regs in mask

Review of attachment 8467595 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Benjamin Bouvier [:bbouvier] from comment #34)
> The proper long term fix is to have the TypedRegisterSet<FloatRegister>
> contain a type information about the float register. More precisely, this
> type information could initially fit in a bit, indicating whether the type
> is 64 bits or 128 bits. Ideally, we'd want to indicate whether the type is
> float32, float64, or which SIMD type it is (to use the right SIMD move).

Yeah, this sounds nice. Your current patch looks fine for now though.
Attachment #8467595 - Flags: review?(sunfish) → review+
Comment on attachment 8465589 [details] [diff] [review]
5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present

Review of attachment 8465589 [details] [diff] [review]:
-----------------------------------------------------------------

Cool, this looks good.
Attachment #8465589 - Flags: review?(sunfish) → review+
Attachment #8439343 - Attachment is obsolete: true
Attachment #8468483 - Flags: review?(sunfish)
Attachment #8468484 - Attachment description: bug1019831-2-stackslot.patch → 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots
Coming from bug 1023404, carrying forward r+ from sunfish.
Attachment #8468489 - Flags: review+
Comment on attachment 8468483 [details] [diff] [review]
1 - SIMD x86-x64: LIR definition

Review of attachment 8468483 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

::: js/src/jit/LIR.h
@@ +446,5 @@
>  
>      void set(uint32_t index, Type type, Policy policy) {
>          JS_STATIC_ASSERT(MAX_VIRTUAL_REGISTERS <= VREG_MASK);
>          bits_ = (index << VREG_SHIFT) | (policy << POLICY_SHIFT) | (type << TYPE_SHIFT);
> +        JS_ASSERT_IF(isSimdType(), SupportsSimd);

This would be slightly nicer as JS_ASSERT_IF(!SupportsSimd, !isSimdType()) since SupportsSimd is the less varying of the two expressions.
Attachment #8468483 - Flags: review?(sunfish) → review+
Attachment #8468484 - Flags: review?(sunfish) → review+
carrying forward r+ from sunfish
Attachment #8465589 - Attachment is obsolete: true
Attachment #8468496 - Flags: review+
Attachment #8439350 - Attachment is obsolete: true
Attachment #8468498 - Flags: review?(sunfish)
carrying forward r+ from sunfish
Attachment #8467595 - Attachment is obsolete: true
Attachment #8468503 - Flags: review+
Attachment #8468485 - Flags: review?(sunfish) → review+
Attachment #8468487 - Flags: review?(sunfish) → review+
Comment on attachment 8468492 [details] [diff] [review]
6 - SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4

Review of attachment 8468492 [details] [diff] [review]:
-----------------------------------------------------------------

MoveEmitterX86::maybeEmitOptimizedCycle currently emits an xorpd sequence if all moves are "isFloatReg()" and other conditions hold. Ideally characterizeCycle should be extended to look at the move types too, so that maybeEmitOptimizedCycle can emit pxor or xorps as appropriate. If you don't want to do that in this patch, we can do it later; xorpd is functionally correct here.

::: js/src/jit/shared/MoveEmitter-x86-shared.cpp
@@ +282,5 @@
>      // saved value of B, to A.
>      switch (type) {
> +      case MoveOp::INT32X4:
> +        JS_ASSERT(pushedAtCycle_ != -1);
> +        JS_ASSERT(pushedAtCycle_ - pushedAtStart_ >= 4 * sizeof(int32_t));

Instead of computing 4 * sizeof(int32_t), could this use Simd128Size (or whatever it's called)?
Attachment #8468492 - Flags: review?(sunfish) → review+
Attachment #8468498 - Flags: review?(sunfish) → review+
Depends on: 1051591
Depends on: 1052380
Attached patch ARM stubsSplinter Review
For completeness and consistency, these should be in this bug too. These are just stubs for the load{Una,A}ligned{Int,Float}32x4 methods.
Attachment #8472204 - Flags: review?(sunfish)
Attachment #8472204 - Flags: review?(sunfish) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: