Closed Bug 1112164 Opened 5 years ago Closed 5 years ago

Spill SIMD registers based on Safepoints.

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: nbp, Assigned: nbp)

References

Details

Attachments

(20 files, 2 obsolete files)

1.28 KB, patch
bhackett
: review+
Details | Diff | Splinter Review
2.91 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
11.25 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
16.51 KB, patch
jandem
: review+
Details | Diff | Splinter Review
7.91 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
100.36 KB, patch
jandem
: review+
Details | Diff | Splinter Review
8.98 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
17.76 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
832 bytes, patch
bbouvier
: review+
Details | Diff | Splinter Review
4.39 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
23.15 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
16.45 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
3.54 KB, patch
jandem
: review+
Details | Diff | Splinter Review
2.58 KB, patch
jandem
: review+
Details | Diff | Splinter Review
35.16 KB, patch
bbouvier
: review+
jandem
: review+
Details | Diff | Splinter Review
4.04 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
8.49 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
2.51 KB, patch
mjrosenb
: review+
Details | Diff | Splinter Review
6.85 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
3.15 KB, patch
bhackett
: review+
Details | Diff | Splinter Review
Safepoints can also be used in case where we spill all registers to the stack
for out-of-line calls / Ion inline caches.  These registers are later used for
restoring the content of the registers as expected, or to feed the
MachineState in case of a throw inside the function call.  These spilled
register should also be aligned when used for SIMD register in order to avoid
unaligned writes.
Blocks: 1119303
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Attached patch x64 WIP (obsolete) — Splinter Review
This patch is a workj-in-progress in order to be able to have a type associated with the registers, and to get the register allocator use typed registers instead of Float Registers.

We need to have typed registers in the register allocator as we want to (1) spill the correct size in pushRegsInMask, and (2) do it efficiently by using the well typed assembly instructions.  The register allocator is in charge of indexing live registers in the Safepoints, which means that the typed should at least be carried by the AnyRegister which are registered in the Safepoints.  To get them carried properly, we need to have them as part of the LDefinition, and in a way which is consistent with the fixed uses.

This patch still has ~14 failures, mostly on asmjs tests, I am still investigating & fixing.
This patch is quite useful for the fixed register inputs of SIMD AsmJSReturn
and others.
Attachment #8564069 - Flags: review?(bhackett1024)
Attachment #8564069 - Flags: review?(bhackett1024) → review+
Blocks: 1133745
Attachment #8565645 - Flags: review?(jdemooij)
Attachment #8565645 - Flags: review?(benj)
Attachment #8565645 - Flags: feedback?(mrosenberg)
Now that the feature part of it is complete and that this set of patches works (still fixing mirror compilation issues on windows), I will check for regressions.

In the mean time these modifications should be usable on top of the latest stack alignment changesets.
Attached patch part 0..16 - Folded Patch (obsolete) — Splinter Review
Attachment #8563610 - Attachment is obsolete: true
Attachment #8565627 - Flags: review?(benj) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #19)
> I will check for regressions.

I did not notice any slow down on sunspider (reg-alloc time) nor octane (reg-alloc quality).
Comment on attachment 8565636 [details] [diff] [review]
part 8 - Use architecture specific SetSize.

Review of attachment 8565636 [details] [diff] [review]:
-----------------------------------------------------------------

Yep.

::: js/src/jit/RegisterSets.h
@@ +538,5 @@
>      SetType bits() const {
>          return bits_;
>      }
>      uint32_t size() const {
> +        return T::SetSize(bits());

nit: bits_?
Attachment #8565636 - Flags: review?(benj) → review+
Comment on attachment 8565637 [details] [diff] [review]
part 9 - PushRegsInMask no longer assumes that any FloatRegister is 8 bytes.

Review of attachment 8565637 [details] [diff] [review]:
-----------------------------------------------------------------

Unless I am missing something, this doesn't do what it thinks it does...

::: js/src/jit/shared/MacroAssembler-x86-shared.cpp
@@ +39,5 @@
> +        if (reg.isDouble())
> +            storeDouble(reg, spillAddress);
> +        else if (reg.isSingle())
> +            storeFloat32(reg, spillAddress);
> +        else if (reg.isInt32x4())

If i understand correctly, this case and the one below (reg.isFloat32x4()) are never hit, because doubleSet is set.fpus() - simdSet.
I think the purpose of this patch is to remove the second loop afterwards (the one that iterates over simdSet).  If you do so now, the assertion MOZ_ASSERT(diffF == 0) will fail!
Attachment #8565637 - Flags: review?(benj)
(In reply to Benjamin Bouvier [:bbouvier] from comment #23)
> Comment on attachment 8565637 [details] [diff] [review]
> part 9 - PushRegsInMask no longer assumes that any FloatRegister is 8 bytes.
> 
> Review of attachment 8565637 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Unless I am missing something, this doesn't do what it thinks it does...
> 
> ::: js/src/jit/shared/MacroAssembler-x86-shared.cpp
> @@ +39,5 @@
> > +        if (reg.isDouble())
> > +            storeDouble(reg, spillAddress);
> > +        else if (reg.isSingle())
> > +            storeFloat32(reg, spillAddress);
> > +        else if (reg.isInt32x4())
> 
> If i understand correctly, this case and the one below (reg.isFloat32x4())
> are never hit, because doubleSet is set.fpus() - simdSet.
> I think the purpose of this patch is to remove the second loop afterwards
> (the one that iterates over simdSet).  If you do so now, the assertion
> MOZ_ASSERT(diffF == 0) will fail!

Indeed, they are not used before the part 14.  This assertion does not change in part 14, because I update FloatRegister::GetPushSizeInBytes(…), and FloatRegister::size().

The purpose of this patch is not to remove the second loop yet, but only to make sure that the API used by Safepoints is working without having to split the register set in 2.

The only goal of this patch is to make something easy to review, without bloating part 14.
Comment on attachment 8565639 [details] [diff] [review]
part 10 - Clean-up: Use SetType for register mask declarations.

Review of attachment 8565639 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, it's more consistent
Attachment #8565639 - Flags: review?(benj) → review+
Comment on attachment 8565637 [details] [diff] [review]
part 9 - PushRegsInMask no longer assumes that any FloatRegister is 8 bytes.

Review of attachment 8565637 [details] [diff] [review]:
-----------------------------------------------------------------

Alright, in this case r=me.  I think the commit message could be slightly improved, to reflect better than we only push the right amount of data onto the stack, but your call.

::: js/src/jit/shared/MacroAssembler-x86-shared.cpp
@@ +51,5 @@
>      for (FloatRegisterBackwardIterator iter(simdSet); iter.more(); iter++) {
>          diffF -= Simd128DataSize;
>          numSimd -= 1;
>          // XXX how to choose the right move type?
>          storeUnalignedInt32x4(*iter, Address(StackPointer, diffF));

Can you also fix this XXX please, while you're here?

@@ +73,5 @@
>      for (FloatRegisterBackwardIterator iter(simdSet); iter.more(); iter++) {
>          diffF -= Simd128DataSize;
>          numSimd -= 1;
>          if (!ignore.has(*iter))
>              // XXX how to choose the right move type?

And here too
Attachment #8565637 - Flags: review+
Comment on attachment 8565625 [details] [diff] [review]
part 1 - x86/x64 lowerForFPU: Do not reuse the input register if the MIRType are different.

Review of attachment 8565625 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good for now. Bug 1132894 will take it from here :-).

::: js/src/jit/shared/Assembler-x86-shared.h
@@ +1679,5 @@
>            default:
>              MOZ_CRASH("unexpected operand kind");
>          }
>      }
> +    void vcmpps(uint8_t order, Operand src1, FloatRegister src0, FloatRegister dest) {

What prompted the change to pass Operand by value? It's a struct with multiple words of data in it, so it's not a trivial win.

@@ +1684,4 @@
>          MOZ_ASSERT(HasSSE2());
> +        if (!HasAVX() && !src0.aliases(dest)) {
> +            if (src1.kind() == Operand::FPREG &&
> +                dest.aliases(FloatRegister::FromCode(src1.fpu())))

Since this is x86/x64, we don't have to use aliases; we can just say src0 != dest and dest == src1 here and it'll be more readable :-).

@@ +1690,5 @@
> +                src1 = Operand(ScratchSimdReg);
> +            }
> +            vmovdqa(src0, dest);
> +            src0 = dest;
> +        }

Please add a comment here referring the reader to the comment in lowerForFPU.
Attachment #8565625 - Flags: review?(sunfish) → review+
Comment on attachment 8565646 [details] [diff] [review]
part 15 - x86/x64: Add register type in the register allocator spew.

Review of attachment 8565646 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/Assembler-x86-shared.cpp
@@ +295,5 @@
>  }
> +
> +const char *
> +FloatRegister::name() const {
> +    static const char *const names[] = {

Maybe add a comment telling that ordering of types needs to be the same as in FloatRegisters::ContentType?

@@ +319,5 @@
> +        "%xmm4.s4", "%xmm5.s4", "%xmm6.s4", "%xmm7.s4"
> +#ifdef JS_CODEGEN_X64
> +      , "%xmm8.s4", "%xmm9.s4", "%xmm10.s4", "%xmm11.s4",
> +        "%xmm12.s4", "%xmm13.s4", "%xmm14.s4", "%xmm15.s4"
> +#endif

using macros here would make sense :)
Attachment #8565646 - Flags: review?(benj) → review+
Comment on attachment 8565647 [details] [diff] [review]
part 16 - Clean-up: Remove FloatRegister::GetSizeInBytes.

Review of attachment 8565647 [details] [diff] [review]:
-----------------------------------------------------------------

Fun.
Attachment #8565647 - Flags: review?(benj) → review+
Comment on attachment 8565645 [details] [diff] [review]
part 14 - Add types to x86/x64 float registers.

Review of attachment 8565645 [details] [diff] [review]:
-----------------------------------------------------------------

Okay, most of my comments are nits and apply also to -x86 files, so I'll leave an r+, as it's pretty high priority to get this in.

::: js/src/jit/Registers.h
@@ +122,1 @@
>      void setRegisterLocation(FloatRegister reg, double *dp) {

Can you MOZ_ASSERT(reg.isDouble()) here? and if you feel like so, also assert in the GPR case

::: js/src/jit/x64/Architecture-x64.h
@@ +169,5 @@
>      static const uint32_t Allocatable = 15;
>  
> +    typedef uint64_t SetType;
> +    static_assert(sizeof(SetType) * 8 >= Total,
> +        "SetType should be large enough to enumerate all registers.");

nit: can you align the string with the first parenthesis, please?

@@ +180,5 @@
> +    static const SetType spread_int32x4_ = SetType(1) << (uint32_t(Int32x4) * TotalPhys);
> +    static const SetType spread_float32x4_ = SetType(1) << (uint32_t(Float32x4) * TotalPhys);
> +    static const SetType spread_scalar_ = spread_single_ + spread_double_;
> +    static const SetType spread_vector_ = spread_int32x4_ + spread_float32x4_;
> +    static const SetType spread_ = spread_scalar_ + spread_vector_;

nit: as you're manipulating bits here, i'd prefer using | rather than + (on these 3 lines)

@@ +194,5 @@
> +          (1 << X86Encoding::xmm3) |
> +          (1 << X86Encoding::xmm4) |
> +          (1 << X86Encoding::xmm5)
> +        ) * spread_scalar_
> +        + AllPhysMask * spread_vector_;

ditto

@@ +218,5 @@
>      typedef size_t Code;
>      typedef Codes::Encoding Encoding;
>      typedef Codes::SetType SetType;
>      static uint32_t SetSize(SetType x) {
> +        static_assert(sizeof(SetType) == 8, "SetType must be 32 bits");

nit: "must be 64 bits"

@@ +222,5 @@
> +        static_assert(sizeof(SetType) == 8, "SetType must be 32 bits");
> +        // Count the number of non-aliased registers, for the moment.
> +        x |= x >> (2 * Codes::TotalPhys);
> +        x |= x >> Codes::TotalPhys;
> +        x &= Codes::AllPhysMask;

This code isn't obvious. Can you please comment it? "Copy bits corresponding to all taken registers, into the low 16 bits section", or something like that?

@@ +227,1 @@
>          return mozilla::CountPopulation32(x);

Please comment that it's safe to use CountPopulation32 here, because &= AllPhysMask will have reduced x to 16 bits.

@@ +237,5 @@
> +    // Note: These fields are using one extra bit to make the invalid enumerated
> +    // values fit, and thus prevent a warning.
> +    Codes::Encoding reg_ : 5;
> +    Codes::ContentType type_ : 3;
> +    bool isInvalid_ : 1;

won't padding make this struct take 16 bits in memory here?

@@ +241,5 @@
> +    bool isInvalid_ : 1;
> +
> +    // Constants used for exporting/importing the float register code.
> +    static const size_t regSize = 4;
> +    static const size_t regMask = (1 << regSize) - 1;

NIT: CAN_HAS_CAPITAL_LETTERZ_FOR_CONSTANTZ?

@@ +257,2 @@
>  
>      static FloatRegister FromCode(uint32_t i) {

FromCode doesn't sound entirely true, as we're extracting both code and content type from the parameter...

@@ +262,5 @@
>  
> +    bool isSingle() const { return type_ == Codes::Single; }
> +    bool isDouble() const { return type_ == Codes::Double; }
> +    bool isInt32x4() const { return type_ == Codes::Int32x4; }
> +    bool isFloat32x4() const { return type_ == Codes::Float32x4; }

please everywhere MOZ_ASSERT(!isInvalid_);

@@ +268,5 @@
>  
> +    FloatRegister asSingle() const { return FloatRegister(reg_, Codes::Single); }
> +    FloatRegister asDouble() const { return FloatRegister(reg_, Codes::Double); }
> +    FloatRegister asInt32x4() const { return FloatRegister(reg_, Codes::Int32x4); }
> +    FloatRegister asFloat32x4() const { return FloatRegister(reg_, Codes::Float32x4); }

here as well

@@ +283,5 @@
>      Code code() const {
> +        MOZ_ASSERT(uint32_t(reg_) < Codes::TotalPhys);
> +        // :TODO: ARM is doing the same thing, but we should avoid this, except
> +        // that the RegisterSets depends on this.
> +        return Code(reg_ | (type_ << 4));

nit: s/4/regSize

@@ +292,3 @@
>      }
>      const char *name() const {
> +        // :TODO: Add type

(note to self: this is done in a later patch, same bug)

@@ +308,2 @@
>      }
> +    // It is to ensure that two floating point registers' types are equivalent.

I don't get this comment?

::: js/src/jit/x64/Assembler-x64.cpp
@@ +334,5 @@
> +    SetType set128b = int32x4Set | float32x4Set;
> +    SetType set64b = doubleSet & ~set128b;
> +    SetType set32b = singleSet & ~set64b  & ~set128b;
> +
> +    return mozilla::CountPopulation32(set128b) * (4 * sizeof(int32_t))

Please add a comment /MOZ_ASSERT that all sets are less than 32 bits (actually, they're less than 16 bits, so you could even use CountPopulation16 here, would that exist)

::: js/src/jit/x64/Assembler-x64.h
@@ +82,1 @@
>  static MOZ_CONSTEXPR_VAR FloatRegister ScratchSimdReg = xmm15;

What about the ScratchSimdReg?

::: js/src/jit/x86/Architecture-x86.h
@@ +158,5 @@
>      static const uint32_t Allocatable = 7;
>  
> +    typedef uint32_t SetType;
> +    static_assert(sizeof(SetType) * 8 >= Total,
> +        "SetType should be large enough to enumerate all registers.");

(most of x64 comments apply here as well)

::: js/src/jit/x86/Assembler-x86.cpp
@@ +135,5 @@
> +    SetType set32b = singleSet & ~set64b  & ~set128b;
> +
> +    return mozilla::CountPopulation32(set128b) * (4 * sizeof(int32_t))
> +        + mozilla::CountPopulation32(set64b) * sizeof(double)
> +        + mozilla::CountPopulation32(set32b) * sizeof(float);

(see comments in Assembler-x64.cpp)
Attachment #8565645 - Flags: review?(benj) → review+
Also, I had some more question: have you tried running the folded patch in an AWFY instance, to measure how it affects regalloc time in particular? That would be nice to do to avoid regressions.

Also, requesting fuzzing on the folded patch might be an excellent idea ;)
Comment on attachment 8565629 [details] [diff] [review]
part 4 - SimdReinterpretCast: Do not use redefine as the MIRType of the input & output are different.

Review of attachment 8565629 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Go bug 1132894 :-).

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +2163,5 @@
> +{
> +    FloatRegister input = ToFloatRegister(ins->input());
> +    FloatRegister output = ToFloatRegister(ins->output());
> +
> +    if (input.aliases(output))

In x86/x64 code, use input == output, especially since this is testing whether we can skip the copy altogether.
Attachment #8565629 - Flags: review?(sunfish) → review+
(In reply to Dan Gohman [:sunfish] from comment #27)
> Comment on attachment 8565625 [details] [diff] [review]
> part 1 - x86/x64 lowerForFPU: Do not reuse the input register if the MIRType
> are different.
> 
> Review of attachment 8565625 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good for now. Bug 1132894 will take it from here :-).
> 
> ::: js/src/jit/shared/Assembler-x86-shared.h
> @@ +1679,5 @@
> >            default:
> >              MOZ_CRASH("unexpected operand kind");
> >          }
> >      }
> > +    void vcmpps(uint8_t order, Operand src1, FloatRegister src0, FloatRegister dest) {
> 
> What prompted the change to pass Operand by value? It's a struct with
> multiple words of data in it, so it's not a trivial win.

This was simply a cosmetic change to avoid making a copy, as I am modifying the operand if we have to move registers around.

> @@ +1684,4 @@
> >          MOZ_ASSERT(HasSSE2());
> > +        if (!HasAVX() && !src0.aliases(dest)) {
> > +            if (src1.kind() == Operand::FPREG &&
> > +                dest.aliases(FloatRegister::FromCode(src1.fpu())))
> 
> Since this is x86/x64, we don't have to use aliases; we can just say src0 !=
> dest and dest == src1 here and it'll be more readable :-).

This is no longer true, after the part 14, of this bug.  The equality will not work after part 14, except if we compare the XMMRegisterID, which is what the aliases function do.
(In reply to Benjamin Bouvier [:bbouvier] from comment #30)
> Comment on attachment 8565645 [details] [diff] [review]
> part 14 - Add types to x86/x64 float registers.
> 
> Review of attachment 8565645 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/Registers.h
> @@ +122,1 @@
> >      void setRegisterLocation(FloatRegister reg, double *dp) {
> 
> Can you MOZ_ASSERT(reg.isDouble()) here? and if you feel like so, also
> assert in the GPR case

I will see if I can do that as part of a follow-up patch.

> @@ +227,1 @@
> >          return mozilla::CountPopulation32(x);
> 
> Please comment that it's safe to use CountPopulation32 here, because &=
> AllPhysMask will have reduced x to 16 bits.
> 
> @@ +237,5 @@
> > +    // Note: These fields are using one extra bit to make the invalid enumerated
> > +    // values fit, and thus prevent a warning.
> > +    Codes::Encoding reg_ : 5;
> > +    Codes::ContentType type_ : 3;
> > +    bool isInvalid_ : 1;
> 
> won't padding make this struct take 16 bits in memory here?

This does not matter as the structure is not packaged anyway, which probably means that it take at least a word (size_t).

> @@ +257,2 @@
> >  
> >      static FloatRegister FromCode(uint32_t i) {
> 
> FromCode doesn't sound entirely true, as we're extracting both code and
> content type from the parameter...

I agree, we should either rename it, or change the Code function to not return a "Code".

> ::: js/src/jit/x64/Assembler-x64.cpp
> @@ +334,5 @@
> > +    SetType set128b = int32x4Set | float32x4Set;
> > +    SetType set64b = doubleSet & ~set128b;
> > +    SetType set32b = singleSet & ~set64b  & ~set128b;
> > +
> > +    return mozilla::CountPopulation32(set128b) * (4 * sizeof(int32_t))
> 
> Please add a comment /MOZ_ASSERT that all sets are less than 32 bits
> (actually, they're less than 16 bits, so you could even use
> CountPopulation16 here, would that exist)

CountPopulation16 does not exists yet.

> ::: js/src/jit/x64/Assembler-x64.h
> @@ +82,1 @@
> >  static MOZ_CONSTEXPR_VAR FloatRegister ScratchSimdReg = xmm15;
> 
> What about the ScratchSimdReg?

For the moment this does not matter because the register type is not checked outside the MoveEmitter.  If we were to add register type assertions into the MacroAssembler / Assembler, then we would have to update this ScratchRegister, or use only one scratch register and cast it at will.
Attachment #8565628 - Flags: review?(jdemooij) → review+
Comment on attachment 8565643 [details] [diff] [review]
part 12 - Use RegisterDump size for bailout spills.

Review of attachment 8565643 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86/Trampoline-x86.cpp
@@ +531,5 @@
>  
>      // Common size of stuff we've pushed.
> +    static const uint32_t BailoutDataSize = 0
> +        + sizeof(void *) // frameClass
> +        + sizeof(RegisterDump);

I was going to say: this is confusing because BailoutStack doesn't use RegisterDump, but I see the folded patch changes that :)
Attachment #8565643 - Flags: review?(jdemooij) → review+
Comment on attachment 8565644 [details] [diff] [review]
part 13 - HandleRegisterDump operations should support all register type.

Review of attachment 8565644 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +1085,1 @@
>  #endif

We should complain when somebody adds a new type and doesn't handle it here:

    else
        MOZ_CRASH(...);
Attachment #8565644 - Flags: review?(jdemooij) → review+
Comment on attachment 8565645 [details] [diff] [review]
part 14 - Add types to x86/x64 float registers.

Review of attachment 8565645 [details] [diff] [review]:
-----------------------------------------------------------------

Just some nits in addition to Benjamin's comments.

::: js/src/jit/x64/Architecture-x64.h
@@ +138,5 @@
> +        Single,
> +        Double,
> +        Int32x4,
> +        Float32x4,
> +        NUM_TYPES

Nit: NumTypes?

@@ +174,5 @@
> +
> +    // Magic values which are used to duplicate a mask of physical register for
> +    // a specific type of register. A multiplication is used to copy and shift
> +    // the bits of the physical register mask.
> +    static const SetType spread_single_ = SetType(1) << (uint32_t(Single) * TotalPhys);

Style nit: SpreadSingle, also rename the others below.

::: js/src/jit/x64/Assembler-x64.cpp
@@ +327,5 @@
> +        (all >> (uint32_t(Codes::Double) * Codes::TotalPhys)) & Codes::AllPhysMask;
> +    SetType singleSet =
> +        (all >> (uint32_t(Codes::Single) * Codes::TotalPhys)) & Codes::AllPhysMask;
> +
> +    // PushRegsInMask push the largest register first, and thus avoid pushing

Nit: pushes, avoids

::: js/src/jit/x86/Architecture-x86.h
@@ +122,5 @@
>  class FloatRegisters {
>    public:
>      typedef X86Encoding::XMMRegisterID Encoding;
>  
> +    enum ContentType {

File a follow-up bug to add Architecture-x86-shared.h? I think we could share the FloatRegister code and use #ifdefs if they are different somewhere.
Attachment #8565645 - Flags: review?(jdemooij) → review+
Comment on attachment 8565632 [details] [diff] [review]
part 5 - Distinguish between the FloatRegister code and the encoding on x86 & x64.

Review of attachment 8565632 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/Assembler-x86-shared.h
@@ +1590,4 @@
>      }
>      void vcvttss2si(FloatRegister src, Register dest) {
>          MOZ_ASSERT(HasSSE2());
> +        masm.vcvttss2si_rr(src.encoding(), dest.code());

So now we have FloatRegister::encoding() and Register::code(). If Code got another meaning for FloatRegisters, could we rename that one instead and use Code instead of Encoding? code() is used a lot in these files...
Attachment #8565632 - Flags: review?(jdemooij)
(In reply to Nicolas B. Pierron [:nbp] from comment #16)
> Created attachment 8565645 [details] [diff] [review]
> part 14 - Add types to x86/x64 float registers.

Several things in register allocation are proportional to the number of physical registers. For example, the Backtracking allocator's registers array will hold 4 times as many splay trees with this patch. Conceptually, on x86/x64/ARM64, only one type of a register is active at any given time, so the register allocator shouldn't have to treat each register*type as a separate register. Is there a way to do that within this new framework?
(In reply to Jan de Mooij [:jandem] from comment #38)
> Comment on attachment 8565632 [details] [diff] [review]
> part 5 - Distinguish between the FloatRegister code and the encoding on x86
> & x64.
> 
> Review of attachment 8565632 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/shared/Assembler-x86-shared.h
> @@ +1590,4 @@
> >      }
> >      void vcvttss2si(FloatRegister src, Register dest) {
> >          MOZ_ASSERT(HasSSE2());
> > +        masm.vcvttss2si_rr(src.encoding(), dest.code());
> 
> So now we have FloatRegister::encoding() and Register::code(). If Code got
> another meaning for FloatRegisters, could we rename that one instead and use
> Code instead of Encoding? code() is used a lot in these files...

code() is used for 2 things, either the encoding of the register in the assembler or the index of the register-content in the SetType.

In both case, I think that the name is terrible, but changing the encoding one first is the most practical and pragmatic one, as this only affect one platform (12 files, 290 occurrences), as opposed to changing the index, implies adding this to all platforms (53 files, 686 occurrences).

Also, I agree that mixing Register::code and FloatRegister::encoding is not nice, but we do not yet make use of any aliased representation of GPR (only doing optimizations by accident, with byte registers).

So even if I think your point is right, and that code() is a terrible name, and that having it used for 2 different things is bad.  In the mean time, I think the encoding() function (which is enforced by the type system) is one step in the right direction.

Also, the type system refuse to compile when we are trying to use code() where the enumerated register encoding is expected.  Doing the opposite does not help to catch missing places, as the type system does not warn when an enumerated value is assigned to an integer type.
(In reply to Dan Gohman [:sunfish] from comment #39)
> (In reply to Nicolas B. Pierron [:nbp] from comment #16)
> > Created attachment 8565645 [details] [diff] [review]
> > part 14 - Add types to x86/x64 float registers.
> 
> Several things in register allocation are proportional to the number of
> physical registers. For example, the Backtracking allocator's registers
> array will hold 4 times as many splay trees with this patch. Conceptually,
> on x86/x64/ARM64, only one type of a register is active at any given time,
> so the register allocator shouldn't have to treat each register*type as a
> separate register. Is there a way to do that within this new framework?

I do not know much about the register allocator implementation.  I guess the functions "alignedAliased" and "numAlignedAliased" can be used to unify the registers independently of their type.  Still note that on ARM32, AlignedAliased registers might still have additional aliased registers.

  |    d3   |
  | s5 | s6 |

d3, s6 are aligned & aliased
d3, s6, s5 are aliased
Comment on attachment 8565632 [details] [diff] [review]
part 5 - Distinguish between the FloatRegister code and the encoding on x86 & x64.

Asking again for review, since I think I addressed your questions in comment 40.
Attachment #8565632 - Flags: review?(jdemooij)
Comment on attachment 8565633 [details] [diff] [review]
part 6 - RegisterSets: takeAny should take one register and all aliases of it.

Review of attachment 8565633 [details] [diff] [review]:
-----------------------------------------------------------------

Can you make sure that the world doesn't burn to the ground when you use the stupid allocator? or get word that we simply don't care about it anymore.
Attachment #8565633 - Flags: review?(mrosenberg) → review+
Comment on attachment 8565635 [details] [diff] [review]
part 7 - Add common architecture functions to query/convert a register type.

Review of attachment 8565635 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR.h
@@ -491,5 @@
>          return type() == INT32X4 || type() == FLOAT32X4;
>      }
>      bool isCompatibleReg(const AnyRegister &r) const {
>          if (isFloatReg() && r.isFloat()) {
> -#if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_MIPS)

so glad to see this go.

::: js/src/jit/arm/Architecture-arm.h
@@ +387,5 @@
>      VFPRegister sintOverlay(unsigned int which = 0) const;
>      VFPRegister uintOverlay(unsigned int which = 0) const;
>  
> +    VFPRegister asSingle() const { return singleOverlay(); }
> +    VFPRegister asDouble() const { return doubleOverlay(); }

My personal preference would be to only have these, and get rid of the overlay functions.  This can wait for another patch, but please file a bug for it (you can even assign it to me)

::: js/src/jit/none/Architecture-none.h
@@ +92,5 @@
> +    bool isFloat32x4() const { MOZ_CRASH(); }
> +    FloatRegister asSingle() const { MOZ_CRASH(); }
> +    FloatRegister asDouble() const { MOZ_CRASH(); }
> +    FloatRegister asInt32x4() const { MOZ_CRASH(); }
> +    FloatRegister asFloat32x4() const { MOZ_CRASH(); }

can we just have a #define that expands to Foo bar() const { MOZ_CRASH(); }

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +1371,2 @@
>      if (ool->needFloat32Conversion()) {
> +        MOZ_ASSERT(src.isSingle());

this seems wrong.  Why does srcSingle exist if you're explicitly checking that src is a single?  I realize that this will simply always return true on x86 and x64, but I bet it'll break on MIPS.

::: js/src/jit/shared/MoveEmitter-x86-shared.cpp
@@ +427,5 @@
>  MoveEmitterX86::emitFloat32Move(const MoveOperand &from, const MoveOperand &to)
>  {
> +    MOZ_ASSERT_IF(from.isFloatReg(), from.floatReg().isSingle());
> +    MOZ_ASSERT_IF(to.isFloatReg(), to.floatReg().isSingle());
> +

Does it ever make sense for the source and dest to have different sizes?
the MoveEmitter shouldn't be doing conversions.

::: js/src/jit/x64/Assembler-x64.cpp
@@ +88,5 @@
>              stackOffset_ += sizeof(uint64_t);
>              break;
>          }
> +        if (type == MIRType_Float32)
> +            current_ = ABIArg(FloatArgRegs[floatRegIndex_++].asSingle());

if it looks like this style of if (or switch) is becoming endemic, I'd recommend having a simple as(MIRType t), which coerces to the given type
Attachment #8565635 - Flags: review?(mrosenberg) → review+
Attachment #8565641 - Flags: review?(mrosenberg) → review+
(In reply to Marty Rosenberg [:mjrosenb] from comment #44)
> Comment on attachment 8565635 [details] [diff] [review]
> part 7 - Add common architecture functions to query/convert a register type.
> 
> Review of attachment 8565635 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/none/Architecture-none.h
> @@ +92,5 @@
> > +    bool isFloat32x4() const { MOZ_CRASH(); }
> > +    FloatRegister asSingle() const { MOZ_CRASH(); }
> > +    FloatRegister asDouble() const { MOZ_CRASH(); }
> > +    FloatRegister asInt32x4() const { MOZ_CRASH(); }
> > +    FloatRegister asFloat32x4() const { MOZ_CRASH(); }
> 
> can we just have a #define that expands to Foo bar() const { MOZ_CRASH(); }

I am not sure this brings much value, as the goal of this file is to quickly identify what is missing to get it compiling again.

> ::: js/src/jit/shared/CodeGenerator-shared.cpp
> @@ +1371,2 @@
> >      if (ool->needFloat32Conversion()) {
> > +        MOZ_ASSERT(src.isSingle());
> 
> this seems wrong.  Why does srcSingle exist if you're explicitly checking
> that src is a single?  I realize that this will simply always return true on
> x86 and x64, but I bet it'll break on MIPS.

I added this to get x86/x64 working after landing part 14.  I cannot test this on MIPS as the MIPS backend no longer compile.

> ::: js/src/jit/shared/MoveEmitter-x86-shared.cpp
> @@ +427,5 @@
> >  MoveEmitterX86::emitFloat32Move(const MoveOperand &from, const MoveOperand &to)
> >  {
> > +    MOZ_ASSERT_IF(from.isFloatReg(), from.floatReg().isSingle());
> > +    MOZ_ASSERT_IF(to.isFloatReg(), to.floatReg().isSingle());
> > +
> 
> Does it ever make sense for the source and dest to have different sizes?
> the MoveEmitter shouldn't be doing conversions.

No, the move emitter should be consistent, and I added these assertions to make sure that we are not doing any mistakes when we land part 14.

> ::: js/src/jit/x64/Assembler-x64.cpp
> @@ +88,5 @@
> >              stackOffset_ += sizeof(uint64_t);
> >              break;
> >          }
> > +        if (type == MIRType_Float32)
> > +            current_ = ABIArg(FloatArgRegs[floatRegIndex_++].asSingle());
> 
> if it looks like this style of if (or switch) is becoming endemic, I'd
> recommend having a simple as(MIRType t), which coerces to the given type

These should only be used in rare cases, and most likely in the Assembler files, where we do not have any MIRTypes.  These instances are only the few exceptions which are used to ensure that we are not doing any mistakes in some move emitter code.

Also, using this function is a sign of bad smell of register allocation, so I don't think make it more convenient is a good idea.
(In reply to Benjamin Bouvier [:bbouvier] from comment #26)
> Comment on attachment 8565637 [details] [diff] [review]
> part 9 - PushRegsInMask no longer assumes that any FloatRegister is 8 bytes.
> 
> Review of attachment 8565637 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/shared/MacroAssembler-x86-shared.cpp
> @@ +51,5 @@
> >      for (FloatRegisterBackwardIterator iter(simdSet); iter.more(); iter++) {
> >          diffF -= Simd128DataSize;
> >          numSimd -= 1;
> >          // XXX how to choose the right move type?
> >          storeUnalignedInt32x4(*iter, Address(StackPointer, diffF));
> 
> Can you also fix this XXX please, while you're here?

I tried, but this did not work as (some?) registers of the simdSet are added as Double register, and not Int32x4 / Float32x4. As these would be removed later I did not investigate more.
Attachment #8566654 - Flags: review?(mrosenberg) → review+
This patch is made such that we will not attempt to push SIMD register which
are listed in statically compiled register mask, such as
FloatRegisters::AllMask and FloatRegisters::VolatileMask.

On the other hand, we have to properly handle PushBailoutFrame, such that we
push the correct size, as expected by MachineState::FromBailout, which
depends on the size of FloatRegisters::RegisterContent, which is also a
statically known constant.

This approach let us be optimal on any PushRegsInMask, while avoiding adding
too many branches for the bailout code.
Attachment #8567110 - Flags: review?(benj)
Comment on attachment 8567110 [details] [diff] [review]
part 18 - PushRegsInMask: Do not spill SIMD register if there is no support.

Review of attachment 8567110 [details] [diff] [review]:
-----------------------------------------------------------------

Two questions I'd like to see answered, to understand the whole thing.

::: js/src/jit/shared/MacroAssembler-x86-shared.cpp
@@ +59,5 @@
>      MOZ_ASSERT(diffF == 0);
>  }
>  
>  void
>  MacroAssembler::PopRegsInMaskIgnore(RegisterSet set, RegisterSet ignore, FloatRegisterSet simdSet)

Why PopRegsInMaskIgnore, while the same is done on PushRegsInMask? Either we should do on both *Ignore variants, or on the 4 variants?

A comment here would be appreciated, if that's the only right way to do it.

::: js/src/jit/x64/Assembler-x64.cpp
@@ +309,5 @@
>  {
> +    if (JitSupportsSimd())
> +        return s;
> +
> +    // Ignore all SIMD register.

nit: registers

::: js/src/jit/x64/Trampoline-x64.cpp
@@ +523,5 @@
> +        for (FloatRegisterBackwardIterator iter(set.fpus()); iter.more(); iter++) {
> +            FloatRegister reg = *iter;
> +            Address spillAddress(StackPointer, reg.getRegisterDumpOffsetInBytes());
> +            masm.storeDouble(reg, spillAddress);
> +        }

Is it possible to just select the scalar registers (as done in ReduceSetForPush), and then construct a RegisterSet(AllGPRs, set-with-scalars-only)?
(In reply to Benjamin Bouvier [:bbouvier] from comment #49)
> Comment on attachment 8567110 [details] [diff] [review]
> part 18 - PushRegsInMask: Do not spill SIMD register if there is no support.
> 
> Review of attachment 8567110 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Two questions I'd like to see answered, to understand the whole thing.
> 
> ::: js/src/jit/shared/MacroAssembler-x86-shared.cpp
> @@ +59,5 @@
> >      MOZ_ASSERT(diffF == 0);
> >  }
> >  
> >  void
> >  MacroAssembler::PopRegsInMaskIgnore(RegisterSet set, RegisterSet ignore, FloatRegisterSet simdSet)
> 
> Why PopRegsInMaskIgnore, while the same is done on PushRegsInMask? Either we
> should do on both *Ignore variants, or on the 4 variants?
> 
> A comment here would be appreciated, if that's the only right way to do it.

AFAIK, all variants are using these 2 as a back-end.  So if we do it for these 2, then all variants have it too.

> ::: js/src/jit/x64/Trampoline-x64.cpp
> @@ +523,5 @@
> > +        for (FloatRegisterBackwardIterator iter(set.fpus()); iter.more(); iter++) {
> > +            FloatRegister reg = *iter;
> > +            Address spillAddress(StackPointer, reg.getRegisterDumpOffsetInBytes());
> > +            masm.storeDouble(reg, spillAddress);
> > +        }
> 
> Is it possible to just select the scalar registers (as done in
> ReduceSetForPush), and then construct a RegisterSet(AllGPRs,
> set-with-scalars-only)?

Yes, but it does not bring any value, as the iterator iterates over the largest register first, and unconditionally remove all aliased registers at the same time.  So the only benefit of it would be equivalent to doing

  FloatRegister reg = (*iter).asDouble();

and for the moment it does not matter much as the storeDouble function does not assert that we the right register type for the moment.
Depends on: 1135190
Duplicate of this bug: 1112166
Comment on attachment 8565632 [details] [diff] [review]
part 5 - Distinguish between the FloatRegister code and the encoding on x86 & x64.

Review of attachment 8565632 [details] [diff] [review]:
-----------------------------------------------------------------

OK, seems reasonable, but can you either rename Register::code() as well, for consistency, or file a follow-up bug to do that?
Attachment #8565632 - Flags: review?(jdemooij) → review+
Comment on attachment 8567110 [details] [diff] [review]
part 18 - PushRegsInMask: Do not spill SIMD register if there is no support.

Review of attachment 8567110 [details] [diff] [review]:
-----------------------------------------------------------------

(comments to x64 apply to x86 as well, of course).

r=me with a change to the comment.  Please let me know of the chosen wording before landing.

::: js/src/jit/x64/Trampoline-x64.cpp
@@ +513,5 @@
> +        masm.PushRegsInMask(AllRegs);
> +    } else {
> +        // If SIMD is not supported, then PushRegsInMask does not correctly
> +        // spill all registers as expected by the RegisterDump structure, as we
> +        // skip the SIMD part of the register part.

Please rephrase the last part of this comment (from "as we skip the SIMD part of the register part"). I am not even sure of what do you mean here: do you mean the two higher lanes of the registers (which are SIMD specific)?

After discussing with you on irc, I think I get it: PushRegsInMask will call reduceSetForPush, which will force all xmm regs to have double size (rather than 16 bytes), so in the PushRegsInMask loop, the size of the register won't match the size expected by the RegisterDump (16 bytes).
How about:

// When SIMD isn't supported, PushRegsInMask reduces the set of float registers to be double-sized, while the RegisterDump expects each of the float registers to have the maximal possible size (viz., Simd128DataSize). To work around this, we just spill the double registers by hand here, using the register dump offset directly.


Thinking about it, it just seems like PushRegsInMask is used for something it shouldn't achieve, or that its API is wrong. Maybe file a follow-up about this?

::: js/src/jit/x86/Trampoline-x86.cpp
@@ +507,5 @@
> +        masm.PushRegsInMask(AllRegs);
> +    } else {
> +        // If SIMD is not supported, then PushRegsInMask does not correctly
> +        // spill all registers as expected by the RegisterDump structure, as we
> +        // skip the SIMD part of the register part.

ditto
Attachment #8567110 - Flags: review?(benj) → review+
Blocks: 1135629
Blocks: 1134626
Depends on: 1135722
As opposed to the backtracking allocator, LSRA relies on the RegisterSet to
determine what registers should be spilled or not.  As part 6 ensures that
iterators are only visiting one of the aliased register, then we have to
walk the aliased registers when we are spilling registers around a
call-site.
Attachment #8569173 - Flags: review?(bhackett1024)
Comment on attachment 8569173 [details] [diff] [review]
part 19 - LSRA should spill aliased registers around call-sites. r=

Review of attachment 8569173 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/ion/lsra-bug1112164.js
@@ +27,5 @@
> +          var ref = reference(x, y, z, w);
> +          var gen = generator(x, y, z, w);
> +/*
> +          print();
> +          print(x, y, z, w);

note: I removed these lines from the final patch.
Comment on attachment 8569173 [details] [diff] [review]
part 19 - LSRA should spill aliased registers around call-sites. r=

Review of attachment 8569173 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LiveRangeAllocator.cpp
@@ +650,5 @@
> +                        AnyRegister reg(*iter);
> +                        for (size_t i = 0; i < reg.numAliased(); i++) {
> +                            if (!addFixedRangeAtHead(reg.aliased(i), inputOf(*ins), outputOf(*ins)))
> +                                return false;
> +                        }

The for loop enclosing this goes through all allocatable registers, so we should be adding fixed ranges for these aliased registers already, right?
Attachment #8569173 - Flags: review?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #56)
> Comment on attachment 8569173 [details] [diff] [review]
> part 19 - LSRA should spill aliased registers around call-sites. r=
> 
> Review of attachment 8569173 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The for loop enclosing this goes through all allocatable registers, so we
> should be adding fixed ranges for these aliased registers already, right?

It goes through all allocatable registers, yes, but it only visit them once for all typed-aliased, as aliased registers are filtered out when iterating.  But LSRA does not consider that spilling one register is the same as spilling all aliased registers.

Do you have a pointer where this could be fixed in a better way?
(In reply to Nicolas B. Pierron [:nbp] from comment #57)
> It goes through all allocatable registers, yes, but it only visit them once
> for all typed-aliased, as aliased registers are filtered out when iterating.
> But LSRA does not consider that spilling one register is the same as
> spilling all aliased registers.

OK, that's fine, but looking through the code I don't see where the aliased registers are filtered out.  allRegisters_ looks like it includes registers which are aliases of each other, and AnyRegisterIterator::operator++ looks like it will end up calling take() on the set rather than takeAllAliased().
(In reply to Brian Hackett (:bhackett) from comment #58)
> (In reply to Nicolas B. Pierron [:nbp] from comment #57)
> > It goes through all allocatable registers, yes, but it only visit them once
> > for all typed-aliased, as aliased registers are filtered out when iterating.
> > But LSRA does not consider that spilling one register is the same as
> > spilling all aliased registers.
> 
> OK, that's fine, but looking through the code I don't see where the aliased
> registers are filtered out.  allRegisters_ looks like it includes registers
> which are aliases of each other, and AnyRegisterIterator::operator++ looks
> like it will end up calling take() on the set rather than takeAllAliased().

This is changed in part 6, you can also see that in the folded patch.  Otherwise we would have a miss-match between the number of physical registers in and the number of visited registers, and we would be spilling 4 times as many registers in pushRegsInMask.
Comment on attachment 8569173 [details] [diff] [review]
part 19 - LSRA should spill aliased registers around call-sites. r=

Re-ask for review based on comment 58 & comment 59.
Attachment #8569173 - Flags: review?(bhackett1024)
Comment on attachment 8569173 [details] [diff] [review]
part 19 - LSRA should spill aliased registers around call-sites. r=

Review of attachment 8569173 [details] [diff] [review]:
-----------------------------------------------------------------

OK, can you add a comment describing what's going on?
Attachment #8569173 - Flags: review?(bhackett1024) → review+
Try on top of the latest mozilla-central:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab752feb14ce

Push to inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f7ae9936181
https://hg.mozilla.org/integration/mozilla-inbound/rev/685f083bb01c
https://hg.mozilla.org/integration/mozilla-inbound/rev/037b0e802178
https://hg.mozilla.org/integration/mozilla-inbound/rev/9a3fd2e8ec2e
https://hg.mozilla.org/integration/mozilla-inbound/rev/660504e5426f
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c7c70386b55
https://hg.mozilla.org/integration/mozilla-inbound/rev/d04bc7d09a01
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ac5996b7002
https://hg.mozilla.org/integration/mozilla-inbound/rev/567de2624609
https://hg.mozilla.org/integration/mozilla-inbound/rev/77391d5065c2
https://hg.mozilla.org/integration/mozilla-inbound/rev/2fdc357ee1d5
https://hg.mozilla.org/integration/mozilla-inbound/rev/30ca8c21566d
https://hg.mozilla.org/integration/mozilla-inbound/rev/6715f22f55a6
https://hg.mozilla.org/integration/mozilla-inbound/rev/aef0a69e6623
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdc1a9758b94
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef0b0e3d8938
https://hg.mozilla.org/integration/mozilla-inbound/rev/5598dd323a08
https://hg.mozilla.org/integration/mozilla-inbound/rev/9af78b23b43e
https://hg.mozilla.org/integration/mozilla-inbound/rev/e1714343f9d4
https://hg.mozilla.org/integration/mozilla-inbound/rev/71a08ff0d27c
Attachment #8565645 - Flags: feedback?(mrosenberg)
Attachment #8565654 - Attachment is obsolete: true
Blocks: 1137688
Depends on: 1138141
Depends on: 1140801
You need to log in before you can comment on or make changes to this bug.