Closed Bug 991153 Opened 10 years ago Closed 10 years ago

ARM full Float32 support

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: mjrosenb, Assigned: mjrosenb)

References

Details

Attachments

(15 files, 17 obsolete files)

87.14 KB, patch
nbp
: review+
Details | Diff | Splinter Review
26.24 KB, patch
djvj
: review+
Details | Diff | Splinter Review
16.79 KB, patch
jandem
: review+
Details | Diff | Splinter Review
5.74 KB, patch
bhackett1024
: review+
Details | Diff | Splinter Review
3.15 KB, patch
nbp
: review+
Details | Diff | Splinter Review
17.26 KB, patch
nbp
: review+
Details | Diff | Splinter Review
86.47 KB, patch
jandem
: review-
Details | Diff | Splinter Review
19.36 KB, patch
jandem
: review+
Details | Diff | Splinter Review
22.81 KB, patch
jandem
: review+
Details | Diff | Splinter Review
13.18 KB, patch
Details | Diff | Splinter Review
13.73 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
62.33 KB, patch
Details | Diff | Splinter Review
92.70 KB, patch
jandem
: review+
Details | Diff | Splinter Review
58.08 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
1.52 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
Attached patch splitFloatReg-r0.patch (obsolete) — Splinter Review
      No description provided.
Attachment #8400740 - Flags: review?(jdemooij)
Lots of changes, mostly alpha variance.
Attachment #8400741 - Flags: review?(nicolas.b.pierron)
Attached patch fixLoweringFloats-r0.patch (obsolete) — Splinter Review
x86, x64, arm compile, not tested yet.  should only affect codegen on arm.
Attachment #8400742 - Flags: review?(kvijayan)
I'll explain what this is all about in a bit.  I need to run to the airport now.
Ok, currently, on arm, and mips, we only use half of the available floating point registers, because s0 and s1 both alias d0, and the register allocator, and engine at large don't know how to deal with this.  This is more than just a perf issue, since the armhf abi asks for single values to be passed in both even and odd single registers.  The patches currently attached are mostly just preparation for teaching the allocator how to deal with floating point values.  Currently codegen should not be affected in any way shape or form.
Comment on attachment 8400741 [details] [diff] [review]
addScratchFloat-r0.patch

Review of attachment 8400741 [details] [diff] [review]:
-----------------------------------------------------------------

follow-up: Add DoubleReg{0,1} to replace some (all) FloatReg{0,1}
follow-up: Add ToDoubleRegister and toFloat32Register to assert the register type of the allocation.

::: js/src/jit/CodeGenerator.cpp
@@ +331,3 @@
>      masm.unboxDouble(operand, output);
>      masm.convertDoubleToFloat32(output, output);
> +#endif

Keep the ARM version for all platforms.

nit: MOZ_ASSERT(!output.isDouble());

::: js/src/jit/IonMacroAssembler.cpp
@@ +549,2 @@
>          // Move the fixed point value into an integer register
> +        as_vxfer(output, InvalidReg, ScratchFloat32Reg.uintOverlay(), FloatToCore);

ScratchFloat32Reg -> ScratchDoubleReg

::: js/src/jit/arm/Assembler-arm.h
@@ +95,5 @@
>  static MOZ_CONSTEXPR_VAR Register ReturnReg = r0;
> +static MOZ_CONSTEXPR_VAR FloatRegister ReturnFloat32Reg = { FloatRegisters::d0 };
> +static MOZ_CONSTEXPR_VAR FloatRegister ReturnDoubleReg = { FloatRegisters::d0 };
> +static MOZ_CONSTEXPR_VAR FloatRegister ScratchFloat32Reg = { FloatRegisters::d15 };
> +static MOZ_CONSTEXPR_VAR FloatRegister ScratchDoubleReg = { FloatRegisters::d15 };

static_assert(!ScratchFloat32Reg.isDouble(), "Should be defined as a Float32 Register");
static_assert(ScratchDoubleReg.isDouble(),   "Should be defined as a Double Register");

And do the same thing for ReturnFloat32Register and ReturnDoubleRegister.  The reason being that these register might flow in a function which is dynamically checking the type of the FloatRegister, especially if they are pre-allocated in the lowering.

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +1365,5 @@
>  
>      FloatRegister reg = ToFloatRegister(in);
>      if (box->type() == MIRType_Float32) {
> +        masm.convertFloat32ToDouble(reg, ScratchFloat32Reg);
> +        reg = ScratchFloat32Reg;

ScratchFloat32Reg -> ScratchDoubleReg, as reg is the output register.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +3727,5 @@
>      switch (result) {
>        case MoveOp::DOUBLE:
>          if (!useHardFpABI()) {
>              // Move double from r0/r1 to ReturnFloatReg.
> +            as_vxfer(r0, r1, ReturnDoubleReg, CoreToFloat);

nit: Update the comment too.

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +532,5 @@
> +            scratch = ScratchDoubleReg;
> +        else
> +            scratch = ScratchFloat32Reg;
> +#else
> +        scratch = ScratchFloat32Reg;

ScratchDoubleReg

@@ +535,5 @@
> +#else
> +        scratch = ScratchFloat32Reg;
> +#endif
> +        masm.loadDouble(dump, scratch);
> +        masm.branchDouble(Assembler::DoubleNotEqual, scratch, reg, failure_);

This 2 lines should be part of the reg.isDouble() branch.
Attachment #8400741 - Flags: review?(nicolas.b.pierron) → review+
Attached patch fixFloat32-r3.patch (obsolete) — Splinter Review
The bulk of the patch is here.  This adds the rest of the machinery needed for proper float32 support, and turns it on in the register allocators.
Attachment #8403031 - Flags: review?(jdemooij)
Attachment #8403031 - Flags: review?(bhackett1024)
(In reply to Nicolas B. Pierron [:nbp] from comment #5)

> ::: js/src/jit/CodeGenerator.cpp
> @@ +331,3 @@
> >      masm.unboxDouble(operand, output);
> >      masm.convertDoubleToFloat32(output, output);
> > +#endif
> 
> Keep the ARM version for all platforms.
> 
I added this because x86/x64 throws an asertion if you ask to unbox into the scratch register
> nit: MOZ_ASSERT(!output.isDouble());
Currently, x* doesn't track the bitiness of registers,this would require it to be added.  shall I add in this box of worms?
> 
> ::: js/src/jit/IonMacroAssembler.cpp
> @@ +549,2 @@
> >          // Move the fixed point value into an integer register
> > +        as_vxfer(output, InvalidReg, ScratchFloat32Reg.uintOverlay(), FloatToCore);
> 
> ScratchFloat32Reg -> ScratchDoubleReg
> 
It should be the same thing, any particular reason for this? I chose the Float32 version because it has the same number of bits.
> ::: js/src/jit/arm/Assembler-arm.h
> @@ +95,5 @@
> >  static MOZ_CONSTEXPR_VAR Register ReturnReg = r0;
> > +static MOZ_CONSTEXPR_VAR FloatRegister ReturnFloat32Reg = { FloatRegisters::d0 };
> > +static MOZ_CONSTEXPR_VAR FloatRegister ReturnDoubleReg = { FloatRegisters::d0 };
> > +static MOZ_CONSTEXPR_VAR FloatRegister ScratchFloat32Reg = { FloatRegisters::d15 };
> > +static MOZ_CONSTEXPR_VAR FloatRegister ScratchDoubleReg = { FloatRegisters::d15 };
> 
> static_assert(!ScratchFloat32Reg.isDouble(), "Should be defined as a Float32
> Register");
> static_assert(ScratchDoubleReg.isDouble(),   "Should be defined as a Double
> Register");
> 
With only these patches in place, basically everything still assumes that only double registers exist, so the scratch registers are all doubles for the time being.  That is fixed in the fourth patch.  I'll add those static asserts there.

> And do the same thing for ReturnFloat32Register and ReturnDoubleRegister. 
> The reason being that these register might flow in a function which is
> dynamically checking the type of the FloatRegister, especially if they are
> pre-allocated in the lowering.
> 
ditto.
> 
> @@ +535,5 @@
> > +#else
> > +        scratch = ScratchFloat32Reg;
> > +#endif
> > +        masm.loadDouble(dump, scratch);
> > +        masm.branchDouble(Assembler::DoubleNotEqual, scratch, reg, failure_);
> 
> This 2 lines should be part of the reg.isDouble() branch.
Now that I look at it, we should probably not be calling loadDouble at all on something that may be a 32 bit float?  Or can this just never be a float32?
Comment on attachment 8400740 [details] [diff] [review]
splitFloatReg-r0.patch

Review of attachment 8400740 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay. This makes sense I think, just a bunch of style nits before I take another look.

::: js/src/jit/arm/Architecture-arm.h
@@ +268,5 @@
> +    constexpr VFPRegister(bool b, bool c, bool d, bool e)
> +      : kind(Double), code_(Code(0)),_isInvalid(false), _isMissing(b)
> +    { }
> +    constexpr VFPRegister(Code id)
> +        : kind(Double), code_(id), _isInvalid(false), _isMissing(false)

Nit: 2 spaces less like the previous one. Same for the first two constructors.

@@ +289,5 @@
> +
> +    VFPRegister doubleOverlay(int which = 0) const;
> +    VFPRegister singleOverlay(int which = 0) const;
> +    VFPRegister sintOverlay(int which = 0) const;
> +    VFPRegister uintOverlay(int which = 0) const;

I think we should change the type of "which" to unsigned, then you don't need the >= 0 asserts.

@@ +305,5 @@
> +
> +        VFPRegIndexSplit (uint32_t block_, uint32_t bit_)
> +          : block(block_), bit(bit_)
> +        {
> +            JS_ASSERT (block == block_);

Nit: remove space between JS_ASSERT and (.

@@ +329,5 @@
> +    bool volatile_() const {
> +        if (isDouble())
> +            return !!((1 << (code_ >> 1)) & FloatRegisters::VolatileMask);
> +        else
> +            return !!((1 << code_) & FloatRegisters::VolatileMask);

Nit: no else after return.

@@ +336,5 @@
> +        return FloatRegisters::GetName(code_);
> +    }
> +    bool operator != (const VFPRegister &other) const {
> +        // if the kinds are different, the registers must be different
> +        // right?

Nit: remove this comment.

@@ +344,5 @@
> +        if (kind == other.kind) {
> +            return code_ == other.code_;
> +        } else {
> +            return doubleOverlay() == other.doubleOverlay();
> +        }

Nit: no {}, no-else-after-return.

@@ +362,5 @@
> +            if (code_ < 16) {
> +                JS_ASSERT(a <= 2);
> +                return singleOverlay(a-1);
> +            }
> +            MOZ_ASSUME_UNREACHABLE();

Nit: I think it's a bit simpler to assert code_ > 16, so that you don't have this unreachable block.

@@ +385,5 @@
> +            if (code_ < 16) {
> +                JS_ASSERT(a <= 1);
> +                return singleOverlay(a-1);
> +            }
> +            MOZ_ASSUME_UNREACHABLE();

Same here.

@@ +404,5 @@
> +class TypedRegisterSet;
> +TypedRegisterSet<FloatRegister> reduceSetForPush(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getPushSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getRegisterDumpOffsetInBytes(FloatRegister reg);

I think these should be class members instead of separate methods.

::: js/src/jit/arm/Assembler-arm.cpp
@@ +1192,3 @@
>  {
>      JS_ASSERT(!_isInvalid);
>      if (kind != Double) {

Nit: no {}

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +1613,5 @@
>  MacroAssemblerARM::ma_vxfer(Register src1, Register src2, FloatRegister dest, Condition cc)
>  {
>      as_vxfer(src1, src2, VFPRegister(dest), CoreToFloat, cc);
>  }
> +#if 0

Nit: remove #if 0 code.

::: js/src/jit/x64/Architecture-x64.h
@@ +158,5 @@
>      static const uint32_t NonAllocatableMask =
>          (1 << JSC::X86Registers::xmm15);    // This is ScratchFloatReg.
>  
>      static const uint32_t AllocatableMask = AllMask & ~NonAllocatableMask;
>  };

Nit: add an empty line after this line.

::: js/src/jit/x86/Architecture-x86.h
@@ +136,5 @@
>      static const uint32_t NonAllocatableMask =
>          (1 << JSC::X86Registers::xmm7);
>  
>      static const uint32_t AllocatableMask = AllMask & ~NonAllocatableMask;
>  };

Nit: add an extra newline here.
Attachment #8400740 - Flags: review?(jdemooij)
Comment on attachment 8403031 [details] [diff] [review]
fixFloat32-r3.patch

Review of attachment 8403031 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry but I think this patch is too big and should be split up. Regalloc, bailouts and safepoints are all pretty complicated and this patch touches all of that (and more). Also, landing such changes incrementally helps bisecting etc.

There's also some debugging code left in this patch and many style problems; below I mentioned some of them, but please also fix the similar ones elsewhere.

::: js/src/jit/MoveResolver.h
@@ +95,5 @@
> +            return false;
> +        if (kind_ == FLOAT_REG) {
> +            return floatReg().aliases(other.floatReg());
> +        } else if (code_ != other.code_)
> +            return false;

No {}, no else-after-return.

::: js/src/jit/RegisterAllocator.cpp
@@ +270,4 @@
>              safepoint->addLiveRegister(reg);
> +#ifdef CHECK_OSIPOINT_REGISTERS
> +            //            safepoint->addTempRegister(reg);
> +#endif

Remove.

::: js/src/jit/RegisterSets.h
@@ +89,5 @@
> +        if (isFloat()) {
> +            ret = AnyRegister(fpu().aliased(a));
> +        } else {
> +            ret = AnyRegister(gpr().aliased(a));
> +        }

Nit: no {}

@@ +104,5 @@
> +            return fpu().aliases(other.fpu());
> +        } else if(!isFloat() && !other.isFloat()) {
> +            return gpr().aliases(other.gpr());
> +        }
> +        return false;

Nit: no {}, no else-after-return.

@@ +376,5 @@
>      static inline TypedRegisterSet NonVolatile() {
>          return TypedRegisterSet(T::Codes::AllocatableMask & T::Codes::NonVolatileMask);
>      }
>      bool has(T reg) const {
> +        // when checking to see if a set has a register, we only want that exact

s/when/When

@@ +387,5 @@
>      void add(T reg) {
> +        // make sure we don't add two overlapping registers.
> +        for (int a = 0; a < reg.numAliased(); a++) {
> +            JS_ASSERT(!has(reg.aliased(a)));
> +        }

Nit: no {}, add #ifdef DEBUG around this loop just to be sure (and it's nice when reading the code to see what is debug code and what is not)

@@ +420,5 @@
>      }
> +    void takeAllUnchecked(T reg) {
> +        for (int a = 0; a < reg.numAliased(); a++) {
> +            bits_ &= ~(1ll<<reg.aliased(a).code());
> +        }

No {}

::: js/src/jit/Registers.h
@@ +28,2 @@
>      Code code_;
> +    

Trailing whitespace (if you install the hg color extension, qdiff will show you trailing whitespace)

::: js/src/jit/StupidAllocator.cpp
@@ +79,2 @@
>              registers[registerCount++].reg = AnyRegister(remainingRegisters.takeGeneral());
> +            fprintf(stderr, "%d -> %s\n", registerCount-1, registers[registerCount-1].reg.name());

Remove.

@@ +83,2 @@
>              registers[registerCount++].reg = AnyRegister(remainingRegisters.takeFloat());
> +            fprintf(stderr, "%d -> %s\n", registerCount-1, registers[registerCount-1].reg.name());

Ditto.

@@ +352,2 @@
>                  RegisterIndex existing = findExistingRegister(vreg);
> +                if (existing != UINT32_MAX) {

No {}

::: js/src/jit/StupidAllocator.h
@@ +19,5 @@
>      static const uint32_t MAX_REGISTERS = Registers::Allocatable + FloatRegisters::Allocatable;
>      static const uint32_t MISSING_ALLOCATION = UINT32_MAX;
>  
>      struct AllocatedRegister {
> +        // as far as I can tell, this is static.

Remove this comment, I think.

::: js/src/jit/arm/Architecture-arm.cpp
@@ +289,5 @@
>  
>      return Invalid;
>  }
> +uint32_t
> +getSizeInBytes(const FloatRegisterSet &s)

Wasn't this also in the other patch I reviewed?

::: js/src/jit/arm/Architecture-arm.h
@@ +259,5 @@
> +    // How many intervals can a single register force invalidation of?
> +    // the number is either two or three.  A single point can force two evictions (e.g. d0 can evict s0 and s1)
> +    // Since it was initially assumed to be 1, I'm going to assume this is the proper use.
> +    static const uint32_t MaxAliasedIntervals = 2;
> +    

Remove trailing whitespace.

::: js/src/jit/arm/Assembler-arm.h
@@ +16,5 @@
>  #include "jit/CompactBuffer.h"
>  #include "jit/IonCode.h"
>  #include "jit/shared/Assembler-shared.h"
>  #include "jit/shared/IonAssemblerBufferWithConstantPools.h"
> +void dbg_break();

Remove.

@@ +21,2 @@
>  namespace js {
> +bnamespace jit {

Typo: namespace

@@ +1643,5 @@
>          JS_ASSERT(dtmActive);
>          dtmActive = false;
>          JS_ASSERT(dtmLastReg != -1);
>          dtmDelta = dtmDelta ? dtmDelta : 1;
> +        // the operand for the vstr/vldr instruction is the lowest register in the range

Capitalize, full stop at the end.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +3516,5 @@
>      if (!enoughMemory_)
>          return;
>      switch (type) {
> +      case MoveOp::FLOAT32: {
> +          FloatRegister fr;

Indent with 2 spaces less.

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +451,5 @@
>              finishFloatTransfer();
>          }
> +        // BAD ASSUMPTION: not all registers are the same size!
> +        //JS_ASSERT(offset == static_cast<int32_t>(set.size() * sizeof(double)) * sign);
> +        //fprintf(stderr, "offset calculated to be: %d\n", offset);

Remove this.
Attachment #8403031 - Flags: review?(jdemooij)
Comment on attachment 8400742 [details] [diff] [review]
fixLoweringFloats-r0.patch

Review of attachment 8400742 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to look over this again after comments addressed.

::: js/src/jit/LIR-Common.h
@@ +5031,4 @@
>  {
>    public:
>      LIR_HEADER(SetElementCacheV);
>  

I think this entire #define-based code splitting can be eliminated and redefined in terms of BogusTemp regs.

@@ +5035,5 @@
>      static const size_t Index = 1;
>      static const size_t Value = 1 + BOX_PIECES;
>  
>      LSetElementCacheV(const LAllocation &object, const LDefinition &tempToUnboxIndex,
> +                      const LDefinition &temp, const LDefinition &tempDouble,

With BogusTemp, this whould be the only constructor definition.

@@ +5041,5 @@
>      {
>          setOperand(0, object);
>          setTemp(0, tempToUnboxIndex);
>          setTemp(1, temp);
> +        setTemp(2, tempDouble);

With BogusTemp approach, |armtemp| body could be:

#ifdef JS_CODEGEN_ARM
  JS_ASSERT(false);
  return nullptr;
#else
  return getTemp(1);
#endif

::: js/src/jit/Lowering.cpp
@@ +2986,5 @@
> +#ifdef JS_CODEGEN_ARM
> +        lir = new(alloc()) LAssertRangeF(useRegister(input), tempFloat32(), tempFloat32());
> +#else
> +        lir = new(alloc()) LAssertRangeF(useRegister(input), tempDouble());
> +#endif

See comments about unifying this API and using BogusTemp instead.

::: js/src/jit/CodeGenerator.cpp
@@ +8242,3 @@
>      const Range *r = ins->range();
>  
> +    masm.convertFloat32ToDouble(input, dest);

I don't understand this.  If JS_CODEGEN_ARM, then |dest| will be a Float32 register.  Why is it the destination o a convertFloat32ToDouble operation then?
Attachment #8400742 - Flags: review?(kvijayan)
Not setting r? yet, since last time, I seem to have uploaded a stale patch... making sure I didn't do something like that again.
Attachment #8400742 - Attachment is obsolete: true
Comment on attachment 8403525 [details] [diff] [review]
fixLoweringFloats-r1.patch

Review of attachment 8403525 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86/Lowering-x86.h
@@ -65,5 @@
>  
> -    static bool allowFloat32Optimizations() {
> -        return true;
> -    }
> -

What? I'm like 90% sure I didn't mean to remove this.  Assume this hunk will be removed before I do anything else.
Attachment #8403525 - Flags: review?(kvijayan)
(In reply to Marty Rosenberg [:mjrosenb] from comment #7)
> (In reply to Nicolas B. Pierron [:nbp] from comment #5)
> 
> > ::: js/src/jit/CodeGenerator.cpp
> > @@ +331,3 @@
> > >      masm.unboxDouble(operand, output);
> > >      masm.convertDoubleToFloat32(output, output);
> > > +#endif
> > 
> > Keep the ARM version for all platforms.
> > 
> I added this because x86/x64 throws an asertion if you ask to unbox into the
> scratch register
> > nit: MOZ_ASSERT(!output.isDouble());
> Currently, x* doesn't track the bitiness of registers,this would require it
> to be added.  shall I add in this box of worms?

Oh, this is only on ARM?  It would be nice to have such type tracking on all platforms, but I guess we can do that as part of a follow-up patch.

> > 
> > ::: js/src/jit/IonMacroAssembler.cpp
> > @@ +549,2 @@
> > >          // Move the fixed point value into an integer register
> > > +        as_vxfer(output, InvalidReg, ScratchFloat32Reg.uintOverlay(), FloatToCore);
> > 
> > ScratchFloat32Reg -> ScratchDoubleReg
> > 
> It should be the same thing, any particular reason for this? I chose the
> Float32 version because it has the same number of bits.

All the above use of the Scratch registers are referring to the Double register, and the function is named clampDoubleToUint8.  I would expect that we use less register as possible (in case they were different)

> > ::: js/src/jit/arm/Assembler-arm.h
> > @@ +95,5 @@
> > >  static MOZ_CONSTEXPR_VAR Register ReturnReg = r0;
> > > +static MOZ_CONSTEXPR_VAR FloatRegister ReturnFloat32Reg = { FloatRegisters::d0 };
> > > +static MOZ_CONSTEXPR_VAR FloatRegister ReturnDoubleReg = { FloatRegisters::d0 };
> > > +static MOZ_CONSTEXPR_VAR FloatRegister ScratchFloat32Reg = { FloatRegisters::d15 };
> > > +static MOZ_CONSTEXPR_VAR FloatRegister ScratchDoubleReg = { FloatRegisters::d15 };
> > 
> > static_assert(!ScratchFloat32Reg.isDouble(), "Should be defined as a Float32
> > Register");
> > static_assert(ScratchDoubleReg.isDouble(),   "Should be defined as a Double
> > Register");
> > 
> With only these patches in place, basically everything still assumes that
> only double registers exist, so the scratch registers are all doubles for
> the time being.  That is fixed in the fourth patch.  I'll add those static
> asserts there.

Then I am not sure to understand what is the FloatRegister::isDouble() in VerifyOp, which I guess is inherited from the VFPRegister.

> > @@ +535,5 @@
> > > +#else
> > > +        scratch = ScratchFloat32Reg;
> > > +#endif
> > > +        masm.loadDouble(dump, scratch);
> > > +        masm.branchDouble(Assembler::DoubleNotEqual, scratch, reg, failure_);
> > 
> > This 2 lines should be part of the reg.isDouble() branch.
> Now that I look at it, we should probably not be calling loadDouble at all
> on something that may be a 32 bit float?  Or can this just never be a
> float32?

For the moment we do not distinguish the 2, but in the future we might want to specialize with the type to handle SIMDs.

http://dxr.mozilla.org/mozilla-central/source/js/src/jit/shared/CodeGenerator-shared.cpp#446
Comment on attachment 8403525 [details] [diff] [review]
fixLoweringFloats-r1.patch

Review of attachment 8403525 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonCaches.cpp
@@ +3732,1 @@
>          if (LIRGenerator::allowFloat32Optimizations()) {

Assert that tempFloat32 is not INVALID in this conditional.

::: js/src/jit/shared/Lowering-shared.h
@@ +195,5 @@
>      // Whether we can inline ForkJoinGetSlice.
>      static bool allowInlineForkJoinGetSlice() {
>          return false;
>      }
> +

Nit: whitespace on blank line.

::: js/src/jit/x64/Architecture-x64.h
@@ +191,5 @@
>  };
>  
> +// Arm/D32 has double registers that can NOT be treated as float32
> +// and this requires some dances in lowering.
> +static bool hasUnaliasedDouble() {

See comment in Architecture-x86.h

@@ +196,5 @@
> +    return false;
> +}
> +// On ARM, Dn aliases both S2n and S2n+1, so if you need to convert a float32
> +// to a double as a temporary, you need a temporary double register.
> +static bool hasMultiAlias() {

See comment in Architecture-x86.h

::: js/src/jit/x86/Architecture-x86.h
@@ +169,5 @@
>  };
>  
> +// Arm/D32 has double registers that can NOT be treated as float32
> +// and this requires some dances in lowering.
> +static bool hasUnaliasedDouble() {

Nit: Add note explaining that x86 has aliased doubles.

@@ +175,5 @@
> +}
> +
> +// On ARM, Dn aliases both S2n and S2n+1, so if you need to convert a float32
> +// to a double as a temporary, you need a temporary double register.
> +static bool hasMultiAlias() {

Nit: add note explaining that x86 double regs do not alias multiple float regs.
Attachment #8403525 - Flags: review?(kvijayan) → review+
Comment on attachment 8403031 [details] [diff] [review]
fixFloat32-r3.patch

Review of attachment 8403031 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling review per comment 9.
Attachment #8403031 - Flags: review?(bhackett1024)
Attached patch addRegallocMachinery-r0.patch (obsolete) — Splinter Review
Attachment #8410004 - Flags: review?(jdemooij)
Attachment #8410006 - Flags: review?(bhackett1024)
Attached patch updateForD32-r0.patch (obsolete) — Splinter Review
Attached patch changeSettypes-r0.patch (obsolete) — Splinter Review
Attached patch fixFloat32-r4.patch (obsolete) — Splinter Review
ok, I think this is split up enough (sans the safepoints changes, which are going to be nuked when I rebase), does this look sufficently small?
Attachment #8410009 - Flags: feedback?(jdemooij)
Attachment #8410007 - Flags: review?(dtc-moz)
Comment on attachment 8410006 [details] [diff] [review]
updateBT-r0.patch

Review of attachment 8410006 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +3734,5 @@
>          if (!useHardFpABI()) {
>              // Move float32 from r0 to ReturnFloatReg.
> +            // singleOverlay() is needed until the rest of the engine can deal with it being
> +            // a float32 reg rather than a double reg.
> +            as_vxfer(r0, InvalidReg, ReturnFloat32Reg.singleOverlay(), CoreToFloat);

This change looks like it belongs in another patch.
Attachment #8410006 - Flags: review?(bhackett1024) → review+
Comment on attachment 8410003 [details] [diff] [review]
addRegallocMachinery-r0.patch

Feel free to bounce the review if you don't feel qualified.
Attachment #8410003 - Flags: review?(benj)
Attachment #8410008 - Flags: review?(jdemooij)
Comment on attachment 8410003 [details] [diff] [review]
addRegallocMachinery-r0.patch

Review of attachment 8410003 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: js/src/jit/LIR.cpp
@@ +264,5 @@
> +bool LAllocation::aliases(const LAllocation &other) const
> +{
> +    if (isFloatReg() && other.isFloatReg()) {
> +        return toFloatReg()->reg().aliases(other.toFloatReg()->reg());
> +    }

nit: unnecessary {}

::: js/src/jit/LIR.h
@@ +197,5 @@
>      const char *toString() const;
>  #else
>      const char *toString() const { return "???"; }
>  #endif
> +    bool aliases(const LAllocation &other) const ;

nit: bool aliases(const LAllocation &other) const;

@@ +498,5 @@
> +        return !isFloatReg() && !r.isFloat();
> +    }
> +    bool isCompatibleDef(const LDefinition &other) const {
> +        if (isFloatReg() && other.isFloatReg()) {
> +            return (type() == other.type());

nit: unnecessary parenthesis in the return statement, no braces needed for the test

::: js/src/jit/LiveRangeAllocator.h
@@ +479,5 @@
> +        return def_->isCompatibleReg(r);
> +    }
> +    bool isCompatibleVReg(const VirtualRegister &vr) const {
> +        return def_->isCompatibleDef(*vr.def_);
> +    }

These two functions seem unused in this patch, so they probably belong to another one.

::: js/src/jit/RegisterSets.h
@@ +108,5 @@
> +    // do the two registers hold the same type of data (e.g. both float32, both gpr)
> +    bool isCompatibleReg (const AnyRegister other) const {
> +        if (isFloat() && other.isFloat())
> +            return fpu().equiv(other.fpu());
> +        if ((!isFloat()) && (!other.isFloat()))

nit: unnecessary parenthesis in tests

::: js/src/jit/arm/Architecture-arm.h
@@ +343,5 @@
>          return doubleOverlay() == other.doubleOverlay();
>      }
>      uint32_t numAliased() const {
> +        return 1;
> +#ifdef EVERYONE_KNOWS_ABOUT_ALIASING

As discussed on IRC, there is a patch removing these ifdef later.

::: js/src/jit/x64/Architecture-x64.h
@@ +187,5 @@
>      }
>      bool operator == (const FloatRegister &other) const {
>          return other.code_ == code_;
>      }
> +    bool aliases(FloatRegister const &other) const {

I would be in favor with sharing this code between x86 and x64 (as it is exactly the same in Architecture-x86.h), although there doesn't seem to be an existing place for such shared code and a lot of code is already duplicated between the two files. If you also feel code should be shared, what about a new file, for instance /jit/shared/Architecture-x86-shared.h? (maybe in a follow-up bug)

@@ +206,5 @@
> +    }
> +    uint32_t numAlignedAliased() {
> +        return 1;
> +    }
> +    FloatRegister alignedAliased(uint32_t a) {

Not sure when it is used (not in this patch), but if I understand correctly the return value of numAlignedAliased and by analogy with numAliased, we could / should add:

JS_ASSERT(a == 0);

::: js/src/jit/x86/Architecture-x86.h
@@ +184,5 @@
> +    }
> +    uint32_t numAlignedAliased() {
> +        return 1;
> +    }
> +    FloatRegister alignedAliased(uint32_t a) {

JS_ASSERT(a == 0);
Attachment #8410003 - Flags: review?(benj) → review+
Comment on attachment 8410004 [details] [diff] [review]
updateLSRA-r0.patch

Review of attachment 8410004 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. LinearScan.cpp is in the patch multiple times, so the review tool doesn't work. Manual review comments below.

In splitBlockingIntervals:

             LiveInterval *it = *i;
-            active.removeAt(i);
+            i = active.removeAt(i);
             finishInterval(it);
-            break;

Instead of removing the break, can we do:

if (allocatedReg.numAliased() == 0)
    break;

That way on x86/x64 and ARM-with-GPR-regs, we can still break out of the loop in many cases.

In findBestFreeRegister:

+                if (freeUntilPos[aprevReg.code()] == CodePosition::MIN)
+                    useit = false;

Nit: add a "break;" after the useit = false. Same for the 2 or 3 cases below.
Attachment #8410004 - Flags: review?(jdemooij) → review+
Comment on attachment 8410008 [details] [diff] [review]
changeSettypes-r0.patch

Review of attachment 8410008 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Architecture-arm.h
@@ +396,5 @@
>      static Code FromName(const char *name) {
>          return FloatRegisters::FromName(name);
>      }
> +    typedef FloatRegisters::settype settype;
> +    static uint32_t setsize(settype x) {

Nit: SetType, SetSize

@@ +397,5 @@
>          return FloatRegisters::FromName(name);
>      }
> +    typedef FloatRegisters::settype settype;
> +    static uint32_t setsize(settype x) {
> +        return mozilla::CountPopulation32(x & 0xffffffff) + mozilla::CountPopulation32(x >> 32);

Please add mozilla::CountPopulation64 -- that way we can use __builtin_popcountll (with GCC/Clang at least) and let the compiler optimize it as much as it wants.

::: js/src/jit/x64/Architecture-x64.h
@@ +238,5 @@
> +
> +TypedRegisterSet<FloatRegister> reduceSetForPush(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getPushSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getRegisterDumpOffsetInBytes(FloatRegister reg);

Can you make these methods of FloatRegisterSet / FloatRegister? If needed we can declare them in the (shared) header file and have the definitions in platform-specific code (or use #ifdef).
Attachment #8410008 - Flags: review?(jdemooij)
Comment on attachment 8410009 [details] [diff] [review]
fixFloat32-r4.patch

Review of attachment 8410009 [details] [diff] [review]:
-----------------------------------------------------------------

Yup this size is fine I think.
Attachment #8410009 - Flags: feedback?(jdemooij) → feedback+
Comment on attachment 8410007 [details] [diff] [review]
updateForD32-r0.patch

The patch looks plausible but I just do not have the cycles to properly review this or to test the complete patch set given the communicated priorities. Could you re-assign this review or assign it back to me if you want this reviewed as a priority?
Attachment #8410007 - Flags: review?(dtc-moz) → feedback?(jdemooij)
Comment on attachment 8410007 [details] [diff] [review]
updateForD32-r0.patch

Nicolas, can you take this review? Douglas doesn't have time for it atm, see comment 28.
Attachment #8410007 - Flags: feedback?(jdemooij) → review?(nicolas.b.pierron)
Comment on attachment 8410007 [details] [diff] [review]
updateForD32-r0.patch

Review of attachment 8410007 [details] [diff] [review]:
-----------------------------------------------------------------

I'll review it by tomorrow.
Comment on attachment 8410003 [details] [diff] [review]
addRegallocMachinery-r0.patch

Review of attachment 8410003 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR.h
@@ +499,5 @@
> +    }
> +    bool isCompatibleDef(const LDefinition &other) const {
> +        if (isFloatReg() && other.isFloatReg()) {
> +            return (type() == other.type());
> +        }

style-nit: remove curly braces too.
Comment on attachment 8410007 [details] [diff] [review]
updateForD32-r0.patch

Review of attachment 8410007 [details] [diff] [review]:
-----------------------------------------------------------------

Overlay of Scratch registers is fine, even if I would prefer to have these declare ahead of time.

Overlay of source/destination registers are not fine.  The register allocator is supposed to allocate either a Float32 or a Double, and these should be known before entering any of these convert functions.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +51,2 @@
>               CoreToFloat);
> +    as_vcvt(dest, ScratchFloat32Reg.sintOverlay());

nit: MOZ_ASSERT(dest.isDouble());

@@ +55,5 @@
>  void
>  MacroAssemblerARM::convertInt32ToDouble(const Address &src, FloatRegister dest)
>  {
>      ma_vldr(Operand(src), ScratchDoubleReg);
>      as_vcvt(dest, VFPRegister(ScratchDoubleReg).sintOverlay());

nit: MOZ_ASSERT(dest.isDouble());

@@ +64,5 @@
>  {
>      // direct conversions aren't possible.
>      VFPRegister dest = VFPRegister(dest_);
> +    as_vxfer(src, InvalidReg, ScratchFloat32Reg.uintOverlay(), CoreToFloat);
> +    as_vcvt(dest, ScratchFloat32Reg.uintOverlay());

nit: same here.

@@ +72,5 @@
>  MacroAssemblerARM::convertUInt32ToFloat32(const Register &src, const FloatRegister &dest_)
>  {
>      // direct conversions aren't possible.
>      VFPRegister dest = VFPRegister(dest_);
> +    as_vxfer(src, InvalidReg,ScratchFloat32Reg.uintOverlay(), CoreToFloat);

stile-nit: add a space after "InvalidReg,"

@@ +73,5 @@
>  {
>      // direct conversions aren't possible.
>      VFPRegister dest = VFPRegister(dest_);
> +    as_vxfer(src, InvalidReg,ScratchFloat32Reg.uintOverlay(), CoreToFloat);
> +    as_vcvt(dest.singleOverlay(), ScratchFloat32Reg.uintOverlay());

by using dest.singleOverlay() here, this implies that the only register that we are allowed to manipulate are either aligned-single or casted-double registers.

Remove ".singleOverlay()" and replace it by adding the following assertion above:

  MOZ_ASSERT(dest.isSingle());

@@ +79,5 @@
>  
>  void MacroAssemblerARM::convertDoubleToFloat32(const FloatRegister &src, const FloatRegister &dest,
>                                                 Condition c)
>  {
>      as_vcvt(VFPRegister(dest).singleOverlay(), VFPRegister(src), false, c);

same as above.

@@ +178,3 @@
>      as_vxfer(src, InvalidReg, dest.sintOverlay(),
>               CoreToFloat);
> +    as_vcvt(dest.singleOverlay(), dest.sintOverlay());

Use the ScratchFloat32 here, and do not use dest.singleOverlay() (as mentionned above)
Attachment #8410007 - Flags: review?(nicolas.b.pierron) → review-
Comment on attachment 8410009 [details] [diff] [review]
fixFloat32-r4.patch

Review of attachment 8410009 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Architecture-arm.h
@@ +247,5 @@
>      static Code FromName(const char *name);
>  
>      static const Code Invalid = invalid_freg;
> +    // little white lie.  OH GOD HOW AM I GOING TO SOLVE THIS?
> +    static const uint32_t Total = 48;

Will this be changed to 64 when we support ARM VFP3? 

This question is related to the SIMD register allocation. If Total remains to 48, we could increase it to 56 to deal with the SIMD NEON register aliasing as the same way for single and double floating-point registers. But if it will be 64, there is no room for SIMD NEON registers.
Comment on attachment 8410004 [details] [diff] [review]
updateLSRA-r0.patch

Review of attachment 8410004 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LinearScan.cpp
@@ +1174,5 @@
>      for (size_t i = 0; i < AnyRegister::Total; i++) {
>          if (nextUsePos[i] == CodePosition::MIN)
>              continue;
> +        AnyRegister cur = AnyRegister::FromCode(i);
> +        if (!vregs[current->vreg()].isCompatibleReg(cur))

How about introducing a register kind for each virtual register and allocating the virtual registers by kind? When we allocate a virtual register with a DOUBLE_REGISTER kind, we do not need to iterate all the hardware single-precision registers.
Attached patch floatizeSafepoints-r0.patch (obsolete) — Splinter Review
Attachment #8423475 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8423475 [details] [diff] [review]
floatizeSafepoints-r0.patch

Review of attachment 8423475 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Safepoints.cpp
@@ +53,5 @@
> +WriteFloatRegisterMask(CompactBufferWriter &stream, uint64_t bits)
> +{
> +    if (sizeof(FloatRegisters::settype) == 1) {
> +        stream.writeByte(bits);
> +    } else if (sizeof(FloatRegisters::settype) == 4) {

nit: <= 4

@@ +57,5 @@
> +    } else if (sizeof(FloatRegisters::settype) == 4) {
> +        stream.writeUnsigned(bits);
> +    } else {
> +        stream.writeUnsigned(bits & 0xffffffff);
> +        stream.writeUnsigned(bits >> 32);

nit: static_assert(sizeof(FloatRegisters::settype) == 8)

@@ +74,5 @@
> +{
> +    if (sizeof(FloatRegisters::settype) == 4)
> +        return stream.readUnsigned();
> +    uint64_t ret = stream.readUnsigned();
> +    ret |= uint64_t(stream.readUnsigned()) << 32;

I would highly prefer to have some symetry with the WriteFloatRegisterMask function, because this is wrong for:
  sizeof(FloatRegisters::settype) == 1 (write 1 byte, read 2 unsigned)
Attachment #8423475 - Flags: review?(nicolas.b.pierron)
Attached patch changeSettypes-r1.patch (obsolete) — Splinter Review
Attachment #8410008 - Attachment is obsolete: true
Attachment #8430951 - Flags: review?(jdemooij)
Attached patch fixFloat32-r5.patch (obsolete) — Splinter Review
Attachment #8403031 - Attachment is obsolete: true
Attachment #8410009 - Attachment is obsolete: true
Attachment #8430952 - Flags: review?(jdemooij)
Attachment #8423475 - Attachment is obsolete: true
Attachment #8430954 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8430951 [details] [diff] [review]
changeSettypes-r1.patch

Review of attachment 8430951 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Architecture-arm.h
@@ +143,5 @@
>  
>      static const uint32_t AllocatableMask = AllMask & ~NonAllocatableMask;
> +    typedef uint32_t SetType;
> +    static uint32_t SetSize(SetType x) {
> +        return mozilla::CountPopulation32(x);

Nit: static_assert(sizeof(SetType) == 4, "SetType must be 32 bits");

@@ +396,5 @@
>      static Code FromName(const char *name) {
>          return FloatRegisters::FromName(name);
>      }
> +    typedef FloatRegisters::SetType SetType;
> +    static uint32_t SetSize(SetType x) {

Nit: static_assert(sizeof(SetType) == 8, "SetType must be 64 bits");

::: js/src/jit/x64/Architecture-x64.h
@@ +171,5 @@
>      typedef FloatRegisters Codes;
>      typedef Codes::Code Code;
> +    typedef Codes::SetType SetType;
> +    static uint32_t SetSize(SetType x) {
> +        return mozilla::CountPopulation32(x);

Add the same static asserts here.

::: js/src/jit/x86/Architecture-x86.h
@@ +148,5 @@
>      typedef FloatRegisters Codes;
>      typedef Codes::Code Code;
> +    typedef Codes::SetType SetType;
> +    static uint32_t SetSize(SetType x) {
> +        return mozilla::CountPopulation32(x);

Add the static assert here too.

::: js/src/jit/x86/Assembler-x86.cpp
@@ +103,5 @@
> +
> +FloatRegisterSet
> +jit::reduceSetForPush(const FloatRegisterSet &s)
> +{
> +    return s;

Functions that are not class methods should have capitalized names, so ReduceSetForPush. But as I said before, I think these functions should be methods on FloatRegisterSet etc, so that you can do:

reduced = set.reduceForPush();
Attachment #8430951 - Flags: review?(jdemooij)
Comment on attachment 8430952 [details] [diff] [review]
fixFloat32-r5.patch

Review of attachment 8430952 [details] [diff] [review]:
-----------------------------------------------------------------

I really want to see this fixed to unblock Intel's SIMD work, but I still have some questions and many style nits.

Please check out https://wiki.mozilla.org/JavaScript:SpiderMonkey:Coding_Style, there aren't that many rules.

::: js/src/jit/CodeGenerator.cpp
@@ +1441,5 @@
>  #endif
>            case LDefinition::GENERAL: moveType = MoveOp::GENERAL; break;
>            case LDefinition::INT32:   moveType = MoveOp::INT32;   break;
> +          case LDefinition::FLOAT32: moveType = MoveOp::FLOAT32;
> +#ifdef JS_CPU_ARM

JS_CODEGEN_ARM to make it work with the ARM simulator; here and elsewhere in the patch.

@@ +1447,5 @@
> +                tmpfrom = LFloatReg(from->toFloatReg()->reg().singleOverlay());
> +                from = &tmpfrom;
> +            }
> +            if (to->isFloatReg()) {
> +                tmpto =  LFloatReg(to->toFloatReg()->reg().singleOverlay());

Nit: extra space after the =.

Shouldn't this be done somewhere else? It feels strange to first add the "wrong" registers to a MoveGroup and have them fixed up during codegen.

::: js/src/jit/IonFrames.cpp
@@ +344,5 @@
> +    for (FloatRegisterBackwardIterator iter(fregs); iter.more(); iter++) {
> +        floatSpill -= (*iter).size();
> +        for (int a = 0; a < (*iter).numAlignedAliased(); a++) {
> +            // just say that every aligned aliased register starts here.
> +            machine.setRegisterLocation((*iter).alignedAliased(a), (double*)floatSpill);

I don't understand this and the comment makes it sound like a hack. Can you expand the comment?

::: js/src/jit/RegisterAllocator.cpp
@@ +270,2 @@
>              safepoint->addLiveRegister(reg);
> +        }

This if shouldn't have {}

::: js/src/jit/RegisterSets.h
@@ +336,5 @@
>  template <typename T>
>  class TypedRegisterSet
>  {
> +    typedef  typename T::settype settype;
> +    settype bits_;

The other patch uses SetType instead of settype? Also, there's an extra space between typedef and typename.

@@ +384,3 @@
>      }
>      void add(T reg) {
> +        // make sure we don't add two overlapping registers.

Nit: s/make/Make

@@ +419,4 @@
>      }
> +    void takeAllUnchecked(T reg) {
> +        for (int a = 0; a < reg.numAliased(); a++)
> +            bits_ &= ~(1ll<<reg.aliased(a).code());

Nit: add spacing before and after <<. Also, this 111 appears a few times, could you add a constant and use that instead?

@@ +471,5 @@
>          return result;
>      }
>      T getFirst() const {
>          JS_ASSERT(!empty());
> +        return T::FromCode(mozilla::CountTrailingZeroes64(bits_));

Is TypedRegisterSet now always 64-bit, even on x86/x64? Shouldn't that be ARM-only?

@@ +640,5 @@
>              fpu_.take(reg.fpu());
>          else
>              gpr_.take(reg.gpr());
>      }
> +    void takeAllUnchecked(const AnyRegister &reg) {

Nit: I think mentioning aliased in the name is clearer, so something like takeAllAliasedUnchecked

::: js/src/jit/StupidAllocator.cpp
@@ +82,2 @@
>              registers[registerCount++].reg = AnyRegister(remainingRegisters.takeFloat());
> +        }

These loops shouldn't have {}

@@ +135,4 @@
>          } else {
>              registers[existing].age = ins->id();
>              return registers[existing].reg;
>          }

The outer if's body spans multiple lines, so this one does need {}

@@ +351,2 @@
>                      evictRegister(ins, existing);
> +                }

Nit: no {}, and the 2 comments should be capitalized while you're here.

::: js/src/jit/arm/Assembler-arm.cpp
@@ +1365,5 @@
>  Assembler::as_alu(Register dest, Register src1, Operand2 op2,
>                    ALUOp op, SetCond_ sc, Condition c, Instruction *instdest)
>  {
> +    //if (dest == r6 && op2.isO2Reg() && op2.toOp2Reg().checkRM(lr) && op == op_mov)
> +    //        dbg_break();

Remove.

::: js/src/jit/arm/Trampoline-arm.cpp
@@ +346,5 @@
>  JitRuntime::generateInvalidator(JSContext *cx)
>  {
>      // See large comment in x86's JitRuntime::generateInvalidator.
>      MacroAssembler masm(cx);
> +    //masm.as_bkp(t);

Remove.
Attachment #8430952 - Flags: review?(jdemooij)
Comment on attachment 8430954 [details] [diff] [review]
floatizeSafepoints-r1.patch

Review of attachment 8430954 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Safepoints.cpp
@@ +61,5 @@
> +WriteFloatRegisterMask(CompactBufferWriter &stream, uint64_t bits)
> +{
> +    if (sizeof(FloatRegisters::settype) == 1) {
> +        stream.writeByte(bits);
> +    } else if (sizeof(FloatRegisters::settype) == 4) {

nit: <= 4

@@ +64,5 @@
> +        stream.writeByte(bits);
> +    } else if (sizeof(FloatRegisters::settype) == 4) {
> +        stream.writeUnsigned(bits);
> +    } else {
> +        JS_ASSERT(sizeof(FloatRegisters::settype) == 8);

style-nit: Use MOZ_ASSERT instead of JS_ASSERT.

@@ +77,5 @@
> +    if (sizeof(FloatRegisters::settype) == 1)
> +        return stream.readByte();
> +    if (sizeof(FloatRegisters::settype) <= 4)
> +        return stream.readUnsigned();
> +    JS_ASSERT(sizeof(FloatRegisters::settype) == 8);

stye-nit: same here.
Attachment #8430954 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8410007 - Attachment is obsolete: true
Attachment #8432640 - Flags: review?(nicolas.b.pierron)
Attachment #8430952 - Attachment is obsolete: true
Attachment #8433193 - Flags: review?(jdemooij)
Attached patch changeSettypes-r2.patch (obsolete) — Splinter Review
Attachment #8430951 - Attachment is obsolete: true
Attachment #8433197 - Flags: review?(jdemooij)
Attached patch splitFloatReg-r1.patch (obsolete) — Splinter Review
Attachment #8433199 - Flags: review?(jdemooij)
Comment on attachment 8433197 [details] [diff] [review]
changeSettypes-r2.patch

Review of attachment 8433197 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Architecture-arm.h
@@ +415,3 @@
>  uint32_t getSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
>  uint32_t getPushSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
>  uint32_t getRegisterDumpOffsetInBytes(FloatRegister reg);

These are not defined in this patch, but the x86/x64 ones are?

::: js/src/jit/x64/Architecture-x64.h
@@ +145,5 @@
>      static const uint32_t Allocatable = 15;
>  
>      static const uint32_t AllMask = (1 << Total) - 1;
> +    static const uint32_t AllDoubleMask = AllMask;
> +    static const uint32_t VolatileMask = 

Nit: trailing whitespace

::: js/src/jit/x64/Assembler-x64.cpp
@@ +284,5 @@
> +uint32_t
> +jit::getRegisterDumpOffsetInBytes(FloatRegister reg)
> +{
> +    return reg.code() * sizeof(double);
> +}

Again, all of these 4 should be members of the appropriate classes. Just add a 

uint32_t sizeInBytes();

to TypedRegisterSet and implement the body in Architecture-*.cpp
Attachment #8433197 - Flags: review?(jdemooij)
Comment on attachment 8433199 [details] [diff] [review]
splitFloatReg-r1.patch

Review of attachment 8433199 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Snapshots.h
@@ +88,5 @@
>  
>      // Additional information to recover the content of the allocation.
> +    struct FloatRegisterBits {
> +        uint32_t data;
> +        bool operator ==( const FloatRegisterBits &other) const {

Nit: remove extra spaces before == and after "(": operator==(const Float...

::: js/src/jit/arm/Architecture-arm.h
@@ +265,5 @@
> +      : kind(Double), code_(Code(0)), _isInvalid(true), _isMissing(false)
> +    { }
> +
> +    MOZ_CONSTEXPR VFPRegister(bool b, bool c, bool d, bool e)
> +      : kind(Double), code_(Code(0)),_isInvalid(false), _isMissing(b)

Er, what's the point of these unused boolean arguments?

@@ +395,5 @@
> +class TypedRegisterSet;
> +TypedRegisterSet<FloatRegister> reduceSetForPush(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getPushSizeInBytes(const TypedRegisterSet<FloatRegister> &s);
> +uint32_t getRegisterDumpOffsetInBytes(FloatRegister reg);

This was also in the other patch?

::: js/src/jit/arm/Assembler-arm.cpp
@@ +1214,5 @@
>  {
>      JS_ASSERT(!_isInvalid);
>      if (kind == Double) {
>          // There are no corresponding float registers for d16-d31
> +        ASSERT(code_ < 16);

All these ASSERT's should be converted to JS_ASSERT or MOZ_ASSERT.
Attachment #8433199 - Flags: review?(jdemooij)
Comment on attachment 8433193 [details] [diff] [review]
fixFloat32-r6.patch

Review of attachment 8433193 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry but there are still too many (style) issues, these really need to be fixed before I look closer.

::: js/src/jit/AsmJS.cpp
@@ +5983,5 @@
>  #if defined(JS_CODEGEN_ARM)
>  // The ARM system ABI also includes d15 in the non volatile float registers.
>  static const RegisterSet NonVolatileRegs =
>      RegisterSet(GeneralRegisterSet(Registers::NonVolatileMask),
> +                    FloatRegisterSet(FloatRegisters::NonVolatileDoubleMask | (1ull << FloatRegisters::d15 + 32)));

What does the 32 mean here? Could this be some named constant?

::: js/src/jit/CodeGenerator.cpp
@@ +1441,5 @@
>  #endif
>            case LDefinition::GENERAL: moveType = MoveOp::GENERAL; break;
>            case LDefinition::INT32:   moveType = MoveOp::INT32;   break;
> +          case LDefinition::FLOAT32: moveType = MoveOp::FLOAT32;
> +#if 0 && defined(JS_CODEGEN_ARM)

Remove #if 0 code.

::: js/src/jit/IonFrames.cpp
@@ +1695,5 @@
>      MachineState machine;
>  
>      for (unsigned i = 0; i < Registers::Total; i++)
>          machine.setRegisterLocation(Register::FromCode(i), &regs[i]);
> +#ifdef JS_CPU_ARM

JS_CODEGEN_ARM

::: js/src/jit/MoveResolver.h
@@ +207,5 @@
>      // Moves that are definitely unblocked (constants to registers). These are
>      // emitted last.
>      js::Vector<MoveOp, 16, SystemAllocPolicy> orderedMoves_;
>      bool hasCycles_;
> +    int numCycles_;

Unused.

::: js/src/jit/RegisterAllocator.h
@@ +314,5 @@
>  #elif defined(JS_CODEGEN_ARM)
>          if (mir->compilingAsmJS()) {
>              allRegisters_.take(AnyRegister(HeapReg));
>              allRegisters_.take(AnyRegister(GlobalReg));
> +            // need to remove both NANReg, and its aliases.

Nit: s/need/Need

::: js/src/jit/arm/Architecture-arm.h
@@ +255,5 @@
>      static Code FromName(const char *name);
>  
>      static const Code Invalid = invalid_freg;
> +    // little white lie.  OH GOD HOW AM I GOING TO SOLVE THIS?
> +    static const uint32_t Total = 48;

We really can't have such comments.

::: js/src/jit/arm/Assembler-arm.h
@@ +1665,5 @@
> +        int len = high - low + 1;
> +        // vdtm can only transfer 16 registers at once.  If we need to transfer more,
> +        // then either hoops are necessary, or we need to be updating the register
> +        JS_ASSERT_IF(len > 16, dtmUpdate == WriteBack);
> +        

Trailing whitespace, and the previous comment should have a period.

@@ +1672,5 @@
> +        while (len > 0) {
> +            // limit the instruction to 16 registers
> +            int curLen = Min(len, 16);
> +            // if it is a store, we want to start at the high end and move down
> +            // e.g. vpush d16-d31; vpush d0-d15

Full sentence here too, and everywhere else.
Attachment #8433193 - Flags: review?(jdemooij) → review-
Comment on attachment 8432640 [details] [diff] [review]
updateForD32-r1.patch

Review of attachment 8432640 [details] [diff] [review]:
-----------------------------------------------------------------

Fix the nits and r=me.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +51,1 @@
>      VFPRegister dest = VFPRegister(dest_);

nit: Remove this double copy and remove the underscore from the argument.

@@ +65,5 @@
>  void
>  MacroAssemblerARM::convertUInt32ToDouble(const Register &src, const FloatRegister &dest_)
>  {
>      // direct conversions aren't possible.
>      VFPRegister dest = VFPRegister(dest_);

nit: same here.

@@ +75,5 @@
>  void
>  MacroAssemblerARM::convertUInt32ToFloat32(const Register &src, const FloatRegister &dest_)
>  {
>      // direct conversions aren't possible.
>      VFPRegister dest = VFPRegister(dest_);

nit: and here.

@@ +89,1 @@
>      as_vcvt(VFPRegister(dest).singleOverlay(), VFPRegister(src), false, c);

nit:
  MOZ_ASSERT(src.isDouble());
  as_vcvt(dest, src, false, c);

@@ +99,5 @@
>  void
>  MacroAssemblerARM::branchTruncateDouble(const FloatRegister &src, const Register &dest, Label *fail)
>  {
> +    ma_vcvt_F64_I32(src, ScratchFloat32Reg.sintOverlay());
> +    ma_vxfer(ScratchDoubleReg.sintOverlay(), dest);

hum …? Are we using the same registers?
Can we add a temporary variable for such cases, such as these kind of problem never appear in the future?

  FloatRegister sintTemp = ScratchFloat32Reg.sintOverlay();
  ma_vcvt_F64_I32(src, sintTemp);
  ma_vxfer(sintTemp, dest);

@@ +166,4 @@
>  
>  void
>  MacroAssemblerARM::convertFloat32ToDouble(const FloatRegister &src, const FloatRegister &dest) {
>      as_vcvt(VFPRegister(dest), VFPRegister(src).singleOverlay());

nit:
    MOZ_ASSERT(src.isSingle());
    MOZ_ASSERT(dest.isDouble());
    as_vcvt(dest, src);

@@ +1562,5 @@
>  MacroAssemblerARM::ma_vcvt_F64_I32(FloatRegister src, FloatRegister dest, Condition cc)
>  {
> +    JS_ASSERT(src.isDouble());
> +    JS_ASSERT(dest.isSInt());
> +    as_vcvt(dest, VFPRegister(src), false, cc);

nit: Remove this VFPRegister constructor call.

Note that without the previous assertion, this code implies that we might accidently promote any single to a double register.

@@ +1569,5 @@
>  MacroAssemblerARM::ma_vcvt_F64_U32(FloatRegister src, FloatRegister dest, Condition cc)
>  {
> +    JS_ASSERT(src.isDouble());
> +    JS_ASSERT(dest.isUInt());
> +    as_vcvt(dest.uintOverlay(), src, false, cc);

nit: remove the .uintOverlay().

@@ +4088,1 @@
>      ma_vxfer(VFPRegister(ScratchDoubleReg).uintOverlay(), output);

nit: Remove useless VFPRegister call, and do the same below.
Attachment #8432640 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8433197 - Attachment is obsolete: true
Attachment #8435738 - Flags: review?(jdemooij)
Attachment #8400740 - Attachment is obsolete: true
Attachment #8433199 - Attachment is obsolete: true
Attachment #8435740 - Flags: review?(jdemooij)
Attached patch fixFloat32-r8.patch (obsolete) — Splinter Review
Attachment #8436094 - Flags: review?(jdemooij)
Comment on attachment 8435738 [details] [diff] [review]
changeSettypes-r3.patch

Review of attachment 8435738 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Architecture-arm.cpp
@@ +313,5 @@
> +VFPRegister::GetSizeInBytes(const FloatRegisterSet &s)
> +{
> +    uint64_t bits = s.bits();
> +    uint32_t ret = mozilla::CountPopulation32(bits&0xffffffff) * sizeof(float);
> +    ret +=  mozilla::CountPopulation32(bits >> 32) * sizeof(double);

Nit: add spaces around &, remove extra space after +=. Also below.

@@ +332,5 @@
> +    if (isSingle())
> +        return id() * sizeof(float);
> +    if (isDouble())
> +        return id() * sizeof(double);
> +    MOZ_ASSUME_UNREACHABLE();

Nit:

MOZ_ASSERT(isDouble());
return id() * sizeof(double);

::: js/src/jit/x64/Architecture-x64.h
@@ +243,5 @@
>  static bool hasMultiAlias() {
>      return false;
>  }
>  
> +

Nit: remove this extra line.

::: js/src/jit/x86/Architecture-x86.h
@@ +222,5 @@
>      return false;
>  }
>  
> +
> +

Nit: remove extra whitespace added here.
Attachment #8435738 - Flags: review?(jdemooij) → review+
Comment on attachment 8435740 [details] [diff] [review]
splitFloatReg-r2.patch

Review of attachment 8435740 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Architecture-arm.h
@@ +258,5 @@
> +    bool _isMissing : 1;
> +
> +  public:
> +    MOZ_CONSTEXPR VFPRegister(uint32_t  r, RegType k)
> +      : kind(k), code_ (Code(r)), _isInvalid(false), _isMissing(false)

Nit: remove extra space between "uint32_t" and "r", and after code_.

@@ +268,5 @@
> +    MOZ_CONSTEXPR VFPRegister(RegType k, uint32_t id, bool invalid, bool missing) :
> +        kind(k), code_(Code(id)), _isInvalid(invalid), _isMissing(missing) {
> +    }
> +
> +    MOZ_CONSTEXPR VFPRegister(Code id)

Make this constructor |explicit|.

@@ +303,5 @@
> +
> +      private:
> +        friend VFPRegIndexSplit js::jit::VFPRegister::encode();
> +
> +        VFPRegIndexSplit (uint32_t block_, uint32_t bit_)

Nit: remove space after VFPRegIndexSplit.

@@ +344,5 @@
> +        return doubleOverlay() == other.doubleOverlay();
> +    }
> +    uint32_t numAliased() const {
> +        if (isDouble()) {
> +            if (code_ < 16)

Is there a constant we can use instead of hardcoding 16? Can we add one?

@@ +377,5 @@
> +            return *this;
> +        if (isDouble()) {
> +            JS_ASSERT(code_ < 16);
> +            JS_ASSERT(a <= 1);
> +            return singleOverlay(a-1);

Nit: a - 1

@@ +381,5 @@
> +            return singleOverlay(a-1);
> +        }
> +        JS_ASSERT(a == 1);
> +        JS_ASSERT((code_ & 1) == 0);
> +        return doubleOverlay(a-1);

And here.

::: js/src/jit/arm/Assembler-arm.cpp
@@ +189,5 @@
>      JS_ASSERT(!_isInvalid);
>  
>      switch (kind) {
>        case Double:
> +        return VFPRegIndexSplit(code_ &0xf , code_ >> 4);

code_ & 0xf

::: js/src/jit/x64/Architecture-x64.h
@@ +184,5 @@
> +    }
> +    bool operator != (const FloatRegister &other) const {
> +        return other.code_ != code_;
> +    }
> +    bool operator == (const FloatRegister &other) const {

Nit: operator!=(, operator==(, also pass FloatRegister by value instead of reference: "FloatRegister other"

@@ +187,5 @@
> +    }
> +    bool operator == (const FloatRegister &other) const {
> +        return other.code_ == code_;
> +    }
> +    bool aliases(FloatRegister const &other) const;

Same here.

::: js/src/jit/x86/Architecture-x86.h
@@ +159,5 @@
> +    }
> +    bool volatile_() const {
> +        return !!((1 << code()) & FloatRegisters::VolatileMask);
> +    }
> +    bool operator != (const FloatRegister &other) const {

bool operator!=(FloatRegister other) const {, same for operator==

@@ +165,5 @@
> +    }
> +    bool operator == (const FloatRegister &other) const {
> +        return other.code_ == code_;
> +    }
> +    bool aliases(FloatRegister const &other) const;

FloatRegister other
Attachment #8435740 - Flags: review?(jdemooij) → review+
Comment on attachment 8436094 [details] [diff] [review]
fixFloat32-r8.patch

Review of attachment 8436094 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Architecture-arm.cpp
@@ +338,5 @@
>  }
>  
> +uint32_t
> +FloatRegisters::ActualTotalPhys()
> +{

You might want to put this function in the header file so that it can be inlined in other compilation units (assuming it's used there).

@@ +349,3 @@
>  } // namespace jit
>  } // namespace js
> +

Nit: remove \n here

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +442,3 @@
>              do {
>                  offset += delta;
> +                if ((*iter).isDouble())

Maybe overload operator-> as well, so that you can do iter->isDouble() and iter->code().

@@ +446,2 @@
>                  transferFloatReg(*iter);
> +            } while ((++iter).more() && (*iter).code() == (reg += sign));

Add this right before this line:

++iter;
reg += sign;

So that the condition is side-effect free and easier to read.
Attachment #8436094 - Flags: review?(jdemooij) → review+
This patch is mostly the same as the previous patch, but it works around an issue I found on win64, where FloatRegister FloatRegister::aliased(int a) was being passed the wrong number of arguments.  I can also split the patches up to the last places where every piece of code is modified, but it will probably be easier if they're all changed at the same time.
Attachment #8410003 - Attachment is obsolete: true
Attachment #8444981 - Flags: review?(benj)
Comment on attachment 8444981 [details] [diff] [review]
addRegallocMachinery-r1.patch

Review of attachment 8444981 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I need to understand what's going on more in depth, so clearing the review flag for now. I feel like it would be a generally good idea to explain in a comment (probably in RegisterSets.h or Registers.h) what's the purpose of all these new functions: aliases, numAliased, aliased, equiv (equivalency is a very generic concept), numAlignedAliases (what's the difference with numAliased), alignedAliases (ditto with the non aligned version). Having all these functions defined without seeing the uses makes it kind of hard to review :/

Does the BacktrackingAllocator also need changes?

For the records: I just asked you on IRC why we have these functions setting an in-out param rather than returning a value. Marty told me it's because of a bug on win64 (yay, MSVC?), so Marty, feel free to ignore my comments about that, and add a comment to all these functions explaining it's due to a compiler bug (and maybe open a follow up bug for fixing that in the future?).

::: js/src/jit/LiveRangeAllocator.h
@@ +474,5 @@
>      }
>      bool isFloatReg() const {
>          return def_->isFloatReg();
>      }
> +    bool isCompatibleReg(const AnyRegister &r) const {

this appears unused in this patch

@@ +477,5 @@
>      }
> +    bool isCompatibleReg(const AnyRegister &r) const {
> +        return def_->isCompatibleReg(r);
> +    }
> +    bool isCompatibleVReg(const VirtualRegister &vr) const {

ditto

::: js/src/jit/RegisterSets.h
@@ +65,5 @@
>      }
>      bool volatile_() const {
>          return isFloat() ? fpu().volatile_() : gpr().volatile_();
>      }
> +    AnyRegister aliased(unsigned int a) const {

nit: unsigned int => uint32_t + could you give another name to "a" please

::: js/src/jit/Registers.h
@@ +61,5 @@
>      }
>      uint32_t numAliased() const {
>          return 1;
>      }
> +    void aliased(int a, Register *ret) const {

Why not a return value for this function?
+ we don't usually use int, can you use a stdint variant?

::: js/src/jit/arm/Architecture-arm.h
@@ +345,5 @@
>      }
>      static const int NumAliasedDoubles = 16;
>      uint32_t numAliased() const {
> +        return 1;
> +#ifdef EVERYONE_KNOWS_ABOUT_ALIASING

unreachable code

@@ +357,2 @@
>      }
> +    void aliased(uint32_t a, VFPRegister *ret) {

nit: what about a return value here too? + another name for "a"

@@ +375,5 @@
>                  return 2;
>              return 1;
>          }
>          // s1 has 0 other aligned aliases
>          // s0 has 1 other aligned aliases

nit: 1 other aligned alias?

@@ +376,5 @@
>              return 1;
>          }
>          // s1 has 0 other aligned aliases
>          // s0 has 1 other aligned aliases
>          return 2 - (code_ & 1);

correct me if i'm wrong, but code_&1 will be 0 or 1, so the return value will be 2 or 1, which doesn't seem like what's suggested by the comment right above.

@@ +378,5 @@
>          // s1 has 0 other aligned aliases
>          // s0 has 1 other aligned aliases
>          return 2 - (code_ & 1);
>      }
> +    void alignedAliased(uint32_t a, VFPRegister *ret) {

ditto for a's name

::: js/src/jit/arm/Simulator-arm.cpp
@@ +460,5 @@
>      if (getenv("ARM_SIM_ICACHE_CHECKS"))
>          Simulator::ICacheCheckingEnabled = true;
>  
>      char *stopAtStr = getenv("ARM_SIM_STOP_AT");
> +    fprintf(stderr, "stopAtStr is: '%s'\n", stopAtStr);

nit: debugging code

::: js/src/jit/x64/Architecture-x64.h
@@ +187,5 @@
>      }
>      bool operator ==(FloatRegister other) const {
>          return other.code_ == code_;
>      }
> +    bool aliases(FloatRegister const &other) const {

all remarks to Architecture-x86.h mostly apply here too

@@ +206,5 @@
> +    }
> +    uint32_t numAlignedAliased() {
> +        return 1;
> +    }
> +    void alignedAliased(uint32_t a, FloatRegister *ret) {

nit: JS_ASSERT(a == 0); // as in Architecture-x86.h?

::: js/src/jit/x86/Architecture-x86.h
@@ +165,5 @@
>      }
>      bool operator == (const FloatRegister &other) const {
>          return other.code_ == code_;
>      }
> +    bool aliases(FloatRegister const &other) const {

nit: const FloatRegister &other

@@ +171,5 @@
> +    }
> +    uint32_t numAliased() {
> +        return 1;
> +    }
> +    void aliased(uint32_t a, FloatRegister *ret) {

could you please give a more descriptive name to "a", please?

@@ +175,5 @@
> +    void aliased(uint32_t a, FloatRegister *ret) {
> +        JS_ASSERT(a == 0);
> +        *ret = *this;
> +    }
> +    bool equiv(const FloatRegister &other) const {

maybe add a comment here explaining what this function does and why it's ok to always return true?

@@ +184,5 @@
> +    }
> +    uint32_t numAlignedAliased() {
> +        return 1;
> +    }
> +    void alignedAliased(uint32_t a, FloatRegister *ret) {

could you please give a more descriptive name to "a", please?
Attachment #8444981 - Flags: review?(benj)
Attachment #8445019 - Flags: review?(benj)
Comment on attachment 8445019 [details] [diff] [review]
addRegallocMachinery-r2.patch

Review of attachment 8445019 [details] [diff] [review]:
-----------------------------------------------------------------

Much nicer, thanks! I'd really like to see more comments above all the newly introduced functions (even single line comments), up to you now.

::: js/src/jit/arm/Architecture-arm.h
@@ +375,5 @@
>                  return 2;
>              return 1;
>          }
> +        // s1 has 0 other aligned aliases, 1 total.
> +        // s0 has 1 other aligned aliase, 2 total.

Thanks for completing this comment :)

@@ +380,3 @@
>          return 2 - (code_ & 1);
>      }
> +    void alignedAliased(uint32_t aliasIdx, VFPRegister *ret) {

Could you add a comment (ascii art FTW) showing examples of aligned aliases? Something along
// s0 is aligned with d0, but s1 isn't. Conversely, d0 is aligned with s0, but isn't aligned with s1.

@@ +387,3 @@
>          if (isDouble()) {
>              JS_ASSERT(code_ < NumAliasedDoubles);
> +            JS_ASSERT(aliasIdx <= 1);

nit: at this point, you might even just assert aliasIdx == 1, and thus hoist this assert above the isDouble() and remove the one below.

::: js/src/jit/x64/Architecture-x64.h
@@ +187,5 @@
>      }
>      bool operator ==(FloatRegister other) const {
>          return other.code_ == code_;
>      }
> +    bool aliases(FloatRegister const &other) const {

nit: you can also remove the const& here for the "other" argument here

@@ +190,5 @@
>      }
> +    bool aliases(FloatRegister const &other) const {
> +        return other.code_ == code_;
> +    }
> +    uint32_t numAliased() {

nit: make it const

@@ +200,5 @@
> +    }
> +    // This function mostly exists for the ARM backend.  It is to ensure that two
> +    // floating point registers are equivalent.  e.g. S0 is not equivalent to D16.
> +    // Since all floating point registers on x86 and x64 are equivalent, it is
> +    // reasonable for this function to do the same.

Sorry but still not sure what kind of equivalency you're talking about: the fact that S0 is single and D16 is double? the fact that they don't alias each other?

@@ +204,5 @@
> +    // reasonable for this function to do the same.
> +    bool equiv(FloatRegister other) const {
> +        return true;
> +    }
> +    uint32_t size() {

nit: make it const

@@ +207,5 @@
> +    }
> +    uint32_t size() {
> +        return sizeof(double);
> +    }
> +    uint32_t numAlignedAliased() {

nit: make it const

::: js/src/jit/x86/Architecture-x86.h
@@ +168,5 @@
>      }
> +    bool aliases(FloatRegister other) const {
> +        return other.code_ == code_;
> +    }
> +    uint32_t numAliased() {

nit: make it const

@@ +178,5 @@
> +    }
> +    // This function mostly exists for the ARM backend.  It is to ensure that two
> +    // floating point registers are equivalent.  e.g. S0 is not equivalent to D16.
> +    // Since all floating point registers on x86 and x64 are equivalent, it is
> +    // reasonable for this function to do the same.

don't forget to update the comment here too

@@ +182,5 @@
> +    // reasonable for this function to do the same.
> +    bool equiv(FloatRegister other) const {
> +        return true;
> +    }
> +    uint32_t size() {

nit: make it const

@@ +185,5 @@
> +    }
> +    uint32_t size() {
> +        return sizeof(double);
> +    }
> +    uint32_t numAlignedAliased() {

nit: make it const
Attachment #8445019 - Flags: review?(benj) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #60)
> ::: js/src/jit/arm/Architecture-arm.h
> @@ +375,5 @@
> >                  return 2;
> >              return 1;
> >          }
> > +        // s1 has 0 other aligned aliases, 1 total.
> > +        // s0 has 1 other aligned aliase, 2 total.
> 
> Thanks for completing this comment :)

huh, thanks for having completed this comment.
Depends on: 1030599
Depends on: 1030605
Attached patch fixMoveResolver-r0.patch (obsolete) — Splinter Review
WIP, pretty thoroughly tested, just uploading so people can see, and so I can review it more thoroughly.
Attachment #8446432 - Attachment is obsolete: true
Comment on attachment 8447005 [details] [diff] [review]
fixMoveResolver-r1.patch

Review of attachment 8447005 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi-tests/testJitMoveEmitterCycles.cpp
@@ +1,1 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-

This file is mostly auto-generated.  I can pretty easily change the formatting in any test case.
Attachment #8447005 - Flags: review?(jdemooij)
Comment on attachment 8447005 [details] [diff] [review]
fixMoveResolver-r1.patch

Review of attachment 8447005 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for adding the test! I didn't look closely at the algorithm; let's get these nits fixed first.

::: js/src/jit/MoveResolver.cpp
@@ +9,1 @@
>  using namespace js;

Nit: blank line before this line.

::: js/src/jit/arm/MoveEmitter-arm.cpp
@@ +27,2 @@
>          // Reserve stack for cycle resolution
> +        masm.reserveStack((1+moves.numCycles()) * sizeof(double));

Space around '+', and the "1" could use a comment.

@@ +38,5 @@
>      assertDone();
>  }
>  
>  Operand
> +MoveEmitterARM::cycleSlot(int slot, int subslot) const

Can you make these "unsigned slot" and "unsigned subslot"?

@@ +99,5 @@
>      return spilledReg_;
>  }
>  
>  void
> +MoveEmitterARM::breakCycle(const MoveOperand &from, const MoveOperand &to, MoveOp::Type type, int slotId)

And here.

@@ +119,5 @@
> +            masm.ma_vstr(temp, cycleSlot(slotId, 4));
> +        } else {
> +            FloatRegister src = to.floatReg();
> +            // Just always store the largest possible size.
> +            // Currently, this is a double.  when SIMD is added, two

Nit: s/when/When, one space after period and 80 chars per line for comments.

@@ +174,3 @@
>              masm.ma_vstr(temp, toOperand(to, true));
>          } else {
> +            int offset = 0;

Nit: unsigned offset

::: js/src/jit/arm/Simulator-arm.cpp
@@ +1142,1 @@
>      // Set up architecture state.

Ni: blank line before this line.

@@ +4129,4 @@
>      uint64_t callee_saved_value_d = uint64_t(icount_);
> +
> +    if (!fakeCall) {
> +        set_register(r4, callee_saved_value);

fakeCall is pretty vague. Maybe checkCalleeSavedRegs?

::: js/src/jit/arm/Simulator-arm.h
@@ +264,5 @@
>  
>      static int64_t StopSimAt;
> +    // For testing the MoveResolver code, a MoveResolver is set up, and
> +    // the VFP registers are loaded with pre-determined values.
> +    // then the sequence of code is simulated.  In order to test this with the simulator,

Nit: blank line before the comment and ',' after values instead of '.' so that "then" can stay lowercase.

::: js/src/jsapi-tests/testJitMoveEmitterCycles.cpp
@@ +4,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#if defined(JS_ARM_SIMULATOR) && defined(EVERYONE_KNOWS_ABOUT_ALIASING)

Where do we set EVERYONE_KNOWS_ABOUT_ALIASING? Can you just remove it to make sure we really run this test?

@@ +48,5 @@
> +static MOZ_CONSTEXPR_VAR js::jit::FloatRegister s27(27, js::jit::VFPRegister::Single);
> +static MOZ_CONSTEXPR_VAR js::jit::FloatRegister s28(28, js::jit::VFPRegister::Single);
> +static MOZ_CONSTEXPR_VAR js::jit::FloatRegister s29(29, js::jit::VFPRegister::Single);
> +static MOZ_CONSTEXPR_VAR js::jit::FloatRegister s30(30, js::jit::VFPRegister::Single);
> +static MOZ_CONSTEXPR_VAR js::jit::FloatRegister s31(31, js::jit::VFPRegister::Single);

Shouldn't this be in Architecture-arm.h/cpp?

@@ +66,5 @@
> +    MoveResolver mr;
> +    mr.setAllocator(alloc);
> +    Simulator *sim = Simulator::Current();
> +    mr.addMove(MoveOperand(d0), MoveOperand(d2), MoveOp::DOUBLE);
> +    sim->set_d_register_from_double(0,2);

Nit: space between the arguments, also below.

@@ +88,5 @@
> +
> +    sim->call(code->raw(), 1, 1);
> +    if (sim->get_double_from_d_register(2) != 2) {
> +        fprintf(stdout, "d2 is wrong: %f\n", sim->get_double_from_d_register(2));
> +        return false;

jsapi-tests should use CHECK and return false on OOM, not on test failure. That should also make the code smaller.
Attachment #8447005 - Flags: review?(jdemooij)
Attached patch fixMoveResolver-r2.patch (obsolete) — Splinter Review
Attachment #8447972 - Flags: review?(jdemooij)
Attached patch fixFloat32-r9.patch (obsolete) — Splinter Review
Attachment #8436094 - Attachment is obsolete: true
Attached patch fixFloat32-r10.patch (obsolete) — Splinter Review
Just like -r9, but with cruft I found in interdiff removed.
Attachment #8447989 - Attachment is obsolete: true
Attachment #8448001 - Flags: review?(jdemooij)
Blocks: 972836
Comment on attachment 8447972 [details] [diff] [review]
fixMoveResolver-r2.patch

Review of attachment 8447972 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay!

I think sunfish made some changes to the MoveResolver/MoveEmitter a while ago, so he's probably a good reviewer.

::: js/src/jsapi-tests/testJitMoveEmitterCycles.cpp
@@ +172,5 @@
> +    sim->call(code->raw(), 1, 1);
> +    if (int(sim->get_double_from_d_register(14)) != 9) {
> +        fprintf(stderr, "s14 is wrong: %f\n", sim->get_double_from_d_register(14));
> +        return false;
> +    }

Use CHECK(...) here as well.
Attachment #8447972 - Flags: review?(jdemooij) → review?(sunfish)
Comment on attachment 8447972 [details] [diff] [review]
fixMoveResolver-r2.patch

Review of attachment 8447972 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MoveResolver.cpp
@@ +55,5 @@
>  }
>  
> +// Given move (A -> B), this function attempts to find any move (B -> *) in the
> +// move list iterator, and returns the first one.
> +// N.B. It is unclear if a single move can complete more than one cycle, so to be

This seems an odd fact to be unclear about. There's a fair amount of complexity here to handle this, which would be unfortunate if it isn't actually needed.

@@ +59,5 @@
> +// N.B. It is unclear if a single move can complete more than one cycle, so to be
> +// conservative, this function operates on iterators, so the caller can process all
> +// instructions that start a cycle.
> +MoveResolver::PendingMove *
> +MoveResolver::findCycledMove(PendingMoveIterator *iter, const PendingMoveIterator &end, const PendingMove *last)

end can be passed by-value here instead of by-reference.

::: js/src/jit/MoveResolver.h
@@ +91,5 @@
>      }
>  
> +    bool aliases(MoveOperand other) const {
> +        if (kind_ != other.kind_)
> +            return false;

REG can alias EFFECTIVE_ADDRESS. Also, REG can alias the address in a MEMORY. Are these aliases unimportant here? If so, please add a comment about this.

@@ +167,5 @@
> +    uint32_t cycleBeginSlot() const {
> +        return cycleBeginSlot_;
> +    }
> +    uint32_t cycleEndSlot() const {
> +        return cycleEndSlot_;

Can it be asserted that cycleEndSlot_ is not -1 here? And that isCycleEnd() is true? If not, is there somewhere else where such asserts can go? Or maybe it makes sense to eliminate cycleEnd_ (and cycleBegin_) and just let cycleEndSlot_ != -1 mean there's a cycle end (and similarly for cycleBeginSlot_).

::: js/src/jit/RegisterSets.h
@@ +360,5 @@
>      static inline TypedRegisterSet NonVolatile() {
>          return TypedRegisterSet(T::Codes::AllocatableMask & T::Codes::NonVolatileMask);
>      }
>      bool has(T reg) const {
> +        return !!(bits_ & (1ULL << reg.code()));

It looks like the change to the type of bits_ is in fixFloat32-r10.patch. It'd be good to make that change and the change here in the same patch. Among other benefits, this would let you write SetType(1) instead of 1ULL here and in a few places later in this patch.

@@ +368,5 @@
>      }
> +    void addAllAliasedUnchecked(T reg) {
> +        for (int a = 0; a < reg.numAliased(); a++) {
> +            T tmp;
> +            reg.aliased(a, &tmp);

Why does aliased return its value by pointer rather than using a normal return value?

@@ +614,5 @@
>              addUnchecked(any.fpu());
>          else
>              addUnchecked(any.gpr());
>      }
> +    void addAllAliasedUnchecked(const AnyRegister &reg) {

Where is this function called?

@@ +675,5 @@
>              fpu_.takeUnchecked(reg.fpu());
>          else
>              gpr_.takeUnchecked(reg.gpr());
>      }
> +    void takeAllAliasedUnchecked(const AnyRegister &reg) {

Where is this function called too? There are some opportunities for micro-optimizing these functions, but I don't know if it's important here.

::: js/src/jit/arm/MoveEmitter-arm.cpp
@@ +27,2 @@
>          // Reserve stack for cycle resolution
> +        masm.reserveStack((1 + moves.numCycles()) * sizeof(double));

Previously the code supported at most 1 cycle, and allocated 1 sizeof(double) stack slot. Now the code supports numCycles() cycles, and allocates numCycles()+1 stack slots. Where does the +1 come from?

@@ +305,4 @@
>      if (move.isCycleEnd()) {
>          JS_ASSERT(inCycle_);
> +        completeCycle(from, to, move.type(), move.cycleEndSlot());
> +        inCycle_--;

Assert that inCycle_ isn't decremented past 0.

::: js/src/jit/arm/Simulator-arm.cpp
@@ +1137,5 @@
>      icount_ = 0L;
>      resume_pc_ = 0;
>      break_pc_ = nullptr;
>      break_instr_ = 0;
> +    checkCalleeSavedRegs = false;

It looks like the original code checked callee saved registers by default. This patch appears to disable this by default. What is the reason for this change?

@@ +1435,5 @@
>      return value;
>  }
>  
> +template double Simulator::getFromVFPRegister<double, 2>(int reg_index);
> +template float Simulator::getFromVFPRegister<float, 1>(int reg_index);

These don't appear to be used anywhere in this patch. Why is it necessary to explicitly instantiate them?

::: js/src/jit/arm/Simulator-arm.h
@@ +266,5 @@
>  
> +    // For testing the MoveResolver code, a MoveResolver is set up, and
> +    // the VFP registers are loaded with pre-determined values,
> +    // then the sequence of code is simulated.  In order to test this with the
> +    // simulator, the calle-saved registers can't be trashed. This flag

Nit: Typo: calle -> callee

::: js/src/jsapi-tests/testJitMoveEmitterCycles.cpp
@@ +500,5 @@
> +    CHECK(int(sim->get_float_from_s_register(15)) != 29);
> +    CHECK(int(sim->get_float_from_s_register(23)) != 16);
> +    return true;
> +}
> +END_TEST(testJitMoveEmitterCycles_autogen3)

One of these autogen tests contains multiple cycles, right?
Attachment #8447972 - Flags: review?(sunfish)
Comment on attachment 8448001 [details] [diff] [review]
fixFloat32-r10.patch

Review of attachment 8448001 [details] [diff] [review]:
-----------------------------------------------------------------

I'm very sorry for the delay. But I noticed a few things that'd be good to clarify/fix, sorry if they were in the previous version and I missed them :(

I'll review an updated patch ASAP.

::: js/src/jit/CodeGenerator.cpp
@@ +8544,5 @@
>              if (a->isFloatReg()) {
>                  FloatRegister fr = ToFloatRegister(a);
> +                if (fr.isDouble()) {
> +                    int srcId = fr.id() * 2;
> +                    masm.ma_vxfer(fr, Register::FromCode(srcId), Register::FromCode(srcId+1));

What does the |2| mean? Could we use some named constant here? Also, spaces around the + operator while you're here.

::: js/src/jit/IonFrames.cpp
@@ +355,5 @@
> +    FloatRegisterSet fregs = reader.allFloatSpills();
> +    fregs = fregs.reduceSetForPush();
> +    for (FloatRegisterBackwardIterator iter(fregs); iter.more(); iter++) {
> +        floatSpill -= (*iter).size();
> +        for (int a = 0; a < (*iter).numAlignedAliased(); a++) {

s/int/unsigned or uint32_t

::: js/src/jit/RegisterSets.h
@@ +379,5 @@
>  
>      void add(T reg) {
> +        // Make sure we don't add two overlapping registers.
> +#ifdef DEBUG
> +        for (int a = 0; a < reg.numAliased(); a++) {

Nit: unsigned or uint32_t

::: js/src/jit/arm/Architecture-arm.h
@@ +259,5 @@
> +    static const uint32_t Total = 48;
> +    static const uint32_t TotalDouble = 16;
> +    static const uint32_t TotalSingle = 32;
> +    static const uint32_t Allocatable = 45;
> +    // there are only 32 places that we can put values.

Nit: s/there/There

@@ +265,5 @@
> +    static uint32_t ActualTotalPhys();
> +    static const uint64_t AllDoubleMask = ((1ull << 16) - 1) << 32;
> +    static const uint64_t AllMask = ((1ull << 48) - 1);
> +    // How many intervals can a single register force invalidation of?
> +    // The number is either two or three.  A single point can force two evictions (e.g. d0 can evict s0 and s1)

Nit: comments should fit in 80 columns.

@@ +267,5 @@
> +    static const uint64_t AllMask = ((1ull << 48) - 1);
> +    // How many intervals can a single register force invalidation of?
> +    // The number is either two or three.  A single point can force two evictions (e.g. d0 can evict s0 and s1)
> +    // Since it was initially assumed to be 1, I'm going to assume this is the proper use.
> +    static const uint32_t MaxAliasedIntervals = 2;

What does this constant mean exactly? Where is it used? The last line is pretty hand-wavy...

@@ +515,5 @@
>  // The only floating point register set that we work with
>  // are the VFP Registers
>  typedef VFPRegister FloatRegister;
>  
> +

Nit: don't add a blank line here.
Attachment #8448001 - Flags: review?(jdemooij)
Attachment #8448001 - Attachment is obsolete: true
Attachment #8452193 - Flags: review?(jdemooij)
Attachment #8447972 - Attachment is obsolete: true
Attachment #8452196 - Flags: review?(sunfish)
Comment on attachment 8452193 [details] [diff] [review]
fixFloat32-r11.patch

Review of attachment 8452193 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing the nits.
Attachment #8452193 - Flags: review?(jdemooij) → review+
(In reply to Dan Gohman [:sunfish] from comment #72)
> Comment on attachment 8447972 [details] [diff] [review]
> fixMoveResolver-r2.patch
> 
> Review of attachment 8447972 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/MoveResolver.cpp
> @@ +55,5 @@
> >  }
> >  
> > +// Given move (A -> B), this function attempts to find any move (B -> *) in the
> > +// move list iterator, and returns the first one.
> > +// N.B. It is unclear if a single move can complete more than one cycle, so to be
> 
> This seems an odd fact to be unclear about. There's a fair amount of
> complexity here to handle this, which would be unfortunate if it isn't
> actually needed.
> 
When I ran the algorithm by hand, there were certain permutations of some inputs that could induce this. The comment mostly means I have no clue if the compiler is capable of generating such code.
Attachment #8452196 - Flags: review?(sunfish) → review+
This introduced several more build warnings for similar issues in RegisterSets.h, e.g.:
{
js/src/jit/RegisterSets.h:374:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
js/src/jit/RegisterSets.h:420:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
}

...for loops of the form:
 for (int a = 0; a < reg.numAliased(); a++) {

Marty, mind fixing?
Flags: needinfo?(mrosenberg)
Strange.  I just did a local build, and didn't get any warnings.  Did you pass in some special flag to enable extra warnings?
Attachment #8456947 - Flags: review?(dholbert)
Flags: needinfo?(mrosenberg)
Comment on attachment 8456947 [details] [diff] [review]
fixSignedIssues-r0.patch

Nope. I am building with GCC 4.9, but -Wsign-compare should exist in all GCC versions since forever, and clang and MSVC have variants, too.

I get 84 instances of this warning (for RegisterSets.h) if I do the following, (in an already built build):
  touch js/src/jit/RegisterSets.h
  ./mach build js/src

If that ^ doesn't give you these warnings, perhaps it's because your build is configured in such a way that it doesn't exercise these particular templates, for some reason? (since this is in template code.)  Or maybe there's a platform-specific switch for whether this code is compiled?

Anyway, the patch looks good to me, and I can confirm it fixes the warnings locally. Thanks!
Attachment #8456947 - Flags: review?(dholbert) → review+
Depends on: 1038989
Depends on: 1299147
Blocks: 1447457
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: