Open Bug 891211 Opened 11 years ago Updated 2 years ago

Ionmonkey: (ARM) make use of all 32 double float registers when available.

Categories

(Core :: JavaScript Engine, enhancement)

ARM
All
enhancement

Tracking

()

People

(Reporter: dougc, Unassigned)

References

Details

Attachments

(3 files, 3 obsolete files)

An IRC discussion suggestion by mjrosenb: make use of all 32 double float registers when available.  Currently only d0-d15 are used.  Even low end ARM processors now support 32 double-float registers.  Registers dedicated to holding constants would likely continue to use d8 to d15 as these are preserved across native ABI calls.

Note bug 534777 is similar, but rather dated.  Merge if deemed appropriate.
Assignee: general → nobody
Attached patch D32Support-r0.patch (obsolete) — Splinter Review
As I said on IRC, there is a whole block of code that is unnecessary, but has the ability to make a bunch of things minimally prettier.
Attachment #8471705 - Flags: review?(benj)
Changed around the initialization mechanism after talking with Jan.
Attachment #8471705 - Attachment is obsolete: true
Attachment #8471705 - Flags: review?(benj)
Attachment #8472592 - Flags: review?(benj)
Comment on attachment 8472592 [details] [diff] [review]
D32Support-r1.patch

Review of attachment 8472592 [details] [diff] [review]:
-----------------------------------------------------------------

I am not comfortable with the meta macro parts of this patch, which really look overly complicated for the use case, despite the nice comments explaining it. If you really want it in the tree, feel free to bounce the review to somebody else. I would really appreciate a simpler version that doesn't evaluate macros 64 times :)

::: js/src/jit/arm/Architecture-arm.h
@@ +21,5 @@
> +#define EVAL2(...) EVAL3(EVAL3(EVAL3(EVAL3(__VA_ARGS__))))
> +#define EVAL3(...) __VA_ARGS__
> +
> +// CHECK is passed either 1 or 2 arguments. If it is passed 1 argument, it evaluates to 0
> +// if it passed 2 argument, it evaluates to its second argument..

nit: missing word, missing plural form, optional dot:

if it *is* passed 2 arguments, it evaluates to its second argument.
Attachment #8472592 - Flags: review?(benj)
This probably isn't a horribly useful, since Baseline perf isn't key.
Attachment #8492214 - Flags: review?(benj)
Attached patch trackRegisters-r0.patch (obsolete) — Splinter Review
This is the patch that actually fixes the observed perf regressions.  It tracks the set of all registers that are used by a function, and then we avoid spilling those registers.  Currently, it only works on ARM, because it is the only place it is needed, but there is no good reason we can't add it for other architectures.
Attachment #8492220 - Flags: review?(benj)
Comment on attachment 8492214 [details] [diff] [review]
reduceFPushes-r0.patch

Review of attachment 8492214 [details] [diff] [review]:
-----------------------------------------------------------------

If I understand correctly, the mallocStub and freeStub don't actually need the RegisterSave argument, so this patch can be way slimmer than it currently is (allocateGCThing and many other functions don't need it).

Other than that, the value of RegisterSet is always known statically, so having it a parameter seems overkill. How about having instead 2 functions (either templated on RegisterSave or not), that make it clearer that one should only be used in BC and the only one in Ion? (e.g. callPreBarrierBaseline vs callPreBarrierIon, etc.)

Does this patch apply atop the first one? If so, please make sure to have somebody review the first one.

::: js/src/jit/CodeGenerator.cpp
@@ +5336,5 @@
>      return code;
>  }
>  
>  JitCode *
> +JitRuntime::generateMallocStub(JSContext *cx, RegisterSave saveAmt)

nit: the RegisterSave is unused

@@ +5374,5 @@
>      return code;
>  }
>  
>  JitCode *
> +JitRuntime::generateFreeStub(JSContext *cx, RegisterSave saveAmt)

nit: the RegisterSave is unused

::: js/src/jit/arm/Trampoline-arm.cpp
@@ +33,5 @@
>                       (1ULL << FloatRegisters::d13) |
>                       (1ULL << FloatRegisters::d14) |
>                       (1ULL << FloatRegisters::d15));
>  
> +

nit: 2 added blank lines here

@@ +902,5 @@
>      return wrapper;
>  }
>  
>  JitCode *
> +JitRuntime::generatePreBarrier(JSContext *cx, MIRType type, RegisterSave saveAmt)

nit: could you rename saveAmt into something else? I am not sure what Amt means here.

@@ +912,5 @@
>          save = RegisterSet(GeneralRegisterSet(Registers::VolatileMask),
>                             FloatRegisterSet(FloatRegisters::VolatileDoubleMask));
> +    } else if (cx->runtime()->jitSupportsFloatingPoint) {
> +        save = RegisterSet(GeneralRegisterSet(Registers::VolatileMask),
> +                           FloatRegisterSet(FloatRegisters::SmallVolatileDoubleMask));

nit: Could you make this branching simpler:

if (jitSupportsFloatingPoint()) {
  if (saveAmt == LargeSet) {
    ...
  } else {
    MOZ_ASSERT(saveAmt == BaselineSet);
    ...
  }
}

::: js/src/jit/shared/Assembler-shared.h
@@ +34,5 @@
>  };
>  
> +enum RegisterSave {
> +    BaselineSet,
> +    LargeSet

LargeSet sounds quite vague, and it seems to only be used by Ion, how about renaming it to IonSet? If it isn't used only by Ion, what about EntireSet, or WholeSet?

::: js/src/jit/shared/CodeGenerator-shared.h
@@ +311,5 @@
>      bool emitTruncateDouble(FloatRegister src, Register dest);
>      bool emitTruncateFloat32(FloatRegister src, Register dest);
>  
> +    void emitPreBarrier(Register base, const LAllocation *index, MIRType type, RegisterSave saveSet);
> +    void emitPreBarrier(Address address, MIRType type, RegisterSave saveSetb);

nit: saveSetb => saveSet
Attachment #8492214 - Flags: review?(benj)
Comment on attachment 8492220 [details] [diff] [review]
trackRegisters-r0.patch

Review of attachment 8492220 [details] [diff] [review]:
-----------------------------------------------------------------

I think the backtracking allocator needs more. Also, I'd be more comfortable if somebody who worked on the regalloc looked at this.

::: js/src/jit/BacktrackingAllocator.cpp
@@ +811,5 @@
>      if (VirtualRegisterGroup *group = reg->group()) {
>          if (!group->allocation.isRegister())
>              group->allocation = LAllocation(r.reg);
>      }
> +    graph.recordRegister(r.reg);

When evicting registers, shouldn't we as well remove the register from the recorded uses?

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +56,5 @@
>      frameInitialAdjustment_(0)
>  {
>      if (!gen->compilingAsmJS())
>          masm.setInstrumentation(&sps_);
> +    masm.setUsedRegs(graph->usedRegs());

nit: please keep one blank line before the multi line comment afterwards

::: js/src/jit/shared/CodeGenerator-shared.h
@@ +351,5 @@
>      // instruction [this is purely an optimization].  All other volatiles must
>      // be saved and restored in case future LIR instructions need those values.)
>      void saveVolatile(Register output) {
>          RegisterSet regs = RegisterSet::Volatile();
> +        regs = RegisterSet::Intersect(regs, masm.usedRegs());

Could it be worth to have a RegisterSet::VolatileAnd() static method that would embed these two calls? (as we have VolatileNot(), for instance)

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +694,5 @@
>  
>      void abiret() {
>          ret();
>      }
> +    void setUsedRegs(RegisterSet rs) {}

please add a comment that these two functions are here for ARM compat.
Attachment #8492220 - Flags: review?(benj)
(In reply to Benjamin Bouvier [:bbouvier] from comment #7)
> Comment on attachment 8492220 [details] [diff] [review]
> trackRegisters-r0.patch
> 
> Review of attachment 8492220 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think the backtracking allocator needs more. Also, I'd be more comfortable
> if somebody who worked on the regalloc looked at this.
> 
> ::: js/src/jit/BacktrackingAllocator.cpp
> @@ +811,5 @@
> >      if (VirtualRegisterGroup *group = reg->group()) {
> >          if (!group->allocation.isRegister())
> >              group->allocation = LAllocation(r.reg);
> >      }
> > +    graph.recordRegister(r.reg);
> 
> When evicting registers, shouldn't we as well remove the register from the
> recorded uses?
Not yet.  The idea is it is the set of all registers used anyplace in this function, so there is only one quantity to keep track of for the whole function, rather than attempting to keep track of it for every instruction (although that really shouldn't be that hard)

> ::: js/src/jit/shared/CodeGenerator-shared.h
> @@ +351,5 @@
> >      // instruction [this is purely an optimization].  All other volatiles must
> >      // be saved and restored in case future LIR instructions need those values.)
> >      void saveVolatile(Register output) {
> >          RegisterSet regs = RegisterSet::Volatile();
> > +        regs = RegisterSet::Intersect(regs, masm.usedRegs());
> 
> Could it be worth to have a RegisterSet::VolatileAnd() static method that
> would embed these two calls? (as we have VolatileNot(), for instance)
Sure.  I could also complement the meaning of the set, and then just use VolatileNot.
> 
> ::: js/src/jit/shared/MacroAssembler-x86-shared.h
> @@ +694,5 @@
> >  
> >      void abiret() {
> >          ret();
> >      }
> > +    void setUsedRegs(RegisterSet rs) {}
> 
> please add a comment that these two functions are here for ARM compat.

They don't need to be for compat!  This optimization is every bit as valid on x86 and x64 as it is on arm, just they have fewer registers, so the expected returns are much much lower (also, this was to fix a perf regression on ARM only).
Attached patch trackRegisters-r1.patch (obsolete) — Splinter Review
Fixed up the nits, passes jit-tests on arm-sim, x86 and x64.  Asking for review from someone more familiar with the register allocators.
Attachment #8494968 - Flags: review?(bhackett1024)
Comment on attachment 8494968 [details] [diff] [review]
trackRegisters-r1.patch

Review of attachment 8494968 [details] [diff] [review]:
-----------------------------------------------------------------

wrong patch... but it has the right name, this does not bode well.
Attachment #8494968 - Flags: review?(bhackett1024)
Ok, not sure what happened before, but I managed to get it all straightened out (I think)
Attachment #8492220 - Attachment is obsolete: true
Attachment #8494968 - Attachment is obsolete: true
Attachment #8495517 - Flags: review?(bhackett1024)
Comment on attachment 8495517 [details] [diff] [review]
realTrackRegisters-r0.patch

Review of attachment 8495517 [details] [diff] [review]:
-----------------------------------------------------------------

Generally I think this looks OK but sprinkling recordRegister() calls around in the allocators is pretty unfortunate.  It would be really nice if you could use LiveInterval::setAllocation as a pinchpoint for backtracking and lsra.  The only thing preventing that seems to be that LiveInterval::getAllocation() returns a non-const internal pointer, but from looking through the code I don't see any reason this can't be changed to return a const LAllocation*, which would be a nice improvement but might require const-ifying some other functions/structures.

Also, this patch has a lot of formatting problems, which I noted below.

::: js/src/jit/LIR.h
@@ +1540,5 @@
> +    // registers (arm64, VFP on arm), there is a tendency to not use
> +    // many registers. Tracking which ones are used /anywhere/ in a
> +    // function allows the backend to not store as many registers
> +    // to the stack.
> +    RegisterSet usedRegs_;

Newline

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +40,5 @@
> +    // With 32 double registers (256 bytes of data!), saving and restoring the
> +    // whole context can be quite expensive. By tracking the set of registers
> +    // used anywhere in a function, it is possible to avoid large spills in
> +    // some special cases.
> +    RegisterSet usedRegs_;

Newline

@@ +57,5 @@
>      }
>  
>    public:
>      MacroAssemblerARM()
> +        : secondScratchReg_(lr), usedRegs_(RegisterSet::All())

Fix whitespace before :

@@ +462,5 @@
> +  public:
> +    void setUsedRegs(RegisterSet rs) {
> +        usedRegs_ = rs;
> +    }
> +    RegisterSet usedRegs( ) {

No space between ( )

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +27,5 @@
>      // reserved for unexpected spills or C++ function calls. It is maintained
>      // by functions which track stack alignment, which for clear distinction
>      // use StudlyCaps (for example, Push, Pop).
>      uint32_t framePushed_;
> +    RegisterSet usedRegs_;

Newline

@@ +695,5 @@
>      void abiret() {
>          ret();
>      }
> +    void setUsedRegs(RegisterSet rs) {usedRegs_ = rs; }
> +    RegisterSet usedRegs() { return usedRegs_; }

Newline
Attachment #8495517 - Flags: review?(bhackett1024)
Blocks: 1074102
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: