Last Comment Bug 709731 - IonMonkey: allow reusing input registers for output registers
: IonMonkey: allow reusing input registers for output registers
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: Jan de Mooij [:jandem]
:
Mentors:
: 679317 (view as bug list)
Depends on: 710983 711936
Blocks: IonMonkey 688078 714428
  Show dependency treegraph
 
Reported: 2011-12-12 01:31 PST by Jan de Mooij [:jandem]
Modified: 2012-01-16 13:12 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP (67.09 KB, patch)
2011-12-24 13:35 PST, Jan de Mooij [:jandem]
no flags Details | Diff | Review
Patch (67.09 KB, patch)
2011-12-27 08:29 PST, Jan de Mooij [:jandem]
dvander: review+
Details | Diff | Review
Patch v2 (93.39 KB, patch)
2012-01-02 10:53 PST, Jan de Mooij [:jandem]
no flags Details | Diff | Review
Patch v2 (92.50 KB, patch)
2012-01-02 10:59 PST, Jan de Mooij [:jandem]
dvander: review+
Details | Diff | Review
Part 2: honor SAME_AS hints (4.29 KB, patch)
2012-01-02 12:16 PST, Jan de Mooij [:jandem]
dvander: review+
Details | Diff | Review

Description Jan de Mooij [:jandem] 2011-12-12 01:31:23 PST
The plan here is to add an extra bit to LUse to indicate that an input register may be reused as an output register. This has the following advantages:

1) Instructions like MSlots, MArrayLength etc could use the same register for both the input and output, avoiding spills.

2) Combined with defineReuseInput(), it will avoid the spills in bug 688078 (since use and def intervals no longer overlap)

3) Fix an LSRA bug (bug 703376 comment 5)

The greedy allocator allocates output registers before input registers. The most obvious change there would be to maintain a register set with registers assigned to definitions, and OR'ing this set with the available register set, if the use has the "only used at start" bit set.
Comment 1 David Anderson [:dvander] 2011-12-12 08:32:46 PST
What is the difference between this new bit and PASSTHROUGH (which extends a virtual register's live range) or MUST_REUSE_INPUT (which splits the interval)?
Comment 2 Jan de Mooij [:jandem] 2011-12-17 10:54:26 PST
This was way more complicated than I expected, I had to change a lot of things to make it work. I have a WIP patch that almost passes jit-tests on x86, the one remaining failure is a "too much recursion" error. The failing test makes recursive calls so maybe we allocate too many stack slots or something, I will investigate.

I fuzzed the patch for about an hour and it found no new problems, but I will keep it running for a few more days.

Next week I'm going to clean this up and split the patch in a number of smaller patches.
Comment 3 Jan de Mooij [:jandem] 2011-12-24 13:35:15 PST
Created attachment 584224 [details] [diff] [review]
WIP

Very close, will finish this on tuesday.
Comment 4 Jan de Mooij [:jandem] 2011-12-27 08:29:43 PST
Created attachment 584440 [details] [diff] [review]
Patch

Passes tests on x86/x64. ARM needs a follow-up patch, but this should be pretty straight-forward, assumming ARM triggers no LSRA bugs.

I looked at the testcase in bug 688078 and some Sunspider bitops tests and regalloc is either comparable or better than before.
Comment 5 David Anderson [:dvander] 2011-12-30 16:22:03 PST
Comment on attachment 584440 [details] [diff] [review]
Patch

Review of attachment 584440 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, thanks for taking this on!

::: js/src/ion/GreedyAllocator.cpp
@@ +547,5 @@
> +                return false;
> +            break;
> +
> +          case LDefinition::MUST_REUSE_INPUT:
> +            if (!allocateSameAsInput(def, ins->getOperand(def->getReusedInput()), &reg))

This will probably have to rebased since GRA changed quite a bit. It should work to separate allocateDefinition() into two functions, one that allocates and another that spills, and then call the former for temporaries (before inputs are allocated, which you already do).

::: js/src/ion/InlineList.h
@@ +156,5 @@
> +            at = this;
> +        if (at == tail_)
> +            return;
> +#ifdef DEBUG
> +        modifyCount_++; //XXX

What's the XXX for?

::: js/src/ion/LIR.h
@@ +284,4 @@
>      }
>  
>    public:
> +    LUse(uint32 vreg, Policy policy, bool usedAtStart=false) {

Nit: spaces between =

@@ +421,5 @@
>  
>      static const uint32 TYPE_BITS = 3;
>      static const uint32 TYPE_SHIFT = 0;
>      static const uint32 TYPE_MASK = (1 << TYPE_BITS) - 1;
> +    static const uint32 POLICY_BITS = 3;

It looks like we still only have four policies?

::: js/src/ion/LinearScan.cpp
@@ +478,5 @@
>          // Variables are assumed alive for the entire block, a define shortens
>          // the interval to the point of definition.
>          for (BitSet::Iterator i(live->begin()); i != live->end(); i++) {
>              vregs[*i].getInterval(0)->addRange(inputOf(block->firstId()),
> +                                               outputOf(block->lastId()).next());

Out of curiosity, why are variables live at the output of the instruction after the block?

@@ +486,5 @@
>          // point of definition, if found.
>          for (LInstructionReverseIterator ins = block->rbegin(); ins != block->rend(); ins++) {
>              // Calls may clobber registers, so force a spill and reload around the callsite.
>              if (ins->isCall()) {
> +                RegisterSet toSpill = ins->spillRegs();

Does it work for this to be all registers? We don't have to remove spillRegs() with this patch, but I'm hoping this patch made it possible.

@@ +502,5 @@
>  
>                      // Ensure that if there aren't any uses, there's at least
>                      // some interval for the output to go into.
>                      if (vregs[def].getInterval(0)->numRanges() == 0)
> +                        vregs[def].getInterval(0)->addRange(from, outputOf(*ins)); //XXX

XXX?

@@ -867,5 @@
> -        for (size_t i = 0; i < reg->numUses(); i++) {
> -            LOperand *use = reg->getUse(i);
> -            CodePosition pos = inputOf(use->ins);
> -            if (use->snapshot)
> -                pos = outputOf(use->ins);

Nice. I suspect this was just wrong, but appeared to be necessary given how LSRA used to work.

@@ +893,5 @@
> +                LAllocation *alloc = reg->ins()->getOperand(reg->def()->getReusedInput());
> +                LAllocation *origAlloc = LAllocation::New(*alloc);
> +
> +                *alloc = *interval->getAllocation();
> +                moveBeforeAlloc(inputOf(reg->ins()), origAlloc, alloc);

This creates a move which is hidden from safepoint building, since normally we wouldn't want the instruction's outputs included in the safepoint map. For now, I'm not sure whether we actually use this policy with gcthings or nunboxes. So you might be able to assert on it.

Otherwise, we'll have to fix it. Fix can be follow-up though.

::: js/src/ion/LinearScan.h
@@ +60,5 @@
>   * Represents with better-than-instruction precision a position in the
>   * instruction stream.
>   *
>   * An issue comes up when dealing live intervals as to how to represent
>   * information such as "this register is only needed for the input of

This comment discusses how intervals overlap; is it still accurate?

::: js/src/ion/shared/Lowering-shared-inl.h
@@ +361,5 @@
>  LIRGeneratorShared::annotate(T *ins)
>  {
>      for (size_t i = 0; i < ins->numDefs(); i++) {
> +        if (ins->getDef(i)->policy() != LDefinition::REDEFINED)
> +        {

Nit: brace on previous line

::: js/src/ion/shared/Lowering-x86-shared.cpp
@@ +68,5 @@
>      LDefinition tempInt;
>      if (opd->type() == MIRType_Int32) {
> +        index = useRegisterAtStart(opd);
> +        tempInt = tempCopy(opd);
> +        tempInt.setReusedInput(0);

Could we have tempCopy(mir, index) to avoid manual calls to setReusedInput() ?

::: js/src/ion/x86/Lowering-x86.cpp
@@ +255,5 @@
>  
>      // shift operator should be constant or in register ecx
>      // x86 can't shift a non-ecx register
>      if (rhs->isConstant())
> +        ins->setOperand(1, useOrConstantAtStart(rhs));

This should probably be named "useAtStartOrConstant"
Comment 6 Jan de Mooij [:jandem] 2011-12-31 01:48:29 PST
(In reply to David Anderson [:dvander] from comment #5)
>
> It looks like we still only have four policies?

Oops, I fixed this and made some other small changes necessary to pass jit-tests. I must have forgotten to qref or attach it here, will ask for re-review after rebasing.

> >              vregs[*i].getInterval(0)->addRange(inputOf(block->firstId()),
> > +                                               outputOf(block->lastId()).next());
> 
> Out of curiosity, why are variables live at the output of the instruction
> after the block?

Since the end of the range is exclusive, outputOf(block->lastId()) covers only the input half. So we have to get the next position after the output position to make the range cover the entire instruction. The range does not extend to the instruction after the block.

> 
> Does it work for this to be all registers? We don't have to remove
> spillRegs() with this patch, but I'm hoping this patch made it possible.

Yes, the LSRA part shouldn't be too hard. I tried to do this last week but I was not sure about the greedy allocator. Since this patch is already quite complicated I'll file a follow-up bug.

> 
> This creates a move which is hidden from safepoint building, since normally
> we wouldn't want the instruction's outputs included in the safepoint map.
> For now, I'm not sure whether we actually use this policy with gcthings or
> nunboxes. So you might be able to assert on it.
> 
> Otherwise, we'll have to fix it. Fix can be follow-up though.

Good point, I'll look into it.
 
> 
> Could we have tempCopy(mir, index) to avoid manual calls to setReusedInput()
> ?

Nice. We used to store an LUse-pointer, but now that we store the index we can pass it directly.
Comment 7 Jan de Mooij [:jandem] 2012-01-02 10:53:35 PST
Created attachment 585309 [details] [diff] [review]
Patch v2

Asking for a second review because this patch is quite different from the previous patch (mainly due to the safepoint and GRA work).

Passes tests with both register allocators on x86 and x64, and fixes bug703376.js with LSRA.
Comment 8 Jan de Mooij [:jandem] 2012-01-02 10:59:52 PST
Created attachment 585312 [details] [diff] [review]
Patch v2
Comment 9 Jan de Mooij [:jandem] 2012-01-02 12:16:06 PST
Created attachment 585320 [details] [diff] [review]
Part 2: honor SAME_AS hints

SAME_AS is no longer used as requirement, but we still use it as hint. The patch is fairly straight-forward; if the reused input register is still free, grab it.

The patch gets rid of some moves on a benchmark I was looking at.
Comment 11 Jan de Mooij [:jandem] 2012-01-16 13:12:49 PST
*** Bug 679317 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.