Closed Bug 709731 Opened 12 years ago Closed 12 years ago

IonMonkey: allow reusing input registers for output registers

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 3 obsolete files)

The plan here is to add an extra bit to LUse to indicate that an input register may be reused as an output register. This has the following advantages:

1) Instructions like MSlots, MArrayLength etc could use the same register for both the input and output, avoiding spills.

2) Combined with defineReuseInput(), it will avoid the spills in bug 688078 (since use and def intervals no longer overlap)

3) Fix an LSRA bug (bug 703376 comment 5)

The greedy allocator allocates output registers before input registers. The most obvious change there would be to maintain a register set with registers assigned to definitions, and OR'ing this set with the available register set, if the use has the "only used at start" bit set.
Blocks: 688078
What is the difference between this new bit and PASSTHROUGH (which extends a virtual register's live range) or MUST_REUSE_INPUT (which splits the interval)?
This was way more complicated than I expected, I had to change a lot of things to make it work. I have a WIP patch that almost passes jit-tests on x86, the one remaining failure is a "too much recursion" error. The failing test makes recursive calls so maybe we allocate too many stack slots or something, I will investigate.

I fuzzed the patch for about an hour and it found no new problems, but I will keep it running for a few more days.

Next week I'm going to clean this up and split the patch in a number of smaller patches.
Depends on: 711936
Depends on: 712278
Depends on: 710983
Attached patch WIP (obsolete) — Splinter Review
Very close, will finish this on tuesday.
Attached patch Patch (obsolete) — Splinter Review
Passes tests on x86/x64. ARM needs a follow-up patch, but this should be pretty straight-forward, assumming ARM triggers no LSRA bugs.

I looked at the testcase in bug 688078 and some Sunspider bitops tests and regalloc is either comparable or better than before.
Attachment #584224 - Attachment is obsolete: true
Attachment #584440 - Flags: review?(dvander)
Comment on attachment 584440 [details] [diff] [review]
Patch

Review of attachment 584440 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, thanks for taking this on!

::: js/src/ion/GreedyAllocator.cpp
@@ +547,5 @@
> +                return false;
> +            break;
> +
> +          case LDefinition::MUST_REUSE_INPUT:
> +            if (!allocateSameAsInput(def, ins->getOperand(def->getReusedInput()), &reg))

This will probably have to rebased since GRA changed quite a bit. It should work to separate allocateDefinition() into two functions, one that allocates and another that spills, and then call the former for temporaries (before inputs are allocated, which you already do).

::: js/src/ion/InlineList.h
@@ +156,5 @@
> +            at = this;
> +        if (at == tail_)
> +            return;
> +#ifdef DEBUG
> +        modifyCount_++; //XXX

What's the XXX for?

::: js/src/ion/LIR.h
@@ +284,4 @@
>      }
>  
>    public:
> +    LUse(uint32 vreg, Policy policy, bool usedAtStart=false) {

Nit: spaces between =

@@ +421,5 @@
>  
>      static const uint32 TYPE_BITS = 3;
>      static const uint32 TYPE_SHIFT = 0;
>      static const uint32 TYPE_MASK = (1 << TYPE_BITS) - 1;
> +    static const uint32 POLICY_BITS = 3;

It looks like we still only have four policies?

::: js/src/ion/LinearScan.cpp
@@ +478,5 @@
>          // Variables are assumed alive for the entire block, a define shortens
>          // the interval to the point of definition.
>          for (BitSet::Iterator i(live->begin()); i != live->end(); i++) {
>              vregs[*i].getInterval(0)->addRange(inputOf(block->firstId()),
> +                                               outputOf(block->lastId()).next());

Out of curiosity, why are variables live at the output of the instruction after the block?

@@ +486,5 @@
>          // point of definition, if found.
>          for (LInstructionReverseIterator ins = block->rbegin(); ins != block->rend(); ins++) {
>              // Calls may clobber registers, so force a spill and reload around the callsite.
>              if (ins->isCall()) {
> +                RegisterSet toSpill = ins->spillRegs();

Does it work for this to be all registers? We don't have to remove spillRegs() with this patch, but I'm hoping this patch made it possible.

@@ +502,5 @@
>  
>                      // Ensure that if there aren't any uses, there's at least
>                      // some interval for the output to go into.
>                      if (vregs[def].getInterval(0)->numRanges() == 0)
> +                        vregs[def].getInterval(0)->addRange(from, outputOf(*ins)); //XXX

XXX?

@@ -867,5 @@
> -        for (size_t i = 0; i < reg->numUses(); i++) {
> -            LOperand *use = reg->getUse(i);
> -            CodePosition pos = inputOf(use->ins);
> -            if (use->snapshot)
> -                pos = outputOf(use->ins);

Nice. I suspect this was just wrong, but appeared to be necessary given how LSRA used to work.

@@ +893,5 @@
> +                LAllocation *alloc = reg->ins()->getOperand(reg->def()->getReusedInput());
> +                LAllocation *origAlloc = LAllocation::New(*alloc);
> +
> +                *alloc = *interval->getAllocation();
> +                moveBeforeAlloc(inputOf(reg->ins()), origAlloc, alloc);

This creates a move which is hidden from safepoint building, since normally we wouldn't want the instruction's outputs included in the safepoint map. For now, I'm not sure whether we actually use this policy with gcthings or nunboxes. So you might be able to assert on it.

Otherwise, we'll have to fix it. Fix can be follow-up though.

::: js/src/ion/LinearScan.h
@@ +60,5 @@
>   * Represents with better-than-instruction precision a position in the
>   * instruction stream.
>   *
>   * An issue comes up when dealing live intervals as to how to represent
>   * information such as "this register is only needed for the input of

This comment discusses how intervals overlap; is it still accurate?

::: js/src/ion/shared/Lowering-shared-inl.h
@@ +361,5 @@
>  LIRGeneratorShared::annotate(T *ins)
>  {
>      for (size_t i = 0; i < ins->numDefs(); i++) {
> +        if (ins->getDef(i)->policy() != LDefinition::REDEFINED)
> +        {

Nit: brace on previous line

::: js/src/ion/shared/Lowering-x86-shared.cpp
@@ +68,5 @@
>      LDefinition tempInt;
>      if (opd->type() == MIRType_Int32) {
> +        index = useRegisterAtStart(opd);
> +        tempInt = tempCopy(opd);
> +        tempInt.setReusedInput(0);

Could we have tempCopy(mir, index) to avoid manual calls to setReusedInput() ?

::: js/src/ion/x86/Lowering-x86.cpp
@@ +255,5 @@
>  
>      // shift operator should be constant or in register ecx
>      // x86 can't shift a non-ecx register
>      if (rhs->isConstant())
> +        ins->setOperand(1, useOrConstantAtStart(rhs));

This should probably be named "useAtStartOrConstant"
Attachment #584440 - Flags: review?(dvander) → review+
(In reply to David Anderson [:dvander] from comment #5)
>
> It looks like we still only have four policies?

Oops, I fixed this and made some other small changes necessary to pass jit-tests. I must have forgotten to qref or attach it here, will ask for re-review after rebasing.

> >              vregs[*i].getInterval(0)->addRange(inputOf(block->firstId()),
> > +                                               outputOf(block->lastId()).next());
> 
> Out of curiosity, why are variables live at the output of the instruction
> after the block?

Since the end of the range is exclusive, outputOf(block->lastId()) covers only the input half. So we have to get the next position after the output position to make the range cover the entire instruction. The range does not extend to the instruction after the block.

> 
> Does it work for this to be all registers? We don't have to remove
> spillRegs() with this patch, but I'm hoping this patch made it possible.

Yes, the LSRA part shouldn't be too hard. I tried to do this last week but I was not sure about the greedy allocator. Since this patch is already quite complicated I'll file a follow-up bug.

> 
> This creates a move which is hidden from safepoint building, since normally
> we wouldn't want the instruction's outputs included in the safepoint map.
> For now, I'm not sure whether we actually use this policy with gcthings or
> nunboxes. So you might be able to assert on it.
> 
> Otherwise, we'll have to fix it. Fix can be follow-up though.

Good point, I'll look into it.
 
> 
> Could we have tempCopy(mir, index) to avoid manual calls to setReusedInput()
> ?

Nice. We used to store an LUse-pointer, but now that we store the index we can pass it directly.
Blocks: 714428
Attached patch Patch v2 (obsolete) — Splinter Review
Asking for a second review because this patch is quite different from the previous patch (mainly due to the safepoint and GRA work).

Passes tests with both register allocators on x86 and x64, and fixes bug703376.js with LSRA.
Attachment #584440 - Attachment is obsolete: true
Attachment #585309 - Flags: review?(dvander)
Attached patch Patch v2Splinter Review
Attachment #585309 - Attachment is obsolete: true
Attachment #585309 - Flags: review?(dvander)
Attachment #585312 - Flags: review?(dvander)
SAME_AS is no longer used as requirement, but we still use it as hint. The patch is fairly straight-forward; if the reused input register is still free, grab it.

The patch gets rid of some moves on a benchmark I was looking at.
Attachment #585320 - Flags: review?(dvander)
Attachment #585312 - Flags: review?(dvander) → review+
Attachment #585320 - Flags: review?(dvander) → review+
http://hg.mozilla.org/projects/ionmonkey/rev/0e2fdda81828
http://hg.mozilla.org/projects/ionmonkey/rev/89bcd34af894
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
No longer depends on: 712278
You need to log in before you can comment on or make changes to this bug.