Closed Bug 733030 Opened 10 years ago Closed 7 years ago

IonMonkey: Fix some LSRA performance problems

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ion:t])

Attachments

(4 files, 1 obsolete file)

With the patch in bug 668305, Kraken audio-beat-detection still spends 76 ms (~20%) in the register allocator.

The most serious problem is the large number of virtual registers (> 3000) - the bitsets to track live vregs are large. I hope we can avoid iterating over these bitsets in some cases, but eventually we'll need chunked compilation (JM+TI uses at least 22 chunks for this function).

In the meantime, however, there are still a number of known regalloc problems we should fix.
If we assign a memory location to an interval, we don't want to add the interval to the active/inactive lists, since these lists are only used for registers.
Attachment #603205 - Flags: review?(sstangl)
Comment on attachment 603205 [details] [diff] [review]
Part 1: Only add intervals with registers to active/inactive

Review of attachment 603205 [details] [diff] [review]:
-----------------------------------------------------------------

Great.
Attachment #603205 - Flags: review?(sstangl) → review+
Oh, also -- I think finishInterval() can now have "if (!interval->reg()) return;" replaced with a JS_ASSERT().
Attached patch Part 1 follow-up (obsolete) — Splinter Review
While addressing comment 3, I noticed a few problems with stack slots:

1) We still have to call finishInterval to free the stack slot, this patch adds an activeSlots list for that. It's just like the "active" list, but we don't have to search it in findBestFreeRegister etc.

2) If the interval is the last interval, finishInterval should free the canonical spill slot (if there is one), not the interval's allocation. If the last interval uses a register, and another interval uses a stack slot, we'll now correctly free the stack slot.

3) The JS_ASSERT(interval->reg()) is no longer necessary since it's immediately dereferenced.
Attachment #603693 - Flags: review?(sstangl)
Attached patch Part 1 follow-upSplinter Review
Correct version this time.
Attachment #603693 - Attachment is obsolete: true
Attachment #603700 - Flags: review?(sstangl)
Attachment #603693 - Flags: review?(sstangl)
Some changes to setIntervalRequirement:

1) Split the loop in two loops: first search for requirements, then search for hints. This works since the list of uses is guaranteed to be sorted.

2) Don't search for hints if there's a canonical spill location. In this case we should just spill the interval (if there are register uses the interval will be split). This case is pretty common, and the patch is a reasonable crypto-md5 / audio-beat-detection speedup.
Attachment #603708 - Flags: review?(dvander)
Attachment #603708 - Flags: review?(dvander) → review+
This adds a new function addRangeAtHead (better name welcome), which we can use instead of addRange in most cases, because we know the range will be the first one (instructions are visited backwards). validateVirtualRegister is used to verify this.
Attachment #604036 - Flags: review?(sstangl)
Just a few questions before review, because I didn't understand some changes.

(In reply to Jan de Mooij (:jandem) from comment #4)
> 1) We still have to call finishInterval to free the stack slot, this patch
> adds an activeSlots list for that. It's just like the "active" list, but we
> don't have to search it in findBestFreeRegister etc.

Historically, before the optimization in Part 1A, finishInterval() ignored stack slots -- the first thing it did was "if (!interval->reg()) return;". Then it would go and, for that register, free its canonical spill location.

If we then have an interval that is a stack slot, does it make sense for it to have a spill location at all? Isn't it already spilled? Shouldn't there be nothing left to free?
(In reply to Sean Stangl from comment #8)
> 
> Historically, before the optimization in Part 1A, finishInterval() ignored
> stack slots -- the first thing it did was "if (!interval->reg()) return;".

Every virtual register has a list of intervals. Every interval has a pointer to its "parent" virtual register. interval->reg() returns this pointer. interval->reg() can/could be NULL in two cases:

1) "Bogus" intervals. These were removed a while ago (I thought this was what you meant in comment 3).

2) Fixed intervals. These should never end up in the active/inactive/unhandled sets or finishInterval - their allocation is always a single physical register and they are only used to determine if an interval intersects a fixed interval. The "fixedIntervalsUnion" is the union of all fixed intervals, so that we can quickly test whether an interval intersects any fixed interval (to avoid always looping over all fixed intervals).

So JS_ASSERT(interval->reg()); just checks if the interval belongs to a virtual register.

> Then it would go and, for that register, free its canonical spill location.
> 
> If we then have an interval that is a stack slot, does it make sense for it
> to have a spill location at all? Isn't it already spilled? Shouldn't there
> be nothing left to free?

A virtual register can have a single canonical spill location. If one of its intervals is assigned a stack slot, it must be equal to this canonical spill location. For example:

Virtual register 3 (caonical spill: stack slot 5):
  - interval 0: (allocation: register eax)
  - interval 1: (allocation: stack slot 5)
  - interval 2: (allocation: register ecx)

When finishInterval is called for interval 2, we notice that it's the last interval so we have to free the canonical spill slot.

The bug was that even if we assign a register (ecx) to interval 2, there is still a canonical spill slot to free (interval 1's slot is not freed earlier, a later interval may want to use the same canonical spill location).
Comment on attachment 603700 [details] [diff] [review]
Part 1 follow-up

Review of attachment 603700 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the explanation! Looks good.

::: js/src/ion/LinearScan.h
@@ +617,2 @@
>      InlineList<LiveInterval> handled;
> +#endif

Can this be DebugOnly<InlineList<LiveInterval> >?
Attachment #603700 - Flags: review?(sstangl) → review+
Comment on attachment 604036 [details] [diff] [review]
Part 3: Optimize addRange

Review of attachment 604036 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/LinearScan.cpp
@@ +546,5 @@
>          // the interval to the point of definition.
>          for (BitSet::Iterator liveRegId(*live); liveRegId; liveRegId++) {
> +            if (!vregs[*liveRegId].getInterval(0)->addRangeAtHead(inputOf(block->firstId()),
> +                                                                  outputOf(block->lastId()).next()))
> +                return false;

Style guidelines require if()s with multi-line conditions to have {} braces, with the opening brace on its own line.

@@ +654,5 @@
>                  // This is a dead phi, so add a dummy range over all phis. This
>                  // can go away if we have an earlier dead code elimination pass.
> +                if (!vregs[def].getInterval(0)->addRangeAtHead(inputOf(block->firstId()),
> +                                                               outputOf(block->firstId())))
> +                    return false;

Same {} nit

@@ +673,5 @@
>                  // Add an interval for this entire loop block
>                  for (BitSet::Iterator liveRegId(*live); liveRegId; liveRegId++) {
> +                    if (!vregs[*liveRegId].getInterval(0)->addRange(inputOf(loopBlock->lir()->firstId()),
> +                                                                    outputOf(loopBlock->lir()->lastId()).next()))
> +                        return false;

Same {} nit

@@ +720,5 @@
>      // rest of the allocator can assume it has at least one range.
>      if (fixedIntervalsUnion->numRanges() == 0) {
> +        if (!fixedIntervalsUnion->addRangeAtHead(CodePosition(0, CodePosition::INPUT),
> +                                                 CodePosition(0, CodePosition::OUTPUT)))
> +            return false;

Same {} nit

@@ +1954,5 @@
> +LiveInterval::validateRanges()
> +{
> +    Range *prev = NULL;
> +
> +    size_t i = ranges_.length() - 1;

This could be added to the for(){} without breaking the column limit.

@@ +1991,1 @@
>  #endif

Prefer "#endif // DEBUG" due to length of block.
Attachment #604036 - Flags: review?(sstangl) → review+
I landed these patches, but backed them out due to regressing Kraken-desaturate on x86. What's strange though is that I can't reproduce this regression locally, and diff'ing ion.cfg shows that the only difference with these patches is that we are now able to reuse a stack slot.
(In reply to Jan de Mooij (:jandem) from comment #12)
> I landed these patches, but backed them out due to regressing
> Kraken-desaturate on x86. What's strange though is that I can't reproduce
> this regression locally, and diff'ing ion.cfg shows that the only difference
> with these patches is that we are now able to reuse a stack slot.

Ever get this sorted out?
(In reply to Paul Wright from comment #13)
> 
> Ever get this sorted out?

I just relanded part 2 and 3:

http://hg.mozilla.org/projects/ionmonkey/rev/bd4524c5b011
http://hg.mozilla.org/projects/ionmonkey/rev/5d02acbb390c

Keeping this bug open for part 1.
Is part 1 a "wontfix"?
LSRA is now disabled by default (and will be removed soon) so we can close this.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
Resolution: INVALID → WONTFIX
You need to log in before you can comment on or make changes to this bug.