Closed Bug 694481 Opened 13 years ago Closed 13 years ago

IonMonkey: Greedy regalloc bug on x64, callTypeBarriers.js

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dvander, Assigned: dvander)

References

Details

Attachments

(1 file, 3 obsolete files)

The problem here is that the greedy allocator doesn't have liveness information, so what happens: 

v1 = ...
while (v1) {
  v2 ..
  v3 ..
}

At v3, we allocate a stack slot for v2. At v2, we free the stack slot. Then at the "while", we allocate the same stack slot for v1. We didn't realize we needed to hold v1 live across the loop edge, so its slot will now get clobbered.
Attached patch fix (obsolete) — Splinter Review
This fix scans all loops and pre-allocates stack storage for every definition used from within the loop that is defined outside the loop.

This can create useless spills, even inside loops, so as a small optimization we no longer spill unless backingStack() was called.
Attachment #567170 - Flags: review?(sstangl)
Attached patch v2 (obsolete) — Splinter Review
Original patch didn't actually scan inner loops properly.
Attachment #567170 - Attachment is obsolete: true
Attachment #567170 - Flags: review?(sstangl)
Attachment #567172 - Flags: review?(sstangl)
Comment on attachment 567172 [details] [diff] [review]
v2

Urgh, nevermind, this way of traversing the graph is bogus.
Attachment #567172 - Flags: review?(sstangl)
Attached patch v3 (obsolete) — Splinter Review
This version finds natural loops instead.
Attachment #567172 - Attachment is obsolete: true
Attachment #567210 - Flags: review?(sstangl)
Comment on attachment 567210 [details] [diff] [review]
v3

Review of attachment 567210 [details] [diff] [review]:
-----------------------------------------------------------------

I'm leaving this patch as r? for the time being: I want to make sure that we can't use the IonBuilder to construct equivalent loop information. If we can, then we might as well change ReorderBlocks() to make loops contiguous, which knocks out some heavyweight code in LSRA that necessarily deviates from Wimmer's algorithm.

This business of looping over all blocks in each for loop also makes Greedy O(n^2), which may be ungood, since the only reason to use Greedy is because of its speed.

::: js/src/ion/GreedyAllocator.cpp
@@ +792,5 @@
> +// immediately containing loop, then stack space for that definition must be
> +// reserved ahead of time. Otherwise, we could re-use storage that has been
> +// temporarily allocated - see bug 694481.
> +bool
> +GreedyAllocator::findLoopCarriedUses(LBlock *backedge)

I'm convinced that this is correct, but I would like to measure the effect on runtime, since Greedy is supposed to be significantly faster than LSRA.

@@ +799,5 @@
> +
> +    MBasicBlock *mheader = backedge->mir()->loopHeader();
> +
> +    uint32 upperBound = backedge->lastId();
> +    uint32 lowerBound = mheader->lir()->firstId();

Is this logic sound? MIR can lower to multiple LIR, which need not be contiguous (or even in the same block!).

::: js/src/ion/IonAnalysis.cpp
@@ +816,5 @@
> +ion::FindNaturalLoops(MIRGraph &graph)
> +{
> +    Vector<MBasicBlock *, 8, SystemAllocPolicy> worklist;
> +
> +    // RPO guarantees we'll see inner backedges before outer backedges.

RPO permits seeing an outer backedge before the corresponding inner backedge. It may be the case that our engine never produces such orderings, but 1) we would have to prove and assert that, and 2) we may actually desire such orderings for the purpose of removing the liveness loop hack in LinearScan.

For example, consider the following arrangement of basic blocks:

for (A) { B; for (C); {D} E; } F;
WLOG, let the edges from D and E be backedges.

The successors of A are B and F.
The successors of C are D and E.

AFBCED is then a valid ordering in RPO, where E is visited before D. More generally, D could be positioned anywhere in the ordering after C, and we would still satisfy RPO. RPO on a non-tree graph is actually a really weak claim.

::: js/src/ion/MIRGraph.h
@@ +359,5 @@
> +        JS_ASSERT(loop->isLoopHeader());
> +        containingLoop_ = loop;
> +    }
> +    MBasicBlock *containingLoop() const {
> +        return containingLoop_;

JS_ASSERT(containingLoop_->isLoopHeader());

@@ +401,5 @@
>      // Utility mark for traversal algorithms.
>      bool mark_;
>  
>      Vector<MBasicBlock *, 1, IonAllocPolicy> immediatelyDominated_;
> +    Vector<MBasicBlock *, 1, IonAllocPolicy> containedInLoop_;

It is a pity that we can't put containedInLoop_ and containingLoop_ in a union. On the other hand, loopDepth_ and containingLoop_ could be unioned, if we care.

@@ +406,4 @@
>      MBasicBlock *immediateDominator_;
>      size_t numDominated_;
> +    size_t loopDepth_;
> +    MBasicBlock *containingLoop_;

Would prefer "loopHeader_". Participles are awkward in variable names.
(In reply to Sean Stangl from comment #5)
> I'm leaving this patch as r? for the time being: I want to make sure that we
> can't use the IonBuilder to construct equivalent loop information. If we
> can, then we might as well change ReorderBlocks() to make loops contiguous,
> which knocks out some heavyweight code in LSRA that necessarily deviates
> from Wimmer's algorithm.

We can, in the same way we could have built the dominator tree this way (which is also an O(n^2) algorithm), but given the problems we've encountered with phis I'm leaning against adding this sort of thing to IonBuilder. Having the analysis separate makes it much easier to debug, and since we're not using the greedy allocator (and it's not yet clear we have any plans to), it's probably premature optimization to worry about this step.
(In reply to Sean Stangl from comment #5)
> I'm convinced that this is correct, but I would like to measure the effect
> on runtime, since Greedy is supposed to be significantly faster than LSRA.

The actual intent was to bootstrap code generation so we weren't blocked on LSRA. They're both greedy algorithms, both fairly complicated, and both need some liveness information (though the "Greedy" one, in this patch, is conservative). I suspect LSRA is actually more expensive but I haven't measured.

> 
> @@ +799,5 @@
> > +
> > +    MBasicBlock *mheader = backedge->mir()->loopHeader();
> > +
> > +    uint32 upperBound = backedge->lastId();
> > +    uint32 lowerBound = mheader->lir()->firstId();
> 
> Is this logic sound? MIR can lower to multiple LIR, which need not be
> contiguous (or even in the same block!).

It's sound in that, any LIR defs inside the loop body should be >= lowerBound and < upperBound (MIR doesn't play into it)

> AFBCED is then a valid ordering in RPO, where E is visited before D. More
> generally, D could be positioned anywhere in the ordering after C, and we
> would still satisfy RPO. RPO on a non-tree graph is actually a really weak
> claim.

Great point, I didn't realize this! My goal here is to fix a greedy allocation bug. If it turns out to be easier to verify or assert on our algorithm yielding the "right" order, I'd prefer to take that - otherwise I'll fix FindNaturalLoops().

> It is a pity that we can't put containedInLoop_ and containingLoop_ in a
> union. On the other hand, loopDepth_ and containingLoop_ could be unioned,
> if we care.

I doubt it. MBasicBlock is getting a little big but it's not worth worrying about yet. loopDepth_ is actually not used anywhere though so it can just go.

> 
> @@ +406,4 @@
> >      MBasicBlock *immediateDominator_;
> >      size_t numDominated_;
> > +    size_t loopDepth_;
> > +    MBasicBlock *containingLoop_;
> 
> Would prefer "loopHeader_". Participles are awkward in variable names.

Good idea.
Attached patch v4Splinter Review
It turns out this postordering property was already asserted on: there's an assert that makes sure when finding a backedge, that it is not already marked as having a parent loop header. Also, we assert that in the predecessor search that we don't find a backedge which has no header.
Attachment #567210 - Attachment is obsolete: true
Attachment #567210 - Flags: review?(sstangl)
Attachment #569255 - Flags: review?(sstangl)
Attachment #569255 - Flags: review?(sstangl) → review+
https://hg.mozilla.org/projects/ionmonkey/rev/e11561a4aa62
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.