Last Comment Bug 694481 - IonMonkey: Greedy regalloc bug on x64, callTypeBarriers.js
: IonMonkey: Greedy regalloc bug on x64, callTypeBarriers.js
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: David Anderson [:dvander]
:
Mentors:
Depends on:
Blocks: 677337
  Show dependency treegraph
 
Reported: 2011-10-13 18:46 PDT by David Anderson [:dvander]
Modified: 2011-11-04 13:53 PDT (History)
0 users
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
fix (11.15 KB, patch)
2011-10-14 13:38 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
v2 (11.25 KB, patch)
2011-10-14 13:42 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
v3 (16.36 KB, patch)
2011-10-14 16:08 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
v4 (13.16 KB, patch)
2011-10-24 18:37 PDT, David Anderson [:dvander]
sstangl: review+
Details | Diff | Splinter Review

Description David Anderson [:dvander] 2011-10-13 18:46:24 PDT
The problem here is that the greedy allocator doesn't have liveness information, so what happens: 

v1 = ...
while (v1) {
  v2 ..
  v3 ..
}

At v3, we allocate a stack slot for v2. At v2, we free the stack slot. Then at the "while", we allocate the same stack slot for v1. We didn't realize we needed to hold v1 live across the loop edge, so its slot will now get clobbered.
Comment 1 David Anderson [:dvander] 2011-10-14 13:38:10 PDT
Created attachment 567170 [details] [diff] [review]
fix

This fix scans all loops and pre-allocates stack storage for every definition used from within the loop that is defined outside the loop.

This can create useless spills, even inside loops, so as a small optimization we no longer spill unless backingStack() was called.
Comment 2 David Anderson [:dvander] 2011-10-14 13:42:00 PDT
Created attachment 567172 [details] [diff] [review]
v2

Original patch didn't actually scan inner loops properly.
Comment 3 David Anderson [:dvander] 2011-10-14 13:50:12 PDT
Comment on attachment 567172 [details] [diff] [review]
v2

Urgh, nevermind, this way of traversing the graph is bogus.
Comment 4 David Anderson [:dvander] 2011-10-14 16:08:51 PDT
Created attachment 567210 [details] [diff] [review]
v3

This version finds natural loops instead.
Comment 5 Sean Stangl [:sstangl] 2011-10-21 17:27:01 PDT
Comment on attachment 567210 [details] [diff] [review]
v3

Review of attachment 567210 [details] [diff] [review]:
-----------------------------------------------------------------

I'm leaving this patch as r? for the time being: I want to make sure that we can't use the IonBuilder to construct equivalent loop information. If we can, then we might as well change ReorderBlocks() to make loops contiguous, which knocks out some heavyweight code in LSRA that necessarily deviates from Wimmer's algorithm.

This business of looping over all blocks in each for loop also makes Greedy O(n^2), which may be ungood, since the only reason to use Greedy is because of its speed.

::: js/src/ion/GreedyAllocator.cpp
@@ +792,5 @@
> +// immediately containing loop, then stack space for that definition must be
> +// reserved ahead of time. Otherwise, we could re-use storage that has been
> +// temporarily allocated - see bug 694481.
> +bool
> +GreedyAllocator::findLoopCarriedUses(LBlock *backedge)

I'm convinced that this is correct, but I would like to measure the effect on runtime, since Greedy is supposed to be significantly faster than LSRA.

@@ +799,5 @@
> +
> +    MBasicBlock *mheader = backedge->mir()->loopHeader();
> +
> +    uint32 upperBound = backedge->lastId();
> +    uint32 lowerBound = mheader->lir()->firstId();

Is this logic sound? MIR can lower to multiple LIR, which need not be contiguous (or even in the same block!).

::: js/src/ion/IonAnalysis.cpp
@@ +816,5 @@
> +ion::FindNaturalLoops(MIRGraph &graph)
> +{
> +    Vector<MBasicBlock *, 8, SystemAllocPolicy> worklist;
> +
> +    // RPO guarantees we'll see inner backedges before outer backedges.

RPO permits seeing an outer backedge before the corresponding inner backedge. It may be the case that our engine never produces such orderings, but 1) we would have to prove and assert that, and 2) we may actually desire such orderings for the purpose of removing the liveness loop hack in LinearScan.

For example, consider the following arrangement of basic blocks:

for (A) { B; for (C); {D} E; } F;
WLOG, let the edges from D and E be backedges.

The successors of A are B and F.
The successors of C are D and E.

AFBCED is then a valid ordering in RPO, where E is visited before D. More generally, D could be positioned anywhere in the ordering after C, and we would still satisfy RPO. RPO on a non-tree graph is actually a really weak claim.

::: js/src/ion/MIRGraph.h
@@ +359,5 @@
> +        JS_ASSERT(loop->isLoopHeader());
> +        containingLoop_ = loop;
> +    }
> +    MBasicBlock *containingLoop() const {
> +        return containingLoop_;

JS_ASSERT(containingLoop_->isLoopHeader());

@@ +401,5 @@
>      // Utility mark for traversal algorithms.
>      bool mark_;
>  
>      Vector<MBasicBlock *, 1, IonAllocPolicy> immediatelyDominated_;
> +    Vector<MBasicBlock *, 1, IonAllocPolicy> containedInLoop_;

It is a pity that we can't put containedInLoop_ and containingLoop_ in a union. On the other hand, loopDepth_ and containingLoop_ could be unioned, if we care.

@@ +406,4 @@
>      MBasicBlock *immediateDominator_;
>      size_t numDominated_;
> +    size_t loopDepth_;
> +    MBasicBlock *containingLoop_;

Would prefer "loopHeader_". Participles are awkward in variable names.
Comment 6 David Anderson [:dvander] 2011-10-21 18:35:32 PDT
(In reply to Sean Stangl from comment #5)
> I'm leaving this patch as r? for the time being: I want to make sure that we
> can't use the IonBuilder to construct equivalent loop information. If we
> can, then we might as well change ReorderBlocks() to make loops contiguous,
> which knocks out some heavyweight code in LSRA that necessarily deviates
> from Wimmer's algorithm.

We can, in the same way we could have built the dominator tree this way (which is also an O(n^2) algorithm), but given the problems we've encountered with phis I'm leaning against adding this sort of thing to IonBuilder. Having the analysis separate makes it much easier to debug, and since we're not using the greedy allocator (and it's not yet clear we have any plans to), it's probably premature optimization to worry about this step.
Comment 7 David Anderson [:dvander] 2011-10-21 19:11:20 PDT
(In reply to Sean Stangl from comment #5)
> I'm convinced that this is correct, but I would like to measure the effect
> on runtime, since Greedy is supposed to be significantly faster than LSRA.

The actual intent was to bootstrap code generation so we weren't blocked on LSRA. They're both greedy algorithms, both fairly complicated, and both need some liveness information (though the "Greedy" one, in this patch, is conservative). I suspect LSRA is actually more expensive but I haven't measured.

> 
> @@ +799,5 @@
> > +
> > +    MBasicBlock *mheader = backedge->mir()->loopHeader();
> > +
> > +    uint32 upperBound = backedge->lastId();
> > +    uint32 lowerBound = mheader->lir()->firstId();
> 
> Is this logic sound? MIR can lower to multiple LIR, which need not be
> contiguous (or even in the same block!).

It's sound in that, any LIR defs inside the loop body should be >= lowerBound and < upperBound (MIR doesn't play into it)

> AFBCED is then a valid ordering in RPO, where E is visited before D. More
> generally, D could be positioned anywhere in the ordering after C, and we
> would still satisfy RPO. RPO on a non-tree graph is actually a really weak
> claim.

Great point, I didn't realize this! My goal here is to fix a greedy allocation bug. If it turns out to be easier to verify or assert on our algorithm yielding the "right" order, I'd prefer to take that - otherwise I'll fix FindNaturalLoops().

> It is a pity that we can't put containedInLoop_ and containingLoop_ in a
> union. On the other hand, loopDepth_ and containingLoop_ could be unioned,
> if we care.

I doubt it. MBasicBlock is getting a little big but it's not worth worrying about yet. loopDepth_ is actually not used anywhere though so it can just go.

> 
> @@ +406,4 @@
> >      MBasicBlock *immediateDominator_;
> >      size_t numDominated_;
> > +    size_t loopDepth_;
> > +    MBasicBlock *containingLoop_;
> 
> Would prefer "loopHeader_". Participles are awkward in variable names.

Good idea.
Comment 8 David Anderson [:dvander] 2011-10-24 18:37:54 PDT
Created attachment 569255 [details] [diff] [review]
v4

It turns out this postordering property was already asserted on: there's an assert that makes sure when finding a backedge, that it is not already marked as having a parent loop header. Also, we assert that in the predecessor search that we don't find a backedge which has no header.
Comment 9 David Anderson [:dvander] 2011-11-04 13:53:22 PDT
https://hg.mozilla.org/projects/ionmonkey/rev/e11561a4aa62

Note You need to log in before you can comment on or make changes to this bug.