Closed Bug 694481 Opened 13 years ago Closed 13 years ago

IonMonkey: Greedy regalloc bug on x64, callTypeBarriers.js

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: dvander, Assigned: dvander)

References

Details

Attachments

(1 file, 3 obsolete files)

fix 13 years ago David Anderson [:dvander] - inactive, e-mail if emergency 11.15 KB, patch		Details \| Diff \| Splinter Review
v2 13 years ago David Anderson [:dvander] - inactive, e-mail if emergency 11.25 KB, patch		Details \| Diff \| Splinter Review
v3 13 years ago David Anderson [:dvander] - inactive, e-mail if emergency 16.36 KB, patch		Details \| Diff \| Splinter Review
v4 13 years ago David Anderson [:dvander] - inactive, e-mail if emergency 13.16 KB, patch	sstangl : review+	Details \| Diff \| Splinter Review

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Description

•

13 years ago

The problem here is that the greedy allocator doesn't have liveness information, so what happens: 

v1 = ...
while (v1) {
  v2 ..
  v3 ..
}

At v3, we allocate a stack slot for v2. At v2, we free the stack slot. Then at the "while", we allocate the same stack slot for v1. We didn't realize we needed to hold v1 live across the loop edge, so its slot will now get clobbered.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 1

•

13 years ago

Attached patch fix (obsolete) — Details — Splinter Review

This fix scans all loops and pre-allocates stack storage for every definition used from within the loop that is defined outside the loop.

This can create useless spills, even inside loops, so as a small optimization we no longer spill unless backingStack() was called.

Attachment #567170 - Flags: review?(sstangl)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 2

•

13 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

Original patch didn't actually scan inner loops properly.

Attachment #567170 - Attachment is obsolete: true

Attachment #567170 - Flags: review?(sstangl)

Attachment #567172 - Flags: review?(sstangl)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 3

•

13 years ago

Comment on attachment 567172 [details] [diff] [review]
v2

Urgh, nevermind, this way of traversing the graph is bogus.

Attachment #567172 - Flags: review?(sstangl)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 4

•

13 years ago

Attached patch v3 (obsolete) — Details — Splinter Review

This version finds natural loops instead.

Attachment #567172 - Attachment is obsolete: true

Attachment #567210 - Flags: review?(sstangl)

Sean Stangl [:sstangl]

Comment 5

•

13 years ago

Comment on attachment 567210 [details] [diff] [review]
v3

Review of attachment 567210 [details] [diff] [review]:
-----------------------------------------------------------------

I'm leaving this patch as r? for the time being: I want to make sure that we can't use the IonBuilder to construct equivalent loop information. If we can, then we might as well change ReorderBlocks() to make loops contiguous, which knocks out some heavyweight code in LSRA that necessarily deviates from Wimmer's algorithm.

This business of looping over all blocks in each for loop also makes Greedy O(n^2), which may be ungood, since the only reason to use Greedy is because of its speed.

::: js/src/ion/GreedyAllocator.cpp
@@ +792,5 @@
> +// immediately containing loop, then stack space for that definition must be
> +// reserved ahead of time. Otherwise, we could re-use storage that has been
> +// temporarily allocated - see bug 694481.
> +bool
> +GreedyAllocator::findLoopCarriedUses(LBlock *backedge)

I'm convinced that this is correct, but I would like to measure the effect on runtime, since Greedy is supposed to be significantly faster than LSRA.

@@ +799,5 @@
> +
> +    MBasicBlock *mheader = backedge->mir()->loopHeader();
> +
> +    uint32 upperBound = backedge->lastId();
> +    uint32 lowerBound = mheader->lir()->firstId();

Is this logic sound? MIR can lower to multiple LIR, which need not be contiguous (or even in the same block!).

::: js/src/ion/IonAnalysis.cpp
@@ +816,5 @@
> +ion::FindNaturalLoops(MIRGraph &graph)
> +{
> +    Vector<MBasicBlock *, 8, SystemAllocPolicy> worklist;
> +
> +    // RPO guarantees we'll see inner backedges before outer backedges.

RPO permits seeing an outer backedge before the corresponding inner backedge. It may be the case that our engine never produces such orderings, but 1) we would have to prove and assert that, and 2) we may actually desire such orderings for the purpose of removing the liveness loop hack in LinearScan.

For example, consider the following arrangement of basic blocks:

for (A) { B; for (C); {D} E; } F;
WLOG, let the edges from D and E be backedges.

The successors of A are B and F.
The successors of C are D and E.

AFBCED is then a valid ordering in RPO, where E is visited before D. More generally, D could be positioned anywhere in the ordering after C, and we would still satisfy RPO. RPO on a non-tree graph is actually a really weak claim.

::: js/src/ion/MIRGraph.h
@@ +359,5 @@
> +        JS_ASSERT(loop->isLoopHeader());
> +        containingLoop_ = loop;
> +    }
> +    MBasicBlock *containingLoop() const {
> +        return containingLoop_;

JS_ASSERT(containingLoop_->isLoopHeader());

@@ +401,5 @@
>      // Utility mark for traversal algorithms.
>      bool mark_;
>  
>      Vector<MBasicBlock *, 1, IonAllocPolicy> immediatelyDominated_;
> +    Vector<MBasicBlock *, 1, IonAllocPolicy> containedInLoop_;

It is a pity that we can't put containedInLoop_ and containingLoop_ in a union. On the other hand, loopDepth_ and containingLoop_ could be unioned, if we care.

@@ +406,4 @@
>      MBasicBlock *immediateDominator_;
>      size_t numDominated_;
> +    size_t loopDepth_;
> +    MBasicBlock *containingLoop_;

Would prefer "loopHeader_". Participles are awkward in variable names.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 6

•

13 years ago

(In reply to Sean Stangl from comment #5)
> I'm leaving this patch as r? for the time being: I want to make sure that we
> can't use the IonBuilder to construct equivalent loop information. If we
> can, then we might as well change ReorderBlocks() to make loops contiguous,
> which knocks out some heavyweight code in LSRA that necessarily deviates
> from Wimmer's algorithm.

We can, in the same way we could have built the dominator tree this way (which is also an O(n^2) algorithm), but given the problems we've encountered with phis I'm leaning against adding this sort of thing to IonBuilder. Having the analysis separate makes it much easier to debug, and since we're not using the greedy allocator (and it's not yet clear we have any plans to), it's probably premature optimization to worry about this step.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 7

•

13 years ago

(In reply to Sean Stangl from comment #5)
> I'm convinced that this is correct, but I would like to measure the effect
> on runtime, since Greedy is supposed to be significantly faster than LSRA.

The actual intent was to bootstrap code generation so we weren't blocked on LSRA. They're both greedy algorithms, both fairly complicated, and both need some liveness information (though the "Greedy" one, in this patch, is conservative). I suspect LSRA is actually more expensive but I haven't measured.

> 
> @@ +799,5 @@
> > +
> > +    MBasicBlock *mheader = backedge->mir()->loopHeader();
> > +
> > +    uint32 upperBound = backedge->lastId();
> > +    uint32 lowerBound = mheader->lir()->firstId();
> 
> Is this logic sound? MIR can lower to multiple LIR, which need not be
> contiguous (or even in the same block!).

It's sound in that, any LIR defs inside the loop body should be >= lowerBound and < upperBound (MIR doesn't play into it)

> AFBCED is then a valid ordering in RPO, where E is visited before D. More
> generally, D could be positioned anywhere in the ordering after C, and we
> would still satisfy RPO. RPO on a non-tree graph is actually a really weak
> claim.

Great point, I didn't realize this! My goal here is to fix a greedy allocation bug. If it turns out to be easier to verify or assert on our algorithm yielding the "right" order, I'd prefer to take that - otherwise I'll fix FindNaturalLoops().

> It is a pity that we can't put containedInLoop_ and containingLoop_ in a
> union. On the other hand, loopDepth_ and containingLoop_ could be unioned,
> if we care.

I doubt it. MBasicBlock is getting a little big but it's not worth worrying about yet. loopDepth_ is actually not used anywhere though so it can just go.

> 
> @@ +406,4 @@
> >      MBasicBlock *immediateDominator_;
> >      size_t numDominated_;
> > +    size_t loopDepth_;
> > +    MBasicBlock *containingLoop_;
> 
> Would prefer "loopHeader_". Participles are awkward in variable names.

Good idea.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 8

•

13 years ago

Attached patch v4 — Details — Splinter Review

It turns out this postordering property was already asserted on: there's an assert that makes sure when finding a backedge, that it is not already marked as having a parent loop header. Also, we assert that in the predecessor search that we don't find a backedge which has no header.

Attachment #567210 - Attachment is obsolete: true

Attachment #567210 - Flags: review?(sstangl)

Attachment #569255 - Flags: review?(sstangl)

Sean Stangl [:sstangl]

Updated

•

13 years ago

Attachment #569255 - Flags: review?(sstangl) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 9

•

13 years ago

https://hg.mozilla.org/projects/ionmonkey/rev/e11561a4aa62

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

IonMonkey: Greedy regalloc bug on x64, callTypeBarriers.js

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: dvander, Assigned: dvander)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 3 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Attachment

General

Description

File Name

Content Type