Closed Bug 844779 Opened 7 years ago Closed 6 years ago

IonMonkey: Improve order of blocks in rpo

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: h4writer, Assigned: sunfish)

References

(Blocks 2 open bugs)

Details

Attachments

(3 files, 8 obsolete files)

We are hurting on earley-boyer, because the order of the blocks make our loops less tight. It is better to have tighter loops, as this increases locality and less jumps. We must assume that a loop will run more times than other code. There is one particular construction we are bad at:

for (var i=0; i<100; i++) {

   if (something) {
      // a lot of code
      return;
   }

}

So the blocks containing "a lot of code" are not part of the loop anymore. The looping is done when we enter that branch. Still we put those blocks in between the loop blocks. This hurts, because the blocks of the loop itself are now further apart and we need to jump from the test (if) to after the test. We should add code to do this.

(This hurts us on earley-boyer.js rewrite_nboyer, and is a 2%-3% improvement with quick and dirty code)
Blocks: 765980
Assignee: general → hv1989
IonMonkey used to have a ReorderBlocks function. That was removed because it was buggy and didn't cooperate with AA (which is expecting some things). Bug 732120 is about the removal. To fix this bug we probably will add something like ReorderBlocks again, but we need to be careful that we don't bump into these issues again.

Also note that the current order isn't far away from the order required to fix this bug. I intend to keep the same numbering as much as possible. I.e. needed for AA the current code guarantees all blocks inside the loop are between the header and backedge, and all code after it is not between the header and backedge. That should definitely stay the same.
Depends on: 851115
Attached patch Reorder blocks (obsolete) — Splinter Review
This would contain everything that's needed to calculate the RPO and making sure loops are as tight as possible.
Attachment #725147 - Flags: feedback?(jdemooij)
Attached patch Recalculate loopdepth (obsolete) — Splinter Review
This bug is depending on bug 851115, because the loopdepth information is incorrect. If you want to execute the reorder block patch, this is a quick fix that fixes the loopdepth information. Applies on top of the reorder block patch
Attached patch Reorder blocks (obsolete) — Splinter Review
This patch contains the reorder block logic and the recalculate loopdepth logic. It isn't possible to have the correct loopdepth information during IonBuilder. It can't be done in IonBuilder, because we don't have the full graph yet.
Attachment #725147 - Attachment is obsolete: true
Attachment #725149 - Attachment is obsolete: true
Attachment #725147 - Flags: feedback?(jdemooij)
Attachment #726346 - Flags: review?(jdemooij)
Comment on attachment 726346 [details] [diff] [review]
Reorder blocks

Review of attachment 726346 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! The ReorderBlocks pass we used to have was a mess, but this one is a lot simpler. This patch will need some fuzzing though. Does this patch have any influence on Kraken or SS?

::: js/src/ion/IonAnalysis.cpp
@@ +622,5 @@
>  
>      return true;
>  }
>  
> +void

Nit: "static void" (use static for functions that are only used in the same C++ file).

@@ +642,5 @@
> +            MBasicBlock *pred = block->getPredecessor(i);
> +            if (pred->loopDepth() == depth)
> +                continue;
> +
> +            worklist.append(pred);

Check for OOM.

@@ +654,5 @@
> +    // Recalculate the loop depth information.
> +    //
> +    // The definition of a loop (important to correctly reorder blocks and for LICM) is:
> +    // all blocks reached when creating a path from the loop backedge to the loopheader.
> +    // This information difference from the loopdepth information gathered in IonBuilder.

s/difference/differs

@@ +688,5 @@
> +        // Add not yet visited successors to the worklist.
> +        for (size_t i = 0; i < block->numSuccessors(); i++) {
> +            MBasicBlock *succ = block->getSuccessor(i);
> +            if (!succ->isMarked()) {
> +                worklist.append(succ);

Check for OOM.

@@ +697,5 @@
> +                    SetLoopDepth(succ);
> +            }
> +        }
> +
> +        // if any added, process them first

Nit: "// If any blocks were added, process them first."

@@ +710,5 @@
> +
> +    return true;
> +}
> +
> +void

Nit: "static void"

@@ +759,5 @@
> +    graph.clearBlockList();
> +
> +    // Start with entry block
> +    pending.pushBack(entry);
> +    pending.peekBack()->mark();

Nit: entry->mark();

@@ +771,5 @@
> +        for (size_t i = 0; i < block->numSuccessors(); i++) {
> +            MBasicBlock *succ = block->getSuccessor(i);
> +            if (!succ->isMarked()) {
> +                pending.pushBack(succ);
> +                pending.peekBack()->mark();

Nit: succ->mark();

@@ +779,5 @@
> +        // Sort newly added blocks (if any added)
> +        if (block != pending.peekBack()) {
> +            MBasicBlockIterator start = pending.begin(block);
> +            start++;
> +            SortByLoopDepth(pending, start);

In most cases the successors will have the same loop depth. To avoid constructing the iterators etc, you can add "bool sortSuccessors = false;", and set it to |true| in the previous loop if succ->loopDepth() != block->loopDepth().

Then you can change "if (block != pending.peekBack())" to "if (sortSuccessors)" (if we didn't add anything sortSuccessors will be |false|)

@@ +796,5 @@
> +    // QUESTION:
> +    // I've seen that in the original order, that the osrblock->id() == preheader->id() - 1
> +    // I could do that too, but I see no reason ...
> +    // I also think it increases locality for nested loops, if they aren't in the loop.
> +    // I would've put even before the entry block if our system didn't wanted the entry to be 0.

Yeah, just adding it as second like we do now is best I think (and remove the comment :)

@@ +803,5 @@
> +        graph.moveBlockToBegin(osr);
> +        graph.moveBlockToBegin(entry);
> +    }
> +
> +    graph.unmarkBlocks();

Nit: JS_ASSERT(graph.numBlocks() == numBlocks); and add a DebugOnly<size_t> numBlocks = graph.numBlocks() at the start of this function.
Attachment #726346 - Flags: review?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #5)
> Then you can change "if (block != pending.peekBack())" to "if
> (sortSuccessors)" (if we didn't add anything sortSuccessors will be |false|)

Eh sorry, this part is not true. You still want to |continue| if (block != pending.peekBack()).
Attached patch Reorder blocksSplinter Review
This patch changes the order of blocks and that's delicate. Therefore a bit of fuzzing would definitely not hurt.
Attachment #726346 - Attachment is obsolete: true
Attachment #730633 - Flags: feedback?(choller)
Attachment #730633 - Flags: feedback?(gary)
Comment on attachment 730633 [details] [diff] [review]
Reorder blocks

Fixed mentioned nits + added an assert when trying to use rpoBegin/rpoEnd/poBegin/poEnd when graph isn't in rpo yet.

SS: no difference
Kraken: 0.2% - 0.5% increase
Attachment #730633 - Flags: review?(jdemooij)
Comment on attachment 730633 [details] [diff] [review]
Reorder blocks

Nothing particularly bad found overnight.
Attachment #730633 - Flags: feedback?(gary) → feedback+
Sorry I didn't get to this earlier. If you still need additional testing, can you rebase the patch on top of central so I can test it now? Otherwise, just go ahead and land it :)
Comment on attachment 730633 [details] [diff] [review]
Reorder blocks

After discussion with Jan, there is actually some room to improve the algorithm. He noticed that blocks with returns only got hoisted after the inner loop, but not after the outer loop. I have a new algorithm that should do that too.
Attachment #730633 - Flags: review?(jdemooij)
Attachment #730633 - Flags: feedback?(choller)
Comment on attachment 730633 [details] [diff] [review]
Reorder blocks

The new algorithm that I had in my mind, didn't pan out. I think this version is simpler and better.
Attachment #730633 - Flags: review?(jdemooij)
Comment on attachment 730633 [details] [diff] [review]
Reorder blocks

Review of attachment 730633 [details] [diff] [review]:
-----------------------------------------------------------------

Nice! Would be good to test this on a few emscripten benchmarks (with and without --no-asmjs), Alon posted some in bug 856813.

::: js/src/ion/IonAnalysis.cpp
@@ +819,5 @@
> +    graph.unmarkBlocks();
> +
> +    JS_ASSERT(graph.numBlocks() == numBlocks);
> +
> +#if DEBUG

Nit: #ifdef

::: js/src/ion/MIRGraph.h
@@ +481,5 @@
>      Vector<RawScript, 4, IonAllocPolicy> scripts_;
>  
>      size_t numBlocks_;
>  
> +#if DEBUG

Nit: #ifdef (and a few times below).
Attachment #730633 - Flags: review?(jdemooij) → review+
Comment on attachment 730633 [details] [diff] [review]
Reorder blocks

Review of attachment 730633 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonAnalysis.cpp
@@ +746,5 @@
> +bool
> +ion::ReorderBlocks(MIRGraph &graph)
> +{
> +    // The blocks get reordered to be in RPO (they are already).
> +    // The order will be improved by adding an extra contrain:

constrain
https://hg.mozilla.org/mozilla-central/rev/d93ad2c96f01
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
These changes cause a regression, see bug 858936.
https://hg.mozilla.org/integration/mozilla-inbound/rev/754dc0c091a6

Backout due to bug 858936. The logic in here is apparently not correct. Special construct of two loops was numbered incorrectly. Example:

function test(a) {
      for(var k=0; ; k++) {
        if (k%3 == 0) {
          if (k%2 == 0) {

          } else {
            break;
          }
        }
        continue;
      }

      for(var k=0; k<100; k++) {
        if (k%2 == 0) {
          continue;
        }
        break;
      }
}
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
To know if it made sense to try to get this back into the tree. Here the numbers on the asm.js ubench:

Without patch:
copy 6776
corrections 6154
fannkuch 3724
fasta 10882
life 5123
memops 6914
primes 5907
skinning 15176

With patch:
copy 6747
corrections 5091
fannkuch 3695
fasta 10815
life 4884
memops 6902
primes 5517
skinning 15016
Are those numbers before or after the fix for bug 881412, which landed recently? Either way, what is your opinion of that patch, and its relationship with this one?
The potential has decreased a bit, since the landing of bug 881412. It even looks like the priority to land this stays low:

Without patch:
copy 6651
corrections 5243
fannkuch 3709
fasta 10760
life 4626
memops 6862
primes 5872
skinning 14854

With patch:
copy 6760
corrections 5162
fannkuch 3712
fasta 10689
life 4608
memops 6889
primes 5897
skinning 15165
Blocks: 772329
I guess we should be able to improve the order with Bug 877878, as we could move branches out of their loops if they are not frequently executed.

The profiled data collected by Wu Wei in Bug 877872 are showing that, in PdfJS, we have short loops which have branches which are only executed a few times.  If we do not remove these branches, it might still be interesting to move them out-of the fast path.  Sadly, doing so will break the RPO / PO iterations, so we might want to do this reordering just before the CodeGen phase.
Attached patch make-loops-contiguous.patch (obsolete) — Splinter Review
Attached is another patch which implements this. I didn't do anything fancy; it just moves blocks to the ends of loops to make loops contiguous. I see it doing good-looking things on benchmarks like early-boyer.js, but I haven't yet seen any speed impacts yet either.
I ran the emscripten test suite and didn't see any changes. But, I tested on x86 and the effects might be more significant on ARM. Overall this sounds like a good improvement. It would also mean code generators like emscripten can have fewer issues with heuristics about where to put code (in or outside of a loop), if the VM makes the proper choice itself.
In particular, with this patch I see that I can alter emscripten's relooper hoisting heuristics (when to move code into a loop) without seeing slowdowns. That opens up some opportunities to reorder code to decrease code size (and possibly speed as well).
(In reply to Dan Gohman [:sunfish] from comment #23)
> Created attachment 8422094 [details] [diff] [review]
> make-loops-contiguous.patch
> 
> Attached is another patch which implements this. I didn't do anything fancy;
> it just moves blocks to the ends of loops to make loops contiguous. I see it
> doing good-looking things on benchmarks like early-boyer.js, but I haven't
> yet seen any speed impacts yet either.

That might be good enough for now!

1) On quick view, I see the marking of items in a loop is a bit strange.
We can just take all paths from the backedge to the loopheader, like this:

> MBasicBlock *loopheader;
> inlooplist.append(loopheader->backedge());
> loopheader->backedge()->mark();
>
> while(!inlooplist.empty()) {
>    MBasicBlock *block = inlooplist.back();
>    if (block == loopheader)
>        continue;
>    for (size_t i = 0; i < block->numPredecessors(); i++) {
>        MBasicBlock *pred = block->getPredecessor(i);
>        if (pred->isMarked())
>            continue;
>        inlooplist.append(pred);
>        pred->mark();          
> }      

See LICM.cpp: Loop::init(). (LICM ignores some sort of loops, though. So there is an extra test in the loop. I think we can safely do this for this too.)

2) I see you use PostOrderIterator. That is to see inner loops before outer loops, right?
Attached patch make-loops-contiguous.patch (obsolete) — Splinter Review
(In reply to Hannes Verschore [:h4writer] from comment #26)
> (In reply to Dan Gohman [:sunfish] from comment #23)
> > Created attachment 8422094 [details] [diff] [review]
> > make-loops-contiguous.patch
> > 
> > Attached is another patch which implements this. I didn't do anything fancy;
> > it just moves blocks to the ends of loops to make loops contiguous. I see it
> > doing good-looking things on benchmarks like early-boyer.js, but I haven't
> > yet seen any speed impacts yet either.
> 
> That might be good enough for now!

Cool. Attached is an updated patch for review.

> 1) On quick view, I see the marking of items in a loop is a bit strange.
> We can just take all paths from the backedge to the loopheader, like this:
> 
> > MBasicBlock *loopheader;
> > inlooplist.append(loopheader->backedge());
> > loopheader->backedge()->mark();
> >
> > while(!inlooplist.empty()) {
> >    MBasicBlock *block = inlooplist.back();
> >    if (block == loopheader)
> >        continue;
> >    for (size_t i = 0; i < block->numPredecessors(); i++) {
> >        MBasicBlock *pred = block->getPredecessor(i);
> >        if (pred->isMarked())
> >            continue;
> >        inlooplist.append(pred);
> >        pred->mark();          
> > }      
> 
> See LICM.cpp: Loop::init(). (LICM ignores some sort of loops, though. So
> there is an extra test in the loop. I think we can safely do this for this
> too.)

I was thinking about loops with multiple backedges, but since this pass runs after SplitCriticalEdges, I guess that's not needed. The updated patch now has simpler logic.

> 2) I see you use PostOrderIterator. That is to see inner loops before outer
> loops, right?

That was my thought, but it turns that this pass works either way. The updated patch now has a plain MBasicBlockIterator.
Assignee: hv1989 → sunfish
Attachment #8422094 - Attachment is obsolete: true
Attachment #8422483 - Flags: review?(hv1989)
Comment on attachment 8422483 [details] [diff] [review]
make-loops-contiguous.patch

Review of attachment 8422483 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good,
but I think MakeLoopContiguous is doing too much work.

::: js/src/jit/Ion.cpp
@@ +1343,5 @@
>      }
>  
> +    {
> +        AutoTraceLog log(logger, TraceLogger::MakeLoopsContiguous);
> +        if (!MakeLoopsContiguous(graph))

I think we can and should move this down as much as possible.
The other passes have no benefit of this. And DCE can remove depending blocks, making it possible to move more blocks under the loop.
(So just before edgeCaseAnalysis)

::: js/src/jit/IonAnalysis.cpp
@@ +2571,5 @@
> +                break;
> +        } else {
> +            // This block is not in the loop. Move it to the end.
> +            graph.moveBlockBefore(insertPt, bb);
> +            bb->setId(afterLoopId++);

If I'm reading this correctly, this will iterate over all blocks after the loop. This seems elaborate.
We should do this differently by only looping over blocks that are in between the loop.

This would be possible by having a inlooplist (like MakeLoopsContiguous)
and adding (in RPO) Successors where the block id is between header->id() and backedge->id().
For blocks with ids outside the loop range it doesn't make sense to iterate them...

@@ +2589,5 @@
> +        MBasicBlock *header = *i;
> +        if (!header->isLoopHeader())
> +            continue;
> +
> +        // Mark all the blocks in the loop.

// Mark all the blocks in the loop by marking all blocks in a path between the backedge and the loopheader.

@@ +2603,5 @@
> +            if (bb == header)
> +                continue;
> +            for (size_t p = 0; p < bb->numPredecessors(); p++) {
> +                MBasicBlock *pred = bb->getPredecessor(p);
> +                if (pred->isMarked() || (osrBlock && osrBlock->dominates(pred)))

Can you add an comment about what the osrBlock test does?
Attachment #8422483 - Flags: review?(hv1989)
(In reply to Hannes Verschore [:h4writer] from comment #28)
> ::: js/src/jit/Ion.cpp
> @@ +1343,5 @@
> >      }
> >  
> > +    {
> > +        AutoTraceLog log(logger, TraceLogger::MakeLoopsContiguous);
> > +        if (!MakeLoopsContiguous(graph))
> 
> I think we can and should move this down as much as possible.
> The other passes have no benefit of this. And DCE can remove depending
> blocks, making it possible to move more blocks under the loop.
> (So just before edgeCaseAnalysis)

Good idea. Done.

> ::: js/src/jit/IonAnalysis.cpp
> @@ +2571,5 @@
> > +                break;
> > +        } else {
> > +            // This block is not in the loop. Move it to the end.
> > +            graph.moveBlockBefore(insertPt, bb);
> > +            bb->setId(afterLoopId++);
> 
> If I'm reading this correctly, this will iterate over all blocks after the
> loop. This seems elaborate.
> We should do this differently by only looping over blocks that are in
> between the loop.

It actually does stop at the end of the loop, though it's admittedly somewhat subtle. I've now added some comments and asserts to show that bb->id() stays between header->id() and backedge->id().

> @@ +2589,5 @@
> > +        MBasicBlock *header = *i;
> > +        if (!header->isLoopHeader())
> > +            continue;
> > +
> > +        // Mark all the blocks in the loop.
> 
> // Mark all the blocks in the loop by marking all blocks in a path between
> the backedge and the loopheader.

Done.

> @@ +2603,5 @@
> > +            if (bb == header)
> > +                continue;
> > +            for (size_t p = 0; p < bb->numPredecessors(); p++) {
> > +                MBasicBlock *pred = bb->getPredecessor(p);
> > +                if (pred->isMarked() || (osrBlock && osrBlock->dominates(pred)))
> 
> Can you add an comment about what the osrBlock test does?

Done.
Attachment #8422483 - Attachment is obsolete: true
Attachment #8423832 - Flags: review?(hv1989)
Comment on attachment 8423832 [details] [diff] [review]
make-loops-contiguous.patch

Review of attachment 8423832 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonAnalysis.cpp
@@ +2608,5 @@
> +    size_t inLoopId = headerId;
> +    size_t afterLoopId = inLoopId + numMarked;
> +    ReversePostorderIterator i = graph.rpoBegin(header);
> +    for (;;) {
> +        MBasicBlock *bb = *i++;

Let us use a less descriptive variablename. "block" is mostly used in the code...
s/bb/block

@@ +2618,5 @@
> +            bb->unmark();
> +            bb->setId(inLoopId++);
> +            // If we've reached the backedge, we're done!
> +            if (bb == backedge)
> +                break;

Oh sorry. I indeed missed this.
Attachment #8423832 - Flags: review?(hv1989) → review+
(In reply to Hannes Verschore [:h4writer] from comment #30)
> @@ +2608,5 @@
> > +    size_t inLoopId = headerId;
> > +    size_t afterLoopId = inLoopId + numMarked;
> > +    ReversePostorderIterator i = graph.rpoBegin(header);
> > +    for (;;) {
> > +        MBasicBlock *bb = *i++;
> 
> Let us use a less descriptive variablename. "block" is mostly used in the
> code...
> s/bb/block

Done.

https://hg.mozilla.org/integration/mozilla-inbound/rev/19eafdcdefe3
https://hg.mozilla.org/mozilla-central/rev/19eafdcdefe3
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: mozilla23 → mozilla32
Depends on: 1011730
Reverted:
https://hg.mozilla.org/integration/mozilla-inbound/rev/28db7381c979
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [leave open]
Sorry, I gave the wrong advice, turns out leave-open is now a keyword.
Keywords: leave-open
Whiteboard: [leave open]
Attached patch make-loops-contiguous.patch (obsolete) — Splinter Review
This patch just adds a check for !mir->instrumentedProfiling() before reordering basic blocks, since the SPS profiler seems to depend on the blocks appearing in their original order.
Attachment #8423832 - Attachment is obsolete: true
Attachment #8425091 - Flags: review?(hv1989)
Attachment #8425091 - Flags: review?(hv1989)
This patch is reordering blocks, which seems that should be possible without a problem.
Now I know there used to be a problem around "buildInline", but that is removed right?
Could it be that not everything has been removed, causing this assert?

e.g. jandem noticed jit/IonBuilder.cpp:832 in buildInline we still have SPS information! Could this be the culprit and if it is, can we just remove that?
Flags: needinfo?(kvijayan)
We could probably remove most of the frame tracking code that's present, since the new implementation will push at most one pseudostack entry per function - all of the internal ProfilerStackOps can be discarded.  Would that help?
Flags: needinfo?(kvijayan)
(In reply to Kannan Vijayan [:djvj] from comment #38)
> We could probably remove most of the frame tracking code that's present,
> since the new implementation will push at most one pseudostack entry per
> function - all of the internal ProfilerStackOps can be discarded.  Would
> that help?

Yes, the crash with the block ordering patch here is in ProfilerStackOp handling, so getting rid of that code sounds very appealing :-).
This should make it possible to land Sunfish his patch. That way we don't have to wait for the full removal of SPS stack frames to the new version.

@Sunfish: willing to give your patch + this a try run to see if that fixes everything you saw?
Attachment #8428749 - Flags: review?(kvijayan)
Comment on attachment 8428749 [details] [diff] [review]
Remove SPS inline function tracking in IonMonkey

Review of attachment 8428749 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.  Can you make a bug to clean up the corresponding stuff in js/src/vm/SPSProfiler.*
Attachment #8428749 - Flags: review?(kvijayan) → review+
@H4writer: Unfortunately, my patch + this SPS patch still crashes on the testcase in bug 1011730.
@H4writer: Actually, with this SPS patch, the testcase in bug 1011730 fails even without my patch.
Attached patch followup fix (obsolete) — Splinter Review
Took a long time to debug and find the issue. SPS was failing since it updated the PC but not the script when disabling pushing/popping inlined frames. Though we kept that still data consistent by pushing the ProfilerStackOp MIR. (Which codeGenerator was counting on). Here we only keep the top pc of the outer script anymore. So things stay consistent and we don't have an updated pc that doesn't match the script anymore
Attachment #8432609 - Flags: review?(kvijayan)
Attachment #8432609 - Flags: review?(kvijayan) → review+
Cool, with this followup, the testcase works even with my contiguous loops patch!
Try is also happy with the sps changes:
https://tbpl.mozilla.org/?tree=Try&rev=4bece190eaac
Folded the patch into one and made a decent patch with user/commit message. Carrying r+ over.
Attachment #8428749 - Attachment is obsolete: true
Attachment #8432609 - Attachment is obsolete: true
Attachment #8432793 - Flags: review+
Attachment #8425091 - Attachment is obsolete: true
Comment on attachment 8423832 [details] [diff] [review]
make-loops-contiguous.patch

This should now work nicely. Even with SPS enabled, so removing the obsolete flag of this.
Attachment #8423832 - Attachment is obsolete: false
Blocks: 733353
https://hg.mozilla.org/mozilla-central/rev/a0bb2b2da6ef
https://hg.mozilla.org/mozilla-central/rev/5b4bd2f81719
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Depends on: 1022081
You need to log in before you can comment on or make changes to this bug.