Closed Bug 844779 Opened 12 years ago Closed 11 years ago

IonMonkey: Improve order of blocks in rpo

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: h4writer, Assigned: sunfish)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 8 obsolete files)

We are hurting on earley-boyer, because the order of the blocks make our loops less tight. It is better to have tighter loops, as this increases locality and less jumps. We must assume that a loop will run more times than other code. There is one particular construction we are bad at: for (var i=0; i<100; i++) { if (something) { // a lot of code return; } } So the blocks containing "a lot of code" are not part of the loop anymore. The looping is done when we enter that branch. Still we put those blocks in between the loop blocks. This hurts, because the blocks of the loop itself are now further apart and we need to jump from the test (if) to after the test. We should add code to do this. (This hurts us on earley-boyer.js rewrite_nboyer, and is a 2%-3% improvement with quick and dirty code)
Blocks: 765980
Assignee: general → hv1989
IonMonkey used to have a ReorderBlocks function. That was removed because it was buggy and didn't cooperate with AA (which is expecting some things). Bug 732120 is about the removal. To fix this bug we probably will add something like ReorderBlocks again, but we need to be careful that we don't bump into these issues again. Also note that the current order isn't far away from the order required to fix this bug. I intend to keep the same numbering as much as possible. I.e. needed for AA the current code guarantees all blocks inside the loop are between the header and backedge, and all code after it is not between the header and backedge. That should definitely stay the same.
Depends on: 851115
Attached patch Reorder blocks (obsolete) — Splinter Review
This would contain everything that's needed to calculate the RPO and making sure loops are as tight as possible.
Attachment #725147 - Flags: feedback?(jdemooij)
Attached patch Recalculate loopdepth (obsolete) — Splinter Review
This bug is depending on bug 851115, because the loopdepth information is incorrect. If you want to execute the reorder block patch, this is a quick fix that fixes the loopdepth information. Applies on top of the reorder block patch
Attached patch Reorder blocks (obsolete) — Splinter Review
This patch contains the reorder block logic and the recalculate loopdepth logic. It isn't possible to have the correct loopdepth information during IonBuilder. It can't be done in IonBuilder, because we don't have the full graph yet.
Attachment #725147 - Attachment is obsolete: true
Attachment #725149 - Attachment is obsolete: true
Attachment #725147 - Flags: feedback?(jdemooij)
Attachment #726346 - Flags: review?(jdemooij)
Comment on attachment 726346 [details] [diff] [review] Reorder blocks Review of attachment 726346 [details] [diff] [review]: ----------------------------------------------------------------- Looks great! The ReorderBlocks pass we used to have was a mess, but this one is a lot simpler. This patch will need some fuzzing though. Does this patch have any influence on Kraken or SS? ::: js/src/ion/IonAnalysis.cpp @@ +622,5 @@ > > return true; > } > > +void Nit: "static void" (use static for functions that are only used in the same C++ file). @@ +642,5 @@ > + MBasicBlock *pred = block->getPredecessor(i); > + if (pred->loopDepth() == depth) > + continue; > + > + worklist.append(pred); Check for OOM. @@ +654,5 @@ > + // Recalculate the loop depth information. > + // > + // The definition of a loop (important to correctly reorder blocks and for LICM) is: > + // all blocks reached when creating a path from the loop backedge to the loopheader. > + // This information difference from the loopdepth information gathered in IonBuilder. s/difference/differs @@ +688,5 @@ > + // Add not yet visited successors to the worklist. > + for (size_t i = 0; i < block->numSuccessors(); i++) { > + MBasicBlock *succ = block->getSuccessor(i); > + if (!succ->isMarked()) { > + worklist.append(succ); Check for OOM. @@ +697,5 @@ > + SetLoopDepth(succ); > + } > + } > + > + // if any added, process them first Nit: "// If any blocks were added, process them first." @@ +710,5 @@ > + > + return true; > +} > + > +void Nit: "static void" @@ +759,5 @@ > + graph.clearBlockList(); > + > + // Start with entry block > + pending.pushBack(entry); > + pending.peekBack()->mark(); Nit: entry->mark(); @@ +771,5 @@ > + for (size_t i = 0; i < block->numSuccessors(); i++) { > + MBasicBlock *succ = block->getSuccessor(i); > + if (!succ->isMarked()) { > + pending.pushBack(succ); > + pending.peekBack()->mark(); Nit: succ->mark(); @@ +779,5 @@ > + // Sort newly added blocks (if any added) > + if (block != pending.peekBack()) { > + MBasicBlockIterator start = pending.begin(block); > + start++; > + SortByLoopDepth(pending, start); In most cases the successors will have the same loop depth. To avoid constructing the iterators etc, you can add "bool sortSuccessors = false;", and set it to |true| in the previous loop if succ->loopDepth() != block->loopDepth(). Then you can change "if (block != pending.peekBack())" to "if (sortSuccessors)" (if we didn't add anything sortSuccessors will be |false|) @@ +796,5 @@ > + // QUESTION: > + // I've seen that in the original order, that the osrblock->id() == preheader->id() - 1 > + // I could do that too, but I see no reason ... > + // I also think it increases locality for nested loops, if they aren't in the loop. > + // I would've put even before the entry block if our system didn't wanted the entry to be 0. Yeah, just adding it as second like we do now is best I think (and remove the comment :) @@ +803,5 @@ > + graph.moveBlockToBegin(osr); > + graph.moveBlockToBegin(entry); > + } > + > + graph.unmarkBlocks(); Nit: JS_ASSERT(graph.numBlocks() == numBlocks); and add a DebugOnly<size_t> numBlocks = graph.numBlocks() at the start of this function.
Attachment #726346 - Flags: review?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #5) > Then you can change "if (block != pending.peekBack())" to "if > (sortSuccessors)" (if we didn't add anything sortSuccessors will be |false|) Eh sorry, this part is not true. You still want to |continue| if (block != pending.peekBack()).
Attached patch Reorder blocksSplinter Review
This patch changes the order of blocks and that's delicate. Therefore a bit of fuzzing would definitely not hurt.
Attachment #726346 - Attachment is obsolete: true
Attachment #730633 - Flags: feedback?(choller)
Attachment #730633 - Flags: feedback?(gary)
Comment on attachment 730633 [details] [diff] [review] Reorder blocks Fixed mentioned nits + added an assert when trying to use rpoBegin/rpoEnd/poBegin/poEnd when graph isn't in rpo yet. SS: no difference Kraken: 0.2% - 0.5% increase
Attachment #730633 - Flags: review?(jdemooij)
Comment on attachment 730633 [details] [diff] [review] Reorder blocks Nothing particularly bad found overnight.
Attachment #730633 - Flags: feedback?(gary) → feedback+
Sorry I didn't get to this earlier. If you still need additional testing, can you rebase the patch on top of central so I can test it now? Otherwise, just go ahead and land it :)
Comment on attachment 730633 [details] [diff] [review] Reorder blocks After discussion with Jan, there is actually some room to improve the algorithm. He noticed that blocks with returns only got hoisted after the inner loop, but not after the outer loop. I have a new algorithm that should do that too.
Attachment #730633 - Flags: review?(jdemooij)
Attachment #730633 - Flags: feedback?(choller)
Comment on attachment 730633 [details] [diff] [review] Reorder blocks The new algorithm that I had in my mind, didn't pan out. I think this version is simpler and better.
Attachment #730633 - Flags: review?(jdemooij)
Comment on attachment 730633 [details] [diff] [review] Reorder blocks Review of attachment 730633 [details] [diff] [review]: ----------------------------------------------------------------- Nice! Would be good to test this on a few emscripten benchmarks (with and without --no-asmjs), Alon posted some in bug 856813. ::: js/src/ion/IonAnalysis.cpp @@ +819,5 @@ > + graph.unmarkBlocks(); > + > + JS_ASSERT(graph.numBlocks() == numBlocks); > + > +#if DEBUG Nit: #ifdef ::: js/src/ion/MIRGraph.h @@ +481,5 @@ > Vector<RawScript, 4, IonAllocPolicy> scripts_; > > size_t numBlocks_; > > +#if DEBUG Nit: #ifdef (and a few times below).
Attachment #730633 - Flags: review?(jdemooij) → review+
Comment on attachment 730633 [details] [diff] [review] Reorder blocks Review of attachment 730633 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/IonAnalysis.cpp @@ +746,5 @@ > +bool > +ion::ReorderBlocks(MIRGraph &graph) > +{ > + // The blocks get reordered to be in RPO (they are already). > + // The order will be improved by adding an extra contrain: constrain
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
These changes cause a regression, see bug 858936.
https://hg.mozilla.org/integration/mozilla-inbound/rev/754dc0c091a6 Backout due to bug 858936. The logic in here is apparently not correct. Special construct of two loops was numbered incorrectly. Example: function test(a) { for(var k=0; ; k++) { if (k%3 == 0) { if (k%2 == 0) { } else { break; } } continue; } for(var k=0; k<100; k++) { if (k%2 == 0) { continue; } break; } }
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
To know if it made sense to try to get this back into the tree. Here the numbers on the asm.js ubench: Without patch: copy 6776 corrections 6154 fannkuch 3724 fasta 10882 life 5123 memops 6914 primes 5907 skinning 15176 With patch: copy 6747 corrections 5091 fannkuch 3695 fasta 10815 life 4884 memops 6902 primes 5517 skinning 15016
Are those numbers before or after the fix for bug 881412, which landed recently? Either way, what is your opinion of that patch, and its relationship with this one?
The potential has decreased a bit, since the landing of bug 881412. It even looks like the priority to land this stays low: Without patch: copy 6651 corrections 5243 fannkuch 3709 fasta 10760 life 4626 memops 6862 primes 5872 skinning 14854 With patch: copy 6760 corrections 5162 fannkuch 3712 fasta 10689 life 4608 memops 6889 primes 5897 skinning 15165
Blocks: 772329
I guess we should be able to improve the order with Bug 877878, as we could move branches out of their loops if they are not frequently executed. The profiled data collected by Wu Wei in Bug 877872 are showing that, in PdfJS, we have short loops which have branches which are only executed a few times. If we do not remove these branches, it might still be interesting to move them out-of the fast path. Sadly, doing so will break the RPO / PO iterations, so we might want to do this reordering just before the CodeGen phase.
Attached patch make-loops-contiguous.patch (obsolete) — Splinter Review
Attached is another patch which implements this. I didn't do anything fancy; it just moves blocks to the ends of loops to make loops contiguous. I see it doing good-looking things on benchmarks like early-boyer.js, but I haven't yet seen any speed impacts yet either.
I ran the emscripten test suite and didn't see any changes. But, I tested on x86 and the effects might be more significant on ARM. Overall this sounds like a good improvement. It would also mean code generators like emscripten can have fewer issues with heuristics about where to put code (in or outside of a loop), if the VM makes the proper choice itself.
In particular, with this patch I see that I can alter emscripten's relooper hoisting heuristics (when to move code into a loop) without seeing slowdowns. That opens up some opportunities to reorder code to decrease code size (and possibly speed as well).
(In reply to Dan Gohman [:sunfish] from comment #23) > Created attachment 8422094 [details] [diff] [review] > make-loops-contiguous.patch > > Attached is another patch which implements this. I didn't do anything fancy; > it just moves blocks to the ends of loops to make loops contiguous. I see it > doing good-looking things on benchmarks like early-boyer.js, but I haven't > yet seen any speed impacts yet either. That might be good enough for now! 1) On quick view, I see the marking of items in a loop is a bit strange. We can just take all paths from the backedge to the loopheader, like this: > MBasicBlock *loopheader; > inlooplist.append(loopheader->backedge()); > loopheader->backedge()->mark(); > > while(!inlooplist.empty()) { > MBasicBlock *block = inlooplist.back(); > if (block == loopheader) > continue; > for (size_t i = 0; i < block->numPredecessors(); i++) { > MBasicBlock *pred = block->getPredecessor(i); > if (pred->isMarked()) > continue; > inlooplist.append(pred); > pred->mark(); > } See LICM.cpp: Loop::init(). (LICM ignores some sort of loops, though. So there is an extra test in the loop. I think we can safely do this for this too.) 2) I see you use PostOrderIterator. That is to see inner loops before outer loops, right?
Attached patch make-loops-contiguous.patch (obsolete) — Splinter Review
(In reply to Hannes Verschore [:h4writer] from comment #26) > (In reply to Dan Gohman [:sunfish] from comment #23) > > Created attachment 8422094 [details] [diff] [review] > > make-loops-contiguous.patch > > > > Attached is another patch which implements this. I didn't do anything fancy; > > it just moves blocks to the ends of loops to make loops contiguous. I see it > > doing good-looking things on benchmarks like early-boyer.js, but I haven't > > yet seen any speed impacts yet either. > > That might be good enough for now! Cool. Attached is an updated patch for review. > 1) On quick view, I see the marking of items in a loop is a bit strange. > We can just take all paths from the backedge to the loopheader, like this: > > > MBasicBlock *loopheader; > > inlooplist.append(loopheader->backedge()); > > loopheader->backedge()->mark(); > > > > while(!inlooplist.empty()) { > > MBasicBlock *block = inlooplist.back(); > > if (block == loopheader) > > continue; > > for (size_t i = 0; i < block->numPredecessors(); i++) { > > MBasicBlock *pred = block->getPredecessor(i); > > if (pred->isMarked()) > > continue; > > inlooplist.append(pred); > > pred->mark(); > > } > > See LICM.cpp: Loop::init(). (LICM ignores some sort of loops, though. So > there is an extra test in the loop. I think we can safely do this for this > too.) I was thinking about loops with multiple backedges, but since this pass runs after SplitCriticalEdges, I guess that's not needed. The updated patch now has simpler logic. > 2) I see you use PostOrderIterator. That is to see inner loops before outer > loops, right? That was my thought, but it turns that this pass works either way. The updated patch now has a plain MBasicBlockIterator.
Assignee: hv1989 → sunfish
Attachment #8422094 - Attachment is obsolete: true
Attachment #8422483 - Flags: review?(hv1989)
Comment on attachment 8422483 [details] [diff] [review] make-loops-contiguous.patch Review of attachment 8422483 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, but I think MakeLoopContiguous is doing too much work. ::: js/src/jit/Ion.cpp @@ +1343,5 @@ > } > > + { > + AutoTraceLog log(logger, TraceLogger::MakeLoopsContiguous); > + if (!MakeLoopsContiguous(graph)) I think we can and should move this down as much as possible. The other passes have no benefit of this. And DCE can remove depending blocks, making it possible to move more blocks under the loop. (So just before edgeCaseAnalysis) ::: js/src/jit/IonAnalysis.cpp @@ +2571,5 @@ > + break; > + } else { > + // This block is not in the loop. Move it to the end. > + graph.moveBlockBefore(insertPt, bb); > + bb->setId(afterLoopId++); If I'm reading this correctly, this will iterate over all blocks after the loop. This seems elaborate. We should do this differently by only looping over blocks that are in between the loop. This would be possible by having a inlooplist (like MakeLoopsContiguous) and adding (in RPO) Successors where the block id is between header->id() and backedge->id(). For blocks with ids outside the loop range it doesn't make sense to iterate them... @@ +2589,5 @@ > + MBasicBlock *header = *i; > + if (!header->isLoopHeader()) > + continue; > + > + // Mark all the blocks in the loop. // Mark all the blocks in the loop by marking all blocks in a path between the backedge and the loopheader. @@ +2603,5 @@ > + if (bb == header) > + continue; > + for (size_t p = 0; p < bb->numPredecessors(); p++) { > + MBasicBlock *pred = bb->getPredecessor(p); > + if (pred->isMarked() || (osrBlock && osrBlock->dominates(pred))) Can you add an comment about what the osrBlock test does?
Attachment #8422483 - Flags: review?(hv1989)
(In reply to Hannes Verschore [:h4writer] from comment #28) > ::: js/src/jit/Ion.cpp > @@ +1343,5 @@ > > } > > > > + { > > + AutoTraceLog log(logger, TraceLogger::MakeLoopsContiguous); > > + if (!MakeLoopsContiguous(graph)) > > I think we can and should move this down as much as possible. > The other passes have no benefit of this. And DCE can remove depending > blocks, making it possible to move more blocks under the loop. > (So just before edgeCaseAnalysis) Good idea. Done. > ::: js/src/jit/IonAnalysis.cpp > @@ +2571,5 @@ > > + break; > > + } else { > > + // This block is not in the loop. Move it to the end. > > + graph.moveBlockBefore(insertPt, bb); > > + bb->setId(afterLoopId++); > > If I'm reading this correctly, this will iterate over all blocks after the > loop. This seems elaborate. > We should do this differently by only looping over blocks that are in > between the loop. It actually does stop at the end of the loop, though it's admittedly somewhat subtle. I've now added some comments and asserts to show that bb->id() stays between header->id() and backedge->id(). > @@ +2589,5 @@ > > + MBasicBlock *header = *i; > > + if (!header->isLoopHeader()) > > + continue; > > + > > + // Mark all the blocks in the loop. > > // Mark all the blocks in the loop by marking all blocks in a path between > the backedge and the loopheader. Done. > @@ +2603,5 @@ > > + if (bb == header) > > + continue; > > + for (size_t p = 0; p < bb->numPredecessors(); p++) { > > + MBasicBlock *pred = bb->getPredecessor(p); > > + if (pred->isMarked() || (osrBlock && osrBlock->dominates(pred))) > > Can you add an comment about what the osrBlock test does? Done.
Attachment #8422483 - Attachment is obsolete: true
Attachment #8423832 - Flags: review?(hv1989)
Comment on attachment 8423832 [details] [diff] [review] make-loops-contiguous.patch Review of attachment 8423832 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/IonAnalysis.cpp @@ +2608,5 @@ > + size_t inLoopId = headerId; > + size_t afterLoopId = inLoopId + numMarked; > + ReversePostorderIterator i = graph.rpoBegin(header); > + for (;;) { > + MBasicBlock *bb = *i++; Let us use a less descriptive variablename. "block" is mostly used in the code... s/bb/block @@ +2618,5 @@ > + bb->unmark(); > + bb->setId(inLoopId++); > + // If we've reached the backedge, we're done! > + if (bb == backedge) > + break; Oh sorry. I indeed missed this.
Attachment #8423832 - Flags: review?(hv1989) → review+
(In reply to Hannes Verschore [:h4writer] from comment #30) > @@ +2608,5 @@ > > + size_t inLoopId = headerId; > > + size_t afterLoopId = inLoopId + numMarked; > > + ReversePostorderIterator i = graph.rpoBegin(header); > > + for (;;) { > > + MBasicBlock *bb = *i++; > > Let us use a less descriptive variablename. "block" is mostly used in the > code... > s/bb/block Done. https://hg.mozilla.org/integration/mozilla-inbound/rev/19eafdcdefe3
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: mozilla23 → mozilla32
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [leave open]
Sorry, I gave the wrong advice, turns out leave-open is now a keyword.
Keywords: leave-open
Whiteboard: [leave open]
Attached patch make-loops-contiguous.patch (obsolete) — Splinter Review
This patch just adds a check for !mir->instrumentedProfiling() before reordering basic blocks, since the SPS profiler seems to depend on the blocks appearing in their original order.
Attachment #8423832 - Attachment is obsolete: true
Attachment #8425091 - Flags: review?(hv1989)
Attachment #8425091 - Flags: review?(hv1989)
This patch is reordering blocks, which seems that should be possible without a problem. Now I know there used to be a problem around "buildInline", but that is removed right? Could it be that not everything has been removed, causing this assert? e.g. jandem noticed jit/IonBuilder.cpp:832 in buildInline we still have SPS information! Could this be the culprit and if it is, can we just remove that?
Flags: needinfo?(kvijayan)
We could probably remove most of the frame tracking code that's present, since the new implementation will push at most one pseudostack entry per function - all of the internal ProfilerStackOps can be discarded. Would that help?
Flags: needinfo?(kvijayan)
(In reply to Kannan Vijayan [:djvj] from comment #38) > We could probably remove most of the frame tracking code that's present, > since the new implementation will push at most one pseudostack entry per > function - all of the internal ProfilerStackOps can be discarded. Would > that help? Yes, the crash with the block ordering patch here is in ProfilerStackOp handling, so getting rid of that code sounds very appealing :-).
This should make it possible to land Sunfish his patch. That way we don't have to wait for the full removal of SPS stack frames to the new version. @Sunfish: willing to give your patch + this a try run to see if that fixes everything you saw?
Attachment #8428749 - Flags: review?(kvijayan)
Comment on attachment 8428749 [details] [diff] [review] Remove SPS inline function tracking in IonMonkey Review of attachment 8428749 [details] [diff] [review]: ----------------------------------------------------------------- Nice. Can you make a bug to clean up the corresponding stuff in js/src/vm/SPSProfiler.*
Attachment #8428749 - Flags: review?(kvijayan) → review+
@H4writer: Unfortunately, my patch + this SPS patch still crashes on the testcase in bug 1011730.
@H4writer: Actually, with this SPS patch, the testcase in bug 1011730 fails even without my patch.
Attached patch followup fix (obsolete) — Splinter Review
Took a long time to debug and find the issue. SPS was failing since it updated the PC but not the script when disabling pushing/popping inlined frames. Though we kept that still data consistent by pushing the ProfilerStackOp MIR. (Which codeGenerator was counting on). Here we only keep the top pc of the outer script anymore. So things stay consistent and we don't have an updated pc that doesn't match the script anymore
Attachment #8432609 - Flags: review?(kvijayan)
Attachment #8432609 - Flags: review?(kvijayan) → review+
Cool, with this followup, the testcase works even with my contiguous loops patch!
Try is also happy with the sps changes: https://tbpl.mozilla.org/?tree=Try&rev=4bece190eaac
Folded the patch into one and made a decent patch with user/commit message. Carrying r+ over.
Attachment #8428749 - Attachment is obsolete: true
Attachment #8432609 - Attachment is obsolete: true
Attachment #8432793 - Flags: review+
Attachment #8425091 - Attachment is obsolete: true
Comment on attachment 8423832 [details] [diff] [review] make-loops-contiguous.patch This should now work nicely. Even with SPS enabled, so removing the obsolete flag of this.
Attachment #8423832 - Attachment is obsolete: false
Blocks: 733353
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: