Last Comment Bug 699415 - IonMonkey: improve block reordering algorithm
: IonMonkey: improve block reordering algorithm
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: Jan de Mooij [:jandem]
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks: IonMonkey
  Show dependency treegraph
 
Reported: 2011-11-03 08:25 PDT by Jan de Mooij [:jandem]
Modified: 2011-11-24 11:51 PST (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Fix (1.70 KB, patch)
2011-11-24 01:55 PST, Jan de Mooij [:jandem]
sstangl: review+
Details | Diff | Splinter Review

Description Jan de Mooij [:jandem] 2011-11-03 08:25:02 PDT
IM reorders basic blocks so that they are in reverse postorder. A problem with the current algorithm is that the resulting block list does not (always) reflect the original program very well.

For instance, consider this function:
--
function f() {
    for (var i=0; i<0x123;) {
        i += 5;
    }
    var a = i * 0x999;
    while (true) {}
    return a;
}
--
Here we generate the following code:
--
// i < 0x123?
0x9ce167:	cmp    $0x123,%eax
0x9ce16d:	jl     0x9ce1a3

// a = i * 0x999
0x9ce173:	mov    %eax,%ecx
0x9ce175:	imul   $0x999,%eax,%eax
0x9ce17b:	jo     0x9ce004

// while (true) {}
0x9ce181:	mov    $0x1,%ecx
0x9ce186:	test   %ecx,%ecx
0x9ce188:	jne    0x9ce19a

// a = i, jump to return
0x9ce18e:	mov    $0xffffff81,%ecx
0x9ce193:	mov    %eax,%edx
0x9ce195:	jmp    0x9ce1b9

// next iteration of second loop
0x9ce19a:	mov    %eax,%edx
0x9ce19c:	mov    %edx,%eax
0x9ce19e:	jmp    0x9ce181

// i += 5
0x9ce1a3:	mov    %eax,%ecx
0x9ce1a5:	mov    %ecx,0x7c(%esp)
0x9ce1a9:	add    $0x5,%ecx
0x9ce1ac:	jo     0x9ce009

// next iteration of first loop
0x9ce1b2:	mov    %ecx,%eax
0x9ce1b4:	jmp    0x9ce167

// return a
0x9ce1b9:	add    $0x80,%esp
0x9ce1bf:	ret    
--
The block reordering algorithm visits the blocks in post order and reverses the result to get the reverse postorder. We should either build a reverse postorder directly, or change the first step of the algorithm to visit the successors of a node in reverse order. The second step of the algorithm will then reverse the list and restore the original order where possible.

Strictly speaking this may not be a "reverse postorder" anymore, but it generates more efficient code (and does not introduce new jit-test failures):
--
// i < 0x123?
0x9ce167:	cmp    $0x123,%eax
0x9ce16d:	jge    0x9ce183

// i += 5
0x9ce173:	mov    %eax,%ecx
0x9ce175:	add    $0x5,%eax
0x9ce178:	jo     0x9ce004

// next iteration of first loop
0x9ce17e:	jmp    0x9ce167

// i *= 0x999
0x9ce183:	mov    %eax,%ecx
0x9ce185:	mov    %ecx,0x7c(%esp)
0x9ce189:	imul   $0x999,%ecx,%ecx
0x9ce18f:	jo     0x9ce009

// while (true) {}
0x9ce195:	mov    $0x1,%eax
0x9ce19a:	test   %eax,%eax
0x9ce19c:	je     0x9ce1a7
0x9ce1a2:	jmp    0x9ce195

// a = i, return a
0x9ce1a7:	mov    $0xffffff81,%eax
0x9ce1ac:	mov    %ecx,%edx
0x9ce1ae:	mov    %eax,%ecx
0x9ce1b0:	add    $0x80,%esp
0x9ce1b6:	ret    
--
19 instructions instead of 22, and it more closely resembles the original source code.
Comment 1 David Anderson [:dvander] 2011-11-03 09:49:44 PDT
Nice analysis. Strict RPO is required for most phases in the pipeline, including lowering, though codegen should be free to have almost any order. 

Would it improve things to change the order loop successors are visited, versus normal branches, in the ReorderBlocks()?
Comment 2 Sean Stangl [:sstangl] 2011-11-03 10:03:26 PDT
We really want all blocks in a loop to be contiguous, which I think the above suggestion accomplishes. If we have contiguous blocks, then LSRA can allocate a single interval for the entire loop, so a chunk of nasty code starting at LinearScan.cpp:553 can vanish.
Comment 3 Jan de Mooij [:jandem] 2011-11-03 14:43:44 PDT
I haven't been able to find a testcase where this breaks (using Anion), but we decided that it may be better/safer to reorder blocks between regalloc and codegen. I'm going to try this tomorrow. If it works we can probably still change ReorderBlocks to use contiguous blocks for loops and simplify LSRA.
Comment 4 Jan de Mooij [:jandem] 2011-11-24 01:55:08 PST
Created attachment 576704 [details] [diff] [review]
Fix

Talked about this with sstangl, we'd like to get it in soon.
Comment 5 Sean Stangl [:sstangl] 2011-11-24 02:03:29 PST
Comment on attachment 576704 [details] [diff] [review]
Fix

Review of attachment 576704 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. Having loops contiguous makes stepping through generated code extremely pleasant.

This probably isn't sufficient to remove code from LSRA, but it does nicely reorder the blocks in most circumstances by mimicking IonBuilder patterns. Great idea.

::: js/src/ion/IonAnalysis.cpp
@@ +628,5 @@
>      Vector<unsigned int, 0, IonAllocPolicy> successors;
>      InlineList<MBasicBlock> done;
>  
>      MBasicBlock *current = *graph.begin();
> +    unsigned int nextSuccessor = current->numSuccessors() - 1;

A comment explaining this strange choice of ordering may be useful =]

@@ +637,5 @@
>      while (true) {
>          if (!current->isMarked()) {
>              current->mark();
>  
>              if (nextSuccessor < current->lastIns()->numSuccessors()) {

Would you mind changing this code to "current->numSuccessors()"? Use of lastIns() directly is deprecated.

@@ +643,5 @@
>                  if (!successors.append(nextSuccessor))
>                      return false;
>  
>                  current = current->lastIns()->getSuccessor(nextSuccessor);
> +                nextSuccessor = current->lastIns()->numSuccessors() - 1;

This can just be current->numSuccessors() - 1, as it is above.
Comment 6 Jan de Mooij [:jandem] 2011-11-24 11:51:44 PST
Pushed with requested changes:

http://hg.mozilla.org/projects/ionmonkey/rev/7e1f1061a215

Note You need to log in before you can comment on or make changes to this bug.