IonMonkey: improve block reordering algorithm




JavaScript Engine
6 years ago
6 years ago


(Reporter: jandem, Assigned: jandem)


(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)



(1 attachment)



6 years ago
IM reorders basic blocks so that they are in reverse postorder. A problem with the current algorithm is that the resulting block list does not (always) reflect the original program very well.

For instance, consider this function:
function f() {
    for (var i=0; i<0x123;) {
        i += 5;
    var a = i * 0x999;
    while (true) {}
    return a;
Here we generate the following code:
// i < 0x123?
0x9ce167:	cmp    $0x123,%eax
0x9ce16d:	jl     0x9ce1a3

// a = i * 0x999
0x9ce173:	mov    %eax,%ecx
0x9ce175:	imul   $0x999,%eax,%eax
0x9ce17b:	jo     0x9ce004

// while (true) {}
0x9ce181:	mov    $0x1,%ecx
0x9ce186:	test   %ecx,%ecx
0x9ce188:	jne    0x9ce19a

// a = i, jump to return
0x9ce18e:	mov    $0xffffff81,%ecx
0x9ce193:	mov    %eax,%edx
0x9ce195:	jmp    0x9ce1b9

// next iteration of second loop
0x9ce19a:	mov    %eax,%edx
0x9ce19c:	mov    %edx,%eax
0x9ce19e:	jmp    0x9ce181

// i += 5
0x9ce1a3:	mov    %eax,%ecx
0x9ce1a5:	mov    %ecx,0x7c(%esp)
0x9ce1a9:	add    $0x5,%ecx
0x9ce1ac:	jo     0x9ce009

// next iteration of first loop
0x9ce1b2:	mov    %ecx,%eax
0x9ce1b4:	jmp    0x9ce167

// return a
0x9ce1b9:	add    $0x80,%esp
0x9ce1bf:	ret    
The block reordering algorithm visits the blocks in post order and reverses the result to get the reverse postorder. We should either build a reverse postorder directly, or change the first step of the algorithm to visit the successors of a node in reverse order. The second step of the algorithm will then reverse the list and restore the original order where possible.

Strictly speaking this may not be a "reverse postorder" anymore, but it generates more efficient code (and does not introduce new jit-test failures):
// i < 0x123?
0x9ce167:	cmp    $0x123,%eax
0x9ce16d:	jge    0x9ce183

// i += 5
0x9ce173:	mov    %eax,%ecx
0x9ce175:	add    $0x5,%eax
0x9ce178:	jo     0x9ce004

// next iteration of first loop
0x9ce17e:	jmp    0x9ce167

// i *= 0x999
0x9ce183:	mov    %eax,%ecx
0x9ce185:	mov    %ecx,0x7c(%esp)
0x9ce189:	imul   $0x999,%ecx,%ecx
0x9ce18f:	jo     0x9ce009

// while (true) {}
0x9ce195:	mov    $0x1,%eax
0x9ce19a:	test   %eax,%eax
0x9ce19c:	je     0x9ce1a7
0x9ce1a2:	jmp    0x9ce195

// a = i, return a
0x9ce1a7:	mov    $0xffffff81,%eax
0x9ce1ac:	mov    %ecx,%edx
0x9ce1ae:	mov    %eax,%ecx
0x9ce1b0:	add    $0x80,%esp
0x9ce1b6:	ret    
19 instructions instead of 22, and it more closely resembles the original source code.
Nice analysis. Strict RPO is required for most phases in the pipeline, including lowering, though codegen should be free to have almost any order. 

Would it improve things to change the order loop successors are visited, versus normal branches, in the ReorderBlocks()?
We really want all blocks in a loop to be contiguous, which I think the above suggestion accomplishes. If we have contiguous blocks, then LSRA can allocate a single interval for the entire loop, so a chunk of nasty code starting at LinearScan.cpp:553 can vanish.

Comment 3

6 years ago
I haven't been able to find a testcase where this breaks (using Anion), but we decided that it may be better/safer to reorder blocks between regalloc and codegen. I'm going to try this tomorrow. If it works we can probably still change ReorderBlocks to use contiguous blocks for loops and simplify LSRA.

Comment 4

6 years ago
Created attachment 576704 [details] [diff] [review]

Talked about this with sstangl, we'd like to get it in soon.
Attachment #576704 - Flags: review?(sstangl)
Comment on attachment 576704 [details] [diff] [review]

Review of attachment 576704 [details] [diff] [review]:

Thanks. Having loops contiguous makes stepping through generated code extremely pleasant.

This probably isn't sufficient to remove code from LSRA, but it does nicely reorder the blocks in most circumstances by mimicking IonBuilder patterns. Great idea.

::: js/src/ion/IonAnalysis.cpp
@@ +628,5 @@
>      Vector<unsigned int, 0, IonAllocPolicy> successors;
>      InlineList<MBasicBlock> done;
>      MBasicBlock *current = *graph.begin();
> +    unsigned int nextSuccessor = current->numSuccessors() - 1;

A comment explaining this strange choice of ordering may be useful =]

@@ +637,5 @@
>      while (true) {
>          if (!current->isMarked()) {
>              current->mark();
>              if (nextSuccessor < current->lastIns()->numSuccessors()) {

Would you mind changing this code to "current->numSuccessors()"? Use of lastIns() directly is deprecated.

@@ +643,5 @@
>                  if (!successors.append(nextSuccessor))
>                      return false;
>                  current = current->lastIns()->getSuccessor(nextSuccessor);
> +                nextSuccessor = current->lastIns()->numSuccessors() - 1;

This can just be current->numSuccessors() - 1, as it is above.
Attachment #576704 - Flags: review?(sstangl) → review+

Comment 6

6 years ago
Pushed with requested changes:
Last Resolved: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.