Closed
Bug 536642
Opened 15 years ago
Closed 13 years ago
TM: peephole optimizer for JS bytecode
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
We need a peephole optimizer for JS bytecode. An example, from bug 536630:
bitwiseAndValue = 4294967296;
for (var i = 0; i < 600000; i++)
bitwiseAndValue = bitwiseAndValue & i;
The bytecode looks like this:
00022: 27 trace
00023: 28 bindname "bitwiseAndValue"
00026: 28 name "bitwiseAndValue"
00029: 28 getgvar "i"
00032: 28 bitand
00033: 28 setname "bitwiseAndValue"
00037: 27 gvarinc "i"
00040: 27 pop
00041: 27 getgvar "i"
00044: 27 uint24 600000
00048: 27 lt
The gvarinc "i" / pop / getgvar "i" sequence could be reduced to gvarinc "i". There are likely to be lots of other peephole opportunities as well.
I'm happy to take this, but I don't know much about JS bytecode so it'll take me a little while to get up to speed.
Comment 1•15 years ago
|
||
Frankly (and disturbingly) I don't think your problem will be not knowing much about JS bytecode, but rather about updating the decompiler to accurately, and securely, decompile optimized bytecode. (Although, we could store two copies of the bytecode, one optimized, one not, and dodge the bullet, horrors tho the idea may seem...)
![]() |
Assignee | |
Comment 2•15 years ago
|
||
I'm ignoring the complication in comment 1 for the moment, I figure that bridge can be burnt if the peephole optimizer actually works well.
Working with bitwise-and.js, I have this sequence of bytecode (apologies for the lack of arguments):
0x84807a8: op trace
0x84807a9: op defvar
0x84807ac: op bindname
0x84807af: op double
0x84807b2: op setname
0x84807b5: op pop
0x84807b6: op zero
0x84807b7: op setgvar
0x84807ba: op pop
0x84807bb: op goto 0x84807d1
0x84807be: op trace
0x84807bf: op bindname
0x84807c2: op name
0x84807c5: op getgvar
0x84807c8: op bitand
0x84807c9: op setname
0x84807cc: op pop
0x84807cd: op gvarinc
0x84807d0: op pop
0x84807d1: op getgvar
0x84807d4: op uint24
0x84807d8: op lt
0x84807d9: op ifne
0x84807dc: op stop
The goto is jumping into the middle of the 2nd sequence, in fact right in the middle of the gvarinc/pop/getgvar trio that I initially tried to replace with gvarinc/nop/nop/nop/nop. (Totally hacky, yes, I'm just trying to get something working.)
I guess that makes sense, since Brendan said in bug 536630 that C-style for-loops are compiled like this:
i=0;
goto L2;
L1: do_something();
i++;
L2: if (i<N) goto L1;
But it makes things more difficult; for the transformation to be ok the setgvar/pop in the first part must be changed to just setgvar. Hmm.
Comment 3•15 years ago
|
||
See js_Disassemble for off-the-shelf bytecode disassembly with operands.
It's not safe to shorten gvarinc<i>;pop;getgvar<i> to just gvarinc<i> for global variable i. gvarinc<i> leaves the pre-incremented value of the global variable identified by the atom mapped at index <i> live on the stack, unlike the getgvar ending the original sequence. You'd want incgvar<i>.
This brings up the ugly possibility that the gvar may not be optimized via defvar to be getter/setter-free and accessed from the global object's slots. If there is a getter or setter, rewriting the bytecode changes the observable calls to getter and/or setter.
Does ES5 strictly define evaluation across statements such that we are not allowed to reorder and reselect other bytecodes to optimize? It looks like it to me, which is kind of bad. I should have caught this... Waldo, what do you think?
Optimizing in the parser and/or emitter still seems easier and more efficient than optimizing post-hoc in a peephole pass.
/be
Comment 4•15 years ago
|
||
(In reply to comment #2)
> I'm ignoring the complication in comment 1 for the moment, I figure that bridge
> can be burnt if the peephole optimizer actually works well.
Agreed, and you guys should (and I'm sure can ;-) "man up" by hacking the decompiler as needed. It is not a picnic, but it can be done and it should be incrementally improved if it can't be replaced.
Peephole may be worthwhile but I'd take for(;;) loop to the next level more directly, via compiler changes. Sorry to repeat, perhaps what we really need is a separate bug on this for(i=0;i<n;i++) cliche.
/be
![]() |
Assignee | |
Comment 5•15 years ago
|
||
(In reply to comment #3)
>
> Optimizing in the parser and/or emitter still seems easier and more efficient
> than optimizing post-hoc in a peephole pass.
No problem, I'm not wedded to any approach, I just want to improve the bytecode :)
I'm also wondering, from a position of great ignorance, if our stack-based VM is hurting us. The Opera guys have just changed to a register-based VM which they claim is a win: http://my.opera.com/core/blog/2009/02/04/carakan
> Peephole may be worthwhile but I'd take for(;;) loop to the next level more
> directly, via compiler changes. Sorry to repeat, perhaps what we really need is
> a separate bug on this for(i=0;i<n;i++) cliche.
Done, bug 537868. Should I close this bug, or do you still want a bytecode peepholer for other cases?
Comment 6•15 years ago
|
||
(In reply to comment #5)
> (In reply to comment #3)
> >
> > Optimizing in the parser and/or emitter still seems easier and more
> > efficient than optimizing post-hoc in a peephole pass.
>
> No problem, I'm not wedded to any approach, I just want to improve the
> bytecode :)
In terms of exactly when the peephole optimizer runs, I think working with the existing code is the way to go. But I would strongly advocate making sure it's a separate component and is not mixed with the front end. In particular it should be possible to turn it on or off with one switch and have things work perfectly fine either way.
> I'm also wondering, from a position of great ignorance, if our stack-based VM
> is hurting us. The Opera guys have just changed to a register-based VM which
> they claim is a win: http://my.opera.com/core/blog/2009/02/04/carakan
Thanks for the link. I only skimmed the top so far but it looks really interesting.
The stack vs. register question is hard to answer without actually doing the test, but on balance, based on Apple's experience, a mini-interpreter I whipped up to test things like that out, and now Opera's experience, I too believe that the register-based design is better on current hardware/workloads.
I'd be pretty happy to see us try to make that change, but I'm not sure how to coordinate it with the JaegerMonkey work. The bytecode->native compiler will obviously have to change if the bytecode changes. That suggests doing the bytecode conversion first. But if we just go ahead with the compiler first, we should be able to get to a point of better perf sooner. It would also give us some practice and new skills that we can apply to making the bytecode->native compiler for the new bytecode even better. Another plan is to work on the bytecode conversion in parallel, and then revisit the decision about what bytecode to base the new compiler on when the time comes.
> > Peephole may be worthwhile but I'd take for(;;) loop to the next level more
> > directly, via compiler changes. Sorry to repeat, perhaps what we really
> > need is a separate bug on this for(i=0;i
> Done, bug 537868. Should I close this bug, or do you still want a bytecode
> peepholer for other cases?
![]() |
Assignee | |
Comment 7•13 years ago
|
||
TM's days are numbered: WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•