Closed Bug 536642 Opened 15 years ago Closed 13 years ago

TM: peephole optimizer for JS bytecode

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

We need a peephole optimizer for JS bytecode. An example, from bug 536630: bitwiseAndValue = 4294967296; for (var i = 0; i < 600000; i++) bitwiseAndValue = bitwiseAndValue & i; The bytecode looks like this: 00022: 27 trace 00023: 28 bindname "bitwiseAndValue" 00026: 28 name "bitwiseAndValue" 00029: 28 getgvar "i" 00032: 28 bitand 00033: 28 setname "bitwiseAndValue" 00037: 27 gvarinc "i" 00040: 27 pop 00041: 27 getgvar "i" 00044: 27 uint24 600000 00048: 27 lt The gvarinc "i" / pop / getgvar "i" sequence could be reduced to gvarinc "i". There are likely to be lots of other peephole opportunities as well. I'm happy to take this, but I don't know much about JS bytecode so it'll take me a little while to get up to speed.
Frankly (and disturbingly) I don't think your problem will be not knowing much about JS bytecode, but rather about updating the decompiler to accurately, and securely, decompile optimized bytecode. (Although, we could store two copies of the bytecode, one optimized, one not, and dodge the bullet, horrors tho the idea may seem...)
I'm ignoring the complication in comment 1 for the moment, I figure that bridge can be burnt if the peephole optimizer actually works well. Working with bitwise-and.js, I have this sequence of bytecode (apologies for the lack of arguments): 0x84807a8: op trace 0x84807a9: op defvar 0x84807ac: op bindname 0x84807af: op double 0x84807b2: op setname 0x84807b5: op pop 0x84807b6: op zero 0x84807b7: op setgvar 0x84807ba: op pop 0x84807bb: op goto 0x84807d1 0x84807be: op trace 0x84807bf: op bindname 0x84807c2: op name 0x84807c5: op getgvar 0x84807c8: op bitand 0x84807c9: op setname 0x84807cc: op pop 0x84807cd: op gvarinc 0x84807d0: op pop 0x84807d1: op getgvar 0x84807d4: op uint24 0x84807d8: op lt 0x84807d9: op ifne 0x84807dc: op stop The goto is jumping into the middle of the 2nd sequence, in fact right in the middle of the gvarinc/pop/getgvar trio that I initially tried to replace with gvarinc/nop/nop/nop/nop. (Totally hacky, yes, I'm just trying to get something working.) I guess that makes sense, since Brendan said in bug 536630 that C-style for-loops are compiled like this: i=0; goto L2; L1: do_something(); i++; L2: if (i<N) goto L1; But it makes things more difficult; for the transformation to be ok the setgvar/pop in the first part must be changed to just setgvar. Hmm.
See js_Disassemble for off-the-shelf bytecode disassembly with operands. It's not safe to shorten gvarinc<i>;pop;getgvar<i> to just gvarinc<i> for global variable i. gvarinc<i> leaves the pre-incremented value of the global variable identified by the atom mapped at index <i> live on the stack, unlike the getgvar ending the original sequence. You'd want incgvar<i>. This brings up the ugly possibility that the gvar may not be optimized via defvar to be getter/setter-free and accessed from the global object's slots. If there is a getter or setter, rewriting the bytecode changes the observable calls to getter and/or setter. Does ES5 strictly define evaluation across statements such that we are not allowed to reorder and reselect other bytecodes to optimize? It looks like it to me, which is kind of bad. I should have caught this... Waldo, what do you think? Optimizing in the parser and/or emitter still seems easier and more efficient than optimizing post-hoc in a peephole pass. /be
(In reply to comment #2) > I'm ignoring the complication in comment 1 for the moment, I figure that bridge > can be burnt if the peephole optimizer actually works well. Agreed, and you guys should (and I'm sure can ;-) "man up" by hacking the decompiler as needed. It is not a picnic, but it can be done and it should be incrementally improved if it can't be replaced. Peephole may be worthwhile but I'd take for(;;) loop to the next level more directly, via compiler changes. Sorry to repeat, perhaps what we really need is a separate bug on this for(i=0;i<n;i++) cliche. /be
(In reply to comment #3) > > Optimizing in the parser and/or emitter still seems easier and more efficient > than optimizing post-hoc in a peephole pass. No problem, I'm not wedded to any approach, I just want to improve the bytecode :) I'm also wondering, from a position of great ignorance, if our stack-based VM is hurting us. The Opera guys have just changed to a register-based VM which they claim is a win: http://my.opera.com/core/blog/2009/02/04/carakan > Peephole may be worthwhile but I'd take for(;;) loop to the next level more > directly, via compiler changes. Sorry to repeat, perhaps what we really need is > a separate bug on this for(i=0;i<n;i++) cliche. Done, bug 537868. Should I close this bug, or do you still want a bytecode peepholer for other cases?
(In reply to comment #5) > (In reply to comment #3) > > > > Optimizing in the parser and/or emitter still seems easier and more > > efficient than optimizing post-hoc in a peephole pass. > > No problem, I'm not wedded to any approach, I just want to improve the > bytecode :) In terms of exactly when the peephole optimizer runs, I think working with the existing code is the way to go. But I would strongly advocate making sure it's a separate component and is not mixed with the front end. In particular it should be possible to turn it on or off with one switch and have things work perfectly fine either way. > I'm also wondering, from a position of great ignorance, if our stack-based VM > is hurting us. The Opera guys have just changed to a register-based VM which > they claim is a win: http://my.opera.com/core/blog/2009/02/04/carakan Thanks for the link. I only skimmed the top so far but it looks really interesting. The stack vs. register question is hard to answer without actually doing the test, but on balance, based on Apple's experience, a mini-interpreter I whipped up to test things like that out, and now Opera's experience, I too believe that the register-based design is better on current hardware/workloads. I'd be pretty happy to see us try to make that change, but I'm not sure how to coordinate it with the JaegerMonkey work. The bytecode->native compiler will obviously have to change if the bytecode changes. That suggests doing the bytecode conversion first. But if we just go ahead with the compiler first, we should be able to get to a point of better perf sooner. It would also give us some practice and new skills that we can apply to making the bytecode->native compiler for the new bytecode even better. Another plan is to work on the bytecode conversion in parallel, and then revisit the decision about what bytecode to base the new compiler on when the time comes. > > Peephole may be worthwhile but I'd take for(;;) loop to the next level more > > directly, via compiler changes. Sorry to repeat, perhaps what we really > > need is a separate bug on this for(i=0;i > Done, bug 537868. Should I close this bug, or do you still want a bytecode > peepholer for other cases?
TM's days are numbered: WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.