Closed Bug 549507 Opened 15 years ago Closed 15 years ago

JM: Opcode dyad profiling

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dmandelin, Unassigned)

References

Details

We need data on the frequency with which each pair of opcodes is executed successively. This will tell us if there are any new opcode fusion optimizations we should do. The infrastructure for this should already be there, guarded by '#ifdef JS_OPMETER'. We want data for SunSpider, V8, and a cool web app or two, maybe Google Docs + an experiment or one of the things we have a hard time tracing.
Here are the common pairs for SS: 10120478 0.096544 getlocal getlocal 3264097 0.031138 getarg getlocal 3201313 0.030539 add setlocal 3012947 0.028742 getlocal getelem 2779871 0.026518 getlocal int8 2470150 0.023564 getlocal getarg 2220535 0.021183 getlocal add 1650693 0.015747 getlocal lt 1482200 0.014139 getlocal one 1232269 0.011755 int8 rsh 1150534 0.010975 getlocal uint16 1135212 0.010829 getarg int8 1046399 0.009982 uint16 lt 1019658 0.009727 int8 bitand 955288 0.009113 getlocal name 913891 0.008718 call trace 912158 0.008701 bitand ifeq 880160 0.008396 one lsh 871431 0.008313 one add 853004 0.008137 getelem setelem 824396 0.007864 getlocal sub 813981 0.007765 getlocal mul 805917 0.007688 int8 lt 792603 0.007561 getelem setlocal 742688 0.007085 getlocal bitand Notable patterns: * getlocal,X If the X doesn't consume the value from the getlocal (e.g., X=getlocal), then there isn't much to do (short of register-allocating across several ops--too hard for near future). But if the X does consume the value, then we can optimize by letting the X use the value directly from a register. In that case we could also consider eliminating the store at the end of getlocal. This kind of optimization can be done more generally by keeping track of which JS stack values are in which registers at a given time, and then when we need to load from stack, looking there first. I think WebKit does something like this. Getting rid of the stores is a little harder. One option is to delay doing the store until either (a) the register is about to be overwritten (do the store at this time), (b) a stub call is about to be made (do the store at this time), or (c) the stack location is overwritten (cancel the store). Getting rid of the stores seems less important as it doesn't contribute to value dependence chains. * add,setlocal If the setlocal is followed by a pop, the result could be directly stored from add. Otherwise there's probably not much win over and above the register-passing optimization above. * ConstInt,Arith An example is int8,rsh. Here, there is no need to store the int8; it can be used as an immediate operand to the rsh. This could be done similarly to the register-passing optimization.
Common pairs for v8: 4395628 0.055762 getlocal getlocal 3953493 0.050153 getlocal mul 3030679 0.038446 int8 rsh 2065025 0.026196 getlocal getarg 2058585 0.026115 uint16 bitand 2031413 0.025770 mul add 2015650 0.025570 getarg getelem 2012402 0.025529 mul getlocal 1983248 0.025159 add setlocal 1979112 0.025106 add getlocal 1962453 0.024895 getlocal int8 1950022 0.024737 getlocal arginc 1398195 0.017737 getlocal pop 1170876 0.014853 zero ge 1113417 0.014124 bitand setlocal 1053467 0.013364 rsh setlocal 1052814 0.013356 decarg zero 1003661 0.012732 getelem uint16 1003025 0.012724 getelem int8 1002776 0.012721 bitand int8 1002758 0.012721 lsh add 1002758 0.012721 int8 lsh 979048 0.012420 getlocal uint16 977891 0.012405 rsh getlocal 976526 0.012388 getelem add 976060 0.012382 getarg add 975984 0.012381 add getarg 975521 0.012375 bitand setelem 975243 0.012372 add setarg 975086 0.012370 getlocal int32 975011 0.012369 rsh add 975011 0.012369 int32 bitand 975011 0.012369 arginc getlocal 975011 0.012369 arginc getelem 940937 0.011936 pop getlocal 826108 0.010480 stricteq ifeq Looks like similar patterns to SS, although the pairs are different.
Common pairs for a brief gmail run: 286896 0.041081 getarg call 222894 0.031917 name getelem 219759 0.031468 getarg getarg 182337 0.026109 call trace 158110 0.022640 getarg name 145382 0.020817 name callelem 94576 0.013542 callprop call 93117 0.013334 this getarg 92131 0.013192 callprop getarg 89937 0.012878 getlocal getlocal 89839 0.012864 callname getarg 88933 0.012734 this callprop 85256 0.012208 getelem name 85223 0.012203 getlocal getelem 75335 0.010787 name eq 75267 0.010778 callname this 74846 0.010717 name case 74834 0.010716 getarg getelem 72294 0.010352 getlocal name 70613 0.010111 getarg setprop 67991 0.009736 pop stop 67643 0.009686 not ifeq 65387 0.009363 callelem getarg 64528 0.009240 getlocal call This is different from the benchmarks--there seem to be fewer opportunities to speed things up by passing along the last value. not/ifeq looks like it could be fused, though, perhaps in the emitter.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
(In reply to comment #3) > not/ifeq looks like it could be fused, though, perhaps in the emitter. This is an oldie, may even have a bug on file. We do not optimize not;ifeq -> ifne or vice versa, but we could. The decompiler will need to dummy up. /be
You need to log in before you can comment on or make changes to this bug.