Closed Bug 475761 Opened 16 years ago Closed 16 years ago

TM: js_Any_GetProp and friends can reenter

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: jorendorff, Assigned: jorendorff)

References

Details

(Keywords: verified1.9.1)

Attachments

(3 files, 1 obsolete file)

This code reenters the interpreter: var obj = {a:1, b:1, c:1, d:1, get e() 1000 }; for (var p in obj) obj[p]; The is a lot like bug 468782, and I think a similar fix would work.
This has to be fixed before bug 462027 (a release blocker) can land. Patch coming.
Assignee: general → jorendorff
Attached patch v1 (obsolete) — Splinter Review
This is percolating through the try server now. There is some risk of having to review again later; but trace-tests and regression tests pass here. Like bug 468782, this patch is a first step; the real fix comes when these new builtins become _FAIL functions that can deep-bail.
Attachment #359427 - Flags: review?(brendan)
The Try Server came up red on Windows due to a silly mistake. Still waiting for Talos to finish (on Mac and Linux)...
Attached patch v2Splinter Review
Attachment #359427 - Attachment is obsolete: true
Attachment #359452 - Flags: review?(brendan)
Attachment #359427 - Flags: review?(brendan)
Talos finished. No apparent perf regression. Do I need to root the ids in the JSFastNative versions of those functions?
Comment on attachment 359452 [details] [diff] [review] v2 >+.igroup setelem JSOP_SETELEM >+ >+ .imacro setprop # obj name val >+ dup # obj name val val >+ pick 3 # name val val obj >+ callbuiltin (JSBUILTIN_SetProperty) # name val val fun obj >+ pick 4 # val val fun obj name >+ pick 4 # val fun obj name val >+ call 2 # val junk >+ pop # val >+ stop >+ .end >+ >+ .imacro setelem # obj i val >+ dup # obj i val val >+ pick 3 # i val val obj >+ callbuiltin (JSBUILTIN_SetElement) # i val val fun obj >+ pick 4 # val val fun obj i >+ pick 4 # val fun obj i val >+ call 2 # val junk >+ pop # val >+ stop >+ .end >+ >+.end >+ >+.igroup initelem JSOP_INITELEM >+ >+ .imacro initelem # obj i val >+ pick 2 # i val obj >+ dup # i val obj obj >+ callbuiltin (JSBUILTIN_SetElement) # i val obj fun obj >+ pick 4 # val obj fun obj i >+ pick 4 # obj fun obj i val >+ call 2 # obj junk >+ pop # obj >+ stop >+ .end >+ Indentaton wackiness (2 vs. 4 space offset) above. >+static jsval FASTCALL >+GetProperty_tn(JSContext *cx, JSObject *obj, JSString *name) >+{ >+ jsid id; >+ jsval v; >+ >+ if (!js_ValueToStringId(cx, STRING_TO_JSVAL(name), &id) >+ || !OBJ_GET_PROPERTY(cx, obj, id, &v)) { Prevailing style has || and && at end of line. >+SetProperty_tn(JSContext* cx, JSObject* obj, JSString* idstr, jsval v) >+{ >+ jsid id; >+ >+ if (!js_ValueToStringId(cx, STRING_TO_JSVAL(idstr), &id) >+ || !OBJ_SET_PROPERTY(cx, obj, id, &v)) { Ditto. Great patch, more imacros to the rescue. r=me with fixes + JSOP_PICK stack balance issue sorted out (mentioned on IRC). /be
Attachment #359452 - Flags: review?(brendan) → review+
Pushed with those fixes, rooting for the ids in {Get,Set}{Property,Element}, an Emacs modeline for imacros.jsasm, and a small bugfix: The JSFastNative versions of Set{Property,Element} weren't setting the return value; they were just returning whatever the setter left in *vp. They must set *vp to a boolean or undefined, even though the next instruction is a POP. Otherwise the snapshot after the call will have a random type on the top of the stack (the type of the value at record-time), leading to trouble at execute-time when side exiting there. http://hg.mozilla.org/tracemonkey/rev/76a7344912ae
This patch is a 30ms regression on ./bench.sh on my machine. According to the no-prisoners policy we should back out, but otoh we are behind schedule and this blocks the rest of Jason's work. Maybe we can fix the perf regression and leave the patch in in the meantime?
No difference in trace stats for sunspider (same traces, same exits/entries, same trigger counts).
TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: *1.022x as slow* 1061.0ms +/- 0.3% 1084.2ms +/- 0.3% significant ============================================================================= 3d: - 154.5ms +/- 0.8% 154.5ms +/- 0.4% cube: - 41.4ms +/- 2.0% 40.8ms +/- 2.0% morph: *1.036x as slow* 30.3ms +/- 1.6% 31.4ms +/- 1.6% significant raytrace: 1.006x as fast 82.8ms +/- 0.4% 82.3ms +/- 0.6% significant access: *1.057x as slow* 132.6ms +/- 1.0% 140.1ms +/- 1.4% significant binary-trees: *1.030x as slow* 40.0ms +/- 1.5% 41.2ms +/- 0.7% significant fannkuch: *1.107x as slow* 55.9ms +/- 0.9% 61.9ms +/- 3.1% significant nbody: ?? 23.8ms +/- 1.9% 24.0ms +/- 0.0% not conclusive: might be *1.008x as slow* nsieve: ?? 12.9ms +/- 1.8% 13.0ms +/- 5.2% not conclusive: might be *1.008x as slow* bitops: ?? 37.9ms +/- 1.7% 38.8ms +/- 3.1% not conclusive: might be *1.024x as slow* 3bit-bits-in-byte: 1.25x as fast 2.0ms +/- 0.0% 1.6ms +/- 23.1% significant bits-in-byte: 1.064x as fast 10.0ms +/- 3.4% 9.4ms +/- 3.9% significant bitwise-and: - 2.8ms +/- 10.8% 2.7ms +/- 12.8% nsieve-bits: *1.087x as slow* 23.1ms +/- 1.0% 25.1ms +/- 2.8% significant controlflow: 1.021x as fast 33.8ms +/- 0.9% 33.1ms +/- 0.7% significant recursive: 1.021x as fast 33.8ms +/- 0.9% 33.1ms +/- 0.7% significant crypto: - 61.3ms +/- 1.0% 61.2ms +/- 0.7% aes: ?? 35.1ms +/- 0.6% 35.3ms +/- 1.0% not conclusive: might be *1.006x as slow* md5: 1.031x as fast 19.8ms +/- 1.5% 19.2ms +/- 1.6% significant sha1: ?? 6.4ms +/- 5.8% 6.7ms +/- 5.2% not conclusive: might be *1.047x as slow* date: - 173.6ms +/- 0.3% 172.9ms +/- 0.5% format-tofte: - 67.9ms +/- 0.6% 67.7ms +/- 0.7% format-xparb: - 105.7ms +/- 0.3% 105.2ms +/- 0.6% math: ?? 39.8ms +/- 1.4% 40.0ms +/- 1.2% not conclusive: might be *1.005x as slow* cordic: ?? 19.6ms +/- 1.9% 19.8ms +/- 1.5% not conclusive: might be *1.010x as slow* partial-sums: - 14.0ms +/- 0.0% 14.0ms +/- 0.0% spectral-norm: - 6.2ms +/- 4.9% 6.2ms +/- 4.9% regexp: - 56.2ms +/- 0.5% 56.1ms +/- 0.7% dna: - 56.2ms +/- 0.5% 56.1ms +/- 0.7% string: *1.044x as slow* 371.3ms +/- 0.4% 387.5ms +/- 0.4% significant base64: *1.006x as slow* 16.0ms +/- 0.0% 16.1ms +/- 1.4% significant fasta: ?? 76.1ms +/- 0.7% 76.2ms +/- 0.6% not conclusive: might be *1.001x as slow* tagcloud: *1.021x as slow* 108.3ms +/- 0.3% 110.6ms +/- 1.5% significant unpack-code: *1.096x as slow* 138.7ms +/- 0.5% 152.0ms +/- 0.7% significant validate-input: ?? 32.2ms +/- 1.4% 32.6ms +/- 1.1% not conclusive: might be *1.012x as slow*
unpack-code seems to have lost a lot, fannkuch as well. unpack-code uses string element lookup, fannkuch arrays.
We could have faster paths for already-string, already-interned-as-atom, id values, maybe. The imacro stack manipulation boils away. Where's the time going? Is JSOP_CALLBUILTIN's LIR any less tight than the old way (which could re-enter)? It would be helpful to see LIR or asm diffs. /be
var q = {}; q.z = 5.3; var x = 5.3; var s = "z"; for (var i = 0; i < 10; ++i) x += q[s]; Run without and with the patch and diff -u of verbose output: whale:src gal$ diff -u /tmp/before /tmp/after --- /tmp/before 2009-01-29 01:24:38.000000000 -0800 +++ /tmp/after 2009-01-29 01:25:18.000000000 -0800 @@ -53,21 +53,33 @@ guard(shape) = eq shape, 148 xf2: xf guard(shape) -> 0:60 sp+24 rp+0 - js_Any_getprop1 = js_Any_getprop ( cx $global1 $global2 ) - eq1 = eq js_Any_getprop1, JSVAL_ERROR_COOKIE - xt2: xt eq1 -> 0:60 sp+24 rp+0 +00000: swap +00000: callbuiltin 2 +00000: pick 2 +00000: call 1 + sti sp[16] = $global1 + sti sp[8] = $global2 + sti sp[24] = $global1 + obj + sti sp[16] = obj + sti sp[8] = obj + sti sp[16] = $global1 + sti sp[24] = $global2 + GetProperty_tn1 = GetProperty_tn ( cx $global1 $global2 ) + sti sp[8] = GetProperty_tn1 JSVAL_TAGMASK - and1 = and js_Any_getprop1, JSVAL_TAGMASK + and1 = and GetProperty_tn1, JSVAL_TAGMASK 2 - eq2 = eq and1, 2 + eq1 = eq and1, 2 JSVAL_INT - and2 = and js_Any_getprop1, JSVAL_INT - or1 = or and2, eq2 - eq3 = eq or1, 0 - xt3: xt eq3 -> 0:60 sp+24 rp+0 + and2 = and GetProperty_tn1, JSVAL_INT + or1 = or and2, eq1 + eq2 = eq or1, 0 + xt2: xt eq2 -> 10:60 sp+16 rp+0 - js_UnboxDouble1 = js_UnboxDouble ( js_Any_getprop1 ) + js_UnboxDouble1 = js_UnboxDouble ( GetProperty_tn1 ) +00000: stop 00061: 6 add 00062: 6 setgvar "x" 00066: 5 incgvar "i" @@ -82,7 +94,7 @@ #3FF00000:0 /* 1 */ add1 = add ld4, JSVAL_INT ov1 = ov add1 - xt4: xt ov1 -> 0:66 sp+0 rp+0 + xt3: xt ov1 -> 0:66 sp+0 rp+0 00069: 5 pop 00070: 5 getgvar "i" @@ -107,14 +119,14 @@ st gp[680] = add1 loop -live instruction count 32, total 72, max pressure 9 +live instruction count 33, total 72, max pressure 9 side exits 1 state state = param 0 ecx state sp sp = ld state[0] state sp rp rp = ld state[4] state sp cx cx = ld state[12] state sp cx gp gp = ld state[8] -state sp cx gp eos eos = ld state[JSVAL_ERROR_COOKIE] +state sp cx gp eos eos = ld state[16] state sp cx gp eor eor = ld state[20] state sp cx gp globalObj globalObj = ld state[36] state sp cx gp ld1 ld1 = ld cx[0] @@ -128,10 +140,11 @@ state sp cx gp $global0 $global1 $global2 sti sp[16] = $global2 state sp cx gp $global0 $global1 $global2 ld2 ld2 = ld $global1[0] state sp cx gp $global0 $global1 $global2 ld2 ops ops = ld ld2[4] -state sp cx gp $global0 $global1 $global2 ld2 ld3 ld3 = ld ops[JSVAL_ERROR_COOKIE] -state sp cx gp $global0 $global1 $global2 shape shape = ld ld2[JSVAL_ERROR_COOKIE] -state sp gp $global0 js_Any_getprop1 js_Any_getprop1 = js_Any_getprop ( cx $global1 $global2 ) -state sp gp $global0 js_UnboxDouble1 js_UnboxDouble1 = js_UnboxDouble ( js_Any_getprop1 ) +state sp cx gp $global0 $global1 $global2 ld2 ld3 ld3 = ld ops[16] +state sp cx gp $global0 $global1 $global2 shape shape = ld ld2[16] +state sp gp $global0 GetProperty_tn1 GetProperty_tn1 = GetProperty_tn ( cx $global1 $global2 ) +state sp gp $global0 GetProperty_tn1 sti sp[8] = GetProperty_tn1 +state sp gp $global0 js_UnboxDouble1 js_UnboxDouble1 = js_UnboxDouble ( GetProperty_tn1 ) state sp gp fadd1 fadd1 = fadd $global0, js_UnboxDouble1 state sp gp stq gp[664] = fadd1 state sp gp ld4 ld4 = ld gp[680] @@ -143,6 +156,7 @@ state loop 0x256ef0 [prologue] + nop6 nop9 push edi push esi @@ -150,10 +164,10 @@ push ebp push ebp mov ebp,esp - 0x256f00 [frag entry] + 0x256f06 [frag entry] sub esp,24 -patching 0x256ff3 to 0x256f03 - 0x256f03: +patching 0x256ff3 to 0x256f09 + 0x256f09: compiling trunk 0x257004 state = param 0 ecx sp = ld state[0] @@ -173,14 +187,14 @@ sub eax,4096 eax(ld1) ecx(cx) ebx(state) esi(gp) edi(sp) mov 0(ecx),eax eax(sub1) ecx(cx) ebx(state) esi(gp) edi(sp) cmp eax,0 eax(sub1) ecx(cx) ebx(state) esi(gp) edi(sp) - jle 0x2b8f9d ecx(cx) ebx(state) esi(gp) edi(sp) + jle 0x2b8fac ecx(cx) ebx(state) esi(gp) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) - 0x2b8f9d: + 0x2b8fac: merging registers (intersect) with existing edge mov ecx,-12(ebp) mov eax,2453748 mov esp,ebp - 0x2b8fa7: + 0x2b8fb6: jmp 0x256ff8 --------------------------------------- end exit block 0x257138 $global0 = ldq gp[664] @@ -191,7 +205,7 @@ sti sp[16] = $global2 ld2 = ld $global1[0] ops = ld ld2[4] - ld3 = ld ops[JSVAL_ERROR_COOKIE] + ld3 = ld ops[16] guard(native-map) = eq ld3, OP(&js_ObjectOps) xf1: xf guard(native-map) -> 0:60 sp+24 rp+0 movq xmm0,664(esi) ecx(cx) ebx(state) esi(gp) edi(sp) @@ -206,85 +220,74 @@ mov -20(ebp),eax eax(ld2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) mov eax,4(eax) eax(ld2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) mov eax,16(eax) eax(ops) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) - cmp eax,647590 eax(ld3) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) + cmp eax,646854 eax(ld3) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) mov eax,-20(ebp) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) - jne 0x2b8fac eax(ld2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) + jne 0x2b8fbb eax(ld2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) - 0x2b8fac: + 0x2b8fbb: merging registers (intersect) with existing edge mov ecx,-12(ebp) mov eax,2453880 mov esp,ebp - 0x2b8fb6: + 0x2b8fc5: jmp 0x256ff8 --------------------------------------- end exit block 0x2571c0 - shape = ld ld2[JSVAL_ERROR_COOKIE] + shape = ld ld2[16] guard(shape) = eq shape, 148 xf2: xf guard(shape) -> 0:60 sp+24 rp+0 mov eax,16(eax) eax(ld2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) cmp eax,148 eax(shape) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) mov eax,-16(ebp) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) - jne 0x2b8fbb eax($global2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) + jne 0x2b8fca eax($global2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) - 0x2b8fbb: + 0x2b8fca: merging registers (intersect) with existing edge mov ecx,-12(ebp) mov eax,2453968 mov esp,ebp - 0x2b8fc5: + 0x2b8fd4: jmp 0x256ff8 --------------------------------------- end exit block 0x257218 - js_Any_getprop1 = js_Any_getprop ( cx $global1 $global2 ) - eq1 = eq js_Any_getprop1, JSVAL_ERROR_COOKIE - xt2: xt eq1 -> 0:60 sp+24 rp+0 + GetProperty_tn1 = GetProperty_tn ( cx $global1 $global2 ) + sti sp[8] = GetProperty_tn1 + and1 = and GetProperty_tn1, JSVAL_TAGMASK + eq1 = eq and1, 2 + and2 = and GetProperty_tn1, JSVAL_INT + or1 = or and2, eq1 + eq2 = eq or1, 0 + xt2: xt eq2 -> 10:60 sp+16 rp+0 sub esp,12 eax($global2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) push eax eax($global2) ecx(cx) edx($global1) ebx(state) esi(gp) edi(sp) - call js_Any_getprop ebx(state) esi(gp) edi(sp) + call GetProperty_tn ebx(state) esi(gp) edi(sp) add esp,12 ebx(state) esi(gp) edi(sp) - mov ecx,eax eax(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - cmp ecx,16 ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - je 0x2b8fca ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) ---------------------------------------- exit block (LIR_xt|LIR_xf) - 0x2b8fca: - merging registers (intersect) with existing edge - mov ecx,-12(ebp) - mov eax,2454060 - mov esp,ebp - 0x2b8fd4: - jmp 0x256ff8 ---------------------------------------- end exit block 0x257274 - and1 = and js_Any_getprop1, JSVAL_TAGMASK - eq2 = eq and1, 2 - and2 = and js_Any_getprop1, JSVAL_INT - or1 = or and2, eq2 - eq3 = eq or1, 0 - xt3: xt eq3 -> 0:60 sp+24 rp+0 - mov eax,ecx ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - and eax,7 ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - cmp eax,2 eax(and1) ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - sete edx ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - movzx edx,edx ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - mov eax,ecx ecx(js_Any_getprop1) edx(eq2) ebx(state) esi(gp) edi(sp) - and eax,1 ecx(js_Any_getprop1) edx(eq2) ebx(state) esi(gp) edi(sp) - or eax,edx eax(and2) ecx(js_Any_getprop1) edx(eq2) ebx(state) esi(gp) edi(sp) - test eax,eax eax(or1) ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) - je 0x2b8fd9 ecx(js_Any_getprop1) ebx(state) esi(gp) edi(sp) + mov ecx,eax eax(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + mov 8(edi),ecx ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + mov eax,ecx ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + and eax,7 ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + cmp eax,2 eax(and1) ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + sete edx ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + movzx edx,edx ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + mov eax,ecx ecx(GetProperty_tn1) edx(eq1) ebx(state) esi(gp) edi(sp) + and eax,1 ecx(GetProperty_tn1) edx(eq1) ebx(state) esi(gp) edi(sp) + or eax,edx eax(and2) ecx(GetProperty_tn1) edx(eq1) ebx(state) esi(gp) edi(sp) + test eax,eax eax(or1) ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) + je 0x2b8fd9 ecx(GetProperty_tn1) ebx(state) esi(gp) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) 0x2b8fd9: merging registers (intersect) with existing edge mov ecx,-12(ebp) - mov eax,2454168 + mov eax,2454128 mov esp,ebp 0x2b8fe3: jmp 0x256ff8 ---------------------------------------- end exit block 0x2572e0 - js_UnboxDouble1 = js_UnboxDouble ( js_Any_getprop1 ) +--------------------------------------- end exit block 0x2572b8 + js_UnboxDouble1 = js_UnboxDouble ( GetProperty_tn1 ) fadd1 = fadd $global0, js_UnboxDouble1 stq gp[664] = fadd1 ld4 = ld gp[680] add1 = add ld4, JSVAL_INT ov1 = ov add1 - xt4: xt ov1 -> 0:66 sp+0 rp+0 + xt3: xt ov1 -> 0:66 sp+0 rp+0 call js_UnboxDouble ebx(state) esi(gp) edi(sp) movq xmm0,-8(ebp) ebx(state) esi(gp) edi(sp) mov ecx,ebx ebx(state) esi(gp) edi(sp) xmm0($global0) @@ -297,11 +300,11 @@ jo 0x2b8fe8 ecx(state) ebx(add1) esi(gp) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) 0x2b8fe8: - mov eax,2454300 + mov eax,2454260 mov esp,ebp 0x2b8fef: jmp 0x256ff8 ---------------------------------------- end exit block 0x257364 +--------------------------------------- end exit block 0x25733c st gp[680] = add1 sti sp[0] = add1 sti sp[8] = 10 @@ -314,11 +317,11 @@ jnl 0x2b8ff4 ecx(state) ebx(add1) esi(gp) --------------------------------------- exit block (LIR_xt|LIR_xf) 0x2b8ff4: - mov eax,2454416 + mov eax,2454376 mov esp,ebp 0x2b8ffb: jmp 0x256ff8 ---------------------------------------- end exit block 0x2573d8 +--------------------------------------- end exit block 0x2573b0 st gp[680] = add1 JSVAL_INT loop @@ -341,7 +344,7 @@ entering trace at x.js:6@51, native stack slots: 4 code: 0x256ef0 global: double<15.9> object<0x292100:Object> string<0x294650> int<2> stack: -leaving trace at x.js:5@75, op=lt, lr=0x25739c, exitType=1, sp=2, calldepth=0, cycles=93900 +leaving trace at x.js:5@75, op=lt, lr=0x257374, exitType=1, sp=2, calldepth=0, cycles=146575 double<58.3> object<0x292100:Object> string<0x294650> int<10> stack0=int<10> stack1=int<10> recorder: started(1), aborted(0), completed(1), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
(In reply to comment #11) > unpack-code seems to have lost a lot, fannkuch as well. unpack-code uses string > element lookup, fannkuch arrays. String element gets and dense array gets/sets should not have been affected. Investigating. > Is JSOP_CALLBUILTIN's LIR any less tight than the old way (which could > re-enter)? It would be helpful to see LIR or asm diffs. I don't know why this patch makes things slower, but N.B. bug 462027 will make it less tight in several ways. There will be a snapshot (LIR_xbarrier) before the call, which forces the function arguments to be written to the native stack. There will be an extra store before the call (cx->bailExit). The guard after the call will have an extra load in it (cx->builtinStatus). The snapshot for the guard after the call will force the return value to be written to the native stack. We *might* be able to recover the lost perf by throwing away this patch and instead just make js_Any_GetProp and friends more paranoid. Let's look closer first...
SunSpider tells me: fannkuch: ?? 67.1ms +/- 2.0% 67.2ms +/- 2.3% not conclusive: might be *1.001x as slow* unpack-code: ?? 181.3ms +/- 2.0% 184.4ms +/- 1.5% not conclusive: might be *1.017x as slow* When I run fannkuch with breakpoints on all the new call_imacro paths, none of them hit.
A full run, after rebooting. (There is no reason for anything to be faster, though, so these numbers are at least questionable.) TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.005x as fast 2354.7ms +/- 0.2% 2342.4ms +/- 0.2% significant ============================================================================= 3d: 1.008x as fast 298.8ms +/- 0.3% 296.4ms +/- 0.4% significant cube: 1.015x as fast 102.4ms +/- 0.4% 100.9ms +/- 0.5% significant morph: ?? 94.9ms +/- 0.2% 95.2ms +/- 0.5% not conclusive: might be *1.003x as slow* raytrace: 1.012x as fast 101.5ms +/- 0.4% 100.3ms +/- 0.5% significant access: *1.009x as slow* 372.0ms +/- 0.3% 375.4ms +/- 0.4% significant binary-trees: 1.009x as fast 44.9ms +/- 0.5% 44.5ms +/- 0.8% significant fannkuch: *1.022x as slow* 139.6ms +/- 0.4% 142.7ms +/- 0.5% significant nbody: ?? 139.4ms +/- 0.4% 140.1ms +/- 0.3% not conclusive: might be *1.005x as slow* nsieve: - 48.1ms +/- 0.5% 48.1ms +/- 0.5% bitops: 1.022x as fast 259.5ms +/- 0.3% 253.9ms +/- 0.3% significant 3bit-bits-in-byte: - 42.6ms +/- 0.9% 42.5ms +/- 1.2% bits-in-byte: 1.082x as fast 75.3ms +/- 0.5% 69.6ms +/- 0.7% significant bitwise-and: - 62.0ms +/- 0.5% 62.0ms +/- 0.0% nsieve-bits: ?? 79.6ms +/- 0.5% 79.8ms +/- 0.4% not conclusive: might be *1.003x as slow* controlflow: 1.022x as fast 37.7ms +/- 0.9% 36.9ms +/- 0.6% significant recursive: 1.022x as fast 37.7ms +/- 0.9% 36.9ms +/- 0.6% significant crypto: 1.010x as fast 156.2ms +/- 0.5% 154.6ms +/- 0.4% significant aes: - 62.6ms +/- 0.6% 62.5ms +/- 0.6% md5: - 46.3ms +/- 0.7% 46.0ms +/- 1.0% sha1: 1.026x as fast 47.3ms +/- 0.7% 46.1ms +/- 0.5% significant date: 1.009x as fast 205.1ms +/- 0.4% 203.2ms +/- 0.8% significant format-tofte: ?? 79.6ms +/- 0.6% 79.8ms +/- 0.7% not conclusive: might be *1.003x as slow* format-xparb: 1.017x as fast 125.5ms +/- 0.5% 123.4ms +/- 1.0% significant math: 1.008x as fast 266.3ms +/- 0.3% 264.3ms +/- 0.5% significant cordic: 1.026x as fast 112.2ms +/- 0.4% 109.4ms +/- 0.5% significant partial-sums: ?? 101.3ms +/- 0.5% 101.6ms +/- 0.8% not conclusive: might be *1.003x as slow* spectral-norm: ?? 52.8ms +/- 0.9% 53.3ms +/- 0.6% not conclusive: might be *1.009x as slow* regexp: - 226.4ms +/- 0.2% 225.9ms +/- 0.5% dna: - 226.4ms +/- 0.2% 225.9ms +/- 0.5% string: - 532.7ms +/- 0.4% 531.8ms +/- 0.4% base64: ?? 48.9ms +/- 0.8% 49.4ms +/- 0.7% not conclusive: might be *1.010x as slow* fasta: - 125.7ms +/- 0.8% 125.4ms +/- 0.5% tagcloud: - 123.6ms +/- 0.4% 123.3ms +/- 0.5% unpack-code: 1.005x as fast 168.9ms +/- 0.4% 168.0ms +/- 0.4% significant validate-input: ?? 65.6ms +/- 0.8% 65.7ms +/- 0.7% not conclusive: might be *1.002x as slow*
Yeah, I captured the spew for fannkuch before and after this patch, and did perl -p -i.orig -e 's/0x[0-9a-f]+/0xnnnnnnnn/g' fannkuch-[ab].log perl -p -i.bak -e 's/cycles=([0-9]+)/cycles=nnnnnn/g' fannkuch-[ab].log perl -p -i.bak -e 's/,(..)?[a-f0-9]{6,}/,\1nnnnnn/g' fannkuch-[ab].log This produced two identical files. The new stuff isn't even called.
I just love benchmarks ... I will rerun again. Might be architecture-specific. My machine is a lot faster than yours (2x), so we must have different CPUs.
Moved perf discussion to this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=475998
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Flags: blocking1.9.1+
js1_8_1/trace/trace-test.js v 1.9.1, 1.9.2
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: