Closed Bug 536458 Opened 16 years ago Closed 16 years ago

Too many charCodeAt builtin calls when using bitops and arithmetic

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 551039

People

(Reporter: bzbarsky, Assigned: gal)

References

Details

(Keywords: perf)

Attachments

(3 files)

Attached file shell testcase
BUILD: tracemonkey js shell built from rev 9d0fe1237c26 STEPS TO REPRODUCE: 1) Add printfs to js_String_p_charCodeAt_int and js_String_p_charCodeAt so you can see when they're called (and in particular so you can tell which one is called). 2) Run the attached shell testcase with jit on EXPECTED RESULTS: 3 calls to js_String_p_charCodeAt_int ACTUAL RESULTS: 2 calls to js_String_p_charCodeAt_int and 2 calls to js_String_p_charCodeAt If you increase the iteration count, both numbers increase with it. It looks like we're calling _both_ builtins for each charCodeAt call. The "+1" is key to get this behavior; if I take it out, I just get calls to the int builtin.
Blocks: 534364
blocking2.0: --- → ?
Keywords: perf
And in particular, if I take out the +1, I only get 2 calls to js_String_p_charCodeAt_int.
I guess only 2 calls are expected, since we seem to now trace on loop #3 and run on loop #4, instead of 2 and 3 respectively...
Ignore comment 3; one iteration is effectively "lost" due to some global shape evolution stuff. So we still only expect 2 calls, instead of 2 and 2.
OK, a TMFLAGS=liveness log shows that with the +1 the LIR after liveness analysis has: js_String_p_charCodeAt1 = fcall #js_String_p_charCodeAt ( $global0 $global1 ) $global0 state sp eos ld2 js_String_p_charCodeAt1 stqi sp[0] = js_String_p_charCodeAt1 $global0 state sp eos ld2 js_String_p_charCodeAt_int1 = icall #js_String_p_charCodeAt_int ( $global0 ld2 ) state sp js_String_p_charCodeAt_int1 eos ld2 sti sp[0] = js_String_p_charCodeAt_int1 As in, the liveness analysis fails to DCE the non-int call.
I should note that I don't quite see where we treat CSE-able calls as not liveness roots in the first place...
OK. So what happens in the "working" case is that the stqi to sp[0] is filtered out by StackFilter because d <= top (both are 0, in fact). In the "non-working" case, d < top, and in particular top - d == 4. And we have bit 4 set due to the sti from js_String_p_charCodeAt_int1, but we do NOT have bit 3 set (because the js_String_p_charCodeAt_int1 only writes 4 bytes to the stack). So the StackFilter leaves the stqi, and then liveness treats it as a statement and entrains the js_String_p_charCodeAt1.
And the reason top > d in that case is that with the + 1 there we have: 0x8137c8 and1 = and js_String_p_charCodeAt_int_int1, 255 state sp and1 eos ld2 0x8137e0 sti sp[0] = and1 state sp and1 eos ld2 0x813804 sti sp[8] = 1 state sp and1 eos ld2 0x813810 add1 = add and1, 1 add1 state sp eos ld2 0x813818 ov1 = ov add1 add1 ov1 state sp eos ld2 0x813824 xt1: xt ov1 -> pc=0x40e690 imacpc=0x0 sp+16 rp+0 (GuardID=004) and the StackFilter looks at stack-top whenever it sees a guard (so at that xt guard). And at that point stack-top is in fact at 16 (which we shift right by 2 to get the 4 we're seeing). In other words, our attempt to do integer addition here is actually screwing us over in terms of the builtins we call.
If I hack StackFilter to assume all stores are effectively 4-byte stores, the attached HTML testcase gets much faster and sunspider gets an 8% speedup in browser (though only about 1% in shell, for reasons I don't understand).
bz, this is huge (well, in-browser SS speedup -- more info craved on diff there vs. shell). Cc'ing NJ peeps. /be
bz, let me see if I understand correctly. You have a sequence like this: stqi sp[0] = x ... sti sp[0] = y Ie. the sti clobbers half of the value written by the stqi. And you'd like the stqi to be removed, in this case because computing x is expensive but x isn't used subsequently. I can see why you'd want that, but what if the instruction immediately after the sti was 'z = ld sp[4]'?
Nicholas, that's exactly the sequence I have, and that's exactly what I'd like to happen. I understand why StackFilter currently doesn't do it, but if I understand correctly, in our actual usage of nanojit all stack access is 8-byte aligned (perhaps modulo function calls?), so the next instruction cannot be |ld sp[4]|. I might be understanding incorrectly, of course...
(In reply to comment #13) > if I > understand correctly, in our actual usage of nanojit all stack access is 8-byte > aligned (perhaps modulo function calls?), so the next instruction cannot be |ld > sp[4]|. I don't know if that's true, but I imagine that the optimisation would be safe w.r.t. how TM uses the stack. But expecting NJ to know about TM's use of the stack doesn't seem right. Maybe the bytecode-to-LIR translation should be better.
I just reran sunspider in browser a few times with and without a hacked StackFilter, and the 8% might have been a fluke. Even with the hack to StackFilter I have never gotten a run that fast again.... That makes me a little happier at least in terms of us measuring shell sunspider as an ok proxy for browser. The 1% improvement in shell might be real, though.
I suggest waiting until another commit has gone in, then time the shell again. That helps separate noise from real changes.
Assignee: general → gal
(In reply to comment #12) > bz, let me see if I understand correctly. You have a sequence like this: > > stqi sp[0] = x > ... > sti sp[0] = y Thinking about this more, this is a very strange sequence.
Strange in what sense?
Why does sp[0] have two different types in quick succession? (The stqi is now stfi, btw.) I guess the float value is stored and then is dead? In which case it seems surprising that it was stored. Maybe it's an artifact of the stack machine. I looked at the bytecode but had some trouble understanding how it got mapped to the LIR.
> Why does sp[0] have two different types in quick succession? Because of the code landed in bug 534364 (which this bug is a regression from; see dependencies), which basically ends up with us making two insCall calls in a row, one for a function returning a double and one for a function returning an int.
Aha, so it is strange code! Is there a plan to avoid the two calls at the jstracer level?
blocking2.0: ? → -
I have no idea. Ask Andreas or dvander? That would fix this bug as filed, of course.
I've seen this show up in a few more testcases out on the web.
blocking2.0: - → ?
Marking this as a dup of 551039 because both bugs have the same root cause -- shortcomings in StackFilter.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
blocking2.0: ? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: