Closed
Bug 536458
Opened 16 years ago
Closed 16 years ago
Too many charCodeAt builtin calls when using bitops and arithmetic
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 551039
People
(Reporter: bzbarsky, Assigned: gal)
References
Details
(Keywords: perf)
Attachments
(3 files)
|
104 bytes,
text/plain
|
Details | |
|
630 bytes,
text/html
|
Details | |
|
768 bytes,
patch
|
Details | Diff | Splinter Review |
BUILD: tracemonkey js shell built from rev 9d0fe1237c26
STEPS TO REPRODUCE:
1) Add printfs to js_String_p_charCodeAt_int and js_String_p_charCodeAt so you
can see when they're called (and in particular so you can tell which one is
called).
2) Run the attached shell testcase with jit on
EXPECTED RESULTS: 3 calls to js_String_p_charCodeAt_int
ACTUAL RESULTS: 2 calls to js_String_p_charCodeAt_int and 2 calls to
js_String_p_charCodeAt
If you increase the iteration count, both numbers increase with it. It looks like we're calling _both_ builtins for each charCodeAt call. The "+1" is key to get this behavior; if I take it out, I just get calls to the int builtin.
| Reporter | ||
Comment 1•16 years ago
|
||
| Reporter | ||
Updated•16 years ago
|
| Reporter | ||
Comment 2•16 years ago
|
||
And in particular, if I take out the +1, I only get 2 calls to js_String_p_charCodeAt_int.
| Reporter | ||
Comment 3•16 years ago
|
||
I guess only 2 calls are expected, since we seem to now trace on loop #3 and run on loop #4, instead of 2 and 3 respectively...
| Reporter | ||
Comment 4•16 years ago
|
||
Ignore comment 3; one iteration is effectively "lost" due to some global shape evolution stuff. So we still only expect 2 calls, instead of 2 and 2.
| Reporter | ||
Comment 5•16 years ago
|
||
OK, a TMFLAGS=liveness log shows that with the +1 the LIR after liveness analysis has:
js_String_p_charCodeAt1 = fcall #js_String_p_charCodeAt ( $global0 $global1 )
$global0 state sp eos ld2 js_String_p_charCodeAt1
stqi sp[0] = js_String_p_charCodeAt1
$global0 state sp eos ld2
js_String_p_charCodeAt_int1 = icall #js_String_p_charCodeAt_int ( $global0 ld2 )
state sp js_String_p_charCodeAt_int1 eos ld2
sti sp[0] = js_String_p_charCodeAt_int1
As in, the liveness analysis fails to DCE the non-int call.
| Reporter | ||
Comment 6•16 years ago
|
||
I should note that I don't quite see where we treat CSE-able calls as not liveness roots in the first place...
| Reporter | ||
Comment 7•16 years ago
|
||
| Reporter | ||
Comment 8•16 years ago
|
||
OK. So what happens in the "working" case is that the stqi to sp[0] is filtered out by StackFilter because d <= top (both are 0, in fact).
In the "non-working" case, d < top, and in particular top - d == 4. And we have bit 4 set due to the sti from js_String_p_charCodeAt_int1, but we do NOT have bit 3 set (because the js_String_p_charCodeAt_int1 only writes 4 bytes to the stack). So the StackFilter leaves the stqi, and then liveness treats it as a statement and entrains the js_String_p_charCodeAt1.
| Reporter | ||
Comment 9•16 years ago
|
||
And the reason top > d in that case is that with the + 1 there we have:
0x8137c8 and1 = and js_String_p_charCodeAt_int_int1, 255
state sp and1 eos ld2
0x8137e0 sti sp[0] = and1 state sp and1 eos ld2
0x813804 sti sp[8] = 1 state sp and1 eos ld2
0x813810 add1 = add and1, 1 add1 state sp eos ld2
0x813818 ov1 = ov add1 add1 ov1 state sp eos ld2
0x813824 xt1: xt ov1 -> pc=0x40e690 imacpc=0x0 sp+16 rp+0 (GuardID=004)
and the StackFilter looks at stack-top whenever it sees a guard (so at that xt guard). And at that point stack-top is in fact at 16 (which we shift right by 2 to get the 4 we're seeing). In other words, our attempt to do integer addition here is actually screwing us over in terms of the builtins we call.
| Reporter | ||
Comment 10•16 years ago
|
||
If I hack StackFilter to assume all stores are effectively 4-byte stores, the attached HTML testcase gets much faster and sunspider gets an 8% speedup in browser (though only about 1% in shell, for reasons I don't understand).
Comment 11•16 years ago
|
||
bz, this is huge (well, in-browser SS speedup -- more info craved on diff there vs. shell). Cc'ing NJ peeps.
/be
Comment 12•16 years ago
|
||
bz, let me see if I understand correctly. You have a sequence like this:
stqi sp[0] = x
...
sti sp[0] = y
Ie. the sti clobbers half of the value written by the stqi. And you'd like the stqi to be removed, in this case because computing x is expensive but x isn't used subsequently. I can see why you'd want that, but what if the instruction immediately after the sti was 'z = ld sp[4]'?
| Reporter | ||
Comment 13•16 years ago
|
||
Nicholas, that's exactly the sequence I have, and that's exactly what I'd like to happen. I understand why StackFilter currently doesn't do it, but if I understand correctly, in our actual usage of nanojit all stack access is 8-byte aligned (perhaps modulo function calls?), so the next instruction cannot be |ld sp[4]|. I might be understanding incorrectly, of course...
Comment 14•16 years ago
|
||
(In reply to comment #13)
> if I
> understand correctly, in our actual usage of nanojit all stack access is 8-byte
> aligned (perhaps modulo function calls?), so the next instruction cannot be |ld
> sp[4]|.
I don't know if that's true, but I imagine that the optimisation would be safe w.r.t. how TM uses the stack. But expecting NJ to know about TM's use of the stack doesn't seem right. Maybe the bytecode-to-LIR translation should be better.
| Reporter | ||
Comment 15•16 years ago
|
||
I just reran sunspider in browser a few times with and without a hacked StackFilter, and the 8% might have been a fluke. Even with the hack to StackFilter I have never gotten a run that fast again.... That makes me a little happier at least in terms of us measuring shell sunspider as an ok proxy for browser.
The 1% improvement in shell might be real, though.
Comment 16•16 years ago
|
||
I suggest waiting until another commit has gone in, then time the shell again. That helps separate noise from real changes.
| Assignee | ||
Updated•16 years ago
|
Assignee: general → gal
Comment 17•16 years ago
|
||
(In reply to comment #12)
> bz, let me see if I understand correctly. You have a sequence like this:
>
> stqi sp[0] = x
> ...
> sti sp[0] = y
Thinking about this more, this is a very strange sequence.
| Reporter | ||
Comment 18•16 years ago
|
||
Strange in what sense?
Comment 19•16 years ago
|
||
Why does sp[0] have two different types in quick succession? (The stqi is now stfi, btw.) I guess the float value is stored and then is dead? In which case it seems surprising that it was stored. Maybe it's an artifact of the stack machine. I looked at the bytecode but had some trouble understanding how it got mapped to the LIR.
| Reporter | ||
Comment 20•16 years ago
|
||
> Why does sp[0] have two different types in quick succession?
Because of the code landed in bug 534364 (which this bug is a regression from; see dependencies), which basically ends up with us making two insCall calls in a row, one for a function returning a double and one for a function returning an int.
Comment 21•16 years ago
|
||
Aha, so it is strange code! Is there a plan to avoid the two calls at the jstracer level?
Updated•16 years ago
|
blocking2.0: ? → -
| Reporter | ||
Comment 22•16 years ago
|
||
I have no idea. Ask Andreas or dvander? That would fix this bug as filed, of course.
| Reporter | ||
Comment 23•16 years ago
|
||
I've seen this show up in a few more testcases out on the web.
blocking2.0: - → ?
Comment 24•16 years ago
|
||
Marking this as a dup of 551039 because both bugs have the same root cause -- shortcomings in StackFilter.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
Updated•15 years ago
|
blocking2.0: ? → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•