Everywhere we're dominated by calls, we're pretty slow. We can do better. SunSpider calls are 43% fastnatives, and 56% scripted. v8 calls are 4% fastnatives, and 96% scripted. I'm separating this bug into three patches: 1) Inline the return path as much as possible - only stub call if we need to put call or args objs. 2) Inline the call path for fast natives. 3) Inline the call path for scripted natives. The last part will be hardest. I would like to move frame construction - as much as possible - into the prologue of methods. This way the actual call is very cheap, and entering the method sets up the frame.
Part 1: http://hg.mozilla.org/users/danderson_mozilla.com/moo/rev/2d3fedb92d35 This completely inlines scripted returns. It also eliminates the expensive frame-syncing, which is now done out-of-line and only if a call object is needed. It also avoids |sp[-1] = oldfp->rval|, instead loading the rval into a jsval register pair. Future work on Part 1 remains after the rest is done: we can avoid storing into fp->rval most of the time. We can also avoid the primitive-this-rval test on every return, instead placing it into JSOP_NEW. Part 1 is 2% (21ms) SS win, 8% (1288ms v8 win. Part 2 should come tomorrow, part 3 by the end of the week.
Part 2: http://hg.mozilla.org/users/danderson_mozilla.com/moo/rev/7f9415198722 This was a slight loss on my the graphs, but a slight win locally. Most of it is preparatory work for part 3 so this path will become faster.
Precursor work to part 3: Special stub for already-compiled calls: http://hg.mozilla.org/projects/jaegermonkey/rev/c8f3c19d3b0f Lower var init and call obj creation into script prologues: http://hg.mozilla.org/projects/jaegermonkey/rev/da23e3e77a69 I'm going to stop here for the time being. The path forward is some sort of MIC at callsites, but initializing 19 members in JSStackFrame, some of which need multiple loads and stores, in addition to updating |cx->display| (two loads, two stores), is nigh-intolerable. The real win will be getting the JIT to only update a few words to make a call, like v8/WebKit, and changing the runtime to not need all of the information currently in each frame. We'll get there, but there are bigger wins right now.
> We'll get there, but there are bigger wins right now. Is that still true? I found calls (ie. stubs::Call()) to be a huge part of several SS benchmarks that Sayre asked me to analyze: % of time (shark) % of instrs (cachegrind) - crypto-sha1 13.7% 21.7% - crypto-md5 12.2% 20.1% - math-spectral-norm 12.9% 19.7% and my understanding was that calls were already known to be a big part of several other tests.
No, it's not true. See ongoing JSStackFrame evisceration (bug 557378) and the start of call IC work (bug 578912 and stuff hanging off it).
Created attachment 467978 [details] Count of scripted calls in each SunSpider test Just a little data. We make 1.26M scripted calls total.