Closed Bug 572275 (JaegerCalls) Opened 12 years ago Closed 11 years ago

JM: Make calls fast


(Core :: JavaScript Engine, defect)

Not set





(Reporter: dvander, Assigned: dvander)




(1 file)

Everywhere we're dominated by calls, we're pretty slow. We can do better.

SunSpider calls are 43% fastnatives, and 56% scripted.
v8 calls are 4% fastnatives, and 96% scripted.

I'm separating this bug into three patches:
 1) Inline the return path as much as possible - only stub call if we need to put call or args objs.
 2) Inline the call path for fast natives.
 3) Inline the call path for scripted natives.

The last part will be hardest. I would like to move frame construction - as much as possible - into the prologue of methods. This way the actual call is very cheap, and entering the method sets up the frame.
Assignee: general → dvander
Part 1:

This completely inlines scripted returns. It also eliminates the expensive frame-syncing, which is now done out-of-line and only if a call object is needed. It also avoids |sp[-1] = oldfp->rval|, instead loading the rval into a jsval register pair.

Future work on Part 1 remains after the rest is done: we can avoid storing into fp->rval most of the time. We can also avoid the primitive-this-rval test on every return, instead placing it into JSOP_NEW.

Part 1 is 2% (21ms) SS win, 8% (1288ms v8 win. Part 2 should come tomorrow, part 3 by the end of the week.
Part 2:

This was a slight loss on my the graphs, but a slight win locally. Most of it is preparatory work for part 3 so this path will become faster.
Precursor work to part 3:

Special stub for already-compiled calls:

Lower var init and call obj creation into script prologues:

I'm going to stop here for the time being. The path forward is some sort of MIC at callsites, but initializing 19 members in JSStackFrame, some of which need multiple loads and stores, in addition to updating |cx->display| (two loads, two stores), is nigh-intolerable.

The real win will be getting the JIT to only update a few words to make a call, like v8/WebKit, and changing the runtime to not need all of the information currently in each frame. We'll get there, but there are bigger wins right now.
Blocks: 577036
Blocks: 576688
Alias: JaegerCalls
> We'll get there, but there are bigger wins right now.

Is that still true?  I found calls (ie. stubs::Call()) to be a huge part of several SS benchmarks that Sayre asked me to analyze:

                        % of time (shark)      % of instrs (cachegrind)
- crypto-sha1           13.7%                  21.7% 
- crypto-md5            12.2%                  20.1%
- math-spectral-norm    12.9%                  19.7%

and my understanding was that calls were already known to be a big part of several other tests.
No, it's not true. See ongoing JSStackFrame evisceration (bug 557378) and the start of call IC work (bug 578912 and stuff hanging off it).
Just a little data. We make 1.26M scripted calls total.
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.