remove all JSStackFrame stores except fp->ncode_ for JM inline fast path




JavaScript Engine
7 years ago
7 years ago


(Reporter: luke, Assigned: luke)


Firefox Tracking Flags

(Not tracked)



(1 attachment)



7 years ago
Currently, on call paths, we set fp->flags_, fp->prev_, fp->, and fp->ncode_.  We should be able to get this down to only fp->ncode_:

For the inline call path, generate:

| fp->ncode_ = &fast | fp += X | call | fun | slow | fast: fp -= X |

(fp->ncode_ is the jit-code return address.)  Because we know the first op of 'slow' and 'fast', we should be able to make a simple predicate fp->returnAddressIsFastPath() using *fp->ncode_.  When this predicate is true, it implies the default value of flags_ (JSFRAME_FUNCTION), fp->prev_ (using fp->ncode_ to find the X increment), and fp->fun_ (at a constant offset from fp->ncode_).  If we need to mutate fp->flag_, fp->ncode_ is set to &slow and the other fields are initialized.  "slow" would check hasCallObj/hasArgsObj (also taking these off the hot path).

This is all based on the assumption that it is relatively rare to touch JSStackFrame from the VM which should be measured first but so far has paid off.

Also, this may not be important with JM+TI+inlining, but this is also landable a lot earlier, so it might be worth the effort.

Comment 1

7 years ago
Regarding the question of how often this optimization would pay off, I hooked emitReturn to call a stub that measured what flags were set on the outgoing frame.  In theory, if the fp->flags_ == JSFRAME_FUNCTION, then this optimization pays off.

On V8, there are 6.6M calls, 67% have flags_ == JSFRAME_FUNCTION.  Another 17% have flags_ == JSFRAME_FUNCTION | JSFRAME_CONSTRUCTING, so that seems to suggest further hackage to encode JSFRAME_CONSTRUCTING without using flags_.  A final 9% have JSFRAME_HAS_SCOPECHAIN with fp->scopeChain() == fp->callee().getParent().  So, all in all that's 93%.

For SS, there are 1M calls, 84% have flags_ == JSFRAME_FUNCTION, 4% have flags_ == JSFRAME_FUNCTION | JSFRAME_CONSTRUCTING, and 7% JSFRAME_HAS_SCOPECHAIN as above.  So that's also 93%.

Comment 2

7 years ago
Created attachment 526207 [details] [diff] [review]

I'm still finishing work on a different bug, but I realized I can get a good measure of speedup with a 5 minute patch (attached).  Its completely bogus, but works just enough to test what should be the exact asm generated for the fastest inline call path.

For this call micro-bench:

  function f(x) { return x+1 }
  for (var i = 0; i < 10000000; ++i) {
    f(i); f(i); f(i); f(i); f(i); f(i); f(i);

and using 'time' to measure, I see a 30% speedup (from .53s to .41s).  Curiously, the speedup is 20% for calling the empty function.  (For reference, my months-old d8 is about the speed of trunk and jsc is slower.)

Comment 3

7 years ago
(Going to postpone any plans for this until after type inference lands)

Comment 4

7 years ago
IonMonkey should get this down to its logical minimum.  Until then, TI inlining is doing a nice job.
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.