Closed
Bug 602129
Opened 14 years ago
Closed 14 years ago
JM: make f.call(...) fast
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: luke)
References
Details
Attachments
(3 files, 3 obsolete files)
391 bytes,
text/plain
|
Details | |
22.74 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
30.79 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
When running the attached testcase I get these numbers: bin% ./js ~/bar.js 801: control (no call) 1771: call([object global]) 1632: call(null) 1633: call(undefined) bin% ./js -m ~/bar.js 130: control (no call) 904: call([object global]) 934: call(null) 946: call(undefined) bin% ./js -j ~/bar.js 89: control (no call) 41: call([object global]) 41: call(null) 1619: call(undefined) bin% ./js -m -j ~/bar.js 97: control (no call) 50: call([object global]) 47: call(null) 921: call(undefined) So it looks like call() in JM is about 2x faster than in interp, but still a good bit slower than an actual function call (slower than interp, comparatively), and a lot slower than call when it gets traced (my local tree is slightly hacked so null thisp will trace for call). For comparison, here is v8 opt shell: v8-read-only% ./shell ~/bar.js 85: control (no call) 151: call([object global]) 147: call(null) 160: call(undefined)so we're pretty close to them in JM for the straight function call, but a lot slower for call(
Comment 1•14 years ago
|
||
We can special-case this in the native call MIC, afaik. Might also help v8-deltablue a bit, bhackett found out that it calls Function.prototype.call 125k times (bug 595884 comment 13)
Assignee | ||
Comment 2•14 years ago
|
||
(In reply to comment #1) Shark shows that that (with shell -m) we spend 10.9ms in js::Invoke 10.4ms in js_fun_call, out of 695ms in v8-deltablue. Incidentally, David and I were talking about this yesterday, and the neat thing about expressions like foo.call(this, arg1, arg2) is that, at the call site of 'call', the stack is: [Function.prototype.call, foo, this, arg1, arg2] and the desired call stack to call 'foo' is: [foo, this, arg1, arg2] So we should be able to get a f.call() to cost only a tiny bit more than f().
Reporter | ||
Comment 3•14 years ago
|
||
Hmm. Can just somehow fast-path the case when f.call happens to be a function backed by js_fun_call or whatnot? So guard on that and take the fast-path if so, else fall back to whatever we're doing now?
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #3) That's what we were thinking. It would basically be a PIC + Call IC mash-up.
Updated•14 years ago
|
Blocks: JaegerSpeed
Assignee | ||
Comment 5•14 years ago
|
||
The first step is to distinguish calls to 'apply' from calls to 'call'. The easiest way to do this is to just split JSOP_APPLY into two opcodes.
Assignee: general → lw
Status: NEW → ASSIGNED
Assignee | ||
Comment 6•14 years ago
|
||
This patch is a first stab at what was discussed above and hardly tested. It basically just adds a guard that f.call is actually js_fun_call and, if this passes, proceeds to do a normal call ic to call the 'this' parameter of js_fun_call. Preliminary results show a 10x (-m) speedup of function f(x) { return x+1 } f.call(null, 3) which puts us on par with jsc. This patch could even go a bit faster by comparing function-object equality instead of native equality.
Assignee | ||
Updated•14 years ago
|
Summary: JM: f.call seems to be slow compared to TM and V8 → JM: make f.call(...) fast
Assignee | ||
Comment 8•14 years ago
|
||
Green on try. SS unaffected. deltablue: 1.058x as fast 436.3ms +/- 0.3% 412.4ms +/- 1.4%
Attachment #483714 -
Attachment is obsolete: true
Attachment #484024 -
Attachment is obsolete: true
Attachment #484349 -
Flags: review?(dvander)
Assignee | ||
Updated•14 years ago
|
Attachment #483713 -
Flags: review?(dvander)
Assignee | ||
Comment 9•14 years ago
|
||
Refactored to fit in with the rest of the patch queue. Still green on try.
Attachment #484349 -
Attachment is obsolete: true
Attachment #486276 -
Flags: review?(dvander)
Attachment #484349 -
Flags: review?(dvander)
Assignee | ||
Comment 10•14 years ago
|
||
I implemented the TODO: // TODO: for compileAndGo scripts, we can just guard on callee JSObject* I was unable to write a micro-benchmark that showed a speedup. This makes sense since the two extra loads are not lengthening the critical path for an OoO cpu.
Updated•14 years ago
|
Attachment #483713 -
Flags: review?(dvander) → review+
Comment on attachment 486276 [details] [diff] [review] patch Nice patch, only have tiny nits. >+ explicit Address() {} >+ THANK YOU. >+void >+mjit::Compiler::checkCallSpeculation(uint32 argc, >+ FrameEntry *origCallee, FrameEntry *origThis, >+ MaybeRegisterID calleeType, RegisterID calleeData, Although it's longer, parameters 3,4 make more sense prefixed by "orig". >+ Jump isNative = masm.branchPtr(Assembler::NotEqual, >+ Address(calleeData, JSFunction::offsetOfNativeOrScript()), >+ ImmPtr(JS_FUNC_TO_DATA_PTR(void *, js_fun_call))); I would static assert that the layouts are compatible, here. >+ uncachedCallPatch->ool = true; >+ uncachedCallPatch->hasSlowNcode = false; It's clearer to make ool mean "does not have fastNcodePatch", and then ... >+ stubcc.masm.loadPtr(FrameAddress(offsetof(VMFrame, regs.fp)), JSFrameReg); >+ Address ncodeAddr(JSFrameReg, JSStackFrame::offsetOfncode()); >+ uncachedCallPatch->fastNcodePatch = stubcc.masm.storePtrWithPatch(ImmPtr(NULL), ncodeAddr); ... set slowNcodePatch instead. >+ /* Cannot rejoin: there is more fast path to generate in inlineCallHelper. */ This is a little cryptic. Maybe "Load the return address from vp[0], then jump. inlineCallHelper() will manually link this jump to rejoin into the fast path, since IC slow path expects the return value to be in vp[1]." >+bool >+mjit::Compiler::lowerableFunCall(jsbytecode *pc) Unless this gets enhanced in the rest of the patch queue, might be better as: static inline bool IsLowerableFunCall? > emitUncachedCall(argc, callingNew); > return; >+#ifdef JS_MONOIC > } > >-#ifdef JS_MONOIC Rebase misfire? Good idea to build with --disable-monoic and run jitflags=m (it can catch errors after changing ICs, and also catch ARM regressions). >- Registers tempRegs; >+ CallGenInfo callIC(PC); >+ CallPatchInfo callPatch; >+ MaybeRegisterID icCalleeType; >+ RegisterID icCalleeData; >+ Address icRvalAddr; >+ >+ /* These locals are only initialized if lowerFunCall. */ >+ Jump uncachedCallSlowRejoin; >+ CallPatchInfo uncachedCallPatch; A very brief comment to the right of each of these would be more readable than the block comments above. >+ if (lowerFunCall) >+ stubcc.crossJump(uncachedCallSlowRejoin, masm.label()); "The mispredicted .call slow path loaded rval already. Join it back to the fast-path now."
Attachment #486276 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #11) Good comments, thanks > I would static assert that the layouts are compatible, here. Done in offsetOfNativeOrScript(). > > emitUncachedCall(argc, callingNew); > > return; > >+#ifdef JS_MONOIC > > } > > > >-#ifdef JS_MONOIC > > Rebase misfire? Good idea to build with --disable-monoic and run jitflags=m (it > can catch errors after changing ICs, and also catch ARM regressions). Thanks, will do. This, though, is not a rebase error. If you look further down, this cuts out the #else case in #if JS_MONOIC. That is, it unifies the code run when !defined(JS_MONOIC) with the code when defined(JS_MONOIC) && debugMode.
Assignee | ||
Comment 13•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/2f3a0ac5e251 http://hg.mozilla.org/tracemonkey/rev/d1bf74046ba7
Comment 14•14 years ago
|
||
Landed last month: http://hg.mozilla.org/mozilla-central/rev/2f3a0ac5e251 http://hg.mozilla.org/mozilla-central/rev/d1bf74046ba7
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•