Closed
Bug 1003149
Opened 10 years ago
Closed 10 years ago
BaselineCompiler: Compile JSOP_SPREAD* ops
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: arai, Assigned: arai)
References
Details
Attachments
(2 files, 6 obsolete files)
12.61 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
24.55 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:29.0) Gecko/20100101 Firefox/29.0 (Beta/Release) Build ID: 20140421221237 Steps to reproduce: (separated from bug 889456) We need JSOP_SPREADCALL, JSOP_SPREADNEW and JSOP_SPREADEVAL.
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 1•10 years ago
|
||
Implemented JSOP_SPREAD* in the Baseline Compiler. Part1: fallback stub (ICCall_Fallback) Part2: optimized stubs (ICCall_Scripted and ICCall_Native) Also, enabled js/src/jit-test/tests/basic/spread-call-maxarg.js on ASAN and debug builds again, because bug 919948 is already landed. jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit. Performance comparison with --baseline-eager flag is following: 1. SPREADCALL/CALL SPREADCALL (scripted): function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) f(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: per-iter (total) original : 0.866686 (433342) : -- part1 : 0.590597 (295298) : 46.7% faster part1+2 : 0.439997 (219998) : 97.0% faster CALL (scripted): function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000000; i ++) f(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.000282 (141087) : -- part1 : 0.000282 (140887) : 0.1% faster part1+2 : 0.000282 (140970) : 0.1% faster SPREADCALL (native): let s = elapsed(); for (let i = 0; i < 500000; i ++) Math.min(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.831337 (415668) : -- part1 : 0.540356 (270178) : 53.8% faster part1+2 : 0.455972 (227986) : 82.3% faster CALL (native): let s = elapsed(); for (let i = 0; i < 500000000; i ++) Math.min(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.000283 (141476) : -- part1 : 0.000282 (140910) : 0.4% faster part1+2 : 0.000282 (140967) : 0.4% faster SPREAD only: let s = elapsed(); for (let i = 0; i < 500000; i ++) [1, 2, ...[3, 4, 5]]; print(elapsed() - s); result: original : 0.397787 (198893) : -- part1 : 0.385565 (192782) : 3.2% faster part1+2 : 0.391209 (195604) : 1.7% faster 2. SPREADNEW/NEW SPREADNEW (scripted): function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) new A(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: per-iter (total) original : 1.055098 (527549) : -- part1 : 0.805464 (402732) : 31.0% faster part1+2 : 0.563302 (281651) : 87.3% faster NEW (scripted): function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 50000000; i ++) new A(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.005144 (257184) : -- part1 : 0.005128 (256405) : 0.3% faster part1+2 : 0.005138 (256909) : 0.1% faster SPREADNEW (native): let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.899546 (449773) : -- part1 : 0.753050 (376524) : 19.5% faster part1+2 : 0.553810 (276904) : 62.4% faster NEW (native): let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.008495 (4247) : -- part1 : 0.008479 (4239) : 0.2% faster part1+2 : 0.008464 (4231) : 0.4% faster 3. SPREADEVAL/EVAL SPREADEVAL: let s = elapsed(); for (let i = 0; i < 100000; i ++) eval(...["1 + 2"]); print(elapsed() - s); result: per-iter (total) original : 8.786327 (878632) : -- part1 : 8.509529 (850952) : 3.3% faster part1+2 : 8.521812 (852181) : 3.1% faster EVAL: let s = elapsed(); for (let i = 0; i < 100000; i ++) eval("1 + 2"); print(elapsed() - s); result: original : 7.867457 (786745) : -- part1 : 7.867739 (786773) : 0.0% slower part1+2 : 7.879573 (787957) : 0.2% slower SPREAD only: let s = elapsed(); for (let i = 0; i < 100000; i ++) [...["1 + 2"]]; print(elapsed() - s); result: original : 0.224546 (22454) : -- part1 : 0.213659 (21365) : 5.1% faster part1+2 : 0.216147 (21614) : 3.9% faster
Attachment #8415208 -
Flags: review?(jdemooij)
Updated•10 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 3•10 years ago
|
||
Comment on attachment 8415208 [details] [diff] [review] Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. Review of attachment 8415208 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for working on this! ::: js/src/jit/BaselineCompiler.cpp @@ +2531,5 @@ > +{ > + JS_ASSERT(IsCallPC(pc)); > + > + frame.syncStack(0); > + masm.mov(ImmWord(1), R0.scratchReg()); Nit: masm.move32(Imm32(1), ..); While you're here, can you also change the code in emitCall to use move32(Imm32( ? ::: js/src/jit/BaselineIC.cpp @@ +8218,2 @@ > > + JS_ASSERT(isSpread || argc == GET_ARGC(pc)); Nit: MOZ_ASSERT_IF is a bit more readable: MOZ_ASSERT_IF(!isSpread, argc == GET_ARGC(pc)); @@ +8221,5 @@ > RootedValue callee(cx, vp[0]); > RootedValue thisv(cx, vp[1]); > > + Value *args; > + AutoValueVector argv(cx); 99.99% of calls are not spread calls. This patch adds quite a bit of code to this path. Could you add a separate DoSpreadCallFallback for spread calls? You may have to duplicate *some* lines of code but I think it will be worth it. We can still share the ICCall_Fallback etc code; we just need to override getKey on it's Compiler class like you did in the second patch in this bug. Let me know if you need help with this. @@ +8238,5 @@ > + argv.resize(argc); > + args = argv.begin(); > + > + if (!GetElements(cx, aobj, argc, args)) > + return false; Another advantage of separating the two is that you can maybe more easily share this with the interpreter code. Can we add a separate SpreadCallOperation or something?
Attachment #8415208 -
Flags: review?(jdemooij)
Comment 4•10 years ago
|
||
Comment on attachment 8415209 [details] [diff] [review] Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. Clearing review for now as this patch will also need changes after the first patch is updated. Sorry for the review delay btw.
Attachment #8415209 -
Flags: review?(jdemooij)
Assignee | ||
Comment 5•10 years ago
|
||
Thank you for reviewing! Added DoSpreadCallFallback and SpreadCallOperation. Also, added MOZ_UNLIKELY to |if (isSpread) ...| in shared generateStubCode's. Then, I removed JSOP_SPREADNEW optimization in types::UseNewType. I noticed that the target code (such like |Sub.prototype = new Super(...args);|) is not common case, so this optimization makes no sense for now. So, I removed types::UseNewType call in DoSpreadCallFallback while separating it from DoCallFallback. jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit. Performance comparison with --baseline-eager flag is following: 1. SPREADCALL/CALL SPREADCALL (scripted): function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) f(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.849620 (424810) : -- part1 : 0.584474 (292237) : 45.4% faster part1+2 : 0.438068 (219034) : 93.9% faster CALL (scripted): function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000000; i ++) f(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.000282 (140914) : -- part1 : 0.000282 (140851) : 0.0% faster part1+2 : 0.000282 (141014) : 0.1% slower SPREADCALL (native): let s = elapsed(); for (let i = 0; i < 500000; i ++) Math.min(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.830256 (415128) : -- part1 : 0.524412 (262206) : 58.3% faster part1+2 : 0.457374 (228687) : 81.5% faster CALL (native): let s = elapsed(); for (let i = 0; i < 500000000; i ++) Math.min(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.000282 (140997) : -- part1 : 0.000282 (141020) : 0.0% slower part1+2 : 0.000282 (141144) : 0.1% slower SPREAD only: let s = elapsed(); for (let i = 0; i < 500000; i ++) [1, 2, ...[3, 4, 5]]; print(elapsed() - s); result: original : 0.398640 (199320) : -- part1 : 0.387716 (193858) : 2.8% faster part1+2 : 0.387056 (193528) : 3.0% faster 2. SPREADNEW/NEW SPREADNEW (scripted): function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) new A(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 1.063494 (531747) : -- part1 : 0.807190 (403595) : 31.8% faster part1+2 : 0.557620 (278810) : 90.7% faster NEW (scripted): function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 50000000; i ++) new A(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.005102 (255099) : -- part1 : 0.005107 (255347) : 0.1% slower part1+2 : 0.005106 (255282) : 0.1% slower SPREADNEW (native): let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.883216 (441608) : -- part1 : 0.627534 (313767) : 40.7% faster part1+2 : 0.579846 (289923) : 52.3% faster NEW (native): let s = elapsed(); for (let i = 0; i < 50000000; i ++) new Array(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.006032 (301590) : -- part1 : 0.006041 (302067) : 0.2% slower part1+2 : 0.006024 (301199) : 0.1% faster 3. SPREADEVAL/EVAL SPREADEVAL: let s = elapsed(); for (let i = 0; i < 100000; i ++) eval(...["1 + 2"]); print(elapsed() - s); result: original : 8.449750 (844975) : -- part1 : 8.662900 (866290) : 2.5% slower part1+2 : 8.798130 (879813) : 4.0% slower EVAL: let s = elapsed(); for (let i = 0; i < 100000; i ++) eval("1 + 2"); print(elapsed() - s); result: original : 7.940280 (794028) : -- part1 : 7.815970 (781597) : 1.6% faster part1+2 : 7.966240 (796624) : 0.3% slower SPREAD only: let s = elapsed(); for (let i = 0; i < 1000000; i ++) [...["1 + 2"]]; print(elapsed() - s); result: original : 0.216936 (216936) : -- part1 : 0.207803 (207803) : 4.4% faster part1+2 : 0.210440 (210440) : 3.1% faster
Attachment #8415208 -
Attachment is obsolete: true
Attachment #8418279 -
Flags: review?(jdemooij)
Assignee | ||
Comment 6•10 years ago
|
||
and part2.
Attachment #8415209 -
Attachment is obsolete: true
Attachment #8418280 -
Flags: review?(jdemooij)
Comment 7•10 years ago
|
||
Comment on attachment 8418279 [details] [diff] [review] Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. Review of attachment 8418279 [details] [diff] [review]: ----------------------------------------------------------------- Great job, looks perfect! Please post an updated patch addressing the minor comments below and we can land this. ::: js/src/jit/BaselineIC.cpp @@ +8493,5 @@ > + masm.pushValue(Address(argPtr, 0)); // args[0] > + masm.addPtr(Imm32(sizeof(Value)), argPtr); > + masm.pushValue(Address(argPtr, 0)); // this > + masm.addPtr(Imm32(sizeof(Value)), argPtr); > + masm.pushValue(Address(argPtr, 0)); // callee Nice idea to unroll this instead of calling pushCallArguments. We can optimize this a little more to avoid the addPtrs though: masm.pushValue(Address(argPtr, 0 * sizeof(Value))); // array masm.pushValue(Address(argPtr, 1 * sizeof(Value))); // this masm.pushValue(Address(argPtr, 2 * sizeof(Value))); // callee ::: js/src/vm/Interpreter.cpp @@ +3940,5 @@ > + > +bool > +js::SpreadCallOperation(JSContext *cx, Handle<GlobalObject*> global, HandleScript script, > + jsbytecode *pc, JSOp op, HandleValue thisv, HandleValue callee, > + HandleValue arr, MutableHandleValue res) You can remove the op argument, and just do: JSOp op = JSOp(*pc); in this function. Also remove the "global" argument and use cx->global()->valueIsEval(..) below; it's guaranteed to be the same value. In general fewer arguments is better, because it makes the callsites simpler and makes it less likely a caller passes the wrong values :)
Attachment #8418279 -
Flags: review?(jdemooij) → review+
Comment 8•10 years ago
|
||
Comment on attachment 8418280 [details] [diff] [review] Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. Review of attachment 8418280 [details] [diff] [review]: ----------------------------------------------------------------- Nice work again :) Some comments below to make this even faster by doing more in JIT code, let me know if you need help with this. ::: js/src/jit/BaselineIC.cpp @@ +8112,1 @@ > if (!GetTemplateObjectForNative(cx, script, pc, fun->native(), args, &templateObject)) Change this to: RootedObject templateObject(cx); if (!isSpread) { CallArgs args = ..; ..GetTemplateObject.. } @@ +8243,5 @@ > + > + // Try attaching a call stub. > + if (op != JSOP_SPREADEVAL && > + !TryAttachCallStub(cx, stub, script, pc, op, 1, vp, constructing, true, false)) > + return false; Nit: condition does not fit on one line, in this case we add {} (with { on its own line) because it's more readable. @@ +8320,5 @@ > +ICCallStubCompiler::guardSpreadCall(MacroAssembler &masm, GeneralRegisterSet regs, > + Register argcReg, Register actualArgc,Label *failure) > +{ > + Register argPtr = regs.takeAny(); > + masm.mov(BaselineStackReg, argPtr); Instead of the VM call below, we can easily load the array length in JIT code: Register scratch = regs.takeAny(); masm.unboxObject(Address(BaselineStackReg, ICStackValueOFfset), scratch); masm.loadPtr(Address(scratch, JSObject::offsetOfElements()), scratch); masm.load32(Address(scratch, ObjectElements::offsetOfLength()), actualArgc); // Ensure actual argc <= ARGS_LENGTH_MAX ... You may have to add unboxObject(Address, Register) to the macro assemblers. For x86/ARM, it's exactly the same as unboxInt32 etc. On x64 it can just call unboxNonDouble(Operand(src), dest); @@ +8371,5 @@ > + > + // Allocate stack space to extract arguments (sizeof(Value) * argc). > + Register argsSizeReg = regs.takeAny(); > + masm.mov(actualArgc, argsSizeReg); > + JS_STATIC_ASSERT(sizeof(Value) == 8); Nit: we're using C++11 static_assert now instead of JS_STATIC_ASSERT, so: static_assert(sizeof(Value) == 8, "Value must be 8 bytes"); @@ +8372,5 @@ > + // Allocate stack space to extract arguments (sizeof(Value) * argc). > + Register argsSizeReg = regs.takeAny(); > + masm.mov(actualArgc, argsSizeReg); > + JS_STATIC_ASSERT(sizeof(Value) == 8); > + masm.lshiftPtr(Imm32(3), argsSizeReg); Nit: lshift32 @@ +8374,5 @@ > + masm.mov(actualArgc, argsSizeReg); > + JS_STATIC_ASSERT(sizeof(Value) == 8); > + masm.lshiftPtr(Imm32(3), argsSizeReg); > + masm.subPtr(argsSizeReg, BaselineStackReg); > + regs.add(argsSizeReg); At this point the object must be an array with dense elements and initializedLength == length (we should assert all this in SpreadCallOperation, let me know if you have questions about this). It'd be great if we could do the copy in JIT code, without the VM call. It's a bit complicated but fortunately we have code for this already, in pushArrayArguments (for fun.apply(x, array)). Can you call that function here? @@ +8723,5 @@ > > + Register actualArgc; > + if (MOZ_UNLIKELY(isSpread_)) { > + actualArgc = regs.takeAny(); > + if (!guardSpreadCall(masm, regs, argcReg, actualArgc, &failure)) I think we should just use argcReg instead of a separate actualArgc reg, or does that lead to problems elsewhere?
Attachment #8418280 -
Flags: review?(jdemooij)
Assignee | ||
Comment 9•10 years ago
|
||
Thank you again! Here is updated patch. (In reply to Jan de Mooij [:jandem] from comment #8) > It'd be great if we could do the copy in JIT code, without the VM call. It's > a bit complicated but fortunately we have code for this already, in > pushArrayArguments (for fun.apply(x, array)). Can you call that function > here? I copied pushArrayArguments to pushSpreadCallArguments partially instead of calling it, because array length is already known (argcReg) and it's better to use register instead of loading length again from memory. > I think we should just use argcReg instead of a separate actualArgc reg, or > does that lead to problems elsewhere? Yeah, you are correct. The reason I use actualArgc was that there exists some codes which suppose that argcReg remains as original value, such as |BaseIndex calleeSlot(...)|, between guardSpreadCall and pushSpreadCallArguments. But, I noticed that those codes can be optimized by omitting argcReg, because original argcReg is always 1 in spreadcall.
Attachment #8418279 -
Attachment is obsolete: true
Attachment #8420193 -
Flags: review?(jdemooij)
Assignee | ||
Comment 10•10 years ago
|
||
jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit. Performance comparison with --baseline-eager flag is following: 1. SPREADCALL/CALL SPREADCALL (scripted): function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) f(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.865526 (432763) : -- part1 : 0.575428 (287714) : 50.4% faster part1+2 : 0.408342 (204171) : 112.0% faster CALL (scripted): function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000000; i ++) f(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.000281 (140704) : -- part1 : 0.000282 (140821) : 0.1% slower part1+2 : 0.000282 (140780) : 0.1% slower SPREADCALL (native): let s = elapsed(); for (let i = 0; i < 500000; i ++) Math.min(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.856160 (428080) : -- part1 : 0.522518 (261259) : 63.9% faster part1+2 : 0.435112 (217556) : 96.8% faster CALL (native): let s = elapsed(); for (let i = 0; i < 500000000; i ++) Math.min(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.000282 (140832) : -- part1 : 0.000281 (140651) : 0.1% faster part1+2 : 0.000282 (140754) : 0.1% faster SPREAD only: let s = elapsed(); for (let i = 0; i < 500000; i ++) [1, 2, ...[3, 4, 5]]; print(elapsed() - s); result: original : 0.398302 (199151) : -- part1 : 0.392210 (196105) : 1.6% faster part1+2 : 0.387166 (193583) : 2.9% faster 2. SPREADNEW/NEW SPREADNEW (scripted): function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) new A(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 1.058806 (529403) : -- part1 : 0.766202 (383101) : 38.2% faster part1+2 : 0.533848 (266924) : 98.3% faster NEW (scripted): function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 50000000; i ++) new A(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.005091 (254561) : -- part1 : 0.005110 (255475) : 0.4% slower part1+2 : 0.005093 (254639) : 0.0% slower SPREADNEW (native): let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, ...[3, 4, 5]); print(elapsed() - s); result: original : 0.904744 (452372) : -- part1 : 0.629578 (314789) : 43.7% faster part1+2 : 0.570500 (285250) : 58.6% faster NEW (native): let s = elapsed(); for (let i = 0; i < 50000000; i ++) new Array(1, 2, 3, 4, 5); print(elapsed() - s); result: original : 0.006019 (300943) : -- part1 : 0.006033 (301638) : 0.2% slower part1+2 : 0.006025 (301254) : 0.1% slower 3. SPREADEVAL/EVAL SPREADEVAL: let s = elapsed(); for (let i = 0; i < 100000; i ++) eval(...["1 + 2"]); print(elapsed() - s); result: original : 8.842150 (884215) : -- part1 : 8.731040 (873104) : 1.3% faster part1+2 : 8.711200 (871120) : 1.5% faster EVAL: let s = elapsed(); for (let i = 0; i < 100000; i ++) eval("1 + 2"); print(elapsed() - s); result: original : 7.903420 (790342) : -- part1 : 8.008430 (800843) : 1.3% slower part1+2 : 8.002970 (800297) : 1.2% slower SPREAD only: let s = elapsed(); for (let i = 0; i < 1000000; i ++) [...["1 + 2"]]; print(elapsed() - s); result: original : 0.212673 (212673) : -- part1 : 0.209087 (209087) : 1.7% faster part1+2 : 0.207979 (207979) : 2.3% faster
Attachment #8418280 -
Attachment is obsolete: true
Attachment #8420195 -
Flags: review?(jdemooij)
Comment 11•10 years ago
|
||
Comment on attachment 8420193 [details] [diff] [review] Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. Review of attachment 8420193 [details] [diff] [review]: ----------------------------------------------------------------- Perfect, thanks!
Attachment #8420193 -
Flags: review?(jdemooij) → review+
Comment 12•10 years ago
|
||
Comment on attachment 8420195 [details] [diff] [review] Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. Review of attachment 8420195 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the delay. This looks great! I'll take another look after the comments below are addressed, but should be good to go then. ::: js/src/jit-test/tests/basic/spread-call-maxarg.js @@ -2,5 @@ > -var config = getBuildConfiguration(); > - > -// FIXME: ASAN and debug builds run this too slowly for now. Re-enable > -// after bug 919948 lands. > -if (!config.debug && !config.asan) { How fast is this test now in a debug build? ::: js/src/jit/BaselineIC.cpp @@ +8252,5 @@ > + bool constructing = (op == JSOP_SPREADNEW); > + > + // Try attaching a call stub. > + if (op != JSOP_SPREADEVAL && > + !TryAttachCallStub(cx, stub, script, pc, op, 1, vp, constructing, true, false)) { Nit: because the condition does not fit on one line, move the { to its own line so that it's clearer where the condition ends and the body starts. @@ +8321,5 @@ > + Register scratch = regs.takeAny(); > + masm.unboxObject(Address(BaselineStackReg, ICStackValueOffset), scratch); > + masm.loadPtr(Address(scratch, JSObject::offsetOfElements()), scratch); > + masm.load32(Address(scratch, ObjectElements::offsetOfLength()), argcReg); > + regs.add(scratch); You can use argcReg instead of scratch (and also remove the "regs" argument). @@ +8324,5 @@ > + masm.load32(Address(scratch, ObjectElements::offsetOfLength()), argcReg); > + regs.add(scratch); > + > + // Ensure actual argc <= ARGS_LENGTH_MAX > + masm.branchTest32(Assembler::Above, argcReg, Imm32(ARGS_LENGTH_MAX), failure); It's a comparison so this should be branch32. @@ +8325,5 @@ > + regs.add(scratch); > + > + // Ensure actual argc <= ARGS_LENGTH_MAX > + masm.branchTest32(Assembler::Above, argcReg, Imm32(ARGS_LENGTH_MAX), failure); > + return true; Nit: change the return type to |void| @@ +8358,5 @@ > + > + // Push the callee and |this|. > + masm.pushValue(Address(BaselineFrameReg, STUB_FRAME_SIZE + 1 * sizeof(Value))); > + masm.pushValue(Address(BaselineFrameReg, STUB_FRAME_SIZE + 2 * sizeof(Value))); > + return true; Same here. @@ +8665,4 @@ > regs.take(ArgumentsRectifierReg); > regs.takeUnchecked(BaselineTailCallReg); > > + if (MOZ_UNLIKELY(isSpread_)) { Nit: remove the MOZ_UNLIKELY here and below: we only use it in cases where it really helps performance. The IC generation code isn't very hot because IC code is shared/reused :) ::: js/src/vm/Interpreter.cpp @@ +3952,5 @@ > return false; > } > > +#ifdef DEBUG > + JS_ASSERT(aobj->getDenseInitializedLength() == length); Nit: add a comment above this line: // The object must be an array with dense elements and no holes. Baseline's // optimized spread call stubs rely on this. @@ +3953,5 @@ > } > > +#ifdef DEBUG > + JS_ASSERT(aobj->getDenseInitializedLength() == length); > + JS_ASSERT(!ObjectMayHaveExtraIndexedProperties(aobj)); This assert will fail when there are indexed properties somewhere on the prototype, like this: Array.prototype[0] = 1. Because the array has no holes, we don't consult the prototype chain so we don't have to worry about that. You could change it to: JS_ASSERT(!aobj->isIndexed()); isIndexed() returns true if the object itself has non-dense, indexed properties. @@ +3956,5 @@ > + JS_ASSERT(aobj->getDenseInitializedLength() == length); > + JS_ASSERT(!ObjectMayHaveExtraIndexedProperties(aobj)); > + for (uint32_t i = 0; i < length; i++) { > + JS_ASSERT(!aobj->getDenseElement(i).isMagic()); > + } Nit: no {}
Attachment #8420195 -
Flags: review?(jdemooij)
Assignee | ||
Comment 13•10 years ago
|
||
Thank you again! Here is updated patch. jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit. Performance is almost same as previous patch. (In reply to Jan de Mooij [:jandem] from comment #12) > How fast is this test now in a debug build? (as talked on IRC) It test takes 9 seconds for each run, and I added the check back. Also, removed comment about bug 919948. > Nit: because the condition does not fit on one line, move the { to its own > line so that it's clearer where the condition ends and the body starts. Oh sorry, I've misunderstood what you meant. I fixed it.
Attachment #8420195 -
Attachment is obsolete: true
Attachment #8423326 -
Flags: review?(jdemooij)
Assignee | ||
Comment 14•10 years ago
|
||
Comment on attachment 8423326 [details] [diff] [review] Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. Canceling r? to conform the patch to bug 1010775. I'll post updated patch soon.
Attachment #8423326 -
Flags: review?(jdemooij)
Assignee | ||
Comment 15•10 years ago
|
||
Updated the arguments in macro assembler. jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit.
Attachment #8423326 -
Attachment is obsolete: true
Attachment #8424001 -
Flags: review?(jdemooij)
Comment 16•10 years ago
|
||
Comment on attachment 8424001 [details] [diff] [review] Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. Review of attachment 8424001 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for updating the patch! I'll test your changes on x86/ARM now and push them to Try if that works :)
Attachment #8424001 -
Flags: review?(jdemooij) → review+
Comment 17•10 years ago
|
||
Builds on x86 and ARM too and passes spread jit-tests so I pushed this to Try: https://tbpl.mozilla.org/?tree=Try&rev=a8f3ee1b69ee
Flags: needinfo?(jdemooij)
Comment 18•10 years ago
|
||
Green on Try so I pushed it: https://hg.mozilla.org/integration/mozilla-inbound/rev/4106f916a9b5 https://hg.mozilla.org/integration/mozilla-inbound/rev/bcf7062a8d78 Thanks again!
Assignee: nobody → arai_a
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Comment 19•10 years ago
|
||
This seemed to have regressed some of the asmjs-ubenchs (with --no-asmjs): http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=307078994617&tochange=ec4b23a08c5e asmjs-ubench-skinning: 32% asmjs-ubench-fannkuch: 4.4%
Comment 20•10 years ago
|
||
http://arewefastyet.com/#machine=11&view=breakdown&suite=asmjs-ubench (for reference)
Comment 21•10 years ago
|
||
Weird. It's possible one of the patches subtly changes non-spread calls somehow. I'll take a look later today.
Flags: needinfo?(jdemooij)
Comment 22•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/4106f916a9b5 https://hg.mozilla.org/mozilla-central/rev/bcf7062a8d78
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Comment 23•10 years ago
|
||
I can't reproduce this regression with a 32-bit shell build. I updated my AWFY clone and ran ./harness.py $shell -- --no-asmjs in asmjs-ubench Hannes can you reproduce this? I wonder which patch would have regressed this; it shouldn't affect non-spread calls at all.
Flags: needinfo?(jdemooij) → needinfo?(hv1989)
Comment 24•10 years ago
|
||
I couldn't reproduce on my local machine, but I can on the mac machine. The first bad revision is: http://hg.mozilla.org/integration/mozilla-inbound/rev/bcf7062a8d78 on 32bit optimized build: CC="clang -m32" CXX="clang++ -m32" AR=ar CROSS_COMPILE=1 ../configure --enable-optimize --disable-debug --enable-threadsafe --disable-intl-api --without-intl-api --target=i686-apple-darwin10.0.0 --enable-nspr-build
Flags: needinfo?(hv1989)
Assignee | ||
Comment 25•10 years ago
|
||
Unfortunately, I still cannot reproduce the regression on my machine (iMac Late 2013 / Core i5 2.9GHz / 16GB RAM / OS X 10.9.3 / Xcode 5.1.1), even if I use the configure options in comment 24 and run the command in comment 23, there is no significant difference between the revision and previous one. It might be an environment dependent problem. looking for another solution...
Comment 26•10 years ago
|
||
Yes those regressions are really annoying. They happen on a single machine + platform and aren't reproducible anywhere else. There's probably some weird compiler or CPU cache thing going on. Hannes, maybe we could update the Clang version we use on AWFY and see if it helps? Binaries for OS X can be downloaded on llvm.org, though you may need 10.9 for the latest versions. I've 3.4.1 here; maybe that's why I can't repro it.
Comment 27•10 years ago
|
||
I've updated the clang version to 3.3 (which inbound uses to compile). Seems skinning is back to its original score, so this might have the clang compiler tripping and producing worse code.
You need to log in
before you can comment on or make changes to this bug.
Description
•