1003149 - BaselineCompiler: Compile JSOP_SPREAD* ops

Assignee

Description

•

10 years ago

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:29.0) Gecko/20100101 Firefox/29.0 (Beta/Release)
Build ID: 20140421221237

Steps to reproduce:

(separated from bug 889456)

We need JSOP_SPREADCALL, JSOP_SPREADNEW and JSOP_SPREADEVAL.

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Blocks: 872184, 1000780

OS: Mac OS X → All

Hardware: x86 → All

Tooru Fujisawa [:arai]

Assignee

Comment 1

•

10 years ago

Attached patch Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. (obsolete) — Details — Splinter Review

Implemented JSOP_SPREAD* in the Baseline Compiler.
  Part1: fallback stub (ICCall_Fallback)
  Part2: optimized stubs (ICCall_Scripted and ICCall_Native)

Also, enabled js/src/jit-test/tests/basic/spread-call-maxarg.js
on ASAN and debug builds again, because bug 919948 is already landed.


jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit.
Performance comparison with --baseline-eager flag is following:

1. SPREADCALL/CALL

  SPREADCALL (scripted):
    function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) f(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
                per-iter (total)
    original :  0.866686 (433342) :     --
    part1    :  0.590597 (295298) :  46.7% faster
    part1+2  :  0.439997 (219998) :  97.0% faster

  CALL (scripted):
    function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000000; i ++) f(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.000282 (141087) :     --
    part1    :  0.000282 (140887) :   0.1% faster
    part1+2  :  0.000282 (140970) :   0.1% faster

  SPREADCALL (native):
    let s = elapsed(); for (let i = 0; i < 500000; i ++) Math.min(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.831337 (415668) :     --
    part1    :  0.540356 (270178) :  53.8% faster
    part1+2  :  0.455972 (227986) :  82.3% faster

  CALL (native):
    let s = elapsed(); for (let i = 0; i < 500000000; i ++) Math.min(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.000283 (141476) :     --
    part1    :  0.000282 (140910) :   0.4% faster
    part1+2  :  0.000282 (140967) :   0.4% faster

  SPREAD only:
    let s = elapsed(); for (let i = 0; i < 500000; i ++) [1, 2, ...[3, 4, 5]]; print(elapsed() - s);
  result:
    original :  0.397787 (198893) :     --
    part1    :  0.385565 (192782) :   3.2% faster
    part1+2  :  0.391209 (195604) :   1.7% faster


2. SPREADNEW/NEW

  SPREADNEW (scripted):
    function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) new A(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
                per-iter (total)
    original :  1.055098 (527549) :     --
    part1    :  0.805464 (402732) :  31.0% faster
    part1+2  :  0.563302 (281651) :  87.3% faster

  NEW (scripted):
    function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 50000000; i ++) new A(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.005144 (257184) :     --
    part1    :  0.005128 (256405) :   0.3% faster
    part1+2  :  0.005138 (256909) :   0.1% faster

  SPREADNEW (native):
    let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.899546 (449773) :     --
    part1    :  0.753050 (376524) :  19.5% faster
    part1+2  :  0.553810 (276904) :  62.4% faster

  NEW (native):
    let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.008495 (4247) :     --
    part1    :  0.008479 (4239) :   0.2% faster
    part1+2  :  0.008464 (4231) :   0.4% faster


3. SPREADEVAL/EVAL

  SPREADEVAL:
    let s = elapsed(); for (let i = 0; i < 100000; i ++) eval(...["1 + 2"]); print(elapsed() - s);
  result:
                per-iter (total)
    original :  8.786327 (878632) :     --
    part1    :  8.509529 (850952) :   3.3% faster
    part1+2  :  8.521812 (852181) :   3.1% faster

  EVAL:
    let s = elapsed(); for (let i = 0; i < 100000; i ++) eval("1 + 2"); print(elapsed() - s);
  result:
    original :  7.867457 (786745) :     --
    part1    :  7.867739 (786773) :   0.0% slower
    part1+2  :  7.879573 (787957) :   0.2% slower

  SPREAD only:
    let s = elapsed(); for (let i = 0; i < 100000; i ++) [...["1 + 2"]]; print(elapsed() - s);
  result:
    original :  0.224546 (22454) :     --
    part1    :  0.213659 (21365) :   5.1% faster
    part1+2  :  0.216147 (21614) :   3.9% faster

Attachment #8415208 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 2

•

10 years ago

Attached patch Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. (obsolete) — Details — Splinter Review

part2

Attachment #8415209 - Flags: review?(jdemooij)

Florian Bender

Updated

•

10 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Jan de Mooij [:jandem]

Comment 3

•

10 years ago

Comment on attachment 8415208 [details] [diff] [review]
Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler.

Review of attachment 8415208 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for working on this!

::: js/src/jit/BaselineCompiler.cpp
@@ +2531,5 @@
> +{
> +    JS_ASSERT(IsCallPC(pc));
> +
> +    frame.syncStack(0);
> +    masm.mov(ImmWord(1), R0.scratchReg());

Nit: masm.move32(Imm32(1), ..);

While you're here, can you also change the code in emitCall to use move32(Imm32( ?

::: js/src/jit/BaselineIC.cpp
@@ +8218,2 @@
>  
> +    JS_ASSERT(isSpread || argc == GET_ARGC(pc));

Nit: MOZ_ASSERT_IF is a bit more readable: MOZ_ASSERT_IF(!isSpread, argc == GET_ARGC(pc));

@@ +8221,5 @@
>      RootedValue callee(cx, vp[0]);
>      RootedValue thisv(cx, vp[1]);
>  
> +    Value *args;
> +    AutoValueVector argv(cx);

99.99% of calls are not spread calls. This patch adds quite a bit of code to this path. Could you add a separate DoSpreadCallFallback for spread calls? You may have to duplicate *some* lines of code but I think it will be worth it.

We can still share the ICCall_Fallback etc code; we just need to override getKey on it's Compiler class like you did in the second patch in this bug.

Let me know if you need help with this.

@@ +8238,5 @@
> +        argv.resize(argc);
> +        args = argv.begin();
> +
> +        if (!GetElements(cx, aobj, argc, args))
> +            return false;

Another advantage of separating the two is that you can maybe more easily share this with the interpreter code. Can we add a separate SpreadCallOperation or something?

Attachment #8415208 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 4

•

10 years ago

Comment on attachment 8415209 [details] [diff] [review]
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler.

Clearing review for now as this patch will also need changes after the first patch is updated. Sorry for the review delay btw.

Attachment #8415209 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 5

•

10 years ago

Attached patch Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. (obsolete) — Details — Splinter Review

Thank you for reviewing!

Added DoSpreadCallFallback and SpreadCallOperation.
Also, added MOZ_UNLIKELY to |if (isSpread) ...| in shared generateStubCode's.

Then, I removed JSOP_SPREADNEW optimization in types::UseNewType. I noticed that the target code (such like |Sub.prototype = new Super(...args);|) is not common case, so this optimization makes no sense for now.
So, I removed types::UseNewType call in DoSpreadCallFallback while separating it from DoCallFallback.


jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit.
Performance comparison with --baseline-eager flag is following:

1. SPREADCALL/CALL

  SPREADCALL (scripted):
    function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) f(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.849620 (424810) :     --
    part1    :  0.584474 (292237) :  45.4% faster
    part1+2  :  0.438068 (219034) :  93.9% faster

  CALL (scripted):
    function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000000; i ++) f(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.000282 (140914) :     --
    part1    :  0.000282 (140851) :   0.0% faster
    part1+2  :  0.000282 (141014) :   0.1% slower

  SPREADCALL (native):
    let s = elapsed(); for (let i = 0; i < 500000; i ++) Math.min(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.830256 (415128) :     --
    part1    :  0.524412 (262206) :  58.3% faster
    part1+2  :  0.457374 (228687) :  81.5% faster

  CALL (native):
    let s = elapsed(); for (let i = 0; i < 500000000; i ++) Math.min(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.000282 (140997) :     --
    part1    :  0.000282 (141020) :   0.0% slower
    part1+2  :  0.000282 (141144) :   0.1% slower

  SPREAD only:
    let s = elapsed(); for (let i = 0; i < 500000; i ++) [1, 2, ...[3, 4, 5]]; print(elapsed() - s);
  result:
    original :  0.398640 (199320) :     --
    part1    :  0.387716 (193858) :   2.8% faster
    part1+2  :  0.387056 (193528) :   3.0% faster


2. SPREADNEW/NEW

  SPREADNEW (scripted):
    function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) new A(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  1.063494 (531747) :     --
    part1    :  0.807190 (403595) :  31.8% faster
    part1+2  :  0.557620 (278810) :  90.7% faster

  NEW (scripted):
    function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 50000000; i ++) new A(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.005102 (255099) :     --
    part1    :  0.005107 (255347) :   0.1% slower
    part1+2  :  0.005106 (255282) :   0.1% slower

  SPREADNEW (native):
    let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.883216 (441608) :     --
    part1    :  0.627534 (313767) :  40.7% faster
    part1+2  :  0.579846 (289923) :  52.3% faster

  NEW (native):
    let s = elapsed(); for (let i = 0; i < 50000000; i ++) new Array(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.006032 (301590) :     --
    part1    :  0.006041 (302067) :   0.2% slower
    part1+2  :  0.006024 (301199) :   0.1% faster


3. SPREADEVAL/EVAL

  SPREADEVAL:
    let s = elapsed(); for (let i = 0; i < 100000; i ++) eval(...["1 + 2"]); print(elapsed() - s);
  result:
    original :  8.449750 (844975) :     --
    part1    :  8.662900 (866290) :   2.5% slower
    part1+2  :  8.798130 (879813) :   4.0% slower

  EVAL:
    let s = elapsed(); for (let i = 0; i < 100000; i ++) eval("1 + 2"); print(elapsed() - s);
  result:
    original :  7.940280 (794028) :     --
    part1    :  7.815970 (781597) :   1.6% faster
    part1+2  :  7.966240 (796624) :   0.3% slower

  SPREAD only:
    let s = elapsed(); for (let i = 0; i < 1000000; i ++) [...["1 + 2"]]; print(elapsed() - s);
  result:
    original :  0.216936 (216936) :     --
    part1    :  0.207803 (207803) :   4.4% faster
    part1+2  :  0.210440 (210440) :   3.1% faster

Attachment #8415208 - Attachment is obsolete: true

Attachment #8418279 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 6

•

10 years ago

Attached patch Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. (obsolete) — Details — Splinter Review

and part2.

Attachment #8415209 - Attachment is obsolete: true

Attachment #8418280 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 7

•

10 years ago

Comment on attachment 8418279 [details] [diff] [review]
Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler.

Review of attachment 8418279 [details] [diff] [review]:
-----------------------------------------------------------------

Great job, looks perfect! Please post an updated patch addressing the minor comments below and we can land this.

::: js/src/jit/BaselineIC.cpp
@@ +8493,5 @@
> +        masm.pushValue(Address(argPtr, 0)); // args[0]
> +        masm.addPtr(Imm32(sizeof(Value)), argPtr);
> +        masm.pushValue(Address(argPtr, 0)); // this
> +        masm.addPtr(Imm32(sizeof(Value)), argPtr);
> +        masm.pushValue(Address(argPtr, 0)); // callee

Nice idea to unroll this instead of calling pushCallArguments. We can optimize this a little more to avoid the addPtrs though:

masm.pushValue(Address(argPtr, 0 * sizeof(Value))); // array
masm.pushValue(Address(argPtr, 1 * sizeof(Value))); // this
masm.pushValue(Address(argPtr, 2 * sizeof(Value))); // callee

::: js/src/vm/Interpreter.cpp
@@ +3940,5 @@
> +
> +bool
> +js::SpreadCallOperation(JSContext *cx, Handle<GlobalObject*> global, HandleScript script,
> +                        jsbytecode *pc, JSOp op, HandleValue thisv, HandleValue callee,
> +                        HandleValue arr, MutableHandleValue res)

You can remove the op argument, and just do:

JSOp op = JSOp(*pc);

in this function.

Also remove the "global" argument and use cx->global()->valueIsEval(..) below; it's guaranteed to be the same value.

In general fewer arguments is better, because it makes the callsites simpler and makes it less likely a caller passes the wrong values :)

Attachment #8418279 - Flags: review?(jdemooij) → review+

Jan de Mooij [:jandem]

Comment 8

•

10 years ago

Comment on attachment 8418280 [details] [diff] [review]
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler.

Review of attachment 8418280 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work again :) Some comments below to make this even faster by doing more in JIT code, let me know if you need help with this.

::: js/src/jit/BaselineIC.cpp
@@ +8112,1 @@
>          if (!GetTemplateObjectForNative(cx, script, pc, fun->native(), args, &templateObject))

Change this to:

RootedObject templateObject(cx);
if (!isSpread) {
    CallArgs args = ..;
    ..GetTemplateObject..
}

@@ +8243,5 @@
> +
> +    // Try attaching a call stub.
> +    if (op != JSOP_SPREADEVAL &&
> +        !TryAttachCallStub(cx, stub, script, pc, op, 1, vp, constructing, true, false))
> +        return false;

Nit: condition does not fit on one line, in this case we add {} (with { on its own line) because it's more readable.

@@ +8320,5 @@
> +ICCallStubCompiler::guardSpreadCall(MacroAssembler &masm, GeneralRegisterSet regs,
> +                                    Register argcReg, Register actualArgc,Label *failure)
> +{
> +    Register argPtr = regs.takeAny();
> +    masm.mov(BaselineStackReg, argPtr);

Instead of the VM call below, we can easily load the array length in JIT code:

Register scratch = regs.takeAny();
masm.unboxObject(Address(BaselineStackReg, ICStackValueOFfset), scratch);
masm.loadPtr(Address(scratch, JSObject::offsetOfElements()), scratch);
masm.load32(Address(scratch, ObjectElements::offsetOfLength()), actualArgc);

// Ensure actual argc <= ARGS_LENGTH_MAX
...

You may have to add unboxObject(Address, Register) to the macro assemblers. For x86/ARM, it's exactly the same as unboxInt32 etc. On x64 it can just call unboxNonDouble(Operand(src), dest);

@@ +8371,5 @@
> +
> +    // Allocate stack space to extract arguments (sizeof(Value) * argc).
> +    Register argsSizeReg = regs.takeAny();
> +    masm.mov(actualArgc, argsSizeReg);
> +    JS_STATIC_ASSERT(sizeof(Value) == 8);

Nit: we're using C++11 static_assert now instead of JS_STATIC_ASSERT, so:

static_assert(sizeof(Value) == 8, "Value must be 8 bytes");

@@ +8372,5 @@
> +    // Allocate stack space to extract arguments (sizeof(Value) * argc).
> +    Register argsSizeReg = regs.takeAny();
> +    masm.mov(actualArgc, argsSizeReg);
> +    JS_STATIC_ASSERT(sizeof(Value) == 8);
> +    masm.lshiftPtr(Imm32(3), argsSizeReg);

Nit: lshift32

@@ +8374,5 @@
> +    masm.mov(actualArgc, argsSizeReg);
> +    JS_STATIC_ASSERT(sizeof(Value) == 8);
> +    masm.lshiftPtr(Imm32(3), argsSizeReg);
> +    masm.subPtr(argsSizeReg, BaselineStackReg);
> +    regs.add(argsSizeReg);

At this point the object must be an array with dense elements and initializedLength == length (we should assert all this in SpreadCallOperation, let me know if you have questions about this).

It'd be great if we could do the copy in JIT code, without the VM call. It's a bit complicated but fortunately we have code for this already, in pushArrayArguments (for fun.apply(x, array)). Can you call that function here?

@@ +8723,5 @@
>  
> +    Register actualArgc;
> +    if (MOZ_UNLIKELY(isSpread_)) {
> +        actualArgc = regs.takeAny();
> +        if (!guardSpreadCall(masm, regs, argcReg, actualArgc, &failure))

I think we should just use argcReg instead of a separate actualArgc reg, or does that lead to problems elsewhere?

Attachment #8418280 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 9

•

10 years ago

Attached patch Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. — Details — Splinter Review

Thank you again!

Here is updated patch.

(In reply to Jan de Mooij [:jandem] from comment #8)
> It'd be great if we could do the copy in JIT code, without the VM call. It's
> a bit complicated but fortunately we have code for this already, in
> pushArrayArguments (for fun.apply(x, array)). Can you call that function
> here?

I copied pushArrayArguments to pushSpreadCallArguments partially instead of calling it,
because array length is already known (argcReg) and it's better to use register
instead of loading length again from memory.

> I think we should just use argcReg instead of a separate actualArgc reg, or
> does that lead to problems elsewhere?

Yeah, you are correct.
The reason I use actualArgc was that there exists some codes which suppose that
argcReg remains as original value, such as |BaseIndex calleeSlot(...)|,
between guardSpreadCall and pushSpreadCallArguments.
But, I noticed that those codes can be optimized by omitting argcReg,
because original argcReg is always 1 in spreadcall.

Attachment #8418279 - Attachment is obsolete: true

Attachment #8420193 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 10

•

10 years ago

Attached patch Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. (obsolete) — Details — Splinter Review

jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit.
Performance comparison with --baseline-eager flag is following:

1. SPREADCALL/CALL

  SPREADCALL (scripted):
    function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) f(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.865526 (432763) :     --
    part1    :  0.575428 (287714) :  50.4% faster
    part1+2  :  0.408342 (204171) : 112.0% faster

  CALL (scripted):
    function f(a, b, c, d, e) { return a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000000; i ++) f(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.000281 (140704) :     --
    part1    :  0.000282 (140821) :   0.1% slower
    part1+2  :  0.000282 (140780) :   0.1% slower

  SPREADCALL (native):
    let s = elapsed(); for (let i = 0; i < 500000; i ++) Math.min(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.856160 (428080) :     --
    part1    :  0.522518 (261259) :  63.9% faster
    part1+2  :  0.435112 (217556) :  96.8% faster

  CALL (native):
    let s = elapsed(); for (let i = 0; i < 500000000; i ++) Math.min(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.000282 (140832) :     --
    part1    :  0.000281 (140651) :   0.1% faster
    part1+2  :  0.000282 (140754) :   0.1% faster

  SPREAD only:
    let s = elapsed(); for (let i = 0; i < 500000; i ++) [1, 2, ...[3, 4, 5]]; print(elapsed() - s);
  result:
    original :  0.398302 (199151) :     --
    part1    :  0.392210 (196105) :   1.6% faster
    part1+2  :  0.387166 (193583) :   2.9% faster


2. SPREADNEW/NEW

  SPREADNEW (scripted):
    function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 500000; i ++) new A(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  1.058806 (529403) :     --
    part1    :  0.766202 (383101) :  38.2% faster
    part1+2  :  0.533848 (266924) :  98.3% faster

  NEW (scripted):
    function A(a, b, c, d, e) { this.v = a + b + c + d + e; } let s = elapsed(); for (let i = 0; i < 50000000; i ++) new A(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.005091 (254561) :     --
    part1    :  0.005110 (255475) :   0.4% slower
    part1+2  :  0.005093 (254639) :   0.0% slower

  SPREADNEW (native):
    let s = elapsed(); for (let i = 0; i < 500000; i ++) new Array(1, 2, ...[3, 4, 5]); print(elapsed() - s);
  result:
    original :  0.904744 (452372) :     --
    part1    :  0.629578 (314789) :  43.7% faster
    part1+2  :  0.570500 (285250) :  58.6% faster

  NEW (native):
    let s = elapsed(); for (let i = 0; i < 50000000; i ++) new Array(1, 2, 3, 4, 5); print(elapsed() - s);
  result:
    original :  0.006019 (300943) :     --
    part1    :  0.006033 (301638) :   0.2% slower
    part1+2  :  0.006025 (301254) :   0.1% slower


3. SPREADEVAL/EVAL

  SPREADEVAL:
    let s = elapsed(); for (let i = 0; i < 100000; i ++) eval(...["1 + 2"]); print(elapsed() - s);
  result:
    original :  8.842150 (884215) :     --
    part1    :  8.731040 (873104) :   1.3% faster
    part1+2  :  8.711200 (871120) :   1.5% faster

  EVAL:
    let s = elapsed(); for (let i = 0; i < 100000; i ++) eval("1 + 2"); print(elapsed() - s);
  result:
    original :  7.903420 (790342) :     --
    part1    :  8.008430 (800843) :   1.3% slower
    part1+2  :  8.002970 (800297) :   1.2% slower

  SPREAD only:
    let s = elapsed(); for (let i = 0; i < 1000000; i ++) [...["1 + 2"]]; print(elapsed() - s);
  result:
    original :  0.212673 (212673) :     --
    part1    :  0.209087 (209087) :   1.7% faster
    part1+2  :  0.207979 (207979) :   2.3% faster

Attachment #8418280 - Attachment is obsolete: true

Attachment #8420195 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 11

•

10 years ago

Comment on attachment 8420193 [details] [diff] [review]
Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler.

Review of attachment 8420193 [details] [diff] [review]:
-----------------------------------------------------------------

Perfect, thanks!

Attachment #8420193 - Flags: review?(jdemooij) → review+

Jan de Mooij [:jandem]

Comment 12

•

10 years ago

Comment on attachment 8420195 [details] [diff] [review]
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler.

Review of attachment 8420195 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay. This looks great! I'll take another look after the comments below are addressed, but should be good to go then.

::: js/src/jit-test/tests/basic/spread-call-maxarg.js
@@ -2,5 @@
> -var config = getBuildConfiguration();
> -
> -// FIXME: ASAN and debug builds run this too slowly for now.  Re-enable
> -// after bug 919948 lands.
> -if (!config.debug && !config.asan) {

How fast is this test now in a debug build?

::: js/src/jit/BaselineIC.cpp
@@ +8252,5 @@
> +    bool constructing = (op == JSOP_SPREADNEW);
> +
> +    // Try attaching a call stub.
> +    if (op != JSOP_SPREADEVAL &&
> +        !TryAttachCallStub(cx, stub, script, pc, op, 1, vp, constructing, true, false)) {

Nit: because the condition does not fit on one line, move the { to its own line so that it's clearer where the condition ends and the body starts.

@@ +8321,5 @@
> +    Register scratch = regs.takeAny();
> +    masm.unboxObject(Address(BaselineStackReg, ICStackValueOffset), scratch);
> +    masm.loadPtr(Address(scratch, JSObject::offsetOfElements()), scratch);
> +    masm.load32(Address(scratch, ObjectElements::offsetOfLength()), argcReg);
> +    regs.add(scratch);

You can use argcReg instead of scratch (and also remove the "regs" argument).

@@ +8324,5 @@
> +    masm.load32(Address(scratch, ObjectElements::offsetOfLength()), argcReg);
> +    regs.add(scratch);
> +
> +    // Ensure actual argc <= ARGS_LENGTH_MAX
> +    masm.branchTest32(Assembler::Above, argcReg, Imm32(ARGS_LENGTH_MAX), failure);

It's a comparison so this should be branch32.

@@ +8325,5 @@
> +    regs.add(scratch);
> +
> +    // Ensure actual argc <= ARGS_LENGTH_MAX
> +    masm.branchTest32(Assembler::Above, argcReg, Imm32(ARGS_LENGTH_MAX), failure);
> +    return true;

Nit: change the return type to |void|

@@ +8358,5 @@
> +
> +    // Push the callee and |this|.
> +    masm.pushValue(Address(BaselineFrameReg, STUB_FRAME_SIZE + 1 * sizeof(Value)));
> +    masm.pushValue(Address(BaselineFrameReg, STUB_FRAME_SIZE + 2 * sizeof(Value)));
> +    return true;

Same here.

@@ +8665,4 @@
>      regs.take(ArgumentsRectifierReg);
>      regs.takeUnchecked(BaselineTailCallReg);
>  
> +    if (MOZ_UNLIKELY(isSpread_)) {

Nit: remove the MOZ_UNLIKELY here and below: we only use it in cases where it really helps performance. The IC generation code isn't very hot because IC code is shared/reused :)

::: js/src/vm/Interpreter.cpp
@@ +3952,5 @@
>          return false;
>      }
>  
> +#ifdef DEBUG
> +    JS_ASSERT(aobj->getDenseInitializedLength() == length);

Nit: add a comment above this line:

    // The object must be an array with dense elements and no holes. Baseline's
    // optimized spread call stubs rely on this.

@@ +3953,5 @@
>      }
>  
> +#ifdef DEBUG
> +    JS_ASSERT(aobj->getDenseInitializedLength() == length);
> +    JS_ASSERT(!ObjectMayHaveExtraIndexedProperties(aobj));

This assert will fail when there are indexed properties somewhere on the prototype, like this: Array.prototype[0] = 1. Because the array has no holes, we don't consult the prototype chain so we don't have to worry about that.

You could change it to: JS_ASSERT(!aobj->isIndexed());

isIndexed() returns true if the object itself has non-dense, indexed properties.

@@ +3956,5 @@
> +    JS_ASSERT(aobj->getDenseInitializedLength() == length);
> +    JS_ASSERT(!ObjectMayHaveExtraIndexedProperties(aobj));
> +    for (uint32_t i = 0; i < length; i++) {
> +        JS_ASSERT(!aobj->getDenseElement(i).isMagic());
> +    }

Nit: no {}

Attachment #8420195 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 13

•

10 years ago

Attached patch Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. (obsolete) — Details — Splinter Review

Thank you again!
Here is updated patch.

jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit.
Performance is almost same as previous patch.

(In reply to Jan de Mooij [:jandem] from comment #12)
> How fast is this test now in a debug build?

(as talked on IRC)
It test takes 9 seconds for each run, and I added the check back.
Also, removed comment about bug 919948.

> Nit: because the condition does not fit on one line, move the { to its own
> line so that it's clearer where the condition ends and the body starts.

Oh sorry, I've misunderstood what you meant.
I fixed it.

Attachment #8420195 - Attachment is obsolete: true

Attachment #8423326 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 14

•

10 years ago

Comment on attachment 8423326 [details] [diff] [review]
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler.

Canceling r? to conform the patch to bug 1010775.
I'll post updated patch soon.

Attachment #8423326 - Flags: review?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 15

•

10 years ago

Attached patch Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. — Details — Splinter Review

Updated the arguments in macro assembler.

jstests and jit-test(--tbpl) are passed on Mac OS X 10.9.2 64bit.

Attachment #8423326 - Attachment is obsolete: true

Attachment #8424001 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 16

•

10 years ago

Comment on attachment 8424001 [details] [diff] [review]
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler.

Review of attachment 8424001 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for updating the patch! I'll test your changes on x86/ARM now and push them to Try if that works :)

Attachment #8424001 - Flags: review?(jdemooij) → review+

Jan de Mooij [:jandem]

Comment 17

•

10 years ago

Builds on x86 and ARM too and passes spread jit-tests so I pushed this to Try:

https://tbpl.mozilla.org/?tree=Try&rev=a8f3ee1b69ee

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Comment 18

•

10 years ago

Green on Try so I pushed it:

https://hg.mozilla.org/integration/mozilla-inbound/rev/4106f916a9b5
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcf7062a8d78

Thanks again!

Assignee: nobody → arai_a

Status: NEW → ASSIGNED

Flags: needinfo?(jdemooij)

Hannes Verschore [:h4writer]

Comment 19

•

10 years ago

This seemed to have regressed some of the asmjs-ubenchs (with --no-asmjs):
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=307078994617&tochange=ec4b23a08c5e

asmjs-ubench-skinning: 32%
asmjs-ubench-fannkuch: 4.4%

Hannes Verschore [:h4writer]

Comment 20

•

10 years ago

http://arewefastyet.com/#machine=11&view=breakdown&suite=asmjs-ubench (for reference)

Jan de Mooij [:jandem]

Comment 21

•

10 years ago

Weird. It's possible one of the patches subtly changes non-spread calls somehow. I'll take a look later today.

Flags: needinfo?(jdemooij)

Ryan VanderMeulen [:RyanVM]

Comment 22

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/4106f916a9b5
https://hg.mozilla.org/mozilla-central/rev/bcf7062a8d78

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Jan de Mooij [:jandem]

Comment 23

•

10 years ago

I can't reproduce this regression with a 32-bit shell build. I updated my AWFY clone and ran

./harness.py $shell -- --no-asmjs

in asmjs-ubench

Hannes can you reproduce this? I wonder which patch would have regressed this; it shouldn't affect non-spread calls at all.

Flags: needinfo?(jdemooij) → needinfo?(hv1989)

Hannes Verschore [:h4writer]

Comment 24

•

10 years ago

I couldn't reproduce on my local machine, but I can on the mac machine.

The first bad revision is:
http://hg.mozilla.org/integration/mozilla-inbound/rev/bcf7062a8d78

on 32bit optimized build:
CC="clang -m32" CXX="clang++ -m32" AR=ar CROSS_COMPILE=1 ../configure --enable-optimize --disable-debug --enable-threadsafe --disable-intl-api --without-intl-api --target=i686-apple-darwin10.0.0 --enable-nspr-build

Flags: needinfo?(hv1989)

Tooru Fujisawa [:arai]

Assignee

Comment 25

•

10 years ago

Unfortunately, I still cannot reproduce the regression on my machine
(iMac Late 2013 / Core i5 2.9GHz / 16GB RAM / OS X 10.9.3 / Xcode 5.1.1),
even if I use the configure options in comment 24 and run the command in comment 23,
there is no significant difference between the revision and previous one.
It might be an environment dependent problem.

looking for another solution...

Jan de Mooij [:jandem]

Comment 26

•

10 years ago

Yes those regressions are really annoying. They happen on a single machine + platform and aren't reproducible anywhere else. There's probably some weird compiler or CPU cache thing going on.

Hannes, maybe we could update the Clang version we use on AWFY and see if it helps? Binaries for OS X can be downloaded on llvm.org, though you may need 10.9 for the latest versions. I've 3.4.1 here; maybe that's why I can't repro it.

Hannes Verschore [:h4writer]

Comment 27

•

10 years ago

I've updated the clang version to 3.3 (which inbound uses to compile).
Seems skinning is back to its original score, so this might have the clang compiler tripping and producing worse code.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

10 years ago

Depends on: 1053692

Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 6.46 KB, patch		Details \| Diff \| Splinter Review
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 19.70 KB, patch		Details \| Diff \| Splinter Review
Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 12.83 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 20.94 KB, patch		Details \| Diff \| Splinter Review
Part1: Implement JSOP_SPREAD* fallback stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 12.61 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 24.64 KB, patch		Details \| Diff \| Splinter Review
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 24.62 KB, patch		Details \| Diff \| Splinter Review
Part2: Implement JSOP_SPREAD* optimized stubs in the baseline compiler. 10 years ago Tooru Fujisawa [:arai] 24.55 KB, patch	jandem : review+	Details \| Diff \| Splinter Review