Closed
Bug 1108290
Opened 10 years ago
Closed 9 years ago
Optimize f.apply(x, array) in Ion
Categories
(Core :: JavaScript Engine: JIT, defect)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla45
Tracking | Status | |
---|---|---|
firefox45 | --- | fixed |
People
(Reporter: jandem, Assigned: lth)
References
Details
Attachments
(5 files, 3 obsolete files)
Two issues I noticed while profiling crypto.swf (bug 1103441):
(1) Shumway uses .asApply(). I think most of our |apply| optimizations are only used for JSOP_FUNAPPLY (.apply), so they won't work. This is pretty silly and no longer necessary I think, it'd be nice if we could remove JSOP_FUNCALL/JSOP_FUNAPPLY completely.
(2) Ion has optimizations for f.apply(x, arguments) but we should also inline f.apply(x, array).
On crypto.swf this matters for the method.asApply(..) call in asCallProperty, it's executed > 10 million times so it's pretty hot.
Reporter | ||
Comment 1•10 years ago
|
||
Also, in this Shumway function, |method| is sometimes String.prototype.charCodeAt and |receiver| is a String object. Mix of scripted vs native callees will make it a bit harder to optimize the asApply, and passing a String object to charCodeAt is also not ideal (few % under ToStringSlow)...
Comment 2•10 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #0)
> (1) Shumway uses .asApply(). I think most of our |apply| optimizations are
> only used for JSOP_FUNAPPLY (.apply), so they won't work. This is pretty
> silly and no longer necessary I think, it'd be nice if we could remove
> JSOP_FUNCALL/JSOP_FUNAPPLY completely.
By removing them, you mean having this logic in jsop_call, if the native corresponds to the apply function?
> (2) Ion has optimizations for f.apply(x, arguments) but we should also
> inline f.apply(x, array).
For calling, this should be easy to do (except on x86), as we duplicate the vector of arguments before making a call.
For inlining it, this would be harder as we would have to modify the way we inline functions and do bailouts.
This optimizes calling fun.apply(self, array). There is still some stuff that I need to figure out. We probably actually need an overflow check, if the array is too large. Do I need to add a type barrier for loading the values? It seems to me that the callee would usually check the typeset of its arguments. What do you think of the approach with a template?
Comment 4•10 years ago
|
||
Comment on attachment 8542626 [details] [diff] [review]
WIP - optimize apply with arrays
Review of attachment 8542626 [details] [diff] [review]:
-----------------------------------------------------------------
(In reply to Tom Schuster [:evilpie] from comment #3)
> This optimizes calling fun.apply(self, array). There is still some stuff
> that I need to figure out. We probably actually need an overflow check, if
> the array is too large.
For the array version yes. For the arguments version, this should already be limited by the trampoline. I suggest you reuse the same amount, such that there no additional check needed before choosing between the Baseline/Ion or the Invoke VM function.
> Do I need to add a type barrier for loading the
> values?
I don't think so, as there is no location associated to these load in the bytecode of the caller.
> It seems to me that the callee would usually check the typeset of
> its arguments. What do you think of the approach with a template?
It is good but it would be nicer if the parametrized functions were smaller.
::: js/src/jit/CodeGenerator.cpp
@@ +3197,5 @@
> ImmPtr ptr = ImmPtr(&JSFunction::class_);
> bailoutCmpPtr(Assembler::NotEqual, objreg, ptr, apply->snapshot());
> }
>
> + // XXX Arguments overflow check
This is not needed for LApplyArgs, but it would be in LApplyArray, as the size the of the list of arguments is already limited by the caller.
@@ +3342,5 @@
> + Label end;
> + // Initialize the loop counter AND Compute the stack usage (if == 0)
> + // Load the array length for the counter.
> + Address length(elementsreg, ObjectElements::offsetOfLength());
> + masm.load32(length, extraStackSpace);
This looks like you might want to use emitArgumentsCount with a different set of registers.
::: js/src/jit/TypePolicy.cpp
@@ +990,5 @@
> _(Mix3Policy<ObjectPolicy<0>, BoxPolicy<1>, ObjectPolicy<2> >) \
> _(Mix3Policy<ObjectPolicy<0>, IntPolicy<1>, BoxPolicy<2> >) \
> _(Mix3Policy<ObjectPolicy<0>, IntPolicy<1>, IntPolicy<2> >) \
> _(Mix3Policy<ObjectPolicy<0>, ObjectPolicy<1>, IntPolicy<2> >) \
> + _(Mix3Policy<ObjectPolicy<0>, ObjectPolicy<1>, BoxPolicy<2> >) \
nit: Move this one line above to keep it sorted.
Assignee | ||
Comment 5•10 years ago
|
||
Assignee | ||
Comment 6•10 years ago
|
||
AMD FX4100 @ 3.6GHz, x64.
applyArguments: 91 ms
applyArray w/o patch: 1640 ms
applyArray w/ patch: 94 ms
Assignee | ||
Comment 7•10 years ago
|
||
This is Tom's patch rebased and cleaned up a little (in line with nbp's suggestions) and with code to support 32-bit.
Nbp, is anything missing here? I still need to write (quite a lot of) test cases but feedback on whether something is obviously left out would be good.
(The templated function probably still pays for itself, even if there are a lot of specializations around it.)
Attachment #8542626 -
Attachment is obsolete: true
Attachment #8556569 -
Flags: feedback?(nicolas.b.pierron)
Comment 8•10 years ago
|
||
Just as a side-note, I made some changes myself to fun.apply in Bug 1112161. So there is likely a merge conflict coming soonish. :/
Comment 9•10 years ago
|
||
Comment on attachment 8556569 [details] [diff] [review]
WIP v2
Review of attachment 8556569 [details] [diff] [review]:
-----------------------------------------------------------------
I think this should be all the location which have to be patched, as the marking of the stack should be handled the same way as any other fun.apply(…, arguments) calls.
::: js/src/jit/CodeGenerator.cpp
@@ +3221,5 @@
> +// Do not bailout after the execution of this function since the stack no longer
> +// corresponds to what is expected by the snapshots.
> +void
> +CodeGenerator::emitPushArguments(LApplyArrayGeneric *apply, Register extraStackSpace,
> + Register copyreg)
Seems that this signature is duplicated?
@@ +3248,5 @@
> + masm.bind(&loop);
> +
> + Register elementsreg = ToRegister(apply->getElements());
> +#if defined(JS_CODEGEN_X64)
> + masm.loadPtr(BaseObjectElementIndex(elementsreg, count, -8), copyreg);
nit: I prefer sizeof over constants.
Attachment #8556569 -
Flags: feedback?(nicolas.b.pierron) → feedback+
Assignee | ||
Comment 10•10 years ago
|
||
Performance situation unchanged, looks like this is worth cleaning up and landing:
applyArguments.js
Time: 88; result: 30000000
Time: 87; result: 30000000
applyArray.js
Time: 1255; result: 30000000
Time: 1259; result: 30000000
Assignee | ||
Comment 11•9 years ago
|
||
(The rewrite of apply earlier this year made this code simpler.)
This is basically right, I think, though I still have to run the full test suite and test compile on all platforms.
There's one TODO item here as a comment in the code: there's a guard on the code that optimizes apply(...,arguments) that I don't quite understand, and the question is whether a similar guard would be appropriate and/or necessary for the code that optimizes apply(..., array).
Another question is whether the limit on the array length that I've chosen (corresponds to what baseline and the intrinsic are using) is reasonable for jitted code, or if a shorter limit would be appropriate.
Also, I have some test cases coming.
(In summary, I expect one more round of reviews.)
Attachment #8556569 -
Attachment is obsolete: true
Attachment #8689463 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 12•9 years ago
|
||
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8689463 [details] [diff] [review]
bug1108290-apply-array.patch
Pulling review for now for these additional issues:
- the use of Operand() in a guard is not portable, it fails to compile on
ARM64 and asserts on ARM
- the original patch on this bug uses js_JitOptions.maxStackArgs as a limit
on the array length, this seems more reasonable
Also need to make sure we don't regress (badly) the case where length exceeds initializedLength.
Attachment #8689463 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 14•9 years ago
|
||
Assignee | ||
Comment 15•9 years ago
|
||
This tests fun.apply() with an array where length > initializedLength. This is the worst (realistic) case for the bailout: there is just one value in the array and the function just negates its one argument.
On current m-i, this takes 35ms on my system. With the patch in place it takes 73ms, so this particular case regresses significantly - the running time doubles. If we think this case is important (I do not) then more work is needed to mitigate this.
(With this test case and length == initializedLength, m-i uses 22ms without the patch and 2ms with the patch. So that's nice.)
Assignee | ||
Comment 16•9 years ago
|
||
A little bit on churn on the MIPS side, so hoping :hev can take a look at that as I'm not set up to test compile locally.
Attachment #8690088 -
Flags: review?(r)
Attachment #8690088 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 17•9 years ago
|
||
Attachment #8689463 -
Attachment is obsolete: true
Attachment #8690089 -
Flags: review?(nicolas.b.pierron)
Comment 18•9 years ago
|
||
Comment on attachment 8690088 [details] [diff] [review]
Add sub32 to the common MacroAssembler
Review of attachment 8690088 [details] [diff] [review]:
-----------------------------------------------------------------
I just looked mips code.
::: js/src/jit/mips64/MacroAssembler-mips64.cpp
@@ +926,3 @@
> {
> ma_subu(dest, dest, imm);
> }
Can we remove MacroAssemblerMIPSCompat::ma_sub and MacroAssemblerMIPS64Compat::ma_sub32, use MacroAssemblerMIPSShared::ma_subu?
Attachment #8690088 -
Flags: review?(r) → review+
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Heiher [:hev] from comment #18)
> Comment on attachment 8690088 [details] [diff] [review]
> Add sub32 to the common MacroAssembler
>
> Review of attachment 8690088 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Can we remove MacroAssemblerMIPSCompat::ma_sub and
> MacroAssemblerMIPS64Compat::ma_sub32, use MacroAssemblerMIPSShared::ma_subu?
Sure, I'll make the change before I land.
Comment 20•9 years ago
|
||
Comment on attachment 8690088 [details] [diff] [review]
Add sub32 to the common MacroAssembler
Review of attachment 8690088 [details] [diff] [review]:
-----------------------------------------------------------------
Side note: This patch factor sub32 to the generic MacroAssembler, and it adds "MacroAssembler::sub(const Address&, Register)" for the next patch.
::: js/src/jit/MacroAssembler.h
@@ +719,5 @@
> + // Arithmetic functions
> +
> + inline void sub32(const Address& src, Register dest) PER_SHARED_ARCH;
> + inline void sub32(Register src, Register dest) PER_SHARED_ARCH;
> + inline void sub32(Imm32 imm, Register dest) PER_SHARED_ARCH;
nit: Follow the same order as in the -inl.h files.
Attachment #8690088 -
Flags: review?(nicolas.b.pierron) → review+
Comment 21•9 years ago
|
||
Comment on attachment 8690089 [details] [diff] [review]
Inline fun.apply(..., array) in Ion
Review of attachment 8690089 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit-test/tests/basic/testApplyArrayInline.js
@@ +5,5 @@
> +}
> +
> +function f(xs) {
> + var sum = 0;
> + for ( var i=0 ; i < 10000 ; i++ )
nit: use setJitCompilerOption or inIon to avoid backing in high limits.
@@ +49,5 @@
> + print("No bailout: " + (B - A));
> + print("Bailout: " + (C - B));
> + print("Short: " + (D - C));
> + assertEq((C - B) > (B - A), true);
> + assertEq((C - B) > (D - C), true);
This is time-sensitive, use inIon() to check if we remained in Ion generated code after the loop.
function f(xs) {
var sum = 0;
for ( var i=0 ; i < 100 ; i++ ) {
inIonResult.between |= inIon()
sum += g.apply(null, xs);
}
inIonResult.return = inIon()
return sum;
}
assertEq(f([1,2,3]), 7*100);
assertEq(inIonResult.between && !inIonResult.return, inIonResult.between);
which also works, even if Ion is disabled.
::: js/src/jit/CodeGenerator.cpp
@@ +3618,5 @@
> + Register tmp = ToRegister(apply->getTempObject());
> +
> + Address length(ToRegister(apply->getElements()), ObjectElements::offsetOfLength());
> + masm.load32(length, tmp);
> + bailoutCmp32(Assembler::Above, tmp, Imm32(js_JitOptions.maxStackArgs), snapshot);
Would it make sense to have an OOL path for that?
Attachment #8690089 -
Flags: review?(nicolas.b.pierron) → review+
Assignee | ||
Comment 22•9 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #20)
> Comment on attachment 8690088 [details] [diff] [review]
> Add sub32 to the common MacroAssembler
>
> Review of attachment 8690088 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Side note: This patch factor sub32 to the generic MacroAssembler, and it
> adds "MacroAssembler::sub(const Address&, Register)" for the next patch.
nbp: I don't think I understand what you mean by this. Did you mean to reference another bug with this comment?
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 23•9 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #21)
> Comment on attachment 8690089 [details] [diff] [review]
> Inline fun.apply(..., array) in Ion
>
> Review of attachment 8690089 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: js/src/jit/CodeGenerator.cpp
> @@ +3618,5 @@
> > + Register tmp = ToRegister(apply->getTempObject());
> > +
> > + Address length(ToRegister(apply->getElements()), ObjectElements::offsetOfLength());
> > + masm.load32(length, tmp);
> > + bailoutCmp32(Assembler::Above, tmp, Imm32(js_JitOptions.maxStackArgs), snapshot);
>
> Would it make sense to have an OOL path for that?
I'm guessing probably not, since maxStackArgs is 4096 by default and at that point the overhead of the bailout will be tiny compared to the cost of the operation itself.
Comment 24•9 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #22)
> (In reply to Nicolas B. Pierron [:nbp] from comment #20)
> > Review of attachment 8690088 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > Side note: This patch factor sub32 to the generic MacroAssembler, and it
> > adds "MacroAssembler::sub(const Address&, Register)" for the next patch.
>
> nbp: I don't think I understand what you mean by this. Did you mean to
> reference another bug with this comment?
Sorry, I suggest you split the patch in 2 parts if possible, such that one deals with the sub32 macro-assembler technical debt, and the second one adds the extra function.
(In reply to Lars T Hansen [:lth] from comment #23)
> (In reply to Nicolas B. Pierron [:nbp] from comment #21)
> > ::: js/src/jit/CodeGenerator.cpp
> > > + bailoutCmp32(Assembler::Above, tmp, Imm32(js_JitOptions.maxStackArgs), snapshot);
> >
> > Would it make sense to have an OOL path for that?
>
> I'm guessing probably not, since maxStackArgs is 4096 by default and at that
> point the overhead of the bailout will be tiny compared to the cost of the
> operation itself.
Ok, let's go with that, but be aware that the cost of a bailout is the time spent out of IonMonkey which includes the time spent in Baseline (which can be a lot if we are banned from Ion).
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 25•9 years ago
|
||
Assignee | ||
Comment 26•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f61789e75c6e90253c797036a8c32c95902e8f00
Bug 1108290 - optimize apply with Array. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/e64789160d1bc857b978563912c90b1ea6f3df5f
Bug 1108290 - remove machine-specific definitions of sub32. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd35e12d8f91f410f71b6f5d3023a40e9071e24b
Bug 1108290 - introduce machine-independent definitions of sub32. r=nbp
Comment 27•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f61789e75c6e
https://hg.mozilla.org/mozilla-central/rev/e64789160d1b
https://hg.mozilla.org/mozilla-central/rev/cd35e12d8f91
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox45:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
![]() |
||
Comment 28•9 years ago
|
||
Apparently this made dromaeo-jslib-traverse-prototype on Win8 on AWFY 45% faster!
Assignee | ||
Comment 29•9 years ago
|
||
(In reply to Guilherme Lima from comment #28)
> Apparently this made dromaeo-jslib-traverse-prototype on Win8 on AWFY 45%
> faster!
Well alright then :)
Assignee | ||
Comment 30•9 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #29)
> (In reply to Guilherme Lima from comment #28)
> > Apparently this made dromaeo-jslib-traverse-prototype on Win8 on AWFY 45%
> > faster!
>
> Well alright then :)
prototype.js uses fn.apply(..., array) here and there - for its substitution of "bind", and for its Responder framework, and elsewhere. In most cases those arrays are either literal arrays (and Function.prototype.call might have been the better API to use, though it might be slow too) or the result of munging the arguments object; in either case the arrays will tend to be short. Inlining apply is unsurprisingly a big win in those cases.
Once the spread operator becomes more widely available apply will be used less, but it will still be used when a this object needs to be specified as part of the call, as in many of the cases in prototype.js.
Assignee | ||
Comment 31•9 years ago
|
||
In light of the discussion on bug 1235092 comment 7 I'm now not completely sure that this optimization is correct as it stands (apart from the register allocation issue in bug 1231024): I have done nothing explicit to guard against changes to the array iterator protocol, cf arai's last four bullets:
* array[@@iterator] is not modified
* the array's prototype is Array.prototype
* Array.prototype[@@iterator] is not modified
* %ArrayIteratorPrototype%.next is not modified
and I'm not yet sure that TI guarantees this in my case.
Assignee | ||
Comment 32•9 years ago
|
||
arai corrects me: the issue here is that spread is explicitly defined in terms of iteration, not indexed access. but apply is backward-compatible and uses indexed access only. so we should be ok.
You need to log in
before you can comment on or make changes to this bug.
Description
•