Math.max+spread seems to be very slow with firefox compared to chrome.
Categories
(Core :: JavaScript Engine: JIT, defect, P2)
Tracking
()
People
(Reporter: calixte, Assigned: iain)
Details
Attachments
(6 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
I ran the following script:
var tests = 1000000;
var input = [0, 10, 20, 30, 8, 0, 0];
var sumOutput = function() {
return Math.max(...input);
};
var maxOutput = function() {
return input.reduce(function(a, b) {
return Math.max(a, b);
});
};
console.time('max');
var i = tests;
while (i--) {
input.forEach(function() {
return sumOutput();
});
}
console.timeEnd('max');
console.log(' ======');
console.time('max-reduce');
var i = tests;
while (i--) {
input.forEach(function() {
return maxOutput();
});
}
console.timeEnd('max-reduce');
on https://playcode.io and with nightly 84 I get:
max: 3775ms
======
max-reduce: 321ms
and when ran with chrome, I get:
max: 315ms
======
max-reduce: 71ms
or maybe there is an issue with forEach
.
Comment 1•4 years ago
|
||
Looking at the test case, especially max-reduce
, I would think that this is likely an inlining issue, and allocation of inner function which could have been removed by scalar replacement. I presume there is also an issue with the spread operator, which is used in max
case.
Iain, can you check if this is an inlining issue?
Assignee | ||
Comment 2•4 years ago
|
||
We don't currently inline spread calls. We also don't optimize spread calls to native builtins. I believe we also don't inline forEach
, because it's above our bytecode size limit for self-hosted code. So there are a lot of known issues that could be contributing to this.
I suspect that the difference between max and max-reduce is because we don't optimize spread calls to native builtins, and the remaining delta between us and V8 may be forEach
, but I'll take a look to see if I can confirm.
Comment 3•4 years ago
|
||
Math.max(...input);
is currently implemented as always executing var array = [...input]; Math.max.apply(null, array);
. So we always create an array copy, which accounts to about 2/3 of the time needed when running this test case.
Comment 4•4 years ago
|
||
(In reply to Iain Ireland [:iain] from comment #2)
[…] the remaining delta between us and V8 may be
forEach
, but I'll take a look to see if I can confirm.
For what is worth, I introduced benchmarks a while ago which were showing that we were multiple time faster than chrome on forEach:
https://github.com/mozilla/arewefastyet/tree/master/benchmarks/misc/tests/assorted (misc-forEach-*.js)
Unfortunately this benchmark did not caught the regression, as it is no longer run as part of AWFY, which should probably be added back.
Assignee | ||
Comment 5•4 years ago
|
||
Do we have any data on how often single argument spread calls (foo(...input)
) are used with non-array arguments? Right now we only generate OptimizeSpreadCall (to check whether the argument is already a packed array and skip making a copy) if the single argument is a rest parameter. This appears to be motivated by a micro-benchmark Arai wrote when we first added OptimizeSpreadCall, showing a slight regression for a spread call using a Map. At the time, OptimizeSpreadCall used a VM call. It's now been optimized to use an IC that will return false immediately if it has ever failed, so the overhead should be extremely low.
If I remove the restriction, the spread call case is more than twice as fast, and the slowdown on the Map case appears to be ~4%. My intuition is that spread calls with non-rest-parameter arrays are probably significantly more common than other iterables, so it seems like a worthwhile tradeoff.
Am I missing anything?
Assignee | ||
Comment 6•4 years ago
|
||
I also took a shot at optimizing the native spread call to Math.max in CacheIR. The results look pretty good. Running the shell locally, I get:
Before any changes:
max: 3690.59912109375
With OptimizeSpreadCall changes:
max: 1610.087158203125
With OptimizeSpreadCall + CacheIR changes:
max: 55.56103515625
Which should make us significantly faster than Chrome. I've only implemented it for int32 values so far, but it shouldn't be too hard to do the same thing for doubles.
I'll put the patches up tomorrow.
Assignee | ||
Comment 7•4 years ago
|
||
When OptimizeSpreadCall was introduced, we conservatively only emitted it when the spread argument was a rest parameter. At the time, OptimizeSpreadCall required a VM call. It has been rewritten to use an IC, making the failure case very efficient. If we do not emit OptimizeSpreadCall, then we must allocate and copy an array object, even if the argument is already a packed array. In a microbenchmark, emitting OptimizeSpreadCall for non-rest-parameter cases is more than a 2x speedup, and generating the OptimizeSpreadCall in a non-packed-array case is <5% slowdown. The optimized case is also likely to be much more common in real code.
This patch extends OptimizeSpreadCall to be emitted whenever the single spread argument is any Name node. In theory we might also want to allow other nodes, but currently the bytecode emitter will call emitTree on the first argument twice in this case, which is fine for a Name node but broke in testing for foo(...function() {})
(because we tried emitting a non-hoisted function twice).
Updated•4 years ago
|
Assignee | ||
Comment 8•4 years ago
|
||
For a spread call like foo(...args)
, if OptimizeSpreadCall succeeds, we don't hit a breakpoint associated with ...args
, because we skip past that code. Before the parent patch, this wasn't a problem in this testcase, because we were only generating OptimizeSpreadCall for rest parameters. After that patch, we are generating OptimizeSpreadCall more frequently.
We don't actually care about hitting this breakpoint, so I'm just updating the test.
Depends on D101231
Assignee | ||
Comment 9•4 years ago
|
||
Depends on D101232
Assignee | ||
Comment 10•4 years ago
|
||
On a microbenchmark evaluating Math.max(...input)
, this is ~30x faster.
Depends on D101233
Assignee | ||
Comment 11•4 years ago
|
||
This is about a 12x speedup on a microbenchmark.
Depends on D101234
Assignee | ||
Comment 12•4 years ago
|
||
Depends on D101235
Updated•4 years ago
|
Comment 13•4 years ago
|
||
Comment 14•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7c13fde9da0c
https://hg.mozilla.org/mozilla-central/rev/52be5f823ff3
https://hg.mozilla.org/mozilla-central/rev/f83cf07fe74a
https://hg.mozilla.org/mozilla-central/rev/01f4d696ae46
https://hg.mozilla.org/mozilla-central/rev/5a5fa7bca15d
https://hg.mozilla.org/mozilla-central/rev/c6c0ef525f49
Updated•4 years ago
|
Description
•