Closed Bug 1674143 Opened 4 years ago Closed 3 years ago

Math.max+spread seems to be very slow with firefox compared to chrome.

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
87 Branch
Tracking Status
firefox84 --- wontfix
firefox87 --- fixed

People

(Reporter: calixte, Assigned: iain)

Details

Attachments

(6 files)

I ran the following script:

var tests = 1000000;
var input = [0, 10, 20, 30, 8, 0, 0];
var sumOutput = function() {
  return Math.max(...input);
};
var maxOutput = function() {
  return input.reduce(function(a, b) {
    return Math.max(a, b);
  });
};
console.time('max');
var i = tests;
while (i--) {
  input.forEach(function() {
    return sumOutput();
  });
}
console.timeEnd('max');
console.log(' ======');
console.time('max-reduce');
var i = tests;
while (i--) {
  input.forEach(function() {
    return maxOutput();
  });
}
console.timeEnd('max-reduce');

on https://playcode.io and with nightly 84 I get:

max: 3775ms
 ======
max-reduce: 321ms

and when ran with chrome, I get:

max: 315ms
 ======
max-reduce: 71ms

or maybe there is an issue with forEach.

Looking at the test case, especially max-reduce, I would think that this is likely an inlining issue, and allocation of inner function which could have been removed by scalar replacement. I presume there is also an issue with the spread operator, which is used in max case.

Iain, can you check if this is an inlining issue?

Severity: -- → S4
Flags: needinfo?(iireland)
Priority: -- → P2

We don't currently inline spread calls. We also don't optimize spread calls to native builtins. I believe we also don't inline forEach, because it's above our bytecode size limit for self-hosted code. So there are a lot of known issues that could be contributing to this.

I suspect that the difference between max and max-reduce is because we don't optimize spread calls to native builtins, and the remaining delta between us and V8 may be forEach, but I'll take a look to see if I can confirm.

Flags: needinfo?(iireland)

Math.max(...input); is currently implemented as always executing var array = [...input]; Math.max.apply(null, array);. So we always create an array copy, which accounts to about 2/3 of the time needed when running this test case.

(In reply to Iain Ireland [:iain] from comment #2)

[…] the remaining delta between us and V8 may be forEach, but I'll take a look to see if I can confirm.

For what is worth, I introduced benchmarks a while ago which were showing that we were multiple time faster than chrome on forEach:
https://github.com/mozilla/arewefastyet/tree/master/benchmarks/misc/tests/assorted (misc-forEach-*.js)

Unfortunately this benchmark did not caught the regression, as it is no longer run as part of AWFY, which should probably be added back.

Do we have any data on how often single argument spread calls (foo(...input)) are used with non-array arguments? Right now we only generate OptimizeSpreadCall (to check whether the argument is already a packed array and skip making a copy) if the single argument is a rest parameter. This appears to be motivated by a micro-benchmark Arai wrote when we first added OptimizeSpreadCall, showing a slight regression for a spread call using a Map. At the time, OptimizeSpreadCall used a VM call. It's now been optimized to use an IC that will return false immediately if it has ever failed, so the overhead should be extremely low.

If I remove the restriction, the spread call case is more than twice as fast, and the slowdown on the Map case appears to be ~4%. My intuition is that spread calls with non-rest-parameter arrays are probably significantly more common than other iterables, so it seems like a worthwhile tradeoff.

Am I missing anything?

I also took a shot at optimizing the native spread call to Math.max in CacheIR. The results look pretty good. Running the shell locally, I get:

Before any changes: 
 max: 3690.59912109375
With OptimizeSpreadCall changes:
 max: 1610.087158203125
With OptimizeSpreadCall + CacheIR changes:
 max: 55.56103515625

Which should make us significantly faster than Chrome. I've only implemented it for int32 values so far, but it shouldn't be too hard to do the same thing for doubles.

I'll put the patches up tomorrow.

When OptimizeSpreadCall was introduced, we conservatively only emitted it when the spread argument was a rest parameter. At the time, OptimizeSpreadCall required a VM call. It has been rewritten to use an IC, making the failure case very efficient. If we do not emit OptimizeSpreadCall, then we must allocate and copy an array object, even if the argument is already a packed array. In a microbenchmark, emitting OptimizeSpreadCall for non-rest-parameter cases is more than a 2x speedup, and generating the OptimizeSpreadCall in a non-packed-array case is <5% slowdown. The optimized case is also likely to be much more common in real code.

This patch extends OptimizeSpreadCall to be emitted whenever the single spread argument is any Name node. In theory we might also want to allow other nodes, but currently the bytecode emitter will call emitTree on the first argument twice in this case, which is fine for a Name node but broke in testing for foo(...function() {}) (because we tried emitting a non-hoisted function twice).

Assignee: nobody → iireland
Status: NEW → ASSIGNED

For a spread call like foo(...args), if OptimizeSpreadCall succeeds, we don't hit a breakpoint associated with ...args, because we skip past that code. Before the parent patch, this wasn't a problem in this testcase, because we were only generating OptimizeSpreadCall for rest parameters. After that patch, we are generating OptimizeSpreadCall more frequently.

We don't actually care about hitting this breakpoint, so I'm just updating the test.

Depends on D101231

Depends on D101232

On a microbenchmark evaluating Math.max(...input), this is ~30x faster.

Depends on D101233

This is about a 12x speedup on a microbenchmark.

Depends on D101234

Depends on D101235

Attachment #9196177 - Attachment description: Bug 1674143: Transpile DoubleMinMaxArrayResult r=jandem → Bug 1674143: Transpile NumberMinMaxArrayResult r=jandem
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7c13fde9da0c
Emit OptimizeSpreadCall for non-rest arguments r=arai
https://hg.mozilla.org/integration/autoland/rev/52be5f823ff3
Update column offset testcase r=jorendorff
https://hg.mozilla.org/integration/autoland/rev/f83cf07fe74a
Add Int32MinMaxArrayResult r=jandem
https://hg.mozilla.org/integration/autoland/rev/01f4d696ae46
Transpile Int32MinMaxArrayResult r=jandem,anba
https://hg.mozilla.org/integration/autoland/rev/5a5fa7bca15d
Add NumberMinMaxArrayResult r=jandem
https://hg.mozilla.org/integration/autoland/rev/c6c0ef525f49
Transpile NumberMinMaxArrayResult r=jandem
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: