Consider replacing .forEach with for of loops where easily possible

NEW
Unassigned

Status

()

enhancement
P3
normal
2 years ago
8 months ago

People

(Reporter: florian, Unassigned)

Tracking

unspecified
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

1.17 KB, text/html
Details
(Reporter)

Description

2 years ago
forEach adds overhead (eg. see this profile: https://perfht.ml/2oyWRLz), in most cases it could be replaced with for..of
Flags: qe-verify?
Priority: -- → P2
Assignee: nobody → florian
Status: NEW → ASSIGNED
Iteration: --- → 55.4 - May 1
Flags: qe-verify? → qe-verify-
Priority: P2 → P1
(Reporter)

Comment 1

2 years ago
Posted file benchmark (obsolete) —
Before starting the work to replace all the forEach calls, I compared the performance of:

.forEach
for of
for (let i = 0; i < array.length; ++i)

Here are the results:
- .forEach is the slowest of the 3.
- for of performs 66% better.
- The old style for (let i = 0; i < array.length; ++i) is almost 10 times faster than forEach (and more than 5 times faster than for of).

Given these results, I doubt replacing everything with for of is currently a good idea.

I'm attaching the benchmark I used, and here is the profile of it: https://perfht.ml/2pFKSdl

Benjamin, is there room for optimization in the for of implementation? (feel free to redirect the needinfo)
Flags: needinfo?(bbouvier)
(Reporter)

Comment 2

2 years ago
I get different results when testing on different machines. On a Dell XPS13 on Windows, for of is slower than forEach. The results from comment 1 were on a latest generation macbook.
(In reply to Florian Quèze [:florian] [:flo] from comment #1)
> Benjamin, is there room for optimization in the for of implementation? (feel
> free to redirect the needinfo)

Yes: the for-of loop uses the Iterator protocol under the hood, which creates temporary objects that are invisible to the user. These object allocations need GC, thus the perf impact; they should be optimized out, most of the time, but we're not quite there yet, as far as I remember.

For what it's worth, it's not an apple to apple comparison because the for-of loop initializer doesn't have the `let` keyword, referring to a global variable instead.

cc'ing evilpie/nbp who know more about perf of for-of, for they worked on it recently.
Flags: needinfo?(bbouvier)
After bug 1353170 and bug 1354114 we should really not allocate anything during iteration, but there is still some things inside the loop that we would need to hoist. Getting to the normal for-loop performance is hard and probably won't happen anytime soon.
If the perf difference is not important enough, we could make this a good first bug and let new contributors fix it (maybe a directory at a time).
(Reporter)

Comment 6

2 years ago
(In reply to Marco Castelluccio [:marco] from comment #5)
> If the perf difference is not important enough, we could make this a good
> first bug and let new contributors fix it (maybe a directory at a time).

Doing it by hand would be terrible to review, it's something that can be scripted for the whole tree.
(Reporter)

Comment 7

2 years ago
Posted file benchmark
(In reply to Benjamin Bouvier [:bbouvier] from comment #3)

> For what it's worth, it's not an apple to apple comparison because the
> for-of loop initializer doesn't have the `let` keyword, referring to a
> global variable instead.

I added the missing 'let' keyword, but it doesn't change the results significantly.
Attachment #8859973 - Attachment is obsolete: true
Iteration: 55.4 - May 1 → 55.5 - May 15
(In reply to Florian Quèze [:florian] [:flo] from comment #1)
> Created attachment 8859973 [details]
> benchmark

Is 1000 a correct approximation of the number of executions?
All these loops are being optimized by the different tier of the compilers.

To simplify a bit:
 0-9 — runs under the interpreter.
 10-1000 — Compiles & runs under the code produced by the baseline compiler.
 1001-… — Compiles & runs under Ion with a limited set of optimizations.

So this benchmark is mostly testing Baseline performances, which does not have any specific optimizations for for-of nor forEach.  Under the interpreter and Baseline, forEach calls will probably create a new JSFunction instance for the lambda given as argument:

  function foo() {} // outside the function.
  function test() {
    arr.forEach(foo); // does not allocate
    arr.forEach(function (){}); // allocate
  }

On the other hand for-of will allocate the iterator when entering the loop and at each cycle of the loop for the result of the next() function call of the iterator object.

In both cases the allocation are short-live, but for-of would definitely add more pressure to the nursery collection of our GC.

If you are running these loops ~1000 times in practice, then these allocations might probably be neglected as well.

When running under IonMonkey:
 - .forEach: lambdas are no longer allocated if they are not escaped.
 - for-of: the next() function result is no longer allocated either.

.forEach has the same speed as a C-style for loop. [1][2]
for-of is 2.36x slower than a C-style for loop. [3][4]

[1] https://arewefastyet.com/#machine=29&view=single&suite=misc&subtest=forEach-baseline
[2] https://arewefastyet.com/#machine=29&view=single&suite=misc&subtest=forEach-library
[3] https://arewefastyet.com/#machine=29&view=single&suite=misc&subtest=basic-array
[4] https://arewefastyet.com/#machine=29&view=single&suite=misc&subtest=basic-array-forof

> Here are the results:
> - .forEach is the slowest of the 3.
> - for of performs 66% better.
> - The old style for (let i = 0; i < array.length; ++i) is almost 10 times
> faster than forEach (and more than 5 times faster than for of).
> 
> Given these results, I doubt replacing everything with for of is currently a
> good idea.

for-of is the nicest way to handle for loop in modern JS, and I will recommend it as a default style for its simplicity when the number of iterations is not a concern.  Hopefully, JS code coverage should give you an answer for that.

.forEach might be slow on small vector, but it has the same performances as a C-style for loop on large data sets (>= ~10000), and might be the preferred way until we fix the last for-of issues related to bounds check elimination of the iterator index.
Iteration: 55.5 - May 15 → 55.6 - May 29
(Reporter)

Comment 9

2 years ago
Per comment 8, it seems this is still a nice code cleanup to do, but there's no good reason to claim it's part of photon-performance.
Assignee: florian → nobody
Status: ASSIGNED → NEW
Iteration: 55.6 - May 29 → ---
Priority: P1 → --
Whiteboard: [photon-performance]
See Also: → 1383364
Whiteboard: [fxperf]

Comment 10

a year ago
Untracking this for performance given comment #9.
No longer blocks: photon-performance-triage
Priority: -- → P3
Whiteboard: [fxperf]
You need to log in before you can comment on or make changes to this bug.