1045391 - Array.forEach is slow on very sparse arrays

Reporter

Description

•

11 years ago

Consider this code: > let x = [1,2,3] > x[3628523034] = 7825 > for (i in x) { print(i); } // runs quickly > x.forEach(function(v, i) { print(i) }); // runs *slowly* The implementation of Array.forEach iterates over element index between 0 and |length|, calling the callback if it is present. This naive algorithm matches the one in the spec, but AFAICT there is scope for doing something smarter when many elements are missing. However, implementing something smarter may be difficult because Array.forEach is self-hosted. BTW, this is not just a theoretical example. In pdf.js there are a couple of hot for..in loops over arrays, which cause their indices to be stringified uselessly. I converted them to use forEach() instead, which worked great in most cases, but I hit one example document that had a very sparse array (possibly because it's buggy, I'm not sure), and so that's blocking this improvement from happening :(

Jeff Walden [:Waldo]

Comment 1

•

11 years ago

for-in on arrays is an anti-pattern. It enumerates *all* enumerable properties, not just numeric ones. Why is pdf.js using for-in on an array? (And, maybe, why is it doing this on arrays that are missing elements? That's not the right way to use arrays, either.)

Nicholas Nethercote [inactive]

Reporter

Comment 2

•

11 years ago

> for-in on arrays is an anti-pattern. I know! I'm trying to get rid of that code, but this behaviour is preventing me from doing so. > (And, maybe, why is it doing this on arrays that are missing elements? That's not the right way to use arrays, either.) These arrays are dense in most cases, but for this one particular PDF I have these outrageously sparse arrays. But then, sparse arrays are not unreasonable, and expecting forEach() to work efficiently on them is not unreasonable.

Jan de Mooij [:jandem]

Comment 3

•

11 years ago

Unsurprisingly, we spend > 90% of the time under jit::OperatorIn, which does the slow lookupGeneric -> Shape::search thing. Ion could be smarter by noticing (at compile-time) there are no indexed properties on Array.prototype and only doing an "own" lookup in this case. That should at least avoid searching Array.prototype and Object.prototype... There's also a few % under ValueToId etc, that's unnecessary if the index is known to be int32. So we can make the current algorithm a bit faster but, as you suggested, rewriting it will probably help more...

Boris Zbarsky [:bzbarsky]

Comment 4

•

11 years ago

So what the self-hosted code would need is a fastish way to check whether the array has a proto with indexed props. Fastish, because you'd have to check it after every property get from the array.

Jan de Mooij [:jandem]

Comment 5

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #4) > So what the self-hosted code would need is a fastish way to check whether > the array has a proto with indexed props. Fastish, because you'd have to > check it after every property get from the array. Well Ion should just do this automatically for "int32 in (non-dense) object". It has all the infrastructure to statically (at compile-time) check for indexed props on the prototype and invalidate scripts when this changes; we also use this for normal GETELEM/SETELEM.

Boris Zbarsky [:bzbarsky]

Comment 6

•

11 years ago

Sure; that would help the ion codegen for the existing code. But if we're going to change the algorithm to avoid checking all indices, that involves detecting in the algorithm that the proto is not relevant.

Jeff Walden [:Waldo]

Comment 7

•

11 years ago

If the algorithm is going to change at all, and I'm not sure it should for super-super-sparse arrays (which *are* unreasonable, pace njn), adding something to the loop each time seems like a non-starter to me. This is the right algorithm for sane arrays. Adding an intrinsic or so that can detect when an array is super-super-sparse at the get-go, to permit splitting execution into a normal case and a stupid case, seems like the only thing that wouldn't hurt good code while addressing this.

Boris Zbarsky [:bzbarsky]

Comment 8

•

11 years ago

Sure, I was assuming that. You need that intrinsic, and then within the code that you branch to if that intrinsic tests true you need to keep making sure that indexed props don't appear on the proto, because presumably once they do whatever your super-sparse algorithm is will no longer be valid.

Nicholas Nethercote [inactive]

Reporter

Comment 9

•

11 years ago

> In pdf.js there are a couple of hot for..in loops over arrays, which cause their indices to be > stringified uselessly. I converted them to use forEach() instead, which worked great in most cases, > but I hit one example document that had a very sparse array (possibly because it's buggy, I'm not > sure), and so that's blocking this improvement from happening :( I found a hacky workaround -- if the array has length less than 65536, as the vast majority do, I use vanilla array iteration. Otherwise, I fall back to for..in. So my motivating use case is less relevant now. I'll let others decide if this should be pursued nonetheless.

Nicholas Nethercote [inactive]

Reporter

Updated

•

11 years ago

Assignee: n.nethercote → nobody

Tim Taubert [:ttaubert] (inactive)

Comment 10

•

11 years ago

This seems very related to something the Google Inbox team was hitting: https://news.ycombinator.com/item?id=8495939 These two test cases take roughly the same amount of time in Chrome. The second test case in Firefox is however slower by a huge margin (~1000 times), which indicates we're probably scaling linearly here? http://jsperf.com/slice-sparse Can we maybe get someone to look at this again?

Dirkjan Ochtman (:djc)

Comment 11

•

11 years ago

The Inbox issue has also been filed as bug 1087963.

Jan de Mooij [:jandem]

Updated

•

11 years ago

Depends on: 1088189

Jan de Mooij [:jandem]

Updated

•

11 years ago

Blocks: 1088189

No longer depends on: 1088189

Sylvestre Ledru [:Sylvestre]

Comment 12

•

7 years ago

No assignee, updating the status.

Status: ASSIGNED → NEW

Sylvestre Ledru [:Sylvestre]

Comment 13

•

7 years ago

No assignee, updating the status.

Jacob

Comment 14

•

6 years ago

Hi,
I would like to work on this bug.

Thanks

Boris Zbarsky [:bzbarsky]

Comment 15

•

6 years ago

Jeff, do you know who would be a good mentor for this?

Flags: needinfo?(jwalden)

BugBot [:suhaib / :marco/ :calixte]

Comment 16

•

3 years ago

Clear a needinfo that is pending on an inactive user.

Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jwalden)

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Bugzilla

Array.forEach is slow on very sparse arrays

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: n.nethercote, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Updated

Updated

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Updated