Open
Bug 1045391
Opened 10 years ago
Updated 2 years ago
Array.forEach is slow on very sparse arrays
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
NEW
People
(Reporter: n.nethercote, Unassigned)
References
(Blocks 1 open bug)
Details
Consider this code:
> let x = [1,2,3]
> x[3628523034] = 7825
> for (i in x) { print(i); } // runs quickly
> x.forEach(function(v, i) { print(i) }); // runs *slowly*
The implementation of Array.forEach iterates over element index between 0 and |length|, calling the callback if it is present. This naive algorithm matches the one in the spec, but AFAICT there is scope for doing something smarter when many elements are missing.
However, implementing something smarter may be difficult because Array.forEach is self-hosted.
BTW, this is not just a theoretical example. In pdf.js there are a couple of hot for..in loops over arrays, which cause their indices to be stringified uselessly. I converted them to use forEach() instead, which worked great in most cases, but I hit one example document that had a very sparse array (possibly because it's buggy, I'm not sure), and so that's blocking this improvement from happening :(
Comment 1•10 years ago
|
||
for-in on arrays is an anti-pattern. It enumerates *all* enumerable properties, not just numeric ones. Why is pdf.js using for-in on an array? (And, maybe, why is it doing this on arrays that are missing elements? That's not the right way to use arrays, either.)
Reporter | ||
Comment 2•10 years ago
|
||
> for-in on arrays is an anti-pattern.
I know! I'm trying to get rid of that code, but this behaviour is preventing me from doing so.
> (And, maybe, why is it doing this on arrays that are missing elements? That's not the right way to use arrays, either.)
These arrays are dense in most cases, but for this one particular PDF I have these outrageously sparse arrays. But then, sparse arrays are not unreasonable, and expecting forEach() to work efficiently on them is not unreasonable.
Comment 3•10 years ago
|
||
Unsurprisingly, we spend > 90% of the time under jit::OperatorIn, which does the slow lookupGeneric -> Shape::search thing.
Ion could be smarter by noticing (at compile-time) there are no indexed properties on Array.prototype and only doing an "own" lookup in this case. That should at least avoid searching Array.prototype and Object.prototype... There's also a few % under ValueToId etc, that's unnecessary if the index is known to be int32.
So we can make the current algorithm a bit faster but, as you suggested, rewriting it will probably help more...
Comment 4•10 years ago
|
||
So what the self-hosted code would need is a fastish way to check whether the array has a proto with indexed props. Fastish, because you'd have to check it after every property get from the array.
Comment 5•10 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #4)
> So what the self-hosted code would need is a fastish way to check whether
> the array has a proto with indexed props. Fastish, because you'd have to
> check it after every property get from the array.
Well Ion should just do this automatically for "int32 in (non-dense) object". It has all the infrastructure to statically (at compile-time) check for indexed props on the prototype and invalidate scripts when this changes; we also use this for normal GETELEM/SETELEM.
Comment 6•10 years ago
|
||
Sure; that would help the ion codegen for the existing code. But if we're going to change the algorithm to avoid checking all indices, that involves detecting in the algorithm that the proto is not relevant.
Comment 7•10 years ago
|
||
If the algorithm is going to change at all, and I'm not sure it should for super-super-sparse arrays (which *are* unreasonable, pace njn), adding something to the loop each time seems like a non-starter to me. This is the right algorithm for sane arrays. Adding an intrinsic or so that can detect when an array is super-super-sparse at the get-go, to permit splitting execution into a normal case and a stupid case, seems like the only thing that wouldn't hurt good code while addressing this.
Comment 8•10 years ago
|
||
Sure, I was assuming that. You need that intrinsic, and then within the code that you branch to if that intrinsic tests true you need to keep making sure that indexed props don't appear on the proto, because presumably once they do whatever your super-sparse algorithm is will no longer be valid.
Reporter | ||
Comment 9•10 years ago
|
||
> In pdf.js there are a couple of hot for..in loops over arrays, which cause their indices to be
> stringified uselessly. I converted them to use forEach() instead, which worked great in most cases,
> but I hit one example document that had a very sparse array (possibly because it's buggy, I'm not
> sure), and so that's blocking this improvement from happening :(
I found a hacky workaround -- if the array has length less than 65536, as the vast majority do, I use vanilla array iteration. Otherwise, I fall back to for..in.
So my motivating use case is less relevant now. I'll let others decide if this should be pursued nonetheless.
Reporter | ||
Updated•10 years ago
|
Assignee: n.nethercote → nobody
Comment 10•10 years ago
|
||
This seems very related to something the Google Inbox team was hitting:
https://news.ycombinator.com/item?id=8495939
These two test cases take roughly the same amount of time in Chrome. The second test case in Firefox is however slower by a huge margin (~1000 times), which indicates we're probably scaling linearly here?
http://jsperf.com/slice-sparse
Can we maybe get someone to look at this again?
Comment 11•10 years ago
|
||
The Inbox issue has also been filed as bug 1087963.
Updated•10 years ago
|
Comment 13•6 years ago
|
||
No assignee, updating the status.
Comment 14•5 years ago
|
||
Hi,
I would like to work on this bug.
Thanks
Comment 15•5 years ago
|
||
Jeff, do you know who would be a good mentor for this?
Flags: needinfo?(jwalden)
Comment 16•2 years ago
|
||
Clear a needinfo that is pending on an inactive user.
Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE
.
For more information, please visit auto_nag documentation.
Flags: needinfo?(jwalden)
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•