Open Bug 1045391 Opened 8 years ago Updated 3 years ago

Array.forEach is slow on very sparse arrays

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

People

(Reporter: n.nethercote, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

Consider this code:

> let x = [1,2,3]
> x[3628523034] = 7825
> for (i in x) { print(i); }               // runs quickly
> x.forEach(function(v, i) { print(i) });  // runs *slowly*

The implementation of Array.forEach iterates over element index between 0 and |length|, calling the callback if it is present. This naive algorithm matches the one in the spec, but AFAICT there is scope for doing something smarter when many elements are missing.

However, implementing something smarter may be difficult because Array.forEach is self-hosted.

BTW, this is not just a theoretical example. In pdf.js there are a couple of hot for..in loops over arrays, which cause their indices to be stringified uselessly. I converted them to use forEach() instead, which worked great in most cases, but I hit one example document that had a very sparse array (possibly because it's buggy, I'm not sure), and so that's blocking this improvement from happening :(
for-in on arrays is an anti-pattern.  It enumerates *all* enumerable properties, not just numeric ones.  Why is pdf.js using for-in on an array?  (And, maybe, why is it doing this on arrays that are missing elements?  That's not the right way to use arrays, either.)
> for-in on arrays is an anti-pattern.

I know! I'm trying to get rid of that code, but this behaviour is preventing me from doing so.

> (And, maybe, why is it doing this on arrays that are missing elements?  That's not the right way to use arrays, either.)

These arrays are dense in most cases, but for this one particular PDF I have these outrageously sparse arrays. But then, sparse arrays are not unreasonable, and expecting forEach() to work efficiently on them is not unreasonable.
Unsurprisingly, we spend > 90% of the time under jit::OperatorIn, which does the slow lookupGeneric -> Shape::search thing.

Ion could be smarter by noticing (at compile-time) there are no indexed properties on Array.prototype and only doing an "own" lookup in this case. That should at least avoid searching Array.prototype and Object.prototype... There's also a few % under ValueToId etc, that's unnecessary if the index is known to be int32.

So we can make the current algorithm a bit faster but, as you suggested, rewriting it will probably help more...
So what the self-hosted code would need is a fastish way to check whether the array has a proto with indexed props.  Fastish, because you'd have to check it after every property get from the array.
(In reply to Boris Zbarsky [:bz] from comment #4)
> So what the self-hosted code would need is a fastish way to check whether
> the array has a proto with indexed props.  Fastish, because you'd have to
> check it after every property get from the array.

Well Ion should just do this automatically for "int32 in (non-dense) object". It has all the infrastructure to statically (at compile-time) check for indexed props on the prototype and invalidate scripts when this changes; we also use this for normal GETELEM/SETELEM.
Sure; that would help the ion codegen for the existing code.  But if we're going to change the algorithm to avoid checking all indices, that involves detecting in the algorithm that the proto is not relevant.
If the algorithm is going to change at all, and I'm not sure it should for super-super-sparse arrays (which *are* unreasonable, pace njn), adding something to the loop each time seems like a non-starter to me.  This is the right algorithm for sane arrays.  Adding an intrinsic or so that can detect when an array is super-super-sparse at the get-go, to permit splitting execution into a normal case and a stupid case, seems like the only thing that wouldn't hurt good code while addressing this.
Sure, I was assuming that.  You need that intrinsic, and then within the code that you branch to if that intrinsic tests true you need to keep making sure that indexed props don't appear on the proto, because presumably once they do whatever your super-sparse algorithm is will no longer be valid.
> In pdf.js there are a couple of hot for..in loops over arrays, which cause their indices to be
> stringified uselessly. I converted them to use forEach() instead, which worked great in most cases,
> but I hit one example document that had a very sparse array (possibly because it's buggy, I'm not
> sure), and so that's blocking this improvement from happening :(

I found a hacky workaround -- if the array has length less than 65536, as the vast majority do, I use vanilla array iteration. Otherwise, I fall back to for..in.

So my motivating use case is less relevant now. I'll let others decide if this should be pursued nonetheless.
Assignee: n.nethercote → nobody
This seems very related to something the Google Inbox team was hitting:

https://news.ycombinator.com/item?id=8495939

These two test cases take roughly the same amount of time in Chrome. The second test case in Firefox is however slower by a huge margin (~1000 times), which indicates we're probably scaling linearly here?

http://jsperf.com/slice-sparse

Can we maybe get someone to look at this again?
The Inbox issue has also been filed as bug 1087963.
Depends on: 1088189
Blocks: 1088189
No longer depends on: 1088189
No assignee, updating the status.
Status: ASSIGNED → NEW
No assignee, updating the status.

Hi,
I would like to work on this bug.

Thanks

Jeff, do you know who would be a good mentor for this?

Flags: needinfo?(jwalden)
You need to log in before you can comment on or make changes to this bug.