Optimize Array.prototype.shift to be O(1) instead of O(n)

RESOLVED FIXED in Firefox 55



2 years ago
2 years ago


(Reporter: jandem, Assigned: jandem)


(Blocks: 2 bugs)

Dependency tree / graph

Firefox Tracking Flags

(platform-rel ?, firefox55 fixed)


(Whiteboard: [qf:p1][platform-rel-Outlook][platform-rel-Microsoft])


(2 attachments, 1 obsolete attachment)



2 years ago
Posted file Testcase
Last week I was investigating an Elm benchmark and it uses Array#shift in a way that results in quadratic behavior for us (it uses arrays as queues). Both V8 and JSC optimize this be much faster.

We could optimize this as follows:

(1) When we shift() an element, increment the elements_ pointer to point to the second element (and move the ObjectElements header as well).

(2) In the ObjectElements header, increment an integer (numShiftedElements or something). This will allow us to recover the original pointer which we have to pass to free/realloc etc.

(3) To avoid wasting memory when we shift many elements, we would reallocate and discard this space when we grow the elements (when we have to realloc anyway) or we could do this during GC when numShiftedElements is sufficiently large.

(4) unshift could be optimized in a similar way when numShiftedElements > 0.

This shouldn't be too difficult to implement I think and it would be nice to have it fixed soonish, as it's easy to get quadratic behavior from this.

I'm attaching a micro-benchmark that takes 1 ms in Chrome/Safari and 35 ms for us.

Comment 1

2 years ago
Oh and we should probably fix GC post barriers to store the dense element index relative to the original pointer (add numShiftedElements).

Comment 2

2 years ago
I added some logging and I think this would help a lot of websites. Although shift() is typically used with small arrays (<= 3 elements), when I open outlook.com for instance it looks like they have an array with > 1900 elements and keep shift()ing one element. This will easily take more than a few ms on an average machine.

Considering this is causing quadratic behavior on real websites, we should probably get it fixed soon. I'll work on a patch this week, hopefully it won't be that hard.
Blocks: 1362956
Flags: needinfo?(jdemooij)
Whiteboard: [qf]
This is a great idea!  Both shift() and pop() can treat the elements array as a dequeue.  A "shrink capacity to 1/2 when usage is <1/3" would give us O(1) amortized performance for shift() and pop(), while avoid edge cases where we keep shrinking/growing for alternating add/remove ops around the shrink/grow boundary.
Blocks: 1158767
platform-rel: --- → ?
Whiteboard: [qf] → [qf][platform-rel-Outlook][platform-rel-Microsoft]

Comment 4

2 years ago
Posted patch Patch (obsolete) — Splinter Review
This implements what I described in comment 0. It seems to work very well: we can now shift the elements header and when we grow/shrink elements we unshift these shifted elements if needed.

Most of the changes have to do with GC stuff: barriers, elements alloc/realloc/free, etc. Other than that it works pretty transparently and I didn't have to touch a lot of code. I changed Ion's optimized shift code to move the elements before (instead of after) updating the length + initializedLength, this makes it more similar to ArrayShiftDenseKernel.

The attached micro-benchmark improves from 35 ms to 0-1 ms. On Jon's GC barrier benchmark in bug 1362956 I get this:


Length:      Iterations:  GC state:    Contents:    Shift time:  Pop time:   
100000       1000         no GC        objects      0.10338      0.0010601   
100000       1000         no GC        numbers      0.084298     0.00094800  
100000       1000         in GC        objects      1.8337       0.0011079   
100000       1000         in GC        numbers      0.43972      0.0010459   


Length:      Iterations:  GC state:    Contents:    Shift time:  Pop time:   
100000       1000         no GC        objects      0.00022510   0.00090601  
100000       1000         no GC        numbers      0.00010181   0.00073486  
100000       1000         in GC        objects      0.00013989   0.0010242   
100000       1000         in GC        numbers      0.00010498   0.00071899  

Note that shift() is now as fast as pop() and we no longer have the bad perf cliff when we're in the middle of an incremental GC.

We can also optimize unshift() now when there are shifted elements, but I'll do that as a follow-up.
Assignee: nobody → jdemooij
Flags: needinfo?(jdemooij)
Attachment #8866435 - Flags: review?(jcoppeard)

Comment 5

2 years ago
Note that shift() is also used quite a lot in our chrome code, for instance when restoring tabs:


So this may improve session restore perf a bit for users with thousands of tabs.
Note to ehsan & QF triage team: I'm marking this "qf:p1" because that's what we'd do in triage, and I might have to miss the first half of the next triage meeting.
Whiteboard: [qf][platform-rel-Outlook][platform-rel-Microsoft] → [qf:p1][platform-rel-Outlook][platform-rel-Microsoft]

Comment 7

2 years ago
Posted patch PatchSplinter Review
With the previous patch we could shift up to 64k elements but this patch reduces that to 2047. It doesn't matter much perf wise but it may improve memory usage in certain pathological cases, and in general I think it's good to be conservative with these things.

I noticed Chrome has a bad perf cliff somewhere around ~50,000:

Chrome shifting 50k elements: 4 ms
Chrome shifting 51k elements: > 1 second
Chrome shifting 80k elements: > 2.5 seconds

Nightly without patch shifting 80k elements: ~2.5 seconds
Nightly with patch shifting 80k elements: 4 ms

So Chrome still has the quadratic behavior when shifting more than 50k elements. Safari takes about 5 ms and is similar to us.
Attachment #8866435 - Attachment is obsolete: true
Attachment #8866435 - Flags: review?(jcoppeard)
Attachment #8866685 - Flags: review?(jcoppeard)
Comment on attachment 8866685 [details] [diff] [review]

Review of attachment 8866685 [details] [diff] [review]:

This is great.  And those numbers speak for themselves.

::: js/src/vm/NativeObject.cpp
@@ +728,5 @@
> +
> +    // Move the elements. Initialize to |undefined| to ensure pre-barriers
> +    // don't see garbage.
> +    for (size_t i = 0; i < numShifted; i++)
> +        initDenseElement(i, UndefinedValue());

We need to trigger a prebarrier when shifting these elements out of the array in the first place.  Does that not happen already?  Oh right, this might have have the ObjectElements header copied over it.

::: js/src/vm/NativeObject.h
@@ +210,5 @@
> +    static const size_t MaxShiftedElements = (1 << NumShiftedElementsBits) - 1;
> +    static const size_t NumShiftedElementsShift = 32 - NumShiftedElementsBits;
> +    static const size_t FlagsMask = (1 << NumShiftedElementsShift) - 1;
> +    static_assert(MaxShiftedElements == 2047,
> +                  "MaxShiftedElements should match the comment");

It might be simpler to store flags and shifted elements count as separate uint16_ts.  Setting a maximum for the shifted element count is a good idea though.

@@ +273,5 @@
> +    void addShiftedElements(uint32_t count) {
> +        MOZ_ASSERT(count < capacity);
> +        MOZ_ASSERT(count < initializedLength);
> +        uint32_t numShifted = numShiftedElements() + count;

Can this overflow?

@@ +1054,5 @@
> +    void* getUnshiftedElementsHeader() const {
> +        return ObjectElements::fromElements(unshiftedElements());
> +    }
> +
> +    uint32_t indexForBarrier(uint32_t index) const {

The name doesn't make it clear what this does.  Maybe it could be called unshiftedIndex or indexIgnoringShift or something?
Attachment #8866685 - Flags: review?(jcoppeard) → review+

Comment 9

2 years ago
Pushed by jandemooij@gmail.com:
Optimize Array.prototype.shift to have O(1) perf instead of O(n). r=jonco

Comment 10

2 years ago
Thanks for the quick review.

(In reply to Jon Coppeard (:jonco) from comment #8)
> It might be simpler to store flags and shifted elements count as separate
> uint16_ts.  Setting a maximum for the shifted element count is a good idea
> though.

These flags are accessed from JIT code and there's some concern that accessing 16-bit values is slower than 32-bit values. We have some workarounds for that for JSFunction flags, see this comment:


> Can this overflow?

No all these values (and the result of the addition too) are <= MAX_DENSE_ELEMENTS_ALLOCATION.

> The name doesn't make it clear what this does.  Maybe it could be called
> unshiftedIndex or indexIgnoringShift or something?

Good point, I renamed it to unshiftedIndex and removed the comment (because the function is no longer barrier-specific).

Comment 11

2 years ago
Last Resolved: 2 years ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55


2 years ago
Blocks: 1364345


2 years ago
Blocks: 1364346
Duplicate of this bug: 1362956
You need to log in before you can comment on or make changes to this bug.