Open Bug 1874699 Opened 2 years ago Updated 6 months ago

TypedArray.subarray is quite slow

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

People

(Reporter: valadaptive, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Attached file subarray-bench.html

TypedArray.prototype.subarray is called quite frequently when performing operations on subsections of one large buffer (e.g. deserialization, data marshaling to and from WASM). However, it has considerable overhead.

The attached benchmark demonstrates this overhead. Creating a subarray and modifying the underlying buffer takes far longer than modifying the original array. Over 10M iterations, modifying an element in the array directly takes 5ms, while calling subarray takes ~600ms. In Chrome, for comparison, subarray takes ~200ms.

This overhead is significant enough to affect real-world use cases--encoding a medium-small string into a buffer takes ~60% more time with a subarray call included.

To reproduce, run the attached benchmark in both Chrome and Firefox and compare the results.
I've also profiled the benchmark, and it seems like most of the time is spent in TypedArrayCreateWithBuffer.

One idea I mentioned on Matrix: it might be nice to have a fast path for ArrayBufferObject::addView for when the buffer matches the buffer we had last time; just a small cache to avoid the repeated hash table lookups. It's likely common to create multiple views for the same array buffer.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Blocks: sm-runtime
Severity: -- → S4
Priority: -- → P2

Came across something potentially relevant in an old V8 blog post:

A major source of jank during garbage collection is processing various bookkeeping data structures. Many of these data structures enable optimizations that are unrelated to garbage collection. Two examples are the list of all ArrayBuffers, and each ArrayBuffer’s list of views. These lists allow for an efficient implementation of the DetachArrayBuffer operation without imposing any performance hit on access to an ArrayBuffer view. In situations, however, where a web page creates millions of ArrayBuffers, (e.g., WebGL-based games), updating those lists during garbage collection causes significant jank. In Chrome 46, we removed these lists and instead detect detached buffers by inserting checks before every load and store to ArrayBuffers. This amortizes the cost of walking the big bookkeeping list during GC by spreading it throughout program execution resulting in less jank. Although the per-access checks can theoretically slow down the throughput of programs that heavily use ArrayBuffers, in practice V8's optimizing compiler can often remove redundant checks and hoist remaining checks out of loops, resulting in a much smoother execution profile with little or no overall performance penalty.

It's not a 100% match to what's going on in SM (InnerViewTable is a hash map, not an array, and this bug is a throughput issue rather than a latency one), but might merit discussion anyway. Looking at the profile, ~10% of the subarray cost is spent in InnerViewTable::AddView, and a further ~5% is spent tracing the InnerViewTable during a minor GC.

It seems like we could convert the hash map back to a list, at least for small numbers of views. Or perhaps leverage the nursery and keep nursery views in a list and tenured views in the hash map, or something. (I'd need to refresh my memory of the code before I'm sure either of these make sense.)

Doing the checks per-access is definitely also an option. I think nowadays it would be both for detached ABs as well as resized resizable ABs (though those are going to be very rare for a while). It would be worth benchmarking.

Nightly:

  • encoding a string: 1265 ms
  • encoding a string (w/ subarray): 1957 ms
  • modify an ArrayBuffer: 6 ms
  • modify an ArrayBuffer (w/ subarray): 6 ms

Chrome

  • encoding a string: 3398 ms
  • encoding a string (w/ subarray): 5410 ms
  • modify an ArrayBuffer: 18 ms
  • modify an ArrayBuffer (w/ subarray): 247 ms

Looks like we are 3x-4x faster than Chrome on the latest Nightly.

Bisection:
Bug 1980613 - Part 12: Scalar replace MTypedArraySubarray. r=jandem

This case just requires adding both start operands.

Differential Revision: https://phabricator.services.mozilla.com/D259588

Before: ~600ms
After: 6ms

So bug 1980613 made the modify an ArrayBuffer (w/ subarray) bit 100x faster \o/

Depends on: 1980613
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: