Creating Uint8Array on an existing ArrayBuffer is surprisingly costly

NEW
Unassigned

Status

()

Core
JavaScript Engine: JIT
P3
normal
2 years ago
a year ago

People

(Reporter: maciej.hirsz, Unassigned)

Tracking

50 Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36

Steps to reproduce:

Micro Benchmark: http://terhix.com/typed.html


Actual results:

The most interesting numbers on my hardware:

Benching create a fresh 8 byte Uint8Array
- 20ns per iteration (49554013 ops/sec)

Benching create a fresh 8 byte ArrayBuffer
- 261ns per iteration (3830243 ops/sec)

Benching create Uint8Array on an existing 2048 byte ArrayBuffer
- 615ns per iteration (1626465 ops/sec)

Benching create Uint8Array on an existing 8 byte ArrayBuffer
- 594ns per iteration (1684011 ops/sec)


Expected results:

While the time taken to create a new Uint8Array seems constant, regardless of the size of underlying ArrayBuffer, 600ns is an awful lot to create an object that should be effectively a ( pointer, offset, length ) tuple. Chrome 54 on the same hardware benchmarks at 100-105ns, which is still quite a bit for what it does, but reasonable enough.

Creating a Uint8Array on an ArrayBuffer is the one way to actually read binary data from a WebSocket. For short messages the overhead of creating the view interface is overwhelming compared to the cost of decoding the data - for example decoding a Float64 into a JavaScript Number is some 50ns on it's own, however having to create the view slows it down considerably.

Few more of the results are really confusing, such as creating a Uint8Array with allocation being faster than allocating an ArrayBuffer of the same size.

Updated

2 years ago
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
(In reply to maciej.hirsz from comment #0)
> While the time taken to create a new Uint8Array seems constant, regardless
> of the size of underlying ArrayBuffer, 600ns is an awful lot to create an
> object that should be effectively a ( pointer, offset, length ) tuple.

I was able to save ~90ns by avoiding the ".prototype" lookup in [1], and by cleaning up the type checks in [2]. Using makeProtoInstance instead of makeTypedInstance in [3] saves additional 250ns when creating the typed array, but I don't know how this change affects type inference when actually using the typed array. The view tracking in [4] costs us another 100ns which is quite unfortunate. 

[1] https://github.com/mozilla/gecko-dev/blob/021c9d40ae886b3ac6e9a7dcfd8cb7fdf9910ae0/js/src/vm/TypedArrayObject.cpp#L725-L728
[2] https://github.com/mozilla/gecko-dev/blob/021c9d40ae886b3ac6e9a7dcfd8cb7fdf9910ae0/js/src/vm/TypedArrayObject.cpp#L766-L775
[3] https://github.com/mozilla/gecko-dev/blob/021c9d40ae886b3ac6e9a7dcfd8cb7fdf9910ae0/js/src/vm/TypedArrayObject.cpp#L479
[4] https://github.com/mozilla/gecko-dev/blob/021c9d40ae886b3ac6e9a7dcfd8cb7fdf9910ae0/js/src/vm/TypedArrayObject.cpp#L538-L542
Brian, I know it's been quite a while since you looked at this stuff, but perhaps you remember some optimization potential here that we simply haven't gotten around to?
Flags: needinfo?(bhackett1024)
(In reply to Till Schneidereit [:till] from comment #2)
> Brian, I know it's been quite a while since you looked at this stuff, but
> perhaps you remember some optimization potential here that we simply haven't
> gotten around to?

Ion only inlines TypedArray(N), not TypedArray(buffer).  Doing the latter wouldn't be too hard I think.
Flags: needinfo?(bhackett1024)

Comment 4

2 years ago
Creating throwaway views on a buffer ought to be a cheap operation.
Status: UNCONFIRMED → NEW
Component: JavaScript Engine → JavaScript Engine: JIT
Ever confirmed: true
Priority: -- → P2
I'm not able to reproduce the numbers in the first post on an optimized build of mozilla-central. I'm a bit confused by the user agent string, are the results for Chrome or Firefox? 

I'm seeing these results on a Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz:

$ ~/work/sm-js bench.js
Benching create a fresh 8 byte Array literal
- 1ns per iteration (1000000000 ops/sec)

Benching create a fresh 8 byte Uint8Array
- 139ns per iteration (7194244 ops/sec)

Benching create a fresh 8 byte ArrayBuffer
- 158ns per iteration (6329113 ops/sec)

Benching create a fresh 2048 byte Uint8Array
- 743ns per iteration (1345895 ops/sec)

Benching create a fresh 2048 byte ArrayBuffer
- 878ns per iteration (1138952 ops/sec)

Benching create Uint8Array on an existing 8 byte ArrayBuffer
- 280ns per iteration (3571428 ops/sec)

Benching create Uint8Array on an existing 2048 byte ArrayBuffer
- 293ns per iteration (3412969 ops/sec)

Benching subarray 2048 byte Uint8Array to 8 byte Uint8Array
- 336ns per iteration (2976190 ops/sec)

Benching slice 2048 byte Uint8Array to 8 byte Uint8Array
- 190ns per iteration (5263157 ops/sec)

Benching subarray 2048 byte Uint8Array to 1024 byte Uint8Array
- 336ns per iteration (2976190 ops/sec)

Benching slice 2048 byte Uint8Array to 1024 byte Uint8Array
- 1343ns per iteration (744601 ops/sec)

When looking at the -D output, I see that |new ArrayBuffer(8)| uses 3x StackArgT, CallNative, and Unbox:Object. For |new Uint8Array(8)|, I'm seeing only one |NewTypedArray|, which is 23 instructions long on x86-64. I'm surprised that Uint8Array(8) is so close in runtime to ArrayBuffer(8).

I had to replace |performance.now()| with |(new Date()).getTime()|, since it was not defined in the JS shell.
(Reporter)

Comment 6

2 years ago
I submitted the issue in Chrome since that's where I was logged in on GitHub, the results are from Firefox. 

Just re-running it now:

------------

Firefox 50.0:

Benching create a fresh 8 byte Uint8Array
- 24ns per iteration (41972717 ops/sec)

Benching create a fresh 8 byte ArrayBuffer
- 238ns per iteration (4195686 ops/sec)

Benching create a fresh 2048 byte Uint8Array
- 1065ns per iteration (938720 ops/sec)

Benching create a fresh 2048 byte ArrayBuffer
- 2973ns per iteration (336325 ops/sec)

Benching create Uint8Array on an existing 8 byte ArrayBuffer
- 522ns per iteration (1914718 ops/sec)

Benching create Uint8Array on an existing 2048 byte ArrayBuffer
- 516ns per iteration (1937984 ops/sec)

Benching subarray 2048 byte Uint8Array to 8 byte Uint8Array
- 593ns per iteration (1685487 ops/sec)

Benching slice 2048 byte Uint8Array to 8 byte Uint8Array
- 460ns per iteration (2171764 ops/sec)

Benching subarray 2048 byte Uint8Array to 1024 byte Uint8Array
- 582ns per iteration (1716885 ops/sec)

Benching slice 2048 byte Uint8Array to 1024 byte Uint8Array
- 5756ns per iteration (173720 ops/sec)

------------

Firefox Developer Edition 51.0a2:

Benching create a fresh 8 byte Uint8Array
- 320ns per iteration (3124023 ops/sec)

Benching create a fresh 8 byte ArrayBuffer
- 220ns per iteration (4548866 ops/sec)

Benching create a fresh 2048 byte Uint8Array
- 1396ns per iteration (716509 ops/sec)

Benching create a fresh 2048 byte ArrayBuffer
- 1956ns per iteration (511353 ops/sec)

Benching create Uint8Array on an existing 8 byte ArrayBuffer
- 534ns per iteration (1873641 ops/sec)

Benching create Uint8Array on an existing 2048 byte ArrayBuffer
- 532ns per iteration (1880883 ops/sec)

Benching subarray 2048 byte Uint8Array to 8 byte Uint8Array
- 690ns per iteration (1449065 ops/sec)

Benching slice 2048 byte Uint8Array to 8 byte Uint8Array
- 351ns per iteration (2850667 ops/sec)

Benching subarray 2048 byte Uint8Array to 1024 byte Uint8Array
- 652ns per iteration (1533142 ops/sec)

Benching slice 2048 byte Uint8Array to 1024 byte Uint8Array
- 2227ns per iteration (448999 ops/sec)

------------

This is a 4 years old MacBook Air, and the numbers do seem to check out more or less (except for the being higher here just due to hardware being slower).

The main difference between Developer Edition and current stable is that <create a fresh 8 byte Uint8Array> is no longer a (fast) anomaly.
(In reply to maciej.hirsz from comment #6)
> The main difference between Developer Edition and current stable is that
> <create a fresh 8 byte Uint8Array> is no longer a (fast) anomaly.

There's some extra poisoning that happens in pre-release builds that can affect benchmark results and could easily be to blame for this.

Could you re-run these after setting the environment variable JSGC_DISABLE_POISONING=1?
(Reporter)

Comment 8

2 years ago
Firefox Developer Edition 51.0a2 with JSGC_DISABLE_POISONING=1:

Benching create a fresh 8 byte Uint8Array
- 292ns per iteration (3423192 ops/sec)

Benching create a fresh 8 byte ArrayBuffer
- 222ns per iteration (4506737 ops/sec)

Benching create a fresh 2048 byte Uint8Array
- 1518ns per iteration (658861 ops/sec)

Benching create a fresh 2048 byte ArrayBuffer
- 2034ns per iteration (491638 ops/sec)

Benching create Uint8Array on an existing 8 byte ArrayBuffer
- 497ns per iteration (2010292 ops/sec)

Benching create Uint8Array on an existing 2048 byte ArrayBuffer
- 585ns per iteration (1708423 ops/sec)

Benching subarray 2048 byte Uint8Array to 8 byte Uint8Array
- 617ns per iteration (1620955 ops/sec)

Benching slice 2048 byte Uint8Array to 8 byte Uint8Array
- 360ns per iteration (2779012 ops/sec)

Benching subarray 2048 byte Uint8Array to 1024 byte Uint8Array
- 636ns per iteration (1572067 ops/sec)

Benching slice 2048 byte Uint8Array to 1024 byte Uint8Array
- 2307ns per iteration (433502 ops/sec)

----

Just to be sure, I also checked that the process gets the flag:

$ ps eww 30745
  PID   TT  STAT      TIME COMMAND
30745   ??  S      0:16.09 /Applications/FirefoxDeveloperEdition.app/Contents/MacOS/firefox COLOR_BLACK=\e[0;30m ITERM_SESSION_ID=<edited-out> JSGC_DISABLE_POISONING=1 GREP_COLOR=1;32 ...
(In reply to maciej.hirsz from comment #8)
Thanks for doing that, that's very interesting!
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.