Closed Bug 1512491 Opened 6 years ago Closed 5 years ago

Access to Uint8Array is 25% slower

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla68

Project Flags:

Performance Impact

none

Tracking Flags:

Tracking

Status

firefox65

---

wontfix

firefox66

---

wontfix

firefox67

---

wontfix

firefox68

---

fixed

People

(Reporter: CoolCmd, Assigned: anba, Mentored)

Details

(Keywords: good-first-bug, perf, testcase, Whiteboard: [lang=c++])

Attachments

(2 files)

Bug 1512491 - Part 1: Hoist bound checks with constants. r=nbp! 5 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1512491 - Part 2: Hoist access to MConstantElements. r=nbp! 5 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review

CoolCmd

Reporter

Description

•

6 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0

Steps to reproduce:

navigate to https://jsfiddle.net/CoolCmd/f0x1huma/
click 'Run Tests' button at right


Actual results:

benchmark result:
new Uint8Array() x 49.49 ops/sec
new Uint8Array(new ArrayBuffer()) x 39.71 ops/sec


Expected results:

both tests should have identical ops/sec values, as in Firefox 50, Edge and Chrome.

compare the test1() and test2(). they differ only in the way the array is created.
note: test measures only array access, not array creation.

Nicolas B. Pierron [:nbp]

Comment 1

•

5 years ago

Thank you for opening this issue.
I am forwarding it to Kannan to investigate.

Status: UNCONFIRMED → NEW

status-firefox65: --- → affected

Ever confirmed: true

Flags: needinfo?(kvijayan)

Keywords: perf, testcase

Priority: -- → P2

Whiteboard: [qf]

Kannan Vijayan [:djvj]

Comment 2

•

5 years ago

Ion only optimizes the single-integer-argument case, not the single-buffer-argument case:

https://searchfox.org/mozilla-central/source/js/src/jit/MCallOptimize.cpp#2929

Comment 1 is slightly confusing because it suggests we are comparing `new Uint8Array()` and `new UintArray(new ArrayBuffer())`.

It's actually comparing `new Uint8Array(I32)` vs `new Uint8Array(new ArrayBuffer(I32))`.

The apparent cost of separately allocating an ArrayBuffer is likely not the problem here - the ArrayBuffer will need to be allocated regardless in either case.

The problem here most likely arises from the fact that we are not Ion-optimizing the construction with single-buffer-argument.  Instead of inlining the call, Ion would presumably emit an IC or a CallVM for the `new Uint8Array()`, which is the likely source of the performance differential.

The fix is to handle this case in Ion.  This will require checking the type of the incoming argument using TI, and if it's an `ArrayBuffer` instance, emitting an optimized inline MInstruction to construct the array.

This performance delta should exist on all TypedArray kinds - not just Uint8Array.

This might be a good first-bug for somebody wanting to work on Ion stuff.

Flags: needinfo?(kvijayan)

Kannan Vijayan [:djvj]

Updated

•

5 years ago

Keywords: good-first-bug

Nicolas B. Pierron [:nbp]

Updated

•

5 years ago

Whiteboard: [qf] → [qf][lang=c++]

Nicolas B. Pierron [:nbp]

Updated

•

5 years ago

Mentor: kvijayan

André Bargull [:anba]

Assignee

Comment 3

•

5 years ago

(In reply to Kannan Vijayan [:djvj] from comment #2)
> The problem here most likely arises from the fact that we are not
> Ion-optimizing the construction with single-buffer-argument.  Instead of
> inlining the call, Ion would presumably emit an IC or a CallVM for the `new
> Uint8Array()`, which is the likely source of the performance differential.

This is actually not relevant for the test case, because only a single typed array is constructed for each test case. 

Standalone shell test case:
---
const BUFFER_SIZE = 16 * 1024 * 1024;
var a1 = new Uint8Array(BUFFER_SIZE);
a1.fill(1);
var a2 = new Uint8Array(new ArrayBuffer(BUFFER_SIZE));
a2.fill(1);

function f() {
  var c = 0;
  var t = performance.now();
  for (var i = 0; i < 20; ++i) {
    for (var j = 0; j < BUFFER_SIZE; j++) {
      c += a1[j];
    }
  }
  return [performance.now() - t, c];
}

function g() {
  var c = 0;
  var t = performance.now();
  for (var i = 0; i < 20; ++i) {
    for (var j = 0; j < BUFFER_SIZE; j++) {
      c += a2[j];
    }
  }
  return [performance.now() - t, c];
}

print(f());
print(g());
---


The important bit here is that the length for the typed array |a2| exceeds SINGLETON_BYTE_LENGTH [1], which means |a2| will be instantiated as a singleton whereas |a1| won't be marked as a singleton [2,3]. If |a2| is not created as a singleton, for example by commenting out [2], the performance will be the same for both typed arrays. 

Jan mentioned in bug 1437471 that the singleton code is mostly a performance optimization for Mandreel and we may want to consider to remove this code to avoid unexpected performance cliffs (like in bug 1437471 or in this bug), if we're willing to take a ~4000 regression in Mandreel (bug 1437471, comment 3).


[1] https://searchfox.org/mozilla-central/rev/fd32b3a6fa3eff1468311f6fcf32b45c117136df/js/src/vm/TypedArrayObject.h#132-137
[2] https://searchfox.org/mozilla-central/rev/fd32b3a6fa3eff1468311f6fcf32b45c117136df/js/src/vm/TypedArrayObject.cpp#749-753
[3] https://searchfox.org/mozilla-central/rev/fd32b3a6fa3eff1468311f6fcf32b45c117136df/js/src/vm/TypedArrayObject.cpp#383-390

Flags: needinfo?(kvijayan)

Kannan Vijayan [:djvj]

Comment 4

•

5 years ago

Ugh, I can't believe I completely missed the point of the perf.  Just focused on the construction.

Thanks for catching my mistake Andre.

Flags: needinfo?(kvijayan)

CoolCmd

Reporter

Comment 5

•

5 years ago

:djvj, in fact you are the second who completely missed, see bug 1506480.

Kannan Vijayan [:djvj]

Comment 6

•

5 years ago

Ah, there's always safety in numbers :)

Kannan Vijayan [:djvj]

Updated

•

5 years ago

Whiteboard: [qf][lang=c++] → [qf-][lang=c++]

sharath.savasere

Comment 7

•

5 years ago

Hi,

I am interested in this bug and would like to solve it. If possible, could you tell me how to proceed with this and what files to modify to solve this bug. Is there a way to test my code after making changes?

Thank you!!

Nicolas B. Pierron [:nbp]

Comment 8

•

5 years ago

(In reply to sharath.savasere from comment #7)
> I am interested in this bug and would like to solve it. If possible, could
> you tell me how to proceed with this and what files to modify to solve this
> bug. Is there a way to test my code after making changes?

The first step is to check if you can reproduce this issue. Which implies that you build SpiderMonkey from the sources.

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Build_Documentation

Then after building the JavaScript shell, you should try to see if you can reproduce the performance issue with the code from comment 3.

sharath.savasere

Comment 9

•

5 years ago

(In reply to Nicolas B. Pierron [:nbp] (away until 02/01/2019) from comment #8)
> (In reply to sharath.savasere from comment #7)
> > I am interested in this bug and would like to solve it. If possible, could
> > you tell me how to proceed with this and what files to modify to solve this
> > bug. Is there a way to test my code after making changes?
> 
> The first step is to check if you can reproduce this issue. Which implies
> that you build SpiderMonkey from the sources.
> 
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/
> Build_Documentation
> 
> Then after building the JavaScript shell, you should try to see if you can
> reproduce the performance issue with the code from comment 3.

I did the installation through mozilla-central repo. I built the js shell and ran the code in comment 3. I got these results as output:

504.696044921875,335544320
637.552001953125,335544320

Can you tell me the next steps for tackling this bug!
Thanks!

Nicolas B. Pierron [:nbp]

Comment 10

•

5 years ago

Great, now, with a debug build try to identify what is the difference in the generated code, and see if it would be easy to hoist a repeated set of instructions/conditions out of the slow loop. If you cannot reproduce any differences with a debug build, then toggle the ion-spew on with the configure option on an optimized build.

Then to dump the Assembly, use the environment variable IONFLAGS=help and IONFILTER=foo.js:4, to list all different outputs and filter them based on a given location.

This page[1] might also help you to continue your investigation, or learn a few debugging tricks specific (or not) to SpiderMonkey.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Hacking_Tips

sharath.savasere

Comment 11

•

5 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #10)

Great, now, with a debug build try to identify what is the difference in the
generated code, and see if it would be easy to hoist a repeated set of
instructions/conditions out of the slow loop. If you cannot reproduce any
differences with a debug build, then toggle the ion-spew on with the
configure option on an optimized build.

Then to dump the Assembly, use the environment variable IONFLAGS=help and
IONFILTER=foo.js:4, to list all different outputs and filter them based on a
given location.

This page[1] might also help you to continue your investigation, or learn a
few debugging tricks specific (or not) to SpiderMonkey.

[1]
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/
Hacking_Tips

I looked at the output assembly code of the two functions and the assembly code for function f() is more compared to function g(). I asked in #jsapi channel and was told that its something to do with inline functional call. I am not sure which files to look into and what functions to change. I saw the iongraph plots and its very hard to find out what is causing this difference in time.

Thank you!

André Bargull [:anba]

Assignee

Comment 12

•

5 years ago

IIRC you already have generated the iongraph pdfs/pngs. So for the simplified test case from comment #3, iongraph should have outputted four functions:

 function 0 (self-hosted:6695): success; 29 passes.
 function 1 (self-hosted:6695): success; 30 passes.
 function 2 (/tmp/f.js:7): success; 29 passes.
 function 3 (/tmp/f.js:17): success; 29 passes.

function 0 and function 1 can be ignored here, and we only need to look at function 2 and function 3, where function 2 is function f from the test case and function 3 is function g.

We know that function 2/f is the fast one and function 3/g is the slow one, now we only need to identify how these two functions differ resp. which compiler optimization path led to a different output. So let's take a look at "func02-pass26-Add KeepAlive Instructions-mir.gv.pdf" and "func03-pass26-Add KeepAlive Instructions-mir.gv.pdf", which is the output of the very last MIR optimization path before starting to lower to a more machine level representation (LIR).

The inner loop for (var j = 0; j < BUFFER_SIZE; j++) { ... } is represented in the blocks 4-6, where loading the typed array element c += a2[j]; is represented in block 6.

In the fast case for function 2/f, the following instructions are used:

spectremaskindex phi62 typedarraylength56
loadunboxedscalar typedarrayelements57 spectremaskindex67 uint8

Whereas the slow case function 3/g uses:

constant 0x1000000
boundscheck phi56 constant60
spectremaskindex phi56 constant60
constantelements 0x7efce2100000
loadunboxedscalar constantelements64 spectremaskindex63 uint8

That means in the slower function we're always emitting a boundscheck MBoundsCheck instruction, which is probably the cause of the slow down. In the fast function the boundscheck was somehow moved outside of block 6 and is now present in block 4:

boundscheck loadslot53 typedarraylength56

Now we need to identify which optimization is responsible for moving the boundscheck for function 2/f. Eventually we'll end up in "func02-pass13-Range Analysis-mir.gv.pdf", which is the very first pass which has the bounds check in block 4 instead of block 6. This optimization pass is implemented in RangeAnalysis.h and RangeAnalysis.cpp, so you'll need to inspect why the range analysis was able to hoist the bounds check only for function 2/f, but not for function 3/g. Maybe this function is a good starting point for further debugging?

André Bargull [:anba]

Assignee

Comment 13

•

5 years ago

So I just took another quick look at this.

This seems to be a misbehaving interaction between LICM and Range Analysis. LICM doesn't hoist MConstant (MConstant will only be hoisted if its user is hoisted, but the user, MBoundsCheck in this case, can't be hoisted, because one of its operands isn't loop invariant) and not hoisting the MConstant leads to rejecting to hoist MBoundsCheck in the Range Analysis phase, because its length (the MConstant) wasn't hoisted and is therefore considered as not loop invariant.

And additionally LICM also never hoists MConstantElements, which leads to another slow down in function g when compared to function f.

That means there are two action points:

We need to teach RangeAnalysis to hoist MBoundsCheck with non-hoisted MConstants.
And we need to find out why MConstantElements is considered as not being worth hoistable in LICM.

André Bargull [:anba]

Assignee

Comment 14

•

5 years ago

Attached file Bug 1512491 - Part 1: Hoist bound checks with constants. r=nbp! — Details

André Bargull [:anba]

Assignee

Comment 15

•

5 years ago

Attached file Bug 1512491 - Part 2: Hoist access to MConstantElements. r=nbp! — Details

Depends on D24351

André Bargull [:anba]

Assignee

Updated

•

5 years ago

Assignee: nobody → andrebargull

Status: NEW → ASSIGNED

André Bargull [:anba]

Assignee

Comment 16

•

5 years ago

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=df96f2a0d06cebf19c6f5dc2fd3afdd39f31f593

Keywords: checkin-needed

Pulsebot

Comment 17

•

5 years ago

Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/07d5d42519f3
Part 1: Hoist bound checks with constants. r=nbp,jandem
https://hg.mozilla.org/integration/autoland/rev/27b59f0c6630
Part 2: Hoist access to MConstantElements. r=nbp

Keywords: checkin-needed

Natalia Csoregi [:nataliaCs]

Comment 18

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/07d5d42519f3
https://hg.mozilla.org/mozilla-central/rev/27b59f0c6630

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox65: affected → wontfix

status-firefox66: --- → wontfix

status-firefox67: --- → wontfix

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → -

Whiteboard: [qf-][lang=c++] → [lang=c++]

You need to log in before you can comment on or make changes to this bug.