Closed Bug 663485 Opened 9 years ago Closed 9 years ago

TI: make typed arrays as fast as dense arrays

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 6 obsolete files)

The TI branch has super fast array access, I wonder if we can do the same for typed arrays.

If we have a GETELEM x[y] where we know x is a typed array and y is an integer, we can probably generate a fast inline path without using the (currently disabled) IC's. Maybe at some point even bounds-check hoisting like normal arrays?

Brian do you think this is possible with the current TI machinery?
Yes, definitely.  We should do everything for typed arrays that we are now doing for dense arrays and non-escaping arguments, hoisting both bounds checks and computations of the base pointers themselves.  Went through this recently for the arguments; the simple way is to first get the inline paths in, then extend LoopState with the hoisting machinery for the array type.
Tomorrow I will try to come up with a GETELEM prototype based on your dense array/arguments code. (Taking this bug because it's a bit more self-contained than fixing the IC's)
Assignee: general → jandemooij
Status: NEW → ASSIGNED
Cool, I think that's a better option than fixing the ICs themselves to work with TI --- that would be very hard with the calls made within the ICs, and with the improved precision we now have with type barriers it should be very rare to find hot accesses to typed arrays where we don't know the type of the array being accessed (and won't be able to generate an inline path).
Attached patch WIP v1 (obsolete) — Splinter Review
First stab at a GETELEM patch. Porting the code from the IC's was pretty straight-forward, although the SETELEM will be harder of course. No hoisting or anything but the inline path is basically complete and passes jit-tests.

For this micro-benchmark:
--
function f() {
    // d8 does not like new Uint8Array([1, 2])
    var arr = [1,2,3,4,5,6,7,8,9];
    var a = new Uint8Array(arr.length);
    for (var i=0; i<arr.length; i++)
        a[i] = arr[i];
    var x = 0;
    var idx = 3;
    var t0 = new Date;
    for (var i=0; i<100000000; i++) {
        x += a[idx];
    }
    print(new Date - t0);
    print(x);
}
f();
--
d8      : 140 ms
js -m -n: 211 ms
js -j   : 683 ms
js -m   : 823 ms

For a dense array:

d8      : 138 ms
-m -n   : 127 ms
-j      : 755 ms
-m      : 812 ms

We have to do two loads (typed array from private slot, then the data vector). Hoisting these will be interesting.

Brian I'm not sure what's the best way to test whether the object is a typed array. This part of the code is still very hackish. Currently I look at the proto of the TypeObject, but I'm not sure whether this is correct (mutable __proto__, Object.create etc?), and do we need a type constraint to trigger recompilation when this changes? Or is there a better way to do this?
(In reply to comment #4)
> Brian I'm not sure what's the best way to test whether the object is a typed
> array. This part of the code is still very hackish. Currently I look at the
> proto of the TypeObject, but I'm not sure whether this is correct (mutable
> __proto__, Object.create etc?), and do we need a type constraint to trigger
> recompilation when this changes? Or is there a better way to do this?

I don't think there's a good way to handle this currently, there is too much around arrays which we special case (and extending that special casing would not be good).  I think we need to improve typeobj or typeobj->newScript to also encode information about the clasp which all instance objects share, and which can be cleared dynamically if Object.create or somesuch is used.  Should be able to do this in the next few days, for now the hack is fine.
> We have to do two loads

That might change with bug 656519.

> I'm not sure what's the best way to test whether the object is a typed array.

js_IsTypedArray?  Or is the point that you don't have access to the class of the actual object?
(In reply to comment #6)
> > We have to do two loads
> 
> That might change with bug 656519.
> 
> > I'm not sure what's the best way to test whether the object is a typed array.
> 
> js_IsTypedArray?  Or is the point that you don't have access to the class of
> the actual object?

The question is how to determine from the inferred type information for a GETELEM/SETELEM that it is definitely accessing a particular kind of typed array (so that we don't need to dynamically test in the jitcode).  We can currently encode this information for dense array accesses, just not typed arrays (should not be hard to fix, just need to make sure to do it right).
Attached patch WIP v2 (obsolete) — Splinter Review
Added inline paths for JSOP_SETELEM and JSOP_LENGTH. For SETELEM we support assigning integer and double values (and we inline js_TypedArray_uint8_clamp_double for Uint8ClampedArray).

For the awfy-assorted array testcase with Int32Array instead of Array (I will add some typed array tests to assorted later this week):

d8        :  207 ms or 271 ms
js -m -n  :  251 ms
js -j     :  596 ms
js -m     : 2940 ms (TM-tree)

Still need to do some cleanup, add tests and share more code with the IC's.

Will leave LICM and other optimizations to follow-up patches. 192 ms with -m -n for this test case with dense arrays, so we should be able to win another 40-50 ms.
Attachment #538693 - Attachment is obsolete: true
Clean up the state tracking for object classes, and allow us to keep track of which objects are definitely typed arrays, via a flag on the type object.  You can test for typedarray-ness with typeset->hasObjectFlags in the same way as is done for dense arrays.  The proto still needs to be walked to figure out *which* typed array an object is, or you can inspect the initializerKey entry of the type (make sure to call hasObjectFlags too, though, as the initializerKey won't change if mutable __proto__ is in use).

This also assigns types to each typed array object according to the allocation site, in the same way as for normal arrays.  While there won't really be property type differences between these and the prototype's generic 'new' type object, this will allow us to identify singleton typed arrays, infer where they are used and (eventually) bake addresses related to them into jitcode.  This should be very helpful for code that uses typed arrays to model memory, like emscripten and emulators.

http://hg.mozilla.org/projects/jaegermonkey/rev/0767b119a1c8
Attached patch WIP v3 (obsolete) — Splinter Review
Removed the TI hacks and duplicate typed array load/store code. Separated the conversion/storing code in jsop_setelem_typed. Regalloc is much better now, for GETELEM/SETELEM we won't evict values already in registers. No perf changes for the benchmarks I tried though. Next week I'll experiment with part 2, LICM for typed arrays.
Attachment #539274 - Attachment is obsolete: true
Attached patch LICM v1Splinter Review
Extending the dense array code to handle typed arrays turned out to be quite straight-forward. This hoists JSOP_LENGTH and for GETELEM/SETELEM the bounds-check and the data vector.

For the benchmark numbers in comment 8:

js -m -n LICM   :  140 ms
d8              :  203 ms or 271 ms
js -m -n no-LICM:  251 ms
js -j           :  613 ms
js -m           : 2879 ms (TM-tree)

Something like a[x] = 1 (when hoisted) is a single instruction for an Int32Array:

movl  $0x1, 0(%esi,%edi,2)
> Something like a[x] = 1 (when hoisted) is a single instruction for an
> Int32Array:
> 
> movl  $0x1, 0(%esi,%edi,2)

Nice! :)
Attached patch Part 1 (obsolete) — Splinter Review
I'd like to get this part in first, after this is in and green on tinderbox I'll rebase the LICM patch and ask for review on that too (it's a much smaller patch).
Attachment #541451 - Attachment is obsolete: true
Attachment #543773 - Flags: review?(bhackett1024)
Comment on attachment 543773 [details] [diff] [review]
Part 1

Review of attachment 543773 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, though we will definitely need to fuzz test the typed arrays well.

::: js/src/methodjit/BaseAssembler.h
@@ +866,5 @@
>  
> +    // Load a value from a typed array's packed data vector.
> +    template <typename T>
> +    void loadFromTypedArray(int atype, T address, MaybeRegisterID typeReg,
> +                            AnyRegisterID dataReg)

A comment about what combinations of typeReg/dataReg this function accepts would be good.

@@ +895,5 @@
> +            load32(address, dataReg.reg());
> +            if (typeReg.isSet())
> +                move(ImmType(JSVAL_TYPE_INT32), typeReg.reg());
> +            break;
> +          case js::TypedArray::TYPE_UINT32:

A comment about why Uint32 will always have a type register would be good; the logic is pretty tricky wrt the testing done in getelem_typed.  Also, this path looks pretty slow when getelem_typed is accessing a Uint32Array that has produced double values before --- we convert the uint32 to an FP register, break it into normal registers, then convert it back to an FP register when pushing.  Not sure how hard this would be to fix, but allowing Uint32 to load directly into an FP register would be cool.

::: js/src/methodjit/FastOps.cpp
@@ +1193,5 @@
>      stubcc.rejoin(Changes(2));
>  }
>  
> +static inline int32
> +ClampIntForUint8Array(int32 x)

This should be in typedarrayinlines.h

@@ +1259,5 @@
> +             */
> +            MaybeRegisterID reg;
> +            if (atype == TypedArray::TYPE_INT8 || atype == TypedArray::TYPE_UINT8 ||
> +                atype == TypedArray::TYPE_UINT8_CLAMPED) {
> +                reg = frame.allocReg(Registers::SingleByteRegs).reg();

Does this need to happen for INT8 and UINT8 if the entry is a known int32 already in one of the single byte regs?

@@ +1367,5 @@
> +
> +    // Convert the value.
> +    frame.unpinEntry(vr);
> +    bool allocated;
> +    convertForTypedArray(atype, value, &vr, &allocated);

If the id and value have the same backing then the id is not pinned here and its register could be clobbered.  Normally I would just filter such cases out before the path, but 'a[i] = i' seems like a case that could come up in practice so it's probably beset to make sure the id gets pinned here.
Attachment #543773 - Flags: review?(bhackett1024) → review+
First fuzzer regression with this patch (options -j -m -n -a):

---

function testTypedArrayOther() {
    var ar = new Int32Array;
    for (var Infinity=0; i < ar; ++i) ar[i]=i;
    for (var i = 0; ; ) {}
}
testTypedArrayOther();

---

Assertion failure: fe_ != NULL, at ./methodjit/FrameState.h:223
Another regression here (options -j -m -n -a):

---

F = function () {};
F.prototype = new ArrayBuffer;
ArrayBuffer = F;
ArrayBuffer.prototype.prop = "on prototype";
var b = new ArrayBuffer;
(b.prop, "on prototype")

---

Assertion failure: proto, at jsobj.cpp:5333
Thanks again, decoder. I fixed the test in comment 15 locally and I'll add the test. The test in comment 16 is a pre-existing problem, I can reproduce it without the patch.
Attached patch Part 1 v2 (obsolete) — Splinter Review
Fix review comments and fuzz bugs found so far. Will ask for re-review tomorrow because the patch changed quite a bit, but first I'll give it some more testing and double check everything.
Attachment #543773 - Attachment is obsolete: true
Attached patch Part 1 v3 (obsolete) — Splinter Review
Regalloc for INT8/UINT8 is more efficient now. I was afraid it would complicate the code too much but I rewrote the regalloc code a bit and I think this version makes more sense. This also loads uint32 directly into an FP-register if there's no int32 type-barrier.
Attachment #544009 - Attachment is obsolete: true
Attachment #544309 - Flags: review?(bhackett1024)
Comment on attachment 544309 [details] [diff] [review]
Part 1 v3

Review of attachment 544309 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/methodjit/FastOps.cpp
@@ +1253,5 @@
> +                                 atype == TypedArray::TYPE_UINT8 ||
> +                                 atype == TypedArray::TYPE_UINT8_CLAMPED);
> +
> +            if (!value->isType(JSVAL_TYPE_INT32) || atype == TypedArray::TYPE_UINT8_CLAMPED) {
> +                // x86 has 4 single byte regs. Worst case we've allocated 3

we've allocated => we're pinning
Attachment #544309 - Flags: review?(bhackett1024) → review+
Comment on attachment 544309 [details] [diff] [review]
Part 1 v3

Review of attachment 544309 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/methodjit/FastOps.cpp
@@ +1897,5 @@
> +        barrier = testBarrier(typeReg.reg(), dataReg.reg(), false);
> +    } else {
> +        frame.pushTypedPayload(JSVAL_TYPE_INT32, dataReg.reg());
> +    }
> +    stubcc.rejoin(Changes(2));

This doesn't look right in the case where the read value is out of range.  Generally, known types used pushed by the compiler should match those which have been inferred, as here we may push a value as an integer or double even if it is known as possibly undefined (even though undefined values are only produced by the stub, when we rejoin we will effectively coerce them to the type which the compiler pushed).
Attached patch Part 1 v4Splinter Review
Argh how embarrassing. I tried to outsmart knownPushedType in getelem_typed by looking at the array type but it's too complicated. I simplified getelem_typed, added the AddTypePropertyId we talked about on IRC and added some comments, asserts and tests. Sorry for asking for another review, I hope the interdiff works..
Attachment #544309 - Attachment is obsolete: true
Attachment #544497 - Flags: review?(bhackett1024)
Comment on attachment 544497 [details] [diff] [review]
Part 1 v4

Review of attachment 544497 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #544497 - Flags: review?(bhackett1024) → review+
http://hg.mozilla.org/projects/jaegermonkey/rev/0a10e83c2b3a

ARM is red, need to #ifdef getelem/setelem code.. Also M1 orange in WebGL tests caused by a FrameState regalloc assert. The reason for this is not obvious so tomorrow I will build the browser, debug and post a follow-up patch.
(In reply to comment #24)
> Also M1 orange in WebGL
> tests caused by a FrameState regalloc assert. The reason for this is not
> obvious

Ehm I just realized what's happening here. More evidence that our coverage of typed arrays is really bad, the fact that there are like 9 types probably doesn't help either (only fails with Uint8Array/Int8Array).
Attached patch Follow-upSplinter Review
Renamed JS_POLYIC_TYPED_ARRAY to JS_METHODJIT_TYPED_ARRAY and used it to disable typed array GETELEM/SETELEM on ARM. Unfortunately this may require you to run configure again before the typed array code works.

Also fixed the regalloc problem: tempRegInMaskForData could fail because we pinned the old data reg in setelem_typed and unpinned it after the convertForTypedArray call.
Attachment #545367 - Flags: review?(bhackett1024)
Comment on attachment 545367 [details] [diff] [review]
Follow-up

Review of attachment 545367 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #545367 - Flags: review?(bhackett1024) → review+
http://hg.mozilla.org/projects/jaegermonkey/rev/b67c42403458

Tinderbox is mostly green now, the remaining orange seem unrelated. Filed bug 671084 for the LICM patch.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 649202
You need to log in before you can comment on or make changes to this bug.