Closed Bug 1394386 Opened 7 years ago Closed 5 years ago

TypedArray constructor inlining doesn't always seem to work as expected

Categories

(Core :: JavaScript Engine: JIT, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: anba, Assigned: anba)

Details

Attachments

(3 files, 4 obsolete files)

Test case 1:
---
function f() {
    var q = 0;

    var t1 = dateNow();
    for (var i = 0; i < 1000000; ++i) {
        var ta = Int32Array.from([0]);
        q += ta[0];
    }
    var t2 = dateNow();
    for (var i = 0; i < 1000000; ++i) {
        var ta = Int32Array.of(0);
        q += ta[0];
    }
    var t3 = dateNow();

    return [t2 - t1, t3 - t2, q];
}

// warm-up
for (var i = 0; i < 5; ++i) f();

var [from, of] = f();
print("Int32Array.from:", from);
print("Int32Array.of:", of);
---

Expected: Int32Array.of shouldn't be slower than Int32Array.from
Actual: Int32Array.of is multiple times slower




Test case 2:
---
function f() {
    var q = 0;
    
    var t1 = dateNow();
    for (var i = 0; i < 1000000; ++i) {
        var ta = new Int32Array(1);
        q += ta[0];
    }
    var t2 = dateNow();
    for (var i = 0; i < 1000000; ++i) {
        var ta = Reflect.construct(Int32Array, [1]);
        q += ta[0];
    }
    var t3 = dateNow();

    return [t2 - t1, t3 - t2, q];
}

// warm-up
for (var i = 0; i < 5; ++i) f();

var [newed, constructed] = f();
print("new Int32Array:", newed);
print("Reflect.construct:", constructed);
---

Expected: |new Int32Array| shouldn't be slower than Reflect.construct
Actual: |new Int32Array| is multiple times slower



Similar effects are visible in µ-benchmarks with other %TypedArray%.prototype methods.
Priority: -- → P2
Hi! 
As I don't have much knowledge about this subject (but I wish to improve it), I'll need some help to understand this. Someone can help me?

I converted the |arguments| param to object and it speeded up the tests cases.
Following some results:
----
Test 1:
before:
Int32Array.from: 56.223876953125
Int32Array.of: 227.778076171875

after:
Int32Array.from: 51.428955078125
Int32Array.of: 74.174072265625

Test 2:
before:
new Int32Array: 201.118896484375
Reflect.construct: 35.287109375

after (almost the same result):
new Int32Array: 195.0068359375
Reflect.construct: 34.55810546875
-----

But I have some questions:
First: which is the type of 'arguments' param?

Second: Why it become very slow when we pass a vector to TypedArray.of([0])? 
Int32Array.from: 49.15380859375
Int32Array.of: 1208.989013671875

I think the code is slower on the assignment of the |newObj| (newObj[k] = items[k];) in  TypedArrayStaticOf (TypedArray.js):

Thanks!
(In reply to Victor Carlquist from comment #1)
> First: which is the type of 'arguments' param?

It's always an object. So if the `ToObject` call is properly handled in MCallOptimize (https://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/js/src/jit/MCallOptimize.cpp#3072), `ToObject` should be optimized away completely. 

> 
> Second: Why it become very slow when we pass a vector to TypedArray.of([0])? 
> Int32Array.from: 49.15380859375
> Int32Array.of: 1208.989013671875

`TypedArray.of([0])` will assign the array `[0]` into a typed array element, so it's basically the same as repeatedly evaluating `+[0]` which obviously slower than simply evaluating `+0`. ;-)


> I think the code is slower on the assignment of the |newObj| (newObj[k] =
> items[k];) in  TypedArrayStaticOf (TypedArray.js):

My guess is that the JIT-compiler has insufficient type information or miscompiles TypedArrayStaticOf for some other reason. So the next step is probably to check the code generation to find out if something goes awry when JIT compiling the code [1].

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Hacking_Tips#Using_IonMonkey_spew_(JS_shell)
(In reply to André Bargull [:anba] from comment #2)

> > I think the code is slower on the assignment of the |newObj| (newObj[k] =
> > items[k];) in  TypedArrayStaticOf (TypedArray.js):
> 
> My guess is that the JIT-compiler has insufficient type information or
> miscompiles TypedArrayStaticOf for some other reason. So the next step is
> probably to check the code generation to find out if something goes awry
> when JIT compiling the code [1].
> 
> [1]
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/
> Hacking_Tips#Using_IonMonkey_spew_(JS_shell)

Thanks André!
Ok, I'll investigate it.
Hi André,

I think that the issue is in the GetFrameArgument.

I used the following test case:
---
function f() {
    var q = 0;
    var ta = Int32Array.of(1);
    return ta[0];
}

// warm-up
for (var i = 0; i < 500; ++i) f();
---

This code showed-up a bailout at |getelem, MIR: typebarrier [103], LIR: typebarrierv [107]|. As we can see in the image the bailout occours on 'getframeargument'. 
The JIT is bailouting the CodeGenerator.cpp::visitGetFrameArgument because it's getting an Int32 instead of a Value from arguments.

I think the rcx should contain a pointer to a Value, but it's getting an Int32.

>    0xb7a28edac0a: int3   
>    0xb7a28edac0b: mov    0x68(%rsp,%rax,8),%rcx
> => 0xb7a28edac10: push   %rax
>
> (gdb) info registers rcx
> rcx            0xfff8800000000001   -2111062325329919


So, should we convert/wrap the argument's item into a Value in the GetFrameArgument?

Thanks!
Attachment #8916843 - Attachment is obsolete: true
Good sleuthing! I'll ask :jandem and/or :nbp tomorrow about GetFrameArgument, they should be able to provide better input on how to proceed.
(In reply to Victor Carlquist from comment #4)
> This code showed-up a bailout at |getelem, MIR: typebarrier [103], LIR:
> typebarrierv [107]|.

In case this was not clear in the documentation, these indexes in-between brackets are "supposed to" correspond to the pc (bytecode), the last MIR (iongraph), and the last LIR (iongraph).  The name correspond to the name of the JSOP_*, MInstruction, and LInstruction.

Does this bailout appears multiple times, or just once?

> As we can see in the image the bailout occours on
> 'getframeargument'.
> The JIT is bailouting the CodeGenerator.cpp::visitGetFrameArgument because
> it's getting an Int32 instead of a Value from arguments.

This might be an issue in the reported log message, because visitGetFrameArgument [2] does not contain any calls to "bailout" or "bailoutFrom" functions of the code generator, and the reported LIR instruction is a MTypeBarrier / LTypeBarrierV.

Then the "arguments" vector is not materialized.  We do not have a JSValue to represent it in this case.  It lives on the stack above the Jit frames.  This is an optimization made to avoid the overhead of constructing an object.

Thus, in the following assembly, $rax contains the "index" of the for-loop, which we are reading, and "$rsp + 0x68" is the offset form the stack pointer, at which the arguments vector is.

> I think the rcx should contain a pointer to a Value, but it's getting an
> Int32.
> 
> >    0xb7a28edac0a: int3   
> >    0xb7a28edac0b: mov    0x68(%rsp,%rax,8),%rcx

This line corresponds to the code "arguments[i]", which is inlined from TypedArrayStaticOf [1], and generated by visitGetFrameArgument [2].   So we expect to read the content out of the arguments vector.

> > (gdb) info registers rcx
> > rcx            0xfff8800000000001   -2111062325329919

This is a JSValue for an int32 value which is 1.

This sounds correct and expected, still the type-barrier should work fine, and not bailout.  This might be an inverted condition in the code of the type barrier.

Note, if this bailout only happened once, then this might not be the factor of 2 issues you are looking for, if the loop is running 1000000.

[1] http://searchfox.org/mozilla-central/rev/a984558fa2bbde6492d3fb918496fc0b0835b2ce/js/src/builtin/TypedArray.js#1528,1542
[2] http://searchfox.org/mozilla-central/rev/a984558fa2bbde6492d3fb918496fc0b0835b2ce/js/src/jit/CodeGenerator.cpp#9296-9298
(In reply to Nicolas B. Pierron [:nbp] from comment #7)

Thanks for the nice explanation. :)
 
> Does this bailout appears multiple times, or just once?
It appears just once. I thought when a Bailout occurred, the optimization is descarted and bailout doesn't occur anymore.
(In reply to Victor Carlquist from comment #8)
> (In reply to Nicolas B. Pierron [:nbp] from comment #7)
> > Does this bailout appears multiple times, or just once?
> It appears just once. I thought when a Bailout occurred, the optimization is
> descarted and bailout doesn't occur anymore.

When an invalidation happens, such as with a type barrier, then the code is discarded.
When the code is discarded, we are back in Baseline with new type information to give to the next Ion compilation of the same function.

In iongraph, you should see that as multiple functions being compiled.
(In reply to Victor Carlquist from comment #10)
> I think the |value| at [1] and the |index| at [2] should be constants, but
> they are not.

The index cannot be a constant, because this is the loop variable, over which we iterate, and we cannot bake-in the length of the arguments, because it it dynamic, unless we inline the caller.
Sorry for the big delay, I was very busy this month....

I think the TypedArray.From is faster than TypedArray.of because the TypedArray.From is using the same buffer (CopyOnWrite) of its parameter, we can see it in the attached iongraph, instructions 44 and 45.

Now, the TypedArray.of needs to create a new TypedArray every time, becoming slower than TypedArray.From.

The same occurs with the Reflect, it uses CoW too.

So, I think it's not a bug, but, maybe, we could create a buffer to TypedArray.of to optimize it.
Attachment #8919061 - Attachment is obsolete: true
Attachment #8919062 - Attachment is obsolete: true
Test case 2 on bug's description.
Reflect uses CoW too, at instructions 69 and 70.
This seems to be a case where we repeatedly fail to allocate the TypedArray in the nursery. 

I've added a call to printf here [1] when we fail to allocate in the nursery and in the slow case we're almost always fail to allocate in the nursery (~90,000 failures at 100,000 iterations). 

---
     computeEffectiveAddress(Address(result, totalSize), temp);
+
+    {
+        Label ok;
+        branchPtr(Assembler::AboveOrEqual, AbsoluteAddress(zone->addressOfNurseryCurrentEnd()), temp, &ok);
+        this->printf("failed nursery");
+        bind(&ok);
+    }
+
     branchPtr(Assembler::Below, AbsoluteAddress(zone->addressOfNurseryCurrentEnd()), temp, fail);
---

But why do we constantly fail to nursery allocate the TypedArray object? It turns out we always allocate the TypedArray in the tenured heap when we hit the failure case, so the nursery is never cleaned up (?). Changing this line [2] to use |GenericObject| instead of |TenuredObject| fixes the nursery allocation issue. But I don't know if we can simply it to |GenericObject|. Unfortunately bug 1279992 comment #34, which changed the kind to |TenuredObject|, doesn't provide any information why |TenuredObject| is needed here.


[1] https://searchfox.org/mozilla-central/rev/003262ae12ce937950ffb8d3b0fa520d1cc38bff/js/src/jit/MacroAssembler.cpp#867
[2] https://searchfox.org/mozilla-central/rev/003262ae12ce937950ffb8d3b0fa520d1cc38bff/js/src/vm/TypedArrayObject.cpp#651
(In reply to André Bargull [:anba] from comment #14)
> It
> turns out we always allocate the TypedArray in the tenured heap when we hit
> the failure case, so the nursery is never cleaned up (?).

Ugh, good catch. The template object itself should be tenured, but the object we allocate there is just a clone. Should be fine to change it to GenericObject.
Simply changing it to |GenericObject| triggers some assertions in |initTypedArrayData()| when the typed array is nursery allocated, but the raw-buffer is not in the nursery.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=777716b9680380f4625fefbc406ad90bc00f9c3e&selectedJob=170800456
Attached patch bug1394386.patch (obsolete) — Splinter Review

I don't think the failing assertion [1] mentioned in comment #16 makes any sense, because the same situation can also apply in the non-OOL path [2]:

|Nursery::allocateBuffer| is not required to return a buffer which resides itself in the nursery, cf. [3]. That means the |!nursery.isInside(buf)| part of the assertion can be true in [2]. Furthermore |!tarray->hasInlineElements()| is always true in [2], because the newly created TypedArray doesn't have its elements point to its inline storage. So both pre-conditions of the assertion can be true in [2], but its consequent |tarray->isTenured()| is still false, because the new TypedArray is allocated from the nursery space.

Taking all that into account, it looks like the assertion is bogus and we don't need to care about it. And to fix this bug, we can simply call Nursery::allocate(Zeroed)Buffer (bug 1518101) in |makeTypedArrayWithTemplate|, similar to the code in |AllocateObjectBufferWithInit|.

Drive-by change:

  • Directly return in |TypedArrayObject::objectMoved| when |buf| is nullptr to avoid unnecessary calls to |nursery.isInside| and |nursery.removeMallocedBuffer|.

[1] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/vm/TypedArrayObject.cpp#514-515
[2] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/jit/MacroAssembler.cpp#1194
[3] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/gc/Nursery.cpp#429


Interesting side-effect of properly nursery allocating TypedArray objects in the OOL path: Some µ-benchmarks are now considerably slower!

function f() {
    var r = 0;
    var t = dateNow();
    for (var i = 0; i < 40000; ++i) {
        var ta = new Int32Array(2650 + (i & 0x1));
        r += ta.length;
    }
    return [dateNow()-t, r];
}
var t = dateNow()
for (var i = 0; i < 5; ++i) print(f());
print("total:", (dateNow() - t));

Prints ~3200ms for the "total" count before this change, but with the patch the "total" count is ~4200ms (where each iteration takes between 400ms and 1400ms).

This happens because of more lock contention when allocating/freeing memory: The TypedArray object allocation requests more than |MaxNurseryBufferSize| bytes [4], so the memory is malloc allocated and managed by the nursery in |Nursery::mallocedBuffers|. And nursery malloc memory is freed in a background thread [5]. That means for the µ-benchmark, the main thread is constantly acquiring and releasing the jemalloc memory lock when it is requesting new memory, while the background is constantly acquiring and releasing the same lock when freeing the memory.

When commenting out [5] to force main thread freeing of the malloced memory, the complete µ-benchmark completes in less than 300ms (with this patch applied).

[4] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/gc/Nursery.cpp#429-430
[5] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/gc/Nursery.cpp#1017-1020

Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
Attachment #9034779 - Flags: review?(jdemooij)
Comment on attachment 9034779 [details] [diff] [review]
bug1394386.patch

Review of attachment 9034779 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent.

::: js/src/vm/TypedArrayObject.cpp
@@ +533,5 @@
> +
> +    void* buf = nullptr;
> +    if (!fitsInline) {
> +      MOZ_ASSERT(len > 0);
> +      nbytes = JS_ROUNDUP(nbytes, sizeof(Value));

We don't need to worry about overflow because nbytes must be <= INT32_MAX? We might change these limits at some point so to be extra explicit maybe add something like:

  MOZ_ASSERT(nbytes <= INT32_MAX, "JS_ROUNDUP must not overflow");
Attachment #9034779 - Flags: review?(jdemooij) → review+

(In reply to Jan de Mooij [:jandem] from comment #18)

We don't need to worry about overflow because nbytes must be <= INT32_MAX? We might change these limits at some point so to be extra explicit maybe add something like:

MOZ_ASSERT(nbytes <= INT32_MAX, "JS_ROUNDUP must not overflow");

Okay if I simply copy the check from [1] here, too?

[1] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/js/src/jit/MacroAssembler.cpp#1191

(In reply to André Bargull [:anba] from comment #19)

Okay if I simply copy the check from [1] here, too?

[1] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/js/src/jit/MacroAssembler.cpp#1191

Yes that's great, thanks!

Attached patch bug1394386.patchSplinter Review

Update patch to include assertion per review comments, carrying r+.

Attachment #9034779 - Attachment is obsolete: true
Attachment #9035687 - Flags: review+

Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0140cfbc71db
Don't enforce tenure allocation for TypedArrays from inlined constructor ool-path. r=jandem

Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: