TypedArray constructor inlining doesn't always seem to work as expected
Categories
(Core :: JavaScript Engine: JIT, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox66 | --- | fixed |
People
(Reporter: anba, Assigned: anba)
Details
Attachments
(3 files, 4 obsolete files)
Test case 1: --- function f() { var q = 0; var t1 = dateNow(); for (var i = 0; i < 1000000; ++i) { var ta = Int32Array.from([0]); q += ta[0]; } var t2 = dateNow(); for (var i = 0; i < 1000000; ++i) { var ta = Int32Array.of(0); q += ta[0]; } var t3 = dateNow(); return [t2 - t1, t3 - t2, q]; } // warm-up for (var i = 0; i < 5; ++i) f(); var [from, of] = f(); print("Int32Array.from:", from); print("Int32Array.of:", of); --- Expected: Int32Array.of shouldn't be slower than Int32Array.from Actual: Int32Array.of is multiple times slower Test case 2: --- function f() { var q = 0; var t1 = dateNow(); for (var i = 0; i < 1000000; ++i) { var ta = new Int32Array(1); q += ta[0]; } var t2 = dateNow(); for (var i = 0; i < 1000000; ++i) { var ta = Reflect.construct(Int32Array, [1]); q += ta[0]; } var t3 = dateNow(); return [t2 - t1, t3 - t2, q]; } // warm-up for (var i = 0; i < 5; ++i) f(); var [newed, constructed] = f(); print("new Int32Array:", newed); print("Reflect.construct:", constructed); --- Expected: |new Int32Array| shouldn't be slower than Reflect.construct Actual: |new Int32Array| is multiple times slower Similar effects are visible in µ-benchmarks with other %TypedArray%.prototype methods.
Updated•7 years ago
|
Comment 1•7 years ago
|
||
Hi! As I don't have much knowledge about this subject (but I wish to improve it), I'll need some help to understand this. Someone can help me? I converted the |arguments| param to object and it speeded up the tests cases. Following some results: ---- Test 1: before: Int32Array.from: 56.223876953125 Int32Array.of: 227.778076171875 after: Int32Array.from: 51.428955078125 Int32Array.of: 74.174072265625 Test 2: before: new Int32Array: 201.118896484375 Reflect.construct: 35.287109375 after (almost the same result): new Int32Array: 195.0068359375 Reflect.construct: 34.55810546875 ----- But I have some questions: First: which is the type of 'arguments' param? Second: Why it become very slow when we pass a vector to TypedArray.of([0])? Int32Array.from: 49.15380859375 Int32Array.of: 1208.989013671875 I think the code is slower on the assignment of the |newObj| (newObj[k] = items[k];) in TypedArrayStaticOf (TypedArray.js): Thanks!
Assignee | ||
Comment 2•7 years ago
|
||
(In reply to Victor Carlquist from comment #1) > First: which is the type of 'arguments' param? It's always an object. So if the `ToObject` call is properly handled in MCallOptimize (https://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/js/src/jit/MCallOptimize.cpp#3072), `ToObject` should be optimized away completely. > > Second: Why it become very slow when we pass a vector to TypedArray.of([0])? > Int32Array.from: 49.15380859375 > Int32Array.of: 1208.989013671875 `TypedArray.of([0])` will assign the array `[0]` into a typed array element, so it's basically the same as repeatedly evaluating `+[0]` which obviously slower than simply evaluating `+0`. ;-) > I think the code is slower on the assignment of the |newObj| (newObj[k] = > items[k];) in TypedArrayStaticOf (TypedArray.js): My guess is that the JIT-compiler has insufficient type information or miscompiles TypedArrayStaticOf for some other reason. So the next step is probably to check the code generation to find out if something goes awry when JIT compiling the code [1]. [1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Hacking_Tips#Using_IonMonkey_spew_(JS_shell)
Comment 3•7 years ago
|
||
(In reply to André Bargull [:anba] from comment #2) > > I think the code is slower on the assignment of the |newObj| (newObj[k] = > > items[k];) in TypedArrayStaticOf (TypedArray.js): > > My guess is that the JIT-compiler has insufficient type information or > miscompiles TypedArrayStaticOf for some other reason. So the next step is > probably to check the code generation to find out if something goes awry > when JIT compiling the code [1]. > > [1] > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/ > Hacking_Tips#Using_IonMonkey_spew_(JS_shell) Thanks André! Ok, I'll investigate it.
Comment 4•7 years ago
|
||
Hi André, I think that the issue is in the GetFrameArgument. I used the following test case: --- function f() { var q = 0; var ta = Int32Array.of(1); return ta[0]; } // warm-up for (var i = 0; i < 500; ++i) f(); --- This code showed-up a bailout at |getelem, MIR: typebarrier [103], LIR: typebarrierv [107]|. As we can see in the image the bailout occours on 'getframeargument'. The JIT is bailouting the CodeGenerator.cpp::visitGetFrameArgument because it's getting an Int32 instead of a Value from arguments. I think the rcx should contain a pointer to a Value, but it's getting an Int32. > 0xb7a28edac0a: int3 > 0xb7a28edac0b: mov 0x68(%rsp,%rax,8),%rcx > => 0xb7a28edac10: push %rax > > (gdb) info registers rcx > rcx 0xfff8800000000001 -2111062325329919 So, should we convert/wrap the argument's item into a Value in the GetFrameArgument? Thanks!
Comment 5•7 years ago
|
||
Assignee | ||
Comment 6•7 years ago
|
||
Good sleuthing! I'll ask :jandem and/or :nbp tomorrow about GetFrameArgument, they should be able to provide better input on how to proceed.
Comment 7•7 years ago
|
||
(In reply to Victor Carlquist from comment #4) > This code showed-up a bailout at |getelem, MIR: typebarrier [103], LIR: > typebarrierv [107]|. In case this was not clear in the documentation, these indexes in-between brackets are "supposed to" correspond to the pc (bytecode), the last MIR (iongraph), and the last LIR (iongraph). The name correspond to the name of the JSOP_*, MInstruction, and LInstruction. Does this bailout appears multiple times, or just once? > As we can see in the image the bailout occours on > 'getframeargument'. > The JIT is bailouting the CodeGenerator.cpp::visitGetFrameArgument because > it's getting an Int32 instead of a Value from arguments. This might be an issue in the reported log message, because visitGetFrameArgument [2] does not contain any calls to "bailout" or "bailoutFrom" functions of the code generator, and the reported LIR instruction is a MTypeBarrier / LTypeBarrierV. Then the "arguments" vector is not materialized. We do not have a JSValue to represent it in this case. It lives on the stack above the Jit frames. This is an optimization made to avoid the overhead of constructing an object. Thus, in the following assembly, $rax contains the "index" of the for-loop, which we are reading, and "$rsp + 0x68" is the offset form the stack pointer, at which the arguments vector is. > I think the rcx should contain a pointer to a Value, but it's getting an > Int32. > > > 0xb7a28edac0a: int3 > > 0xb7a28edac0b: mov 0x68(%rsp,%rax,8),%rcx This line corresponds to the code "arguments[i]", which is inlined from TypedArrayStaticOf [1], and generated by visitGetFrameArgument [2]. So we expect to read the content out of the arguments vector. > > (gdb) info registers rcx > > rcx 0xfff8800000000001 -2111062325329919 This is a JSValue for an int32 value which is 1. This sounds correct and expected, still the type-barrier should work fine, and not bailout. This might be an inverted condition in the code of the type barrier. Note, if this bailout only happened once, then this might not be the factor of 2 issues you are looking for, if the loop is running 1000000. [1] http://searchfox.org/mozilla-central/rev/a984558fa2bbde6492d3fb918496fc0b0835b2ce/js/src/builtin/TypedArray.js#1528,1542 [2] http://searchfox.org/mozilla-central/rev/a984558fa2bbde6492d3fb918496fc0b0835b2ce/js/src/jit/CodeGenerator.cpp#9296-9298
Comment 8•7 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #7) Thanks for the nice explanation. :) > Does this bailout appears multiple times, or just once? It appears just once. I thought when a Bailout occurred, the optimization is descarted and bailout doesn't occur anymore.
Comment 9•7 years ago
|
||
(In reply to Victor Carlquist from comment #8) > (In reply to Nicolas B. Pierron [:nbp] from comment #7) > > Does this bailout appears multiple times, or just once? > It appears just once. I thought when a Bailout occurred, the optimization is > descarted and bailout doesn't occur anymore. When an invalidation happens, such as with a type barrier, then the code is discarded. When the code is discarded, we are back in Baseline with new type information to give to the next Ion compilation of the same function. In iongraph, you should see that as multiple functions being compiled.
Comment 10•7 years ago
|
||
Hi, I think the |value| at [1] and the |index| at [2] should be constants, but they are not. [1] http://searchfox.org/mozilla-central/rev/a984558fa2bbde6492d3fb918496fc0b0835b2ce/js/src/jit/CodeGenerator.cpp#11171,11174 [2] http://searchfox.org/mozilla-central/rev/a984558fa2bbde6492d3fb918496fc0b0835b2ce/js/src/jit/CodeGenerator.cpp#9295,9300-9302
Comment 11•7 years ago
|
||
(In reply to Victor Carlquist from comment #10) > I think the |value| at [1] and the |index| at [2] should be constants, but > they are not. The index cannot be a constant, because this is the loop variable, over which we iterate, and we cannot bake-in the length of the arguments, because it it dynamic, unless we inline the caller.
Comment 12•7 years ago
|
||
Sorry for the big delay, I was very busy this month.... I think the TypedArray.From is faster than TypedArray.of because the TypedArray.From is using the same buffer (CopyOnWrite) of its parameter, we can see it in the attached iongraph, instructions 44 and 45. Now, the TypedArray.of needs to create a new TypedArray every time, becoming slower than TypedArray.From. The same occurs with the Reflect, it uses CoW too. So, I think it's not a bug, but, maybe, we could create a buffer to TypedArray.of to optimize it.
Comment 13•7 years ago
|
||
Test case 2 on bug's description. Reflect uses CoW too, at instructions 69 and 70.
Assignee | ||
Comment 14•6 years ago
|
||
This seems to be a case where we repeatedly fail to allocate the TypedArray in the nursery. I've added a call to printf here [1] when we fail to allocate in the nursery and in the slow case we're almost always fail to allocate in the nursery (~90,000 failures at 100,000 iterations). --- computeEffectiveAddress(Address(result, totalSize), temp); + + { + Label ok; + branchPtr(Assembler::AboveOrEqual, AbsoluteAddress(zone->addressOfNurseryCurrentEnd()), temp, &ok); + this->printf("failed nursery"); + bind(&ok); + } + branchPtr(Assembler::Below, AbsoluteAddress(zone->addressOfNurseryCurrentEnd()), temp, fail); --- But why do we constantly fail to nursery allocate the TypedArray object? It turns out we always allocate the TypedArray in the tenured heap when we hit the failure case, so the nursery is never cleaned up (?). Changing this line [2] to use |GenericObject| instead of |TenuredObject| fixes the nursery allocation issue. But I don't know if we can simply it to |GenericObject|. Unfortunately bug 1279992 comment #34, which changed the kind to |TenuredObject|, doesn't provide any information why |TenuredObject| is needed here. [1] https://searchfox.org/mozilla-central/rev/003262ae12ce937950ffb8d3b0fa520d1cc38bff/js/src/jit/MacroAssembler.cpp#867 [2] https://searchfox.org/mozilla-central/rev/003262ae12ce937950ffb8d3b0fa520d1cc38bff/js/src/vm/TypedArrayObject.cpp#651
Comment 15•6 years ago
|
||
(In reply to André Bargull [:anba] from comment #14) > It > turns out we always allocate the TypedArray in the tenured heap when we hit > the failure case, so the nursery is never cleaned up (?). Ugh, good catch. The template object itself should be tenured, but the object we allocate there is just a clone. Should be fine to change it to GenericObject.
Assignee | ||
Comment 16•6 years ago
|
||
Simply changing it to |GenericObject| triggers some assertions in |initTypedArrayData()| when the typed array is nursery allocated, but the raw-buffer is not in the nursery. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=777716b9680380f4625fefbc406ad90bc00f9c3e&selectedJob=170800456
Assignee | ||
Comment 17•5 years ago
|
||
I don't think the failing assertion [1] mentioned in comment #16 makes any sense, because the same situation can also apply in the non-OOL path [2]:
|Nursery::allocateBuffer| is not required to return a buffer which resides itself in the nursery, cf. [3]. That means the |!nursery.isInside(buf)| part of the assertion can be true in [2]. Furthermore |!tarray->hasInlineElements()| is always true in [2], because the newly created TypedArray doesn't have its elements point to its inline storage. So both pre-conditions of the assertion can be true in [2], but its consequent |tarray->isTenured()| is still false, because the new TypedArray is allocated from the nursery space.
Taking all that into account, it looks like the assertion is bogus and we don't need to care about it. And to fix this bug, we can simply call Nursery::allocate(Zeroed)Buffer (bug 1518101) in |makeTypedArrayWithTemplate|, similar to the code in |AllocateObjectBufferWithInit|.
Drive-by change:
- Directly return in |TypedArrayObject::objectMoved| when |buf| is nullptr to avoid unnecessary calls to |nursery.isInside| and |nursery.removeMallocedBuffer|.
[1] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/vm/TypedArrayObject.cpp#514-515
[2] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/jit/MacroAssembler.cpp#1194
[3] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/gc/Nursery.cpp#429
Interesting side-effect of properly nursery allocating TypedArray objects in the OOL path: Some µ-benchmarks are now considerably slower!
function f() {
var r = 0;
var t = dateNow();
for (var i = 0; i < 40000; ++i) {
var ta = new Int32Array(2650 + (i & 0x1));
r += ta.length;
}
return [dateNow()-t, r];
}
var t = dateNow()
for (var i = 0; i < 5; ++i) print(f());
print("total:", (dateNow() - t));
Prints ~3200ms for the "total" count before this change, but with the patch the "total" count is ~4200ms (where each iteration takes between 400ms and 1400ms).
This happens because of more lock contention when allocating/freeing memory: The TypedArray object allocation requests more than |MaxNurseryBufferSize| bytes [4], so the memory is malloc allocated and managed by the nursery in |Nursery::mallocedBuffers|. And nursery malloc memory is freed in a background thread [5]. That means for the µ-benchmark, the main thread is constantly acquiring and releasing the jemalloc memory lock when it is requesting new memory, while the background is constantly acquiring and releasing the same lock when freeing the memory.
When commenting out [5] to force main thread freeing of the malloced memory, the complete µ-benchmark completes in less than 300ms (with this patch applied).
[4] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/gc/Nursery.cpp#429-430
[5] https://searchfox.org/mozilla-central/rev/76fe4bb385348d3f45bbebcf69ba8c7283dfcec7/js/src/gc/Nursery.cpp#1017-1020
Comment 18•5 years ago
|
||
Comment on attachment 9034779 [details] [diff] [review] bug1394386.patch Review of attachment 9034779 [details] [diff] [review]: ----------------------------------------------------------------- Excellent. ::: js/src/vm/TypedArrayObject.cpp @@ +533,5 @@ > + > + void* buf = nullptr; > + if (!fitsInline) { > + MOZ_ASSERT(len > 0); > + nbytes = JS_ROUNDUP(nbytes, sizeof(Value)); We don't need to worry about overflow because nbytes must be <= INT32_MAX? We might change these limits at some point so to be extra explicit maybe add something like: MOZ_ASSERT(nbytes <= INT32_MAX, "JS_ROUNDUP must not overflow");
Assignee | ||
Comment 19•5 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #18)
We don't need to worry about overflow because nbytes must be <= INT32_MAX? We might change these limits at some point so to be extra explicit maybe add something like:
MOZ_ASSERT(nbytes <= INT32_MAX, "JS_ROUNDUP must not overflow");
Okay if I simply copy the check from [1] here, too?
Comment 20•5 years ago
|
||
(In reply to André Bargull [:anba] from comment #19)
Okay if I simply copy the check from [1] here, too?
Yes that's great, thanks!
Assignee | ||
Comment 21•5 years ago
|
||
Update patch to include assertion per review comments, carrying r+.
Assignee | ||
Comment 22•5 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d37d41cf36cd21d8c19b1a891d778eb647ba5f8b
Comment 23•5 years ago
|
||
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0140cfbc71db
Don't enforce tenure allocation for TypedArrays from inlined constructor ool-path. r=jandem
Comment 24•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Description
•