Open Bug 961977 Opened 11 years ago Updated 5 months ago

IonMonkey: Consider Inlining Array.prototype.slice

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

x86
macOS
defect

Tracking

()

People

(Reporter: isk, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(4 files)

Attached file array_slice_bench.js
I'm attaching a shell test case.

d8: 631ms
SM: 2145ms

SM is slower than d8.
I think this is because Array#slice is not inlined.
Status: NEW → ASSIGNED
Do you think that based on profiling, or something else?
Just wondering, what are the results if you replace arr.slice() by res.slice(), and do res = arr.slice() before the loop?
(In reply to Boris Zbarsky [:bz] from comment #1)
> Do you think that based on profiling, or something else?
Sorry I don't have exactly basis.
I think inlining Array#slice can be fast because It is similar to MStringSplit. 
But I don't have any profiling which indicate the cause is it.
(In reply to Nicolas B. Pierron [:nbp] from comment #2)
> Just wondering, what are the results if you replace arr.slice() by
> res.slice(), and do res = arr.slice() before the loop?

I measured it.
The both result is the same.
SM: 900ms
node: 490ms
Assignee: iseki.m.aa → nobody
Blocks: sm-js-perf
Status: ASSIGNED → NEW
Keywords: perf
Priority: -- → P3
don't forget the ability to do Array==Array.slice()
Severity: normal → S3

Hello!

Could I work on this bug?

Hi!

We did some initial work on Array.slice in Warp in bug 1658279. We're apparently still slower than V8, though, so maybe there's room to improve it.

Looking at the testcase, it looks like it's doing array.slice() on an array with holes in it. We currently only have CacheIR support Array.slice for densely packed arrays (and the arguments object), which we implement by calling into ArraySliceDense (from here and here).

The low-hanging fruit here might already have been picked. If you're still interested, feel free to work on it, but it might not be an especially easy place to start.

Thanks Ianin for the references!

I'll try to work on this without to ask to much help. If I see it's to much for me, I'll try to work on something easier :)

Hi!

I found a strange behavior that I still doesn't understand why.

When we set the index 1 with the constant 22 we get 2,222 seconds and when we set the index 53 we get 3,142 seconds.

But the MIR code is the same.

function f() {
var arr = new Array(64);
arr[1] = 22;
for (var i=0; i<50000000; ++i)
var res = arr.slice();
console.log(res)
return res;
}
f();


function f() {
var arr = new Array(64);
arr[52] = 22;
for (var i=0; i<50000000; ++i)
var res = arr.slice();
console.log(res)
return res;
}
f();

Simplifying slightly, an object inside SpiderMonkey is represented using three pointers: a pointer to a shape that describes the layout of named properties, a pointer to a slots array storing named properties, and a pointer to an elements array storing dense integer-indexed properties. So for example the following code:

var array = [1,2,3];
array.x = "a named property";

Would be represented as:

  • a pointer to a shape, which among other things says that property 'x' is in slot 0
  • a pointer to a slots array, which would contain the string value "a named property" in slot 0
  • a pointer to an elements array, which contains the three int32 values 1, 2, and 3.

We try to keep the elements array dense. Storing array[10] = 10 would store 10 in the 10th element of the elements array (reallocating if necessary), then fill the gap between 3 and 10 with a magic "hole" value (which exists for historical reasons and behaves mostly but not entirely like undefined). At this point the elements array is still dense, but because it has holes it is not packed. (Storing array[1000000] this way would require us to allocate a lot of memory, so instead we would store it as a sparse element, which lives in the slots array just like x.)

In your two examples, we will always be allocating a non-packed dense array for the slice, but in one case it will have one empty value before storing 22, and in the other it will have 52. This requires us to allocate and initialize significantly more memory, which explains why it takes longer.

Thank you for explanation Iain!

Please, see the patch at #comment 16.

I almost copy/paste all the ArrayPackedDense's code and changed some checking to allow the dense array unpacked to run in this. But I'm not sure if it's the best approach to address this bug.

We get some improvements but seems that when we need to alloc memory we are getting slow.

Thanks for the all explanation.

function f() {
var arr = new Array(64);
arr[1] = 22;
for (var i=0; i<50000000; ++i)
var res = arr.slice();
console.log(res)
return res;
}
f();

 V8        |    SM Before | SM After

~1,500s | ~1,530s | ~1,062s

function f() {
var arr = new Array(64);
arr[5] = 5 , arr[12] = 12, arr[53] = 43;
for (var i=0; i<50000000; ++i)
var res = arr.slice();
return res;
}
f();

 V8        |    SM Before | SM After

~1,478s | ~2,289s | ~1,998s

Thanks for the patches! Sorry for the delay in replying; I saw your post on the weekend, and forgot to circle back on Monday.

I found it a little surprising that the exact same approach would work for non-packed arrays as packed arrays, because otherwise anba would not have left them for later. After digging into it a little, I believe the problem (or at least one problem) is code like this:

for (var i = 0; i < 2000; i++) {
  var arr = [1,,3];
  arr.__proto__ = ["has", "proto", "elements"];
  var res = arr.slice();
  assertEq(res.hasOwnProperty("1"), true);
  assertEq(res[1], "proto");
}

In this case, if we slice an array that contains a hole, and the prototype of that array has an element at that index, then the sliced result must contain the value from the prototype. Looking at your patch, I think we'll end up with a hole instead, because we copy the elements from arr without looking at its prototype. Packed arrays don't have this problem, because they don't have any holes, so elements on the prototype will always be shadowed by elements of the array.

This is somewhat tricky to work around. One option would be to find a cheap way to guard that the prototype chain doesn't have any indexed properties, replace the current packed-array check with that guard, and only use this path if the prototype chain is normal. We don't currently have great tools for that, although the fuse work that Matt is doing might eventually help. CCing Matt to put this on his radar.

Another alternative would be to add a check somewhere around here that will take the slow path (call array_slice) if the array isn't packed, and anything on the prototype chain has an indexed property. Effectively we would still guard that length == initialized length in the IC, but instead of guarding that the non-packed flag isn't set (and failing if it is) we would immediately call into ArraySliceDense, which would use ArraySliceDenseKernel if:
a) the array is packed, or
b) the prototype chain of the array does not contain any indexed properties.
(We also have to make sure that the resulting array has the non-packed flag set if necessary; not sure whether your current patch does that and I just missed it.)

One bit of higher-level feedback on your approach is that in general these days we try to design our ICs to handle supersets of existing ICs, instead of being mutually exclusive. If we have one IC that only handles packed arrays, and another IC that only handles non-packed arrays, and the input contains a mix, then when we reach Ion we will have two separate ICs. We only transpile CacheIR to MIR if we have a single IC, so we will generate an Ion IC. If instead we have one fast IC that only handles packed arrays, and a slower IC that handles both packed and non-packed arrays, then if the input contains a mix, we'll attach the more general case, and when we reach Ion we'll have a single IC and generate better code. tl;dr if we go with a design that uses two different ICs for packed and non-packed arrays, the non-packed IC should also be able to handle packed arrays.

Hello Iain!

Currently the test doesn't fail because the methods IsPackedArrayDense and IsUnpackedArrayDense return false and the array is returned correctly, but we lost the optimization. I'll try to copy the propotype's chain and see how the performance will behave

Another alternative would be to add a check somewhere around here that will take the slow path (call array_slice) if the array isn't packed, and anything on the prototype chain has an indexed property. Effectively we would still guard that length == initialized length in the IC, but instead of guarding that the non-packed flag isn't set (and failing if it is) we would immediately call into ArraySliceDense, which would use ArraySliceDenseKernel if:
a) the array is packed, or
b) the prototype chain of the array does not contain any indexed properties.
(We also have to make sure that the resulting array has the non-packed flag set if necessary; not sure whether your current patch does that and I just missed it.)

Great! I'll try to add this checks, thank you!

One bit of higher-level feedback on your approach is that in general these days we try to design our ICs to handle supersets of existing ICs, instead of being mutually exclusive. If we have one IC that only handles packed arrays, and another IC that only handles non-packed arrays, and the input contains a mix, then when we reach Ion we will have two separate ICs. We only transpile CacheIR to MIR if we have a single IC, so we will generate an Ion IC. If instead we have one fast IC that only handles packed arrays, and a slower IC that handles both packed and non-packed arrays, then if the input contains a mix, we'll attach the more general case, and when we reach Ion we'll have a single IC and generate better code. tl;dr if we go with a design that uses two different ICs for packed and non-packed arrays, the non-packed IC should also be able to handle packed arrays.

I see, thank you for the feedback!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: