Closed Bug 1146597 Opened 5 years ago Closed 5 years ago

Use unboxed arrays for JSOP_NEWARRAY arrays

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: bhackett, Assigned: bhackett)

References

(Blocks 1 open bug)

Details

Attachments

(10 files, 2 obsolete files)

Bug 1106828 describes a possible layout to use for unboxed arrays.  Given the work that's happened since while doing unboxed plain objects, that layout can be improved.  The target now is:

[ group elements length capacityAndInitializedLength]

Where |elements| points directly to the unboxed array data, with no header involved.  The main trick here is how to compact information about the allocated capacity and the initialized length into 32 bits.  I think that using the lower 26 bits for the initialized length and the upper 6 bits as an index into an array of possible capacities should work.  This makes it slower to access these two quantities for an unboxed array vs. a native array, but the difference shouldn't be excessive and if it proves to be a problem for e.g. Ion compilation of code that keeps appending data to an array, we should be able to add new optimizations to keep things from slowing down.

The initial application for unboxed arrays will be plain array literals created by scripts.  Later we can use unboxed arrays in other places, like JSON arrays, 'new Array()', other pc-dependent array operations like split(), and so forth.  Also, initially unboxed arrays won't be able to cope with holes, though this will be relaxed later, at least for non-int32 element types.  In principle we could have unboxed arrays with holes and int31 values, though I don't know if the extra complexity/cost would be worth it.
Attached patch WIP (obsolete) — Splinter Review
WIP which adds unboxed arrays and baseline/Ion fast paths for getelem/setelem/setelem_hole.  This is almost done but fails on a few benchmarks.
Assignee: nobody → bhackett1024
Attached patch patchSplinter Review
Working patch.  This behaves correctly on all the major benchmarks but does not yet improve anything.  On microbenchmarks like:

    var n = 0;
    for (var i = 0; i < 1000000; i++) {
        var a = [i, 2, 3, 4, 5, 6, 7, 8];
        for (var j = 0; j < a.length; j++)
            n += a[j];
        n = 0;
    }

When running without ggc using unboxed arrays (--unboxed-arrays in the shell) improves my time from 125ms to 48ms, so the improvement of a smaller memory footprint is pretty nice.  There's still some weirdness here though, as for now unboxed arrays are being finalized in the foreground because background finalization more than triples my time on this benchmark (???).  I'm comparing without using GGC since for now unboxed arrays are not nursery allocatable (they have a finalizer) and fixing this will require some work on ggc.

Will break this patch up for review.
Attachment #8589740 - Attachment is obsolete: true
Add UnboxedArrayObject class and associated functionality, and update UnboxedLayout to work with unboxed arrays as well as unboxed plain objects.
Attachment #8590319 - Flags: review?(jdemooij)
Modify JSOP_NEWARRAY in a similar fashion to what was done for JSOP_NEWOBJECT and plain objects, so that we can mark the initial objects created at an opcode as preliminary and use an unboxed representation later if desired.  This also splits out a JSOP_SPREADCALLARRAY opcode, as spread calls have some requirements imposed by baseline that the object definitely be a normal dense array.
Attachment #8590322 - Flags: review?(jdemooij)
A few macro assembler changes to support unboxed arrays.
Attachment #8590324 - Flags: review?(jdemooij)
Handle unboxed arrays in ReceiverGuard, and handle unboxed arrays in the various routines used for compiling property accesses through prototype objects.
Attachment #8590325 - Flags: review?(jdemooij)
Attached patch baseline ICsSplinter Review
Add baseline caches for unboxed array GETELEM, SETELEM, SETELEM_HOLE, and length accesses.
Attachment #8590327 - Flags: review?(jdemooij)
There are a lot more ways that Ion can compile accesses to unboxed arrays vs. unboxed plain objects.  This patch makes the mechanism used for identifying and triggering aborts during Ion compilation based on accesses to object groups with unanalyzed preliminary objects.
Attachment #8590331 - Flags: review?(jdemooij)
Attached patch ion fast pathsSplinter Review
Add fast paths to Ion for GETELEM, SETELEM, SETELEM_HOLE, and length accesses on unboxed arrays.
Attachment #8590332 - Flags: review?(jdemooij)
Comment on attachment 8590331 [details] [diff] [review]
fix preliminary object abort mechanism

The content of this patch is required to fix bug 1149498, so asking for review there.
Attachment #8590331 - Flags: review?(jdemooij)
Attachment #8590331 - Attachment is obsolete: true
Comment on attachment 8590319 [details] [diff] [review]
basic functionality

Review of attachment 8590319 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, r=me with comments below addressed.

::: js/src/vm/UnboxedObject.cpp
@@ +496,5 @@
> +        // The length shape to use for arrays is cached via a modified initial
> +        // shape for array objects. Create an array now to make sure this entry
> +        // is instantiated.
> +        if (!NewDenseEmptyArray(cx))
> +            return false;

Hm, allocating a new, unused array whenever we convert an array to native seems a bit hackish. Is there a way to avoid this or do this only when necessary, maybe |if (shape->isEmptyShape())| below?

@@ +525,5 @@
>          return false;
>  
>      // Propagate all property types from the old group to the new group.
> +    if (layout.isArray()) {
> +        PropagatePropertyTypes(cx, JSID_VOID, group, nativeGroup);

PropagatePropertyTypes is fallible.

@@ +530,5 @@
> +    } else {
> +        for (size_t i = 0; i < layout.properties().length(); i++) {
> +            const UnboxedLayout::Property& property = layout.properties()[i];
> +            jsid id = NameToId(property.name);
> +            PropagatePropertyTypes(cx, id, group, nativeGroup);

And here.

@@ +1014,5 @@
> +
> +        size_t capacity = (GetGCKindBytes(allocKind) - offsetOfInlineElements()) / elementSize;
> +        res->setCapacityIndex(exactCapacityIndex(capacity));
> +    } else {
> +        uint8_t* elements = cx->zone()->pod_malloc<uint8_t>(length * elementSize);

I think this can be

UniquePtr<uint8_t[], JS::FreePolicy> elements(...);

Then we can remove the js_free below and use res->elements_ = elements.release();

@@ +1036,5 @@
> +
> +bool
> +UnboxedArrayObject::setElement(ExclusiveContext* cx, size_t index, const Value& v)
> +{
> +    uint8_t* p = elements() + index * elementSize();

Can we assert index < initLength here and in initElement, getElement?

@@ +1045,5 @@
> +UnboxedArrayObject::initElement(ExclusiveContext* cx, size_t index, const Value& v)
> +{
> +    uint8_t* p = elements() + index * elementSize();
> +    if (elementType() == JSVAL_TYPE_STRING || elementType() == JSVAL_TYPE_OBJECT)
> +        *reinterpret_cast<void**>(p) = nullptr;

Is this to avoid bogus pre-barriers? This is clearer:

if (UnboxedTypeNeedsPreBarrier(elementType()))

@@ +1220,5 @@
> +    MOZ_ASSERT(cap <= newCapacity);
> +
> +    uint8_t* newElements;
> +    if (hasInlineElements()) {
> +        newElements = cx->zone()->pod_malloc<uint8_t>(newCapacity * elementSize());

This can't overflow because MaximumCapacity doesn't come close to UINT32_MAX, right?

We can static_assert MaximumCapacity < UINT32_MAX / sizeof(double)

::: js/src/vm/UnboxedObject.h
@@ +112,5 @@
>          js_free(traceList_);
>      }
>  
> +    bool isArray() {
> +        return properties_.empty();

Maybe |return elementType_ != JSVAL_TYPE_MAGIC;|, for symmetry with initArray?
Attachment #8590319 - Flags: review?(jdemooij) → review+
Comment on attachment 8590322 [details] [diff] [review]
new array opcodes

Review of attachment 8590322 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BaselineIC.h
@@ +1739,5 @@
>      friend class ICStubSpace;
>  
> +    HeapPtrObject templateObject_;
> +
> +    ICNewArray_Fallback(JitCode* stubCode)

Make this |explicit|.

::: js/src/jit/CodeGenerator.cpp
@@ +4075,5 @@
>      }
>  };
>  
> +typedef JSObject* (*NewArrayOperationFn)(JSContext*, HandleScript, jsbytecode*, uint32_t,
> +                                            NewObjectKind);

Nit: fix indentation.
Attachment #8590322 - Flags: review?(jdemooij) → review+
Comment on attachment 8590324 [details] [diff] [review]
macro assembler changes

Review of attachment 8590324 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the issue below fixed.

::: js/src/jit/MacroAssembler.cpp
@@ +1021,5 @@
> +    branchKey(Assembler::BelowOrEqual, lengthAddr, index, failure);
> +    jump(&done);
> +    bind(&capacityIsIndex);
> +    and32(Imm32(~0x3), temp);
> +    rshiftPtr(Imm32(UnboxedArrayObject::CapacityShift - 2), temp);

I think this will clear the 2 lowest bits of the initialized length, but we'll shift those off anyway. Seems like we should either s/~0x3/CapacityMask/ or swap these two lines.

We should probably static_assert sizeof(UnboxedArrayObject::CapacityArray[0]) == 4 to make it more obvious what's going on.
Attachment #8590324 - Flags: review?(jdemooij) → review+
Attachment #8590325 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #11)
> ::: js/src/vm/UnboxedObject.cpp
> @@ +496,5 @@
> > +        // The length shape to use for arrays is cached via a modified initial
> > +        // shape for array objects. Create an array now to make sure this entry
> > +        // is instantiated.
> > +        if (!NewDenseEmptyArray(cx))
> > +            return false;
> 
> Hm, allocating a new, unused array whenever we convert an array to native
> seems a bit hackish. Is there a way to avoid this or do this only when
> necessary, maybe |if (shape->isEmptyShape())| below?

This is a bit hackish, yeah, but this is actually only done once for each distinct group that has objects converted to natives, so trying to optimize this case more didn't seem to make much sense.
Attachment #8590327 - Flags: review?(jdemooij) → review+
Comment on attachment 8590332 [details] [diff] [review]
ion fast paths

Review of attachment 8590332 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/VMFunctions.cpp
@@ +1111,4 @@
>  {
>      // This function is called from Ion code for StoreElementHole's OOL path.
>      // In this case we know the object is native and we can use setDenseElement
>      // instead of setDenseElementWithType.

Nit: update the comment.
Attachment #8590332 - Flags: review?(jdemooij) → review+
It seems this patch regressed a lot of tests on AWFY
Flags: needinfo?(bhackett1024)
Backed out:

https://hg.mozilla.org/integration/mozilla-inbound/rev/15c5ec8215f7
Flags: needinfo?(bhackett1024)
(In reply to Nicolas B. Pierron [:nbp] from comment #19)
> Are you sure you did not backed out too many things?
> 
> http://arewefastyet.com/#machine=28&view=single&suite=misc&subtest=basic-
> strcat&start=1430321582&end=1430388445

This regression seems to have fixed itself.
Attached patch perf fixesSplinter Review
This patch fixes all the performance regressions on AWFY in ss/kraken/octane from the earlier patches.  It also includes some unboxed array improvements so that appending elements to an array in a loop is faster with unboxed arrays than normal arrays.
Attachment #8600197 - Flags: review?(jdemooij)
This is a patch I wrote some time back in concert with the patch just posted.  It adds an analysis pass when unboxed array setelem_holes are performed to see if it can reuse the computation of the array capacity done in each setelem, by adding a loop phi that is only recomputed when a VM call is made and the array capacity might change.  This improves setelem_hole loops but not that much, maybe 5%.  Since with the previous patch setelem_hole loops are a lot faster on unboxed arrays anyways, I don't think the complexity in this patch is necessary for now, and just posting it for posterity in case we revisit this issue later.
Comment on attachment 8600197 [details] [diff] [review]
perf fixes

Review of attachment 8600197 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BaselineCompiler.cpp
@@ +1830,5 @@
>      if (key == JSProto_Array) {
>          // Pass length in R0.
>          masm.move32(Imm32(0), R0.scratchReg());
>  
> +        ObjectGroup* group = ObjectGroup::allocationSiteGroup(cx, script, pc, JSProto_Array);

Should we do the same for JSOP_NEWOBJECT?

::: js/src/jit/BaselineInspector.h
@@ +116,5 @@
>      JSObject* getTemplateObject(jsbytecode* pc);
>      JSObject* getTemplateObjectForNative(jsbytecode* pc, Native native);
>      JSObject* getTemplateObjectForClassHook(jsbytecode* pc, const Class* clasp);
>  
> +    // Sometimes the group a template object will have is known, even if the object itself isn't.

Nit: this is more than 80 chars I think.

::: js/src/vm/TypeInference.cpp
@@ +643,5 @@
>  void
>  TypeSet::print(FILE* fp)
>  {
> +    if (!fp)
> +        fp = stderr;

Nit: is this needed, considering the fp argument defaults to stderr?
Attachment #8600197 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #23)
> ::: js/src/jit/BaselineCompiler.cpp
> @@ +1830,5 @@
> >      if (key == JSProto_Array) {
> >          // Pass length in R0.
> >          masm.move32(Imm32(0), R0.scratchReg());
> >  
> > +        ObjectGroup* group = ObjectGroup::allocationSiteGroup(cx, script, pc, JSProto_Array);
> 
> Should we do the same for JSOP_NEWOBJECT?

Probably, though I haven't seen any tests that are affected by this.  The pattern here is where an array is created once and then written/read in some long loops, and that doesn't seem to happen much with plain objects.

> ::: js/src/vm/TypeInference.cpp
> @@ +643,5 @@
> >  void
> >  TypeSet::print(FILE* fp)
> >  {
> > +    if (!fp)
> > +        fp = stderr;
> 
> Nit: is this needed, considering the fp argument defaults to stderr?

I added this because my lldb is unable to figure out the default argument when calling print(), and doesn't seem to understand what stderr is either.
The second push above is a followup to fix a redundant check of the runtime unboxed object/array option, which was breaking the JS_OPTION_USE_UNBOXED_OBJECTS environment variable.
https://hg.mozilla.org/mozilla-central/rev/020c6a559e3a
https://hg.mozilla.org/mozilla-central/rev/5b82ebe679e6
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Awfy detected improvements \0/, but also some regressions:

mac 32bit, no asmjs: http://arewefastyet.com/regressions/#/regression/1457055
- none

mac 64bit, no amsjs: http://arewefastyet.com/regressions/#/regression/1457080
- asmjs-ubench-skinning: 0.39% regression

mac 32bit, ion: http://arewefastyet.com/regressions/#/regression/1457050
- kraken-crypto-pbkdf2: 1.81%

mac 64bit, ion: http://arewefastyet.com/regressions/#/regression/1457069
- kraken-crypto-pbkdf2: 1.81%
- ss 1.0.1: unpack-code: 4.46%
- ss 1.0.1: tagcloud: 3.05%
- misc 0.4: bugs-608733-trace-compiler: 1.98%
- misc 0.4 : bugs-1131099-lodash2: 2.49%
- misc 0.4 : bugs-1131099-underscore: 0.3%

Was this expected? And can we do something about these regressions?
I'll take a look at the regressions sometime in the next couple days.
Flags: needinfo?(bhackett1024)
Gary and decoder, this bug added a --unboxed-arrays option to the shell which would be great for fuzzing.  As with --unboxed-objects, this tests a feature that is not on by default anywhere.
Depends on: 1161346
Brian, coverity considers that this change can introduce a Division or modulo by zero (cid 1296731) on this line:
https://dxr.mozilla.org/mozilla-central/source/js/src/vm/UnboxedObject.cpp?from=js/src/vm/UnboxedObject.cpp#1035

If you think that is correct, I can report a new bug if you want.
(In reply to Sylvestre Ledru [:sylvestre] from comment #32)
> Brian, coverity considers that this change can introduce a Division or
> modulo by zero (cid 1296731) on this line:
> https://dxr.mozilla.org/mozilla-central/source/js/src/vm/UnboxedObject.
> cpp?from=js/src/vm/UnboxedObject.cpp#1035
> 
> If you think that is correct, I can report a new bug if you want.

This is a false positive.
(In reply to Brian Hackett (:bhackett) from comment #33)
> This is a false positive.
Thanks, ignored in coverity then. Thanks!
Attached patch patchSplinter Review
With cachegrind and perf I don't see a difference in instruction counts on string-tagcloud or string-unpack-code, but krake-stanford-crypto-pbkdf2 does execute about 2.5% more instructions.  This seems to be because of the hasUnanalyzedPreliminaryObjects() call in TryReuseArrayGroup, which calls the outlined function ObjectGroup::maybeSweep several times.  This patch avoids calling maybeSweep() and seems to fix the instruction count issue.

There are a couple bigger issues here this patch doesn't address.  It would be nice if the initial test in maybeSweep() was inlined, but this is annoying to do because of include file issues.  Also, it would be good to optimize the involved natives (array_slice mainly I think) from Ion.
Flags: needinfo?(bhackett1024)
Attachment #8603883 - Flags: review?(jdemooij)
Attachment #8603883 - Flags: review?(jdemooij) → review+
Depends on: 1193543
You need to log in before you can comment on or make changes to this bug.