Add a fast path for arrays to ForOfIterator

RESOLVED FIXED in mozilla30

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: bzbarsky, Assigned: djvj)

Tracking

({perf})

unspecified
mozilla30
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 7 obsolete attachments)

I tried using ForOfIterator in DOM code for sequences instead of the current combination of JS_GetArrayLength and JS_GetElement.

The result is quite slow.  On the testcase at http://hg.mozilla.org/users/bjacob_mozilla.com/webgl-perf-tests/raw-file/tip/uniform-float-taking-js-array.html my time changes from about 30ms to about 260ms.  The profile says it's all being spent in the code that inits the iterator and traverses it, complete with init triggering jit bailouts, etc.

Is it possible to quickly detect cases in which an array object has no funny business going on wrt for-of iteration and just do something fast for it?  I know our impl just lets anyone stick a "@@iterator" prop on any array, but does the spec allow that sort of configurability for actual Array objects?
Flags: needinfo?(jorendorff)
Posted patch experiment (obsolete) — Splinter Review
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
That experiment makes performance look good for me.  No idea whether we can sanely make it faster (e.g. by using the fact that we know our object is an ArrayObject) or whether we should be using the ESClass_Array check instead or whatnot....
Assignee: bzbarsky → nobody
Posted patch experimentSplinter Review
This makes us only regress webidl sequence performance by 10%, instead of 6x, while allowing all iterables for them.

Some questions:

1)  Should we allow this optimization for anything ArrayValues as @@iterator or at least all subclasses of Array?  We'd need a side-effect-free way of getting the length for all objects that support this optimization, but otherwise the generic getElement call works great.

2)  If we restrict the optimization to arrays, should we replace the getElement call with something faster?

3)  Is is<ArrayObject>() the right test?  See comments earlier in this bug.

4)  The general "check that @@iterator is ArrayValues" thing is a bit fragile, but the only other option I see is flagging a compartment if its Array.prototype["@@iterator"] is modified and flagging array objects if they get a new proto or if their @@iterator is modified.  Or something.
Attachment #8355378 - Flags: review?(jorendorff)
Attachment #8351120 - Attachment is obsolete: true
Assignee: nobody → bzbarsky
(In reply to Boris Zbarsky [:bz] from comment #3)
> 1)  Should we allow this optimization for anything ArrayValues as @@iterator
> or at least all subclasses of Array?  We'd need a side-effect-free way of
> getting the length for all objects that support this optimization, but
> otherwise the generic getElement call works great.

Offhand, I'd say let's stick with arrays for now.

> 2)  If we restrict the optimization to arrays, should we replace the
> getElement call with something faster?

Yes.

> 3)  Is is<ArrayObject>() the right test?

Yes.

> 4)  The general "check that @@iterator is ArrayValues" thing is a bit
> fragile, but the only other option I see is flagging a compartment if its
> Array.prototype["@@iterator"] is modified and flagging array objects if they
> get a new proto or if their @@iterator is modified.  Or something.

Yes, this can be optimized if needed.

Unfortunately we must also guard against ArrayIteratorProto.next changing, and it can change at any time, if you call back into the engine. I don't think this is hard to fix or slow, just sad... rather a lot of code for a corner case that will basically never come up in practice.
Flags: needinfo?(jorendorff)
> Unfortunately we must also guard against ArrayIteratorProto.next changing

<sigh>.

What's the right replacement for getElement here, since we know we have an array?
(In reply to Boris Zbarsky [:bz] from comment #5)
> What's the right replacement for getElement here, since we know we have an
> array?

See GetElement in js/src/jsarray.cpp.
> See GetElement in js/src/jsarray.cpp.

I did consider that, but it's not exposed outside jsarray.cpp and I was slightly trying to minimize surgery... and wasn't sure what the right way to expose it is.
Posted patch pic-for-of-wip1.patch (obsolete) — Splinter Review
Started working on this with bz's and jorendorff's blessing.

WIP patch for a C++-land PIC-based optimization for this.  Needs to be hooked into logic, and also needs marking code to play well with GC.. but the meat should be there.

So.. much... guarding....
Updated patch and plugged into iterator behaviour.  Seems to be working.  Sample numbers on two shell tests:

function foo() {
    var n = 0;
    for (var i = 0; i < arguments.length; i++)
        n += arguments[i];
    return n;
}

Test 1:
    var sum = 0;
    for (var i = 0; i < 100000; i++) {
        sum += foo(0, ...[1, 2, 3]);
    }
    return sum;


Test 2:
    var sum = 0;
    var arr = new Array(200);
    for (var i = 0; i < 200; i++)
        arr[i] = i;

    for (var i = 0; i < 10000; i++)
        sum += foo(0, ...arr);

    return sum;

Numbers:
  Test 1 Without PIC - 410ms
  Test 1 With PIC - 120ms
    About 3.4x speedup

  Test 2 Without PIC - 593ms
  Test 2 With PIC - 182ms
    About 3.2x speedup
Posted patch pic-for-of-wip2.patch (obsolete) — Splinter Review
Updated patch.  Added jsiter integration, got prelim numbers that look good (see comment above).

Still need to add: ArrayIterator object reconstitution, GC tracing, control growth of PIC stub chain.
Attachment #8356851 - Attachment is obsolete: true
And tests for all the fun edge cases...
Assignee: bzbarsky → kvijayan
Posted patch pic-for-of-wip3.patch (obsolete) — Splinter Review
Added ArrayIterator object reconstitution (when ArrayIterator.prototype.next changes during iteration).

Added code to control growth of PIC stub chain (for now just clears the stub chain and starts adding new stubs).

Added GC marking support.  Stubs hold their shapes weakly and are removed if the referenced shapes become collected.

Did some manual testing, seems to hang together OK.  Should be good for prelim review after try run (and while I write test cases).

Seems to be about a 3x to 3.5x improvement in for-of iteration over arrays.

TODO: Change stub format to allow a single stub to reference multiple optimized shapes, so we don't have the inefficiency of a linked list of small stubs.
Attachment #8357350 - Attachment is obsolete: true
Attachment #8355378 - Flags: review?(jorendorff)
Posted patch pic-for-of-wip4.patch (obsolete) — Splinter Review
WIP patch 4.  Moves the PIC onto the global object instead of the compartment.  Clears chain on every GC.  Passing jit-tests, running on try:

https://tbpl.mozilla.org/?tree=Try&rev=f048ecf9f7f6
Attachment #8357481 - Attachment is obsolete: true
Re-ran the linux try since it looks like the first one went awry, looks good:

https://tbpl.mozilla.org/?tree=Try&rev=55af5c45f27e
Posted patch pic-for-of-wip5.patch (obsolete) — Splinter Review
Added more optimizations, marked some fastpath C++ functions as inline.  Generally improves performance from previous patch by about 20%.

Waiting on try:

https://tbpl.mozilla.org/?tree=Try&rev=544013c04db5
Attachment #8364749 - Attachment is obsolete: true
Posted patch pic-for-of-final.patch (obsolete) — Splinter Review
Finally.  Green on try (the windows breakage is a build farm issue):

https://tbpl.mozilla.org/?tree=Try&rev=60ac2345ccf0&pusher=kvijayan@mozilla.comsd
Attachment #8365342 - Attachment is obsolete: true
Attachment #8366821 - Flags: review?(jorendorff)
Kannan: with your green try results, is there anything blocking your ForOfIterator patch from landing?
OS: Mac OS X → All
Hardware: x86 → All
Summary: Can we add a fast path for arrays to ForOfIterator → Add a fast path for arrays to ForOfIterator
Waiting for review.
Comment on attachment 8366821 [details] [diff] [review]
pic-for-of-final.patch

Review of attachment 8366821 [details] [diff] [review]:
-----------------------------------------------------------------

Stylistically, I have an annoyingly vague criticism: the way this patch is factored means there are a lot of calls from one file to another, a lot of small functions, and a fair amount of scaffolding. Reading it, I kept finding that the code I wanted to look at was elsewhere.

I think it'd be easier to take in if it were factored to have more straight line code; I'll say more about this below. But YMMV. Use your judgment.

Below I ask for two additional tests, so I'm clearing the r? bit for now. Please ping me when you get this revised and I'll review it immediately.

::: js/src/jit-test/tests/collections/Map-constructor-5.js
@@ +5,3 @@
>  assertThrowsInstanceOf(function () { Map([undefined]); }, TypeError);
>  assertThrowsInstanceOf(function () { Map([null]); }, TypeError);
> +*/

Why is some stuff commented out in this test?

::: js/src/jsiter.cpp
@@ +1299,5 @@
> +            index = 0;
> +            iterator = iterableObj;
> +            return true;
> +        }
> +    }

OK. Here's an example of what I mean about factoring.  This code calls into the PIC three times, but it's really only one operation. As a result each call here is doing redundant processing to recover information that was known at some point during the previous call. It would read better like this:

    if (iterableObj->is<ArrayObject>()) {
        ForOfPIC::Chain pic = ForOfPIC::getOrCreate(cx);
        if (!pic)
            return false;

        bool canGoFast;
        if (!pic->tryOptimizedIteration(cx, iterableObj, &canGoFast))
            return false;
        if (canGoFast) {
            iterator = iterableObj;
            index = 0;
            return true;
        }
    }

even without comments, and the code in ForOfPIC could be more direct.

@@ +1373,5 @@
> +    if (index != NOT_ARRAY) {
> +        ForOfPIC::Chain *stubChain = ForOfPIC::getOrCreate(cx);
> +        if (!stubChain)
> +            return false;
> +        

Delete whitespace on this blank line.

@@ +1435,5 @@
> +    if (!Invoke(cx, args))
> +        return false;
> +
> +    // Result of call to ArrayValuesAt must be an object.
> +    JS_ASSERT(args.rval().isObject());

The comment is redundant. (The assertion is also redundant since we'll call args.rval().toObject() a few lines below.)

::: js/src/jsiter.h
@@ +236,4 @@
>   * and the failure is allowed to propagate on cx, as in this example if DoStuff
>   * fails. In that case, ForOfIterator's destructor does all necessary cleanup.
>   */
>  class ForOfIterator

If in unbitrotting this patch, you end up wondering where this went, it's in jsapi.h. But you probably knew about that...

@@ +239,5 @@
>  class ForOfIterator
>  {
>    private:
>      JSContext *cx;
>      RootedObject iterator;

Please comment this field, now that it's not necessarily an iterator anymore.

@@ +240,5 @@
>  {
>    private:
>      JSContext *cx;
>      RootedObject iterator;
> +    uint32_t arrayLength;

The array length can't be stored. Per spec, if it changes during the loop, we have to notice the change.

This is specified in %ArrayIteratorPrototype%.next, which gets the "length" property from the array each time it is called (step 8), rather than using a cached value from when the iterator was created:

http://people.mozilla.org/~jorendorff/es6-draft.html#sec-%arrayiteratorprototype%-next

(Add a pair of tests for this, one that truncates the array and one that grows it.)

@@ +241,5 @@
>    private:
>      JSContext *cx;
>      RootedObject iterator;
> +    uint32_t arrayLength;
> +    uint32_t index;

Same here, since this isn't necessarily an index.

As Fred Brooks said, "Show me your tables..."

::: js/src/vm/PIC.cpp
@@ +44,5 @@
> +    disabled_ = true;
> +
> +    // Look up '@@iterator' on Array.prototype, ensure it's a slotful shape.
> +    Shape *iterShape = arrayProto->nativeLookup(cx, cx->names().std_iterator);
> +    if (!iterShape || !iterShape->hasSlot())

Also check that iterShape->hasDefaultGetter().

It would be impossible for script alone to make that fail, but you never know what someone will do via the JSAPI... :-P

@@ +57,5 @@
> +        return true;
> +
> +    // Look up the 'next' value on ArrayIterator.prototype
> +    Shape *nextShape = arrayIteratorProto->nativeLookup(cx, cx->names().next);
> +    if (!nextShape || !nextShape->hasSlot())

Check nextShape->hasDefaultGetter().

@@ +81,5 @@
> +
> +js::ForOfPIC::Stub *
> +js::ForOfPIC::Chain::isArrayOptimized(JSObject *obj)
> +{
> +    JS_ASSERT(obj->is<ArrayObject>());

Consider ArrayObject * for the argument type here and elsewhere, instead of asserting. This is only called from ForOfIterator::init, which does an as<ArrayObject>() already anyway...

@@ +121,5 @@
> +    if (disabled_)
> +        return true;
> +
> +    // By the time we get here, we should have a sane array state to work with.
> +    JS_ASSERT(isArrayStateStillSane());

I think if refactored as proposed, this assertion wouldn't be necessary (the preceding check would clearly dominate this line of code), and therefore isArrayStateStillSane would only have one call site, and therefore it could be manually inlined here.

(Not so for isArrayNextStillSane, which must be called from ForOfIterator::nextFromOptimizedArray.)

@@ +219,5 @@
> +    initialized_ = false;
> +}
> +
> +void
> +js::ForOfPIC::Chain::eraseChain(JSContext *cx)

cx is unused.

@@ +228,5 @@
> +    // Free all stubs.
> +    Stub *stub = stubs_;
> +    while (stub) {
> +        Stub *next = stub->next();
> +        js_free(stub);

js_delete, even though

@@ +235,5 @@
> +    stubs_ = nullptr;
> +}
> +
> +
> +// Trace the pointers stored direclty on the stub.

spelling ("direclty")

@@ +238,5 @@
> +
> +// Trace the pointers stored direclty on the stub.
> +void
> +js::ForOfPIC::Chain::mark(JSTracer *trc)
> +{   

trailing whitespace

@@ +286,5 @@
> +const Class ForOfPIC::jsclass = {
> +    "ForOfPIC", JSCLASS_HAS_PRIVATE | JSCLASS_IMPLEMENTS_BARRIERS,
> +    JS_PropertyStub, JS_DeletePropertyStub, JS_PropertyStub, JS_StrictPropertyStub,
> +    JS_EnumerateStub, JS_ResolveStub, JS_ConvertStub, ForOfPIC_finalize,
> +    nullptr,              /* checkAccess */

checkAccess is gone, woo

::: js/src/vm/PIC.h
@@ +33,5 @@
> +{
> +  friend class PICChain<Category>;
> +  private:
> +    typedef typename Category::Stub CatStub;
> +    typedef typename Category::Chain CatChain;

Sigh. Well, I would say "you aren't gonna need it" to all the template stuff in here, but I hope you do need it. PICs make everything better.

@@ +108,5 @@
> +        } else {
> +            JS_ASSERT(stub == stubs_);
> +            stubs_ = stub->next();
> +        }
> +        js_free(stub);

js_free should be js_delete, I think.

I guess in practice, this is only used to kill all the stubs at once. Consider instead:

    void clear() {
        while (CatStub *stub = stubs_) {
            stubs_ = stub->next();
            js_delete(stub);
        }
    }

@@ +136,5 @@
> +    {
> +      private:
> +        // Shape of matching array object.
> +        Shape *shape_;
> +        

trailing whitespace

@@ +147,5 @@
> +        }
> +
> +        Shape *&shape() {
> +            return shape_;
> +        }

This doesn't need to return a reference. Only the value is used.

@@ +151,5 @@
> +        }
> +    };
> +
> +    /*
> +     * A ForOfPIC chain holds some additional flags.

Flags?

@@ +198,5 @@
> +
> +        // Initialize the canonical iterator function.
> +        bool initialize(JSContext *cx);
> +
> +        // Check if a given array object is optimized by theis PIC.

spelling ("theis")

@@ +220,5 @@
> +            if (arrayIteratorProto_->getSlot(arrayIteratorProtoNextSlot_) != canonicalNextFunc_)
> +                return false;
> +
> +            return true;
> +        }

I think I'd just put it all on 2 lines:

    return arrayIteratorProto_->lastProperty() == arrayIteratorProtoShape_ &&
           && arrayIteratorProto_->getSlot(arrayIteratorProtoNextSlot_) == canonicalNextFunc_;

I don't think the comments particularly help here; YMMV.

::: js/src/vm/SelfHosting.cpp
@@ +1105,5 @@
> +
> +bool
> +js::IsSelfHostedFunctionWithName(JSFunction *fun, JSAtom *name)
> +{
> +    return fun->isSelfHostedBuiltin() && fun->getExtendedSlot(0).toString() == name;

While you're in here, please replace this magic 0 with a constant SH_NAME_SLOT or something (also in 1 other place in this file, and in jsfun.cpp).
Attachment #8366821 - Flags: review?(jorendorff)
Actually I bet you won't need all that template stuff, even if we do add more C++ PICs. I'd be happier if you killed it. Again, your call.
(In reply to Jason Orendorff [:jorendorff] from comment #21)
> Actually I bet you won't need all that template stuff, even if we do add
> more C++ PICs. I'd be happier if you killed it. Again, your call.

I was going to talk with you on IRC about this.  I'm not married to the template boilerplate in the implementation, but this seemed to be the best future-proof approach.  My general reasoning:

For extensibility purposes (adding new PICs), we'd like to split out the specific structure of an IC (chain of stubs), from the type-specific details of what chains keep what kind of ICs.  The "canonical" OO implementation of this would use base classes with virtual methods, or a base class with a discriminator type field and manually implemented, switch-based dispatch methods on the base class.  These would be used for |trace| and |free| implementations, for example.

The template implementation I chose uses the base class approach, but parameterizes the base class such that the derived types are the ones used to define the shape of the chain (see CatStub and CatChain typedefs).

If we're trying to keep the PIC core reusable for later, I prefer the latter approach.  Dynamic dispatch feels like the wrong thing to do here.
Yeah, virtual functions would be crazy. By all means keep the templates if they're going to be useful down the road.
Updated with review comments addressed, some cleanup, and tests added.
Attachment #8366821 - Attachment is obsolete: true
Attachment #8374878 - Flags: review?(jimb)
Comment on attachment 8374878 [details] [diff] [review]
pic-for-of-postreview-1.patch

Review of attachment 8374878 [details] [diff] [review]:
-----------------------------------------------------------------

Stealing review. r=me with the comments addressed.

::: js/src/jsapi.h
@@ +4842,5 @@
> +     *  Case 2: Optimized Array Iteration
> +     *      iterator - pointer to the array object.
> +     *      index - current position in array.
> +     *
> +     * The cases are distinguished by whether or not |index| is equal to NOT_ARRAY.

Nice, thanks.

::: js/src/jsfun.cpp
@@ +1249,5 @@
>      }
>  
>      /* Lazily cloned self-hosted script. */
>      JS_ASSERT(fun->isSelfHostedBuiltin());
> +    RootedAtom funAtom(cx, &fun->getExtendedSlot(/*SH_NAME_SLOT*/0).toString()->asAtom());

Please either use a real `enum {SH_NAME_SLOT = 0};` or remind me why it isn't the right thing to do...

(If you feel like adding a getter-setter pair with appropriate assertions, r=me on that too.)

::: js/src/jsiter.cpp
@@ +1345,5 @@
> +
> +    JS_ASSERT(iterator->isNative());
> +    JS_ASSERT(iterator->is<ArrayObject>());
> +
> +    if (index == iterator->as<ArrayObject>().length()) {

>=, not ==

I asked for tests covering this case last time, but it seems you'd have to write a jsapi-test; none of the existing ForOfIterator call sites can cause the array's length to change from the body of the loop.

Your call whether that is worth your time.

::: js/src/tests/ecma_6/Array/for_of_1.js
@@ +1,2 @@
> +// Test corner cases of for-of iteration over Arrays.
> +// The current spidermonky JSOP_SPREAD implementation for function calls

Spelling ("spidermonky")
Attachment #8374878 - Flags: review?(jimb) → review+
(In reply to Jason Orendorff [:jorendorff] from comment #25)
> Please either use a real `enum {SH_NAME_SLOT = 0};` or remind me why it
> isn't the right thing to do...

It's absolutely the right thing to do :)  This was a hack to get the patch to work with a rev which didn't have SH_NAME_SLOT defined.  Thanks for noticing, will fix.

> (If you feel like adding a getter-setter pair with appropriate assertions,
> r=me on that too.)

I do.  I didn't think too much about it because changing literal 0 to a symbolic constant is a bit tangential to the patch.


> > +    if (index == iterator->as<ArrayObject>().length()) {
> 
> >=, not ==

I have internal conflicts regarding this.  On the one hand, '>=' is more robust.  On the other hand, it conveys the impression that it's possible for 'index > length' to hold.  Is it OK if I add a JS_ASSERT(index <= length) and leave it as EQ instead of LE?

> 
> I asked for tests covering this case last time, but it seems you'd have to
> write a jsapi-test; none of the existing ForOfIterator call sites can cause
> the array's length to change from the body of the loop.

This should be possible by defining an appropriate indexed getter that appends to the array.
(In reply to Kannan Vijayan [:djvj] from comment #26)
> > > +    if (index == iterator->as<ArrayObject>().length()) {
> > 
> > >=, not ==
> 
> I have internal conflicts regarding this.  On the one hand, '>=' is more
> robust.  On the other hand, it conveys the impression that it's possible for
> 'index > length' to hold.

Isn't that possible?

> > I asked for tests covering this case last time, but it seems you'd have to
> > write a jsapi-test; none of the existing ForOfIterator call sites can cause
> > the array's length to change from the body of the loop.
> 
> This should be possible by defining an appropriate indexed getter that
> appends to the array.

Darn it! Tests, then!
(In reply to Jason Orendorff [:jorendorff] from comment #27)
> > I have internal conflicts regarding this.  On the one hand, '>=' is more
> > robust.  On the other hand, it conveys the impression that it's possible for
> > 'index > length' to hold.
> 
> Isn't that possible?

Oh right, length mutation.  It's like a blind spot or something.  I keep forgetting about it.
After comments addressed, last try run looks good:
https://tbpl.mozilla.org/?tree=Try&rev=41006e20527f

I removed the code abstracting access to SelfHosted functions' names.  It's a bit tangential, and we need to resolve bug 972087 before doing that properly.  We shouldn't add accessors until we can add them with the proper asserts.
https://hg.mozilla.org/mozilla-central/rev/2aa181731593
https://hg.mozilla.org/mozilla-central/rev/0f18070eb5d6
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
You need to log in before you can comment on or make changes to this bug.