Closed Bug 1375505 Opened 7 years ago Closed 7 years ago

Improve for-in/GetIterator performance

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(10 files, 1 obsolete file)

47.81 KB, patch
evilpie
: review+
Details | Diff | Splinter Review
3.16 KB, patch
anba
: review+
Details | Diff | Splinter Review
2.25 KB, patch
anba
: review+
Details | Diff | Splinter Review
5.10 KB, patch
anba
: review+
Details | Diff | Splinter Review
1.23 KB, patch
anba
: review+
Details | Diff | Splinter Review
1.48 KB, patch
anba
: review+
Details | Diff | Splinter Review
7.93 KB, patch
anba
: review+
Details | Diff | Splinter Review
3.96 KB, patch
anba
: review+
Details | Diff | Splinter Review
57.09 KB, patch
evilpie
: review+
Details | Diff | Splinter Review
19.06 KB, patch
jonco
: review+
Details | Diff | Splinter Review
We have an Ion fast path for when the object matches the last cached iterator but if it fails we fall back to GetIterator and that path is much slower than necessary. This affects Speedometer's Elm benchmark.

I have a stack of patches to improve this (it also simplifies a lot of the iterator code).
This changes a lot of the iteration/enumeration code to return JSObject* instead of returning bool and using an outparam. It makes it easier to reason about this code.

It also means we can remove GetIteratorObject: Ion can now call GetIterator directly.
Attachment #8881940 - Flags: review?(evilpies)
Comment on attachment 8881940 [details] [diff] [review]
Part 1 - Return JSObject* instead of using an outparam

Review of attachment 8881940 [details] [diff] [review]:
-----------------------------------------------------------------

Nice
Attachment #8881940 - Flags: review?(evilpies) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a257dace8dae
part 1 - Change iterator code to return JSObject* instead of returning bool + outparam. r=evilpie
Keywords: leave-open
In js::GetIterator there's a goto miss; that we can eliminate by restructuring the code a bit.

Another patch in this stack will (IIRC) move this code into its own function, which seems nicer, but for now this will do.
Attachment #8884817 - Flags: review?(andrebargull)
Comment on attachment 8884817 [details] [diff] [review]
Part 2 - Eliminate a goto

Review of attachment 8884817 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM!
Attachment #8884817 - Flags: review?(andrebargull) → review+
In js::GetIterator we do this:

(1) Handle some weird cases (PropertyIteratorObject, LegacyGeneratorObject, proxies).
(2) Check the iterator cache.
...

We can swap (1) and (2) because the objects special-cased by (1) will never be in the cache. This will make things a bit slower for these particular objects but that seems to be a good trade-off.
Attachment #8884819 - Flags: review?(andrebargull)
Comment on attachment 8884819 [details] [diff] [review]
Part 3 - Check the cache first

Review of attachment 8884819 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense!
Attachment #8884819 - Flags: review?(andrebargull) → review+
The ReceiverGuard(JSObject*) constructor shows up a lot under GetIteratorObject in profiles. Not surprising because we call it at least once for every object on the proto chain.

This patch inlines these ReceiverGuard constructors. I also changed the ReceiverGuard(JSObject*) constructor a little: (1) we no longer null-check the argument because it's never nullptr and (2) I added a !obj->isNative() check so the native-object case is as fast as possible.

Just this patch improves the micro-benchmark below from ~105 ms to ~91 ms.

function f() {
    var o1 = {x: 1};
    var o2 = {y: 2};
    var t = new Date;
    for (var i = 0; i < 1000000; i++) {
        for (var x in o1) {}
        for (var x in o2) {}
    }
    print(new Date - t);
}
f();
Attachment #8884847 - Flags: review?(andrebargull)
Hm it just occurred to me we don't need obj->maybeShape() - we can just use obj->as<ShapedObject>().shape() directly because we handle all the non-shape cases first.
Attachment #8884855 - Flags: review?(andrebargull)
Comment on attachment 8884847 [details] [diff] [review]
Part 4 - Inline ReceiverGuard constructors

Review of attachment 8884847 [details] [diff] [review]:
-----------------------------------------------------------------

Sure, if it helps to speed things up.
Attachment #8884847 - Flags: review?(andrebargull) → review+
Attachment #8884855 - Flags: review?(andrebargull) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6a1e09c0985d
part 2 - Refactor GetIterator to get rid of a goto. r=anba
https://hg.mozilla.org/integration/mozilla-inbound/rev/eea331779104
part 3 - Check iterator cache before handling some slow cases. r=anba
https://hg.mozilla.org/integration/mozilla-inbound/rev/90a1595be9c1
part 4 - Inline ReceiverGuard constructors. r=anba
https://hg.mozilla.org/integration/mozilla-inbound/rev/09c3d0a30928
part 5 - Optimize ReceiverGuard(JSObject*) constructor more. r=anba
This rewrites CanCacheIterableObject to use multiple if-statements and adds some comments/asserts.
Attachment #8885622 - Flags: review?(andrebargull)
Comment on attachment 8885622 [details] [diff] [review]
Part 6 - Clean up CanCacheIterableObject

Review of attachment 8885622 [details] [diff] [review]:
-----------------------------------------------------------------

Hopefully bug 1098412 lands soon, so we can also remove the __iterator__ lookup.
Attachment #8885622 - Flags: review?(andrebargull) → review+
When we do an iterator cache lookup, we currently check if the object is a typed array, if there's an __iterator__ property, etc. This is unnecessary when there's a cache hit: the shape/group guards guarantee these things still hold and we only need to check CanCompareIterableObjectToCache.

This patch splits the lookup in two parts: first we check CanCompareIterableObjectToCache for each object, and if there's no hit we call CanStoreInIteratorCache which does the additional, slower checks.

It improves the micro-benchmark in comment 9 from 90-100 ms to 60-65 ms so it's a pretty big win.
Attachment #8885640 - Flags: review?(andrebargull)
I have some smaller patches but this is probably most we can squeeze out on the VM side.

Next I want to make JSOP_ITER an IC, that should make things like the test in comment 9 much faster.
Comment on attachment 8885640 [details] [diff] [review]
Part 7 - Optimize iterator cache lookup

Review of attachment 8885640 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsiter.cpp
@@ +892,5 @@
> +            return nullptr;
> +        }
> +
> +        obj = obj->staticPrototype();
> +    } while (obj);

Oops, I need to use another variable, like pobj, instead of |obj| since the original object is needed below.
Fix the issue in comment 17.
Attachment #8885640 - Attachment is obsolete: true
Attachment #8885640 - Flags: review?(andrebargull)
Attachment #8885666 - Flags: review?(andrebargull)
We don't use the iterator cache if one of the objects has the hasUncacheableProto flag. This flag means the object's __proto__ can change without triggering a Shape change.

The iterator cache itself shape checks all objects anyway (and compares the number of guards), so it will do the right thing in the hasUncacheableProto case.

The only place where this matters is the lastCachedNativeIterator mechanism, since that assumes the receiver has a non-null proto. lastCachedNativeIterator's days are numbered (JSOP_ITER ICs will replace it) but for now we can just move the check to that code.

Iterating objects with the hasUncacheableProto flag is uncommon, but when it does hit it's a bad perf cliff so we might as well fix it. For the micro-benchmark below I get:

before: 598 ms
after:   63 ms

function f() {
    var t = new Date;
    var o = {x: 1};
    Object.setPrototypeOf(o, null);
    for (var i = 0; i < 2000000; i++) {
        for (var x in o) {}
    }
    print(new Date - t);
}
f();
Attachment #8885671 - Flags: review?(andrebargull)
Comment on attachment 8885666 [details] [diff] [review]
Part 7 - Optimize iterator cache lookup

Review of attachment 8885666 [details] [diff] [review]:
-----------------------------------------------------------------

Did you add the separate "StoreInIteratorCache" method to make the code more understandable? (Just curious, no need to inline this method into its sole caller.)
Attachment #8885666 - Flags: review?(andrebargull) → review+
Comment on attachment 8885671 [details] [diff] [review]
Part 8 - Use iterator cache for objects with uncacheable proto

Review of attachment 8885671 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable.
Attachment #8885671 - Flags: review?(andrebargull) → review+
(In reply to André Bargull from comment #20)
> Did you add the separate "StoreInIteratorCache" method to make the code more
> understandable? (Just curious, no need to inline this method into its sole
> caller.)

Yeah, it seemed nice to move most of the caching-related code into separate functions instead of having everything in GetIterator :)
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2f3159af0a3c
part 6 - Clean up CanCacheIterableObject a bit. r=anba
https://hg.mozilla.org/integration/mozilla-inbound/rev/ea3f1df83823
part 7 - Optimize iterator cache lookup. r=anba
https://hg.mozilla.org/integration/mozilla-inbound/rev/77447df746cc
part 8 - Use iterator cache also for objects with the hasUncacheableProto flag. r=anba
This replaces compartment->lastCachedNativeIterator with a CacheIR-based Baseline and Ion IC for JSOP_ITER. Ion had an inline path for this that was brittle, and Baseline didn't optimize JSOP_ITER at all.

The IC code is about as fast as the old Ion code for the monomorphic case.

For the microbenchmark in comment 9 I get:

before: 77 ms (yesterday I got better numbers, not sure why)
after:  21 ms

before --no-ion: 125 ms
after --no-ion:   52 ms

Things are especially fast when we iterate a small number of different objects:

function f() {
    var arr = [{x1: 0}, {x2: 0}, {x3: 0}, {x4: 0}];
    var t = new Date;
    for (var i = 0; i < 10000000; i++) {
	var o = arr[i % 4];
        for (var x in o) {}
    }
    print(new Date - t);
}
f();

before: 403 ms
after:  139 ms

There are cases here where the old code was faster (when the IC goes megamorphic). I think we should land this patch and see how it affects benchmarks, if it regresses something we can add a megamorphic stub for this.

|for (var x in null/undefined) {}| also shows up, that should be easier to optimize now as a follow-up.
Attachment #8886162 - Flags: review?(evilpies)
Comment on attachment 8886162 [details] [diff] [review]
Part 9 - Add Baseline/Ion IC for JSOP_ITER

Review of attachment 8886162 [details] [diff] [review]:
-----------------------------------------------------------------

Nice results! The rest of the patch looks good.

::: js/src/jit/CacheIR.cpp
@@ +3624,5 @@
> +    else if (expandoId)
> +        writer.guardNoDenseElements(*expandoId);
> +
> +    // Do the same for the objects on the proto chain.
> +    GeneratePrototypeHoleGuards(writer, obj, objId);

I really like how reuse all of this code again! However, I think there is a small bug here, we can cache objects with unboxed objects on the proto chain, but this doesn't seem support that.
(In reply to Limited Availabilty till Oct | Tom Schuster [:evilpie] from comment #26)
> However, I think there is a
> small bug here, we can cache objects with unboxed objects on the proto
> chain, but this doesn't seem support that.

We discussed this on IRC. Unboxed objects can't appear on prototype chains (because they can't have the is-delegate shape flag).

I was going to add an assert for this, but GeneratePrototypeHoleGuards already uses pobj->as<NativeObject>() which should assert if we ever get a non-native object there.
Comment on attachment 8886162 [details] [diff] [review]
Part 9 - Add Baseline/Ion IC for JSOP_ITER

Review of attachment 8886162 [details] [diff] [review]:
-----------------------------------------------------------------

Okay, I had no idea we couldn't have unboxed objects on the proto chain.
Attachment #8886162 - Flags: review?(evilpies) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bd39c38c043
part 9 - Optimize for-in/JSOP_ITER with a Baseline/Ion IC. r=evilpie
The iterator cache is currently a fixed-size (256 slots) table per thread/JSContext that's purged on GC. It has a number of issues:

(1) Hashing is not perfect so it's pretty easy to get into a state where we keep hitting/replacing the same bucket and keep allocating iterator objects.

(2) The issue in (1) depends on pointer values, GC timing etc so the iterator cache is very non-deterministic. I've noticed this on Speedometer where the Elm benchmark either allocates > 40,000 iterators or < 500 depending on which other tests we run before it.

(3) Purging the cache on each GC requires a PodArrayZero that has to overwrite 2 KB on 64-bit platforms. This is some unfortunate overhead if we didn't cache anything.

(4) If we GC Zone A, there's no reason to spoil things (by purging the cache) for Zone B.

This patch replaces the global/fixed-size cache with a per-compartment HashSet. Like the old cache, this HashSet is cleared on GC.

The cache is a lot more deterministic now: I no longer see cases where we allocate > 40k iterators on Elm - it's now consistently < 250 which is much better than before.

I also cleaned up the code a bit and used mozilla::AddToHash and mozilla::PodEqual instead of hand-rolling our own versions.
Attachment #8887868 - Flags: review?(jcoppeard)
Comment on attachment 8887868 [details] [diff] [review]
Part 10 - Rewrite iterator cache

Review of attachment 8887868 [details] [diff] [review]:
-----------------------------------------------------------------

This looks much better.

::: js/src/jscompartment.h
@@ -674,5 @@
>  
>      js::WrapperMap               crossCompartmentWrappers;
>  
> -    using CCKeyVector = mozilla::Vector<js::CrossCompartmentKey, 0, js::SystemAllocPolicy>;
> -    CCKeyVector                  nurseryCCKeys;

Thanks for removing this.
Attachment #8887868 - Flags: review?(jcoppeard) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d649fe4b821a
part 10 - Replace fixed-size iterator cache with a per-compartment HashSet to improve hit rate. r=jonco
I think that's enough for this bug.
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/d649fe4b821a
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1383715
You need to log in before you can comment on or make changes to this bug.