Closed Bug 1328140 Opened 7 years ago Closed 7 years ago

Handle IC failures better

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Performance Impact none
Tracking Status
firefox55 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [platform-rel-Facebook][platform-rel-ReactJS][platform-rel-Ember])

Attachments

(2 files)

"IC failures" here means:

(1) We attach too many optimized stubs so we can't optimize cases after that. (For Ion ICs the limit is currently 16 stubs, that's quite a lot. Changing this may regress some benchmarks that use polymorphic property accesses, but we should definitely try to do better.)

(2) We repeatedly fail to attach a stub. In this case we should stop trying after x failures so we don't spend too much time on something that's likely not going to succeed. As we work on optimizing more cases, this will become less of an issue. The Ion GetProp IC has a failedUpdates_ mechanism that we should probably use everywhere.

Maybe we can check for these cases in the CacheIR code, so we handle them the same way in Ion/Baseline. We can discard all stubs and then either attach a generic catch-all stub or set a flag on the fallback stub so it stops optimizing.
Bz brought up an interesting case while reviewing bug 1319087.

Because (currently) WrapResult will fail if we have to create a wrapper object we will end up in the fallback stub. If however we reached the max number of stubs we will attach a generic stub. This is of course silly, because the GetPropertyOperation afterwards would create the wrapper object and cache it, so the next time LookupWrapper would have succeeded. I kind of assume we might have similar issues in other places as well, just something we should maybe address in one go.
Replacing stubs with a generic stub can always cause performance issues, because it's possible one of the stubs we're replacing is actually very hot and handles 99% of incoming objects. I'm not sure there's much we can/should do about that...
The more I've been thinking about this, the more I'm convinced the 16-stubs limit in Ion (and Baseline getprop) is ridiculous and is very likely hurting us more than it helps (especially with W^X).

It probably makes more sense to do the following: the first time we reach the max stubs limit (let's say 4-8 stubs), we set a special polymorphic flag on the IC and IR generator. In this mode we would always attach the most generic proxy stub, readslot stubs would not be specialized to a particular receiver shape but do a callWithABI to a Shape::search function, etc. If we then again reach the max stubs limit, we give up and attach the most generic callVM stub. This should make us much more resilient when it comes to the polymorphic cases.

I'll do some measurements (Octane/Dromaeo/Speedometer/Browsermark) once bug 1341067 and a few other things are in. This is definitely P1.
Priority: P3 → P1
Do we need something more advanced? I can image having more classes than two? And what about having a particular type of stub that is polymorphic but the other is not polymorphic?

I was imagining we could do:
1) Keep a number on how many stubs have been added of a specific type.
2) Every time we want to add a stub, test if that numbers is lower than X. (and unlink all stubs from that type if seen).
3) Put the more generic stubs lower in our tryAttach function.

=> Positive: This would result in a stub system that would automatically take more generic stubs when the more specific stubs are getting blocked.
=> Negative: Imagine we added a generic stub since stubtype1 was used too much. If stubtype2 also becomes too generic we will hit the generic stub instead of going to the fallback stub and unlinking those stubs :(

Another idea (variation on yours) I had:
Instead of using a flag: "polymorphic". Add a flag "optimizeLevel", which we increase every time we hit the max stub limit. And let the most specific stubs only get added when "optimizeLevel == 0", more generic at "optimizeLevel == 1" ... This would give the some of the same benefits, but allow us to have more "levels"
(In reply to Hannes Verschore [:h4writer] from comment #4)
> Another idea (variation on yours) I had:
> Instead of using a flag: "polymorphic". Add a flag "optimizeLevel", which we
> increase every time we hit the max stub limit. And let the most specific
> stubs only get added when "optimizeLevel == 0", more generic at
> "optimizeLevel == 1" ... This would give the some of the same benefits, but
> allow us to have more "levels"

I like this because it's pretty simple but should be quite effective. Maybe an enum with 3 values, we start at the first level, if we attach too many stubs we switch to the second level and prefer attaching the more generic (but still pretty fast) stubs. The 3rd level would be "generic/megamorphic" and just means "give up and attach a single stub that handles everything with a VM call".

When we fail to attach a stub > X times, we could also change the IC level to most-generic so we stop wasting time trying to attach a stub.

I think this will fix a lot of perf issues and give us the same behavior in Ion and Baseline. We will have to see how Octane-TypeScript and browser benchmarks behave, they have some bad polymorphism.
This idea sounds great to me as well! This would handle the CCW case perfectly, at the 2nd level we would emit a generic proxy call and at the 3rd level we would fallback to the generic stub.
I prototyped this quickly for the Ion GetProp/GetElem IC with the following changes:

* We attach max 6 stubs per level.

* Instead of attaching a readslot/missing-prop stub in tryAttachNative, the second level attaches a stub that does a callWithABI to a function that handles simple slot reads or missing properties (separate stubs for GetProp and GetElem). I specialized this for PlainObjects so this function can be very tight (no Class hooks to check for), but we should probably make it a bit more generic.

* The 3rd (Generic) level just ensures we don't attach any stubs. We could do a bit better here by attaching a generic stub but not sure it's worth it.

This was a small win on Octane-pdfjs and Octane-TypeScript. It also improved Speedometer (57 -> 60) and some assorted tests like react-shell. No significant changes on SunSpider and Kraken.
Blocks: 1245279
Can you send this change to the Hasal team for testing?  It would be interesting to know if it moves any needles there.
(In reply to :Ehsan Akhgari from comment #8)
> Can you send this change to the Hasal team for testing?  It would be
> interesting to know if it moves any needles there.

It was just a partial, quick-and-dirty prototype. I have some ideas to improve things more and I'd prefer trying that first (and running tests) before I send it to them. We really want this in 54 though and I should have more polished patches next week.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Depends on: 1341067
Depends on: 1342856
I'm now thinking maybe we shouldn't land this less than a week before the merge. It should improve performance but changing heuristics is always a bit of a risk.

I should have a patch for this tomorrow so we can see if it has any effect on Talos, Hasal, etc.
I have this mostly working locally but I'm running into some perf issues on Dromaeo, especially the dom-traverse tests.

The old situation was that we attached up to 16 GetProp stubs per IC and then we just stopped attaching new stubs.

The new situation is that we attach 6 stubs and if we need more we discard them and we try again in "megamorphic mode". If we then again reach the limit we switch to "generic" mode and we stop attaching anything. This is pretty nice because it allows us to do optimizations we couldn't do before: for instance in megamorphic mode when we attach a getter/setter stub, instead of guarding on specific receiver/proto shapes, we call a little C++ helper function that checks the object (or its proto chain) still has this getter/setter property. So the idea is that we can optimize specific getter/setter calls even if we have hundreds of different receiver shapes.

Now the problem with this Dromaeo test is that it calls getters like Node.nextSibling on all kinds of DOM nodes. The current setup is better for this because the 16 stubs happened to handle the most common cases in a way that's faster than our megamorphic getter stub.

bz, it's not the first time we're running into problems because we have many DOM element Classes. What would it take to use the same Class for, say, <p> and <div>? For toString we could add a Class hook if needed...
Flags: needinfo?(bzbarsky)
> What would it take to use the same Class for, say, <p> and <div>?

Before I start thinking about that seriously, would that be enough?  They have different protos too, not just different classes.
Flags: needinfo?(bzbarsky)
I made the IonBuilder code work with the megamorphic Baseline stub, and this is now a nice win on the Dromaeo DOM tests \o/ The dom-traverse test is now 5-7% *faster* than before and query.html and attr.html show similar gains.

Still a Dromaeo-CSS jQuery regression to figure out. I actually get better scores on dromaeo.com with my patch because the scores are calculated differently there I assume.
WIP patch improves the Ember benchmark in bug 1097376 a lot:

* Render List:  448 ms -> 358 ms
* Complex List: 190 ms -> 140 ms

It's still a small win on Speedometer and at least a 5% win on the react-shell benchmark. It's pretty nice to see significant wins on frameworks from handling megamorphism properly.

Dromaeo DOM will be 3-7% faster. We may have to take a regression on Dromaeo-CSS - when changing heuristics like this it's hard to make stuff faster without regressing a few things.

What's nice about handling failures properly is that it uncovers certain performance issues. We have a SetElement stub for adding a new dense element but it fails if we have to realloc the elements. That was okay(-ish) until now but my patch counts that as an IC failure and disables the IC after that happens a few times. This caused a large Talos CanvasMark regression and after I fixed the stub to realloc the elements it's actually a small win.

Unfortunately this will be too late for 54, but this bug is going to help framework code a lot. I'll start posting patches tomorrow.
(In reply to Jan de Mooij [:jandem] from comment #14)
> Dromaeo DOM will be 3-7% faster. We may have to take a regression on
> Dromaeo-CSS

Surprising twist: the SetElement fix I mentioned actually improves Dromaeo CSS as well so this is now an improvement (1-2% or so). Retriggers are pretty clear about it. It makes some sense because these query functions have to fill arrays of elements.
Depends on: 1344195
Depends on: 1344198
Depends on: 1344218
Depends on: 1344691
Baseline's GetProp_Generic stub is the last non-CacheIR Baseline prop/elem stub and the next patch will use a different approach.
Attachment #8845369 - Flags: review?(hv1989)
Sorry, this is a pretty large patch but at this point I don't think splitting it up more would help much and most of the changes are pretty mechanical.

Both Baseline and Ion ICs will have an ICState field that stores the number of stubs, number of failures, and the current IC mode (Default/Megamorphic/Generic). An IC can transition from one mode to the next if we either attach too many stubs (6 in this patch) or failed to attach a stub too many times (5 in this patch).

In megamorphic mode we despecialize as follows:

* We attach megamorphic stubs (fast C++ call) for getting native object properties and setting an existing native property.

* Getter/setter stubs will use a megamorphic guard to check the object or its proto chain has the getter/setter property, instead of guarding on specific receiver shapes etc. This is especially useful for the DOM: different element types have different shapes but we can still optimize getters on the prototype chain and use that information in IonBuilder.

* For proxies we use the most generic stub that works for all proxies.

There are other things for which we could add megamorphic stubs later, for instance getting/setting dense elements, JSOP_IN, etc.

Typical megamorphic lookup micro-benchmarks are 2-3x faster. It's easy to come up with scripted getter/setter micro-benchmarks where we're more than 10x faster because Ion can now inline them in more cases.

As a follow-up I want to add a JIT option to make ICs default to megamorphic stubs, so we can test/fuzz that code better.
Attachment #8845382 - Flags: review?(hv1989)
Comment on attachment 8845382 [details] [diff] [review]
Part 2 - Handle IC failures better

Review of attachment 8845382 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/VMFunctions.cpp
@@ +1608,5 @@
> +    }
> +
> +    if (MOZ_UNLIKELY(JSID_IS_INT(id)))
> +        return false;
> +

Shouldn't we use IdIsIndex here?
(In reply to Tom Schuster [:evilpie] from comment #18)
> Shouldn't we use IdIsIndex here?

Hm good point. If JSID_IS_INT returns false and IdIsIndex returns true, it must be an index > JSID_INT_MAX. Such properties are not stored in dense elements because NativeObject::MAX_DENSE_ELEMENTS_COUNT < JSID_INT_MAX, so I think it's okay to handle such sparse elements here. I will add a static_assert + comment for this. Let me know if I'm missing something.
You are correct, I hadn't realized that AtomToId already handles atom->isIndex.

As discussed on IRC the MaxFailures limit of 6 is probably too low. We have to consider that after going Generic there is no going back.

We also talked about handling this number not as a strict limit when we already attached stubs. Maybe we should double or multiply by the number of attached stubs.

Another thing we discussed was attaching the megamorphic stubs as an additional stub instead of replacing all the existing stubs.
(In reply to Tom Schuster [:evilpie] from comment #20)
> We also talked about handling this number not as a strict limit when we
> already attached stubs. Maybe we should double or multiply by the number of
> attached stubs.

I can change it to something like 5 + 50 * numOptimizedStubs to deoptimize less eagerly if we have stubs.

> Another thing we discussed was attaching the megamorphic stubs as an
> additional stub instead of replacing all the existing stubs.

Unfortunately some guards are a bit expensive (GuardSpecificAtom with a non-atomized string for example) so we have to be careful to not add some overhead before reaching the megamorphic stub.

I'm hoping bug 1346178 will be useful to avoid overhead from attaching a bunch of stubs before going megamorphic.
I tested this patch on Speedometer and started looking into whether we run into some situation where we keep looking up the same id on the same shape, because I was mostly interested into adding some kind of caching.

However Ember seems to run into some kind of problem where we turn the stub megamorphic before the object is completely assigned, or maybe they keep adding properties, not sure. This mostly affects the uses of Ember.Descriptor, Ember.testing and Ember.assert, but also various other properties. I would assume that the global Ember namespace doesn't constantly change.

Object Shape        JSID               Name           Count 
(('0x7fa2f9d116a0', '140337983886464', 'Descriptor'), 4368)
(('0x7fa2f9d116a0', '140338400234880', 'assert'), 2115)
(('0x7fa2f9d116a0', '140338432722048', 'testing'), 1716)
(('0x7fa2fa619100', '140337983886464', 'Descriptor'), 1368)
(('0x7fa2fb56af60', '140337983865856', 'descs'), 1336)
(('0x7fa2fb539dd0', '140338431941728', 'hasOwnProperty'), 1287)
(('0x7fa2f9d35380', '140337983865856', 'descs'), 1101)
(('0x7fa2fe972218', '140337983866336', '__ember_observesBefore__'), 1030)
(('0x7fa2fe972218', '140337983683928', '__ember_observes__'), 1030)
(('0x7fa2fe972218', '140337983683968', '__ember_listens__'), 1030)
(('0x7fa2f9d35380', '140337983865888', 'watching'), 921)
(('0x7fa2f9d6d790', '140337983865856', 'descs'), 773)
(('0x7fa2fb5c9f10', '140337983886464', 'Descriptor'), 772)
(('0x7fa2f9d1f808', '140338399967392', 'content'), 770)
(('0x7fa2f9d1f998', '140338399967392', 'content'), 768)
(('0x7fa2f9d1f790', '140338399967392', 'content'), 759)
(('0x7fa2fb56a358', '140338431941728', 'hasOwnProperty'), 754)
(('0x7fa2f9d35380', '140338431947616', 'values'), 747)
(('0x7fa2f9da9ba0', '140337978140704', '__ember1489284317769_meta'), 721)

At around 300 of those tuples we start going down to about 100 occurrences.  This is after doing about two ember runs. This is also a global log and not per megamorphic stub, so those might be lower per stub.

I haven't tested this, but maybe we could check when adding a NativeSlot stub if the previous stubs are for shapes that are in the shape lineage of the new shape and do some kind of cleanup or delay the megamorphic stub. Sadly we don't have any runtime profiling information to decide which stubs we might remove.
(In reply to Tom Schuster [:evilpie] from comment #22)
> I haven't tested this, but maybe we could check when adding a NativeSlot
> stub if the previous stubs are for shapes that are in the shape lineage of
> the new shape and do some kind of cleanup or delay the megamorphic stub.

It's difficult for non-dictionary shapes because the previous shapes in the shape lineage could still be the lastProperty of some other objects.

> Sadly we don't have any runtime profiling information to decide which stubs
> we might remove.

Last week we talked about adding a counter to IC stubs (maybe only to non-monomorphic ones to avoid overhead) so we can use that information (also in BaselineInspector). It's worth investigating at some point.
Comment on attachment 8845369 [details] [diff] [review]
Part 1 - Remove Baseline GetProp_Generic stub

Review of attachment 8845369 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/SharedIC.cpp
@@ +2038,1 @@
>          attached = true;

Shouldn't the above two lines also get removed?
Attachment #8845369 - Flags: review?(hv1989) → review+
Comment on attachment 8845382 [details] [diff] [review]
Part 2 - Handle IC failures better

Review of attachment 8845382 [details] [diff] [review]:
-----------------------------------------------------------------

Nice and clean!

Got three follow-up questions.
1) We should probably include this information in the CachIC spewer?
2) Seems like the order of stubs sometimes makes a difference? See my example in the code.
3) Few of the emitXXX are split between BaselineCacheIRCompiler and IonCacheIRCompiler and might be possible to share with some helper functions?

::: js/src/jit/CacheIR.cpp
@@ +532,5 @@
>  
> +void
> +GetPropIRGenerator::attachMegamorphicNativeSlot(ObjOperandId objId, jsid id, bool handleMissing)
> +{
> +    MOZ_ASSERT(mode_ != ICState::Mode::Default);

MOZ_ASSERT(mode_ != ICState::Mode::Megamorphic);

@@ +564,5 @@
>      switch (type) {
>        case CanAttachNone:
>          return false;
>        case CanAttachReadSlot:
> +        if (mode_ != ICState::Mode::Default) {

For me it make more sense to check if the state is megamorphic.

if (mode_ == ICState::Mode::Megamorphic) {

@@ +608,5 @@
>  
> +    // If we're megamorphic prefer a generic proxy stub that handles a lot more
> +    // cases.
> +    if (mode_ != ICState::Mode::Default)
> +        return false;

if (mode_ == ICState::Mode::Megamorphic)

@@ +676,5 @@
>          return false;
>  
> +    // If we're megamorphic prefer a generic proxy stub that handles a lot more
> +    // cases.
> +    if (mode_ != ICState::Mode::Default)

if (mode_ == ICState::Mode::Megamorphic)

@@ +765,1 @@
>          MOZ_ASSERT(cacheKind_ == CacheKind::GetElem);

MOZ_ASSERT(cacheKind_ == CacheKind::GetElem && mode_ == ICState::Mode::Megamorphic);

@@ +945,5 @@
>  GetPropIRGenerator::tryAttachProxy(HandleObject obj, ObjOperandId objId, HandleId id)
>  {
> +    ProxyStubType type = GetProxyStubType(cx_, obj, id);
> +
> +    if (mode_ != ICState::Mode::Default && type != ProxyStubType::None)

Can we test for ProxyStubType::None immediately. Is easier to read.

if (type == ProxyStubType::None)
    return false;

if (mode == ICState::Mode::Megamorphic)
    return tryAttachGenericProxy(obj, objId, id, /* handleDOMProxies = */ true);

MOZ_ASSERT(mode == ICState::Mode::Generic);

switch (type) {
  case ProxyStubType::None:	
    MOZ_CRASH();

@@ +973,2 @@
>        case ProxyStubType::Generic:
> +        return tryAttachGenericProxy(obj, objId, id, /* handleDOMProxies = */ false);

These two statements show there is an issue.
The order of attaching makes a difference.

1) DOMExpando
2) DOMUnshadowed (without CanAttachNativeGetProp)
=> Ataches DOMProxyExpando
=> and GenericProxy

2) DOMUnshadowed (without CanAttachNativeGetProp)
1) DOMExpando
=> Ataches GenericProxy

the DOMProxyExpando is never attached! Since the genericProxy takes precedence.

my hunch would be that should "fail" to add the DOMUnshadowed in that case and wait for the megamorphic state?

@@ +2151,5 @@
>          *isTemporarilyUnoptimizable_ = true;
>          return false;
>      }
>  
> +    if (mode_ != ICState::Mode::Default &&

mode_ == ICState::Mode::Megamorphic

@@ +2806,5 @@
>  
> +    ProxyStubType type = GetProxyStubType(cx_, obj, id);
> +
> +    if (mode_ != ICState::Mode::Default && type != ProxyStubType::None)
> +        return tryAttachGenericProxy(obj, objId, id, rhsId, /* handleDOMProxies = */ true);

Same thing I said for GetPropIRGenerator

@@ +2825,1 @@
>          return tryAttachGenericProxy(obj, objId, id, rhsId, /* handleDOMProxies = */ true);

Same remark as in GetPropIRGenerator.

::: js/src/jit/ICState.h
@@ +23,5 @@
> +    // single stub that handles everything or stop attaching new stubs.
> +    //
> +    // We also transition to Generic when we repeatedly fail to attach a stub,
> +    // to avoid wasting time trying.
> +    enum class Mode : uint8_t { Default = 0, Megamorphic, Generic };

Should we call "Default" -> "Specialized" ?

::: js/src/jit/IonCacheIRCompiler.cpp
@@ +693,5 @@
> +    masm.loadTypedOrValue(Address(masm.getStackPointer(), 0), output);
> +    masm.adjustStack(sizeof(Value));
> +
> +    masm.branchIfFalseBool(scratch2, failure->label());
> +    return true;

We should try to make sure the code is duplicated in Baseline/IonMonkey. They are almost the same except for where to read the name. Can we create helperfunctions that do different things in different modes to have this in the generic IC Compiler?

::: js/src/vm/TypeInference-inl.h
@@ +413,5 @@
>      if (obj->group()->unknownProperties())
>          return true;
>  
>      if (HeapTypeSet* types = obj->group()->maybeGetProperty(IdToTypeId(id)))
> +        return types->hasType(type) && !types->nonConstantProperty();

Where did this happen? Do we have a testcase or can you add a testcase where this is important?
Attachment #8845382 - Flags: review?(hv1989) → review+
Whiteboard: [qf-] → [qf-] [platform-rel-Facebook][platform-rel-ReactJS][platform-rel-Ember]
Comment on attachment 8845382 [details] [diff] [review]
Part 2 - Handle IC failures better

Review of attachment 8845382 [details] [diff] [review]:
-----------------------------------------------------------------

After profiling Ember the second and third most time spent in C++ are the new GetNativeDataProperty functions, so I think we should definitely have a bug to optimize some of those cases that I pointed out.

::: js/src/jit/VMFunctions.cpp
@@ +1539,5 @@
> +
> +        // Property not found. Watch out for Class hooks.
> +        if (MOZ_UNLIKELY(!obj->is<PlainObject>())) {
> +            if (ClassMayResolveId(cx->names(), obj->getClass(), id, obj) ||
> +                obj->getClass()->getGetProperty())

Doing this makes sense to me, but somehow LookupOwnPropertyPure doesn't do it.
(In reply to Tom Schuster [:evilpie] from comment #26)
> After profiling Ember the second and third most time spent in C++ are the
> new GetNativeDataProperty functions, so I think we should definitely have a
> bug to optimize some of those cases that I pointed out.

I'll look into this today to see what Ember is doing exactly. Not landing this for a few days because I have some other things to deal with.
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f98a326bcf8d
Improve handling of IC failures, add megamorphic IC stubs. r=h4writer
This will likely regress a number of benchmarks because it changes heuristics. Let's see what AWFY says.

(In reply to Hannes Verschore [:h4writer] from comment #25)
> 1) We should probably include this information in the CachIC spewer?

Yeah, I'll file a follow-up bug.

> 2) Seems like the order of stubs sometimes makes a difference? See my
> example in the code.

True. It's pre-existing and I don't think we have to do anything here for now, but we can fix it if we see it's a problem somewhere.

> 3) Few of the emitXXX are split between BaselineCacheIRCompiler and
> IonCacheIRCompiler and might be possible to share with some helper functions?

Filed bug 1348792.

> For me it make more sense to check if the state is megamorphic.

I changed it. (I wrote the checks like that in case we ever add a new mode, but we can deal with it then.)

> Should we call "Default" -> "Specialized" ?

Good idea, done.
This regressed Octane-box2d, I filed bug 1348850 for that.

It also regressed Octane-TypeScript because HasTypePropertyId is too pessimistic. I'll fix that and hopefully it will be an improvement then.
Blocks: 1349035
https://hg.mozilla.org/mozilla-central/rev/f98a326bcf8d
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Performance Impact: --- → -
Whiteboard: [qf-] [platform-rel-Facebook][platform-rel-ReactJS][platform-rel-Ember] → [platform-rel-Facebook][platform-rel-ReactJS][platform-rel-Ember]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: