Closed Bug 1495519 Opened Last year Closed 7 months ago

Rethink the IC state transition story

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox64 --- affected

People

(Reporter: mgaudet, Assigned: mgaudet)

Details

Attachments

(1 file, 3 obsolete files)

Attached file benchmark.js
While investigating Bug 1494537 I realized that an IC which should apply (tryAttachMegamorphicSetElement) isn't being hit because we never attempt to attach while the IC state is in Megamorphic mode. 

I think it's worth reconsidering how maybeTransition works, as I think it currently is 1) Difficult to follow 2) possible to skip the megamorphic case un-necessarily.
I wanted to run this patch by you first because it is very much in the same neighbourhood as Bug 1448039. 

Afterwards, this will need feedback from Jan, and a -thorough- performance evaluation.
Attachment #9013377 - Flags: feedback?(khyperia)
Assignee: nobody → mgaudet
Status: NEW → ASSIGNED
Comment on attachment 9013377 [details] [diff] [review]
Rethink state transitions in ICState

Review of attachment 9013377 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! Definitely agree with the changes, just have a few nitpicks.

::: js/src/jit/ICState.h
@@ +41,3 @@
>      static const size_t MaxOptimizedStubs = 6;
>  
>      void transition(Mode mode) {

Could we delete this function? I don't think it's being used anymore (`transition(Mode mode)`)

@@ +56,5 @@
> +        }
> +        numFailures_ = 0;
> +    }
> +
> +

Nit: extra blank line here.
Attachment #9013377 - Flags: feedback?(khyperia)
Attachment #9013377 - Attachment is obsolete: true
(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #0)
> because we never attempt to attach
> while the IC state is in Megamorphic mode. 

Did you mean *not* in Megamorphic mode here?

If I understand correctly, the problem is that we have cases that are only handled by megamorphic stubs and not the "normal" stubs - IMO we should fix these cases when we find them. Megamorphic mode was intended for the "we had too many stubs" case and I'm not sure about conflating that with "we failed to attach too many times". (One issue with fixing the bug this way is that we still waste time trying to attach a number of times until we finally change the mode, I think?)
What do you think about renaming tryAttachMegamorphicSetElement and calling it both in the megamorphic case and as final/catch-all tryAttach*?

This is a good find, btw :)
(In reply to Jan de Mooij [:jandem] from comment #5)
> (In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #0)
> > because we never attempt to attach
> > while the IC state is in Megamorphic mode. 
> 
> Did you mean *not* in Megamorphic mode here?

Nope: In this case the stub transitions directly from specialized to generic [1]: 

        if (numFailures_ == maxFailures() || mode_ == Mode::Megamorphic) { 
            transition(Mode::Generic);
            return true;
        }

In this case, numFailures_ == maxFailures(). Every instance of the increment of numFailures_ happens while DoGetElemSuperFallbackInfo is on the stack (though, I'm not sure if it is the same invocation), but essentially we hit numFailures_ and directly transition to Generic. 

It's definitely curious, as it seems like the trackNotAttached() is getting called more times on the stub->state() than actually we even try to attach; the 1-1 correlation that seems implicit in maybeTransition ( i.e. |numFailures_ == maxFailures()|... what about |numFailures > maxFailures()?|) is broken for SetElem. 


> If I understand correctly, the problem is that we have cases that are only
> handled by megamorphic stubs and not the "normal" stubs. 

Correct. 

> IMO we should fix these cases when we find them. Megamorphic mode was intended for the "we had
> too many stubs" case and I'm not sure about conflating that with "we failed
> to attach too many times". 

There's definitely some philosophical questions about megamorphic mode. Today I think it's already been conflated between we've failed too many times and we've attached too many stubs. (Regardless of the outcome of this particular patch, I think the maybeTransition logic needs to be rewritten to be much more clear about the aims). 

We can definitely audit the remaining megamorphic cases to be sure, and perhaps open them up as 'generically available stubs'.

> (One issue with fixing the bug this way is that we still waste time trying to attach a number of times until we finally
> change the mode, I think?)

I don't think of it as a waste of time IMO; we're sampling the input space maxFailure() times before moving to a more coarse grained stub. The problem with more generic stubs is that they can handle cases where we have a more specific, higher performance stub available; so we'd like to prefer to try to attach the most specific stubs while we can, but still have the generic stubs in the back pocket for when they don't ever attach. 

So, to summarize: I still like this patch, but I'm OK with the idea that you'd prefer to instead make the generic stubs more available. I do think absolutely maybeTransition needs to be rewritten though -- I'm open to suggestions about what you'd like to see. 

[1]: https://searchfox.org/mozilla-central/rev/6ddb5fb144993fb5de044e2e8d900d7643b98a4d/js/src/jit/ICState.h#93
Priority: -- → P3
QA Contact: sdetar
QA Contact: sdetar
Flags: needinfo?(jdemooij)
(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #7)
> I don't think of it as a waste of time IMO; we're sampling the input space
> maxFailure() times before moving to a more coarse grained stub. The problem
> with more generic stubs is that they can handle cases where we have a more
> specific, higher performance stub available; so we'd like to prefer to try
> to attach the most specific stubs while we can, but still have the generic
> stubs in the back pocket for when they don't ever attach.

This isn't unreasonable and the problem with heuristics is there's not a right/wrong answer :/ My current thinking is:

(1) "Megamorphic" implies we had too many stubs at some point and I don't want to change that part. We could rename Megamorphic => Generic and Generic => Disabled, but we have CacheIR instructions/optimizations with Megamorphic in the name where this would be more confusing.

(2) IMO, if we fail to attach our most specific stubs, then it's okay (and the right thing) to immediately use a more generic one. That way, failure to attach is always a real/potential performance issue we should consider fixing.

I hope this addresses your NI; let me know if I missed something.
Flags: needinfo?(jdemooij)
Thanks, it was definitely helpful in understanding your position, in particular you desire for Megamorphic to be the too-many-stubs case

After chewing on it a bit, I wrote two different patches: 

The first simply restructured maybeTransition in order to help improve readability. This patch should be semantically identical

   https://hg.mozilla.org/try/rev/2523803e881337a43b39f5a2d78e1324ecfb9dc8

The second renames MegamorphicSetSlot to GenericSetSlot, and attaches only after all the other object attempts have failed. I believe this is what you were alluding to in (2).

   https://hg.mozilla.org/try/rev/aa75f08f36742474b8770fdeb6a783d66328f1f4

Performance experimentation shows worse results overall relative to the benchmark results seen in Comment 3 [1]. I haven't done a deep investigation as to why the performance is worse, but my gut feeling says it may well be the pathology described in Bug 1494537, Comment 13

I still think the approach in the attached patch [2] makes the most sense, though, as you suggest there may be some renaming involved. I'm just increasingly convinced there's value in keeping a class of ICs away from hot code unless we have some sampled evidence they'd be more valuable than eating some fallbacks. The cost of a bad IC shadowing a more common potentially higher-performing IC seems to me to outweigh the value of reducing fallback hits as low as possible. 

Maybe we need to add a fourth IC state (and do the Generic rename as suggested), then make the state transitions look like this: 

                 +-----------+
           +-----+Specialized+-----+
           |     +-----------+     |
           |                       |
           v                       v
     Too Many Stubs        Too Many Failures
           |                       |
+----------v+                  +---v---+
|Megamorphic|                  |Generic|
+-------+---+                  +---+---+
        |                          |
        |         +--------+       |
        +--------->Disabled<-------+
                  +--------+

This helps keep the states distinct. 



[1]: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=ac631e7771a1&framework=1&selectedTimeRange=172800
[2]: https://phabricator.services.mozilla.com/D7356
Flags: needinfo?(jdemooij)
(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #9)
> Maybe we need to add a fourth IC state (and do the Generic rename as
> suggested), then make the state transitions look like this: 

Just to make sure I understand: what would be the difference in behavior between Megamorphic and Generic in this system?
(In reply to Jan de Mooij [:jandem] from comment #10)
> (In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #9)
> > Maybe we need to add a fourth IC state (and do the Generic rename as
> > suggested), then make the state transitions look like this: 
> 
> Just to make sure I understand: what would be the difference in behavior
> between Megamorphic and Generic in this system?

The ability to distinguish allows us to choose different behaviours, though perhaps they mostly act similarly initially. 

My gut says that in 'Generic' mode, we'd attach slower stubs that we would prefer to keep away from hot paths, whereas in Megamorphic mode, we prefer not to attach stubs that will only handle one type. 

Right now these two groups overlap heavily, but we'd be able to use this extra bit of information to develop more sophisticated tuning. 

(I am honestly not sure how much I am advocating for this; while it seems like a useful primitive to have, the existence depends on further analysis of our IC attachment story that we have no one currently scheduled to work on, so may be more complexity than the topic deserves at hand. Nevertheless, I feel like we can make an improvement on our current state-of-the-art)
I like the distinction between the Megamorphic (too many stubs were attached) and Generic (we failed to attach too many times, do the most generic thing you can think of or we'll go to Disabled) states. Adding a new state does add a bit more complexity though, I'll think about it more.
Sorry for the slowness here :/ I think having Generic/Megamorphic/Disabled states makes sense if Try results show it's a win.
Flags: needinfo?(jdemooij)
Attachment #9013458 - Attachment is obsolete: true
Modify the ICState state machine to clarify the difference between an IC chain
that has failed too many times and an IC chain that has attached too many
stubs. This adds a new terminal state Disabled which corresponds to the IC chain
being disabled.

This will allow further investment in IC attachment heuristics.

The state-machine transitions are also gently modified, which in some
cirucmstances will allow megamorphic attachments where previously the IC would
directly transition to Disabled (then Generic).
The patch makes sense to me but:

(1) Instead of treating Generic == Megamorphic in CacheIR.cpp, I think we should do what you said in comment 11:

> My gut says that in 'Generic' mode, we'd attach slower stubs that we would prefer to keep away from hot paths, whereas in Megamorphic mode, we prefer not to attach stubs that will only handle one type. 

So all the tryAttachMegamorphic paths would still check mode == Megamorphic, and we would only use Generic as a catch-all at the end: if (mode == Generic) { ..attach the slowest and most generic thing we have.. }

(2) Was this bug initially motivated by a win on some benchmark? Did we lose it?
Flags: needinfo?(mgaudet)
(In reply to Jan de Mooij [:jandem] from comment #16)
> So all the tryAttachMegamorphic paths would still check mode == Megamorphic,
> and we would only use Generic as a catch-all at the end: if (mode ==
> Generic) { ..attach the slowest and most generic thing we have.. }

Ok, lemme see what I can do there. I wasn't super confident about getting them all right, but I'll do a pass and performance test too. 

> (2) Was this bug initially motivated by a win on some benchmark? Did we lose
> it?

More a performance pathology I saw while working on Bug 1494537. The core issue was that the SetElement cache was missing the tryAttachMegamorphicSetElement case, because of the peculiar structure of said cache w.r.t. state transitions that lead it to completely bypass the Megamorphic state, transitioning directly from Specialized to Generic. 

This patch preserves the property that we should always still attempt a round of attachment for either Generic or Megamorphic, which addresses that pathology.  

I suspect fixing this will be a win, on the order of 1-2% on a number of benchmarks (dromaeo and kraken to be specific), though the talos numbers are too noisy to get a high-confidence confirmation. The benefit is that we'll attach ICs where before we'd just go generic. The downside is... we'll attach ICs, where previously we'd just go generic. There are circumstances where this is a win, and circumstances where it's a loss. My gut says more of the former than the latter.
Flags: needinfo?(mgaudet)
(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #17)
> (In reply to Jan de Mooij [:jandem] from comment #16)
> > So all the tryAttachMegamorphic paths would still check mode == Megamorphic,
> > and we would only use Generic as a catch-all at the end: if (mode ==
> > Generic) { ..attach the slowest and most generic thing we have.. }
> 
> Ok, lemme see what I can do there. I wasn't super confident about getting
> them all right, but I'll do a pass and performance test too. 
> 

So, getting this right is going to be hard. Probably worth it one day, but I'm not as sure this is the place.

   https://hg.mozilla.org/try/rev/35f4dcbecc3e 

Is an attempt at breaking out the Generic from the Megamorphic cases. Performance on TreeHerder [1] is generally worse than the patch (currently on phabricator) which punts on the problem and simply considers the new "Generic/Megamorphic" as the same. 

IMO this heuristic work should be a followup bug 


[1]: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c19406581938&newProject=try&newRevision=9db5e0e833f6&framework=10&showOnlyComparable=1
Essentially suggesting we land the current phabricator patch, then open followup bugs for heuristic investigation.
Flags: needinfo?(jdemooij)

Clearing NI because you'll be on leave still this month. Personally I think that if we don't see a clear win anywhere we shouldn't bother changing it right now.

Flags: needinfo?(jdemooij)
Attachment #9026130 - Attachment is obsolete: true

Well, despite a fair amount of effort, I'm going to close this as incomplete.

There are definitely cases where changing the heuristic used to control our transition to generic matter. I can gin up examples galore. However, seemingly no matter what heuristic I choose, it's hard to do reliably better than what we have.

Some of the more interesting ideas I tried include

  • Adding an 'exploration' parameter, inspired by Monte-Carlo Tree Search techniques. Similarly, I also tried Continuing to track the number of failures in generic mode, and resetting the IC state when it overflows, as a way of intermittently trying to find a state-change where ICs are beneficial again.
  • Preserving ICs that have a good hit rate, as measured by the entry counters.

Lots of microbenchmarks exist that show how these new techniques can kick our existing system's butt: Despite that, they invariably fall down on our real world benchmarks on raptor; speedometer, tp6 etc.

The root cause being there is no good single best heuristic, and so in the space of IC attachment, you'll always degrade some cases in favour others.

While my gut still says there's room for improvement here (Reinforcement learning?), I don't think it's worth spending more effort here right now.

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.