Closed Bug 1599909 Opened 4 years ago Closed 2 years ago

Analyze baseline ICs to inform strategies for eventual IC -> MIR compilation

Categories

(Core :: JavaScript Engine: JIT, task, P2)

task

Tracking

()

RESOLVED FIXED

People

(Reporter: cfallin, Unassigned)

Details

Attachments

(3 files, 4 obsolete files)

In order to compile baseline ICs (CacheIR) to MIR, which is a possible path forward for the JIT, we want to understand the current behavior better in order to inform some design decisions.

In particular:

  • How hot are IC stubs in a chain? Are chains usually very short, with just one hot stub that always hits, or do we need to consider polymorphic cases more carefully? This will inform whether we inline (some, all) IC stubs directly and how we might dispatch between them.

  • How much in common do IC stubs in a chain have? Are many of the stubs identical up to a prefix, then differing in tails (with different guards to select)? Or are the stubs completely different? This will inform whether we need to think about hoisting/sharing guards if we inline stubs, and in the extreme case of identical stubs except data, whether we want to do something clever for data-polymorphism (Iain's "IC smooshing" idea).

To answer these questions, I built some instrumentation to dump a trace of information about stubs on every GC, and postprocess that trace. Will attach in subsequent comments.

This patch (not meant to be landed!) contains several bits of
instrumentation:

  • A dump (in JSON format) of data about every IC and every stub for
    every existing JitScript at GC time. This dump includes the CacheIR
    opcodes and arguments, associated field data, and entry and
    guard-failure counts.

  • A modification to the guard-failure path of baseline ICs to include
    failure counts for each guard.

This is meant to be paired with the analysis script in the above bug in
order to analyze IC behavior.

Attached file icstats6.txt.xz (obsolete) —

Attaching log of IC stub data collected from a browsing session over popular sites + new-tab Pocket links.

Attached file Analysis script: ic-stats.py (obsolete) —

Attaching analysis script.

Attached file analysis-output.txt (obsolete) —

Attaching analysis output.

I included four analyses in this initial analysis script:

  • IC stub count (length of chain): histograms where the bin (X-axis) is number of stubs in an IC chain, and Y-axis is either count of observed chain with this stub count (unweighted) or entries into chains with this stub count (weighted).

  • IC entry count (hotness across chain): histogram where the bin is the stub position on the chain, and Y-axis is sum of all entries into a stub in this position.

  • Identical stubs: histograms where X-axis is number of redundant stubs, i.e., stubs whose bodies are identical to some other stub in their chain, except for stub data (field values), and Y-axis is either count of observed cases (unweighted) or count of observed cases multiplied by chain entry counts (weighted).

  • Identical stub prefixes: the above identical-stubs analysis, but where for each stub, we consider only the k-prefix of CacheIR ops. Results are parameterized on k, from 1 to maximum observed stub length.

There's some more we could do with the raw data, but this seems like a good start. Some conclusions:

  • About 88% of IC chains have only 1 stub; 99% have 1 or 2 stubs, weighted by entry counts. This immediately tells us that any sort of fancy polymorphism is probably unnecessary.

  • The distribution of IC entry counts is even more heavily skewed: 95% of all stub invocations are to the first stub in the chain.

  • There are very few duplicate stubs (identical code, just different field data): only 0.28% of all stubs weighted by entry count, or 0.03% of all stubs unweighted (!).

  • Even if we consider only a k-prefix, with the most permissive k = 1, we have only 0.5% of stubs that are redundant (have the same first instruction, in this case, as another stub in the chain).

The above conclusions regarding redundant stubs are not too surprising given the initial context that most chains are very short. Overall, my takeaway from the above is that (i) we probably should support poly-inlining for at least two cases (88% is a clear majority but taking a hit to call a stub 1/8 times is still not great), but (ii) any of the more complex things we've talked about to optimize long chains or merge stubs together are probably relegated to corner-cases (e.g. for DOM nodes or the like) and not too important for the general web workload.

Happy to discuss more and/or add more analyses!

This is awesome. Thanks!

Is there any way to also track IC state? If the number of ICs in a chain exceeds a limit, we transition to megamorphic (and eventually generic). So these numbers have some level of hidden polymorphism, where chains got too long and transitioned. The number of megamorphic/generic chains is an upper bound on that hidden polymorphism.

Priority: -- → P2
Attached file icstats7.txt.xz
Attachment #9112102 - Attachment is obsolete: true
Attached file ic-stats.py
Attachment #9112114 - Attachment is obsolete: true
Attached file analysis-output.txt
Attachment #9112118 - Attachment is obsolete: true

Attached new versions of stats collection patch, analysis script, trace of browsing session, and analysis output, with IC state dumped as well.

In this particular browsing session (different from the previous trace, but similar mix of sites), we see, weighted by chain entry count, 1.3% of all ICs are megamorphic and 15.4% are generic (with the remaining 83.3% specialized). Unweighted, far fewer are generic or megamorphic -- 3.9% total. (It makes sense to me that the more frequently entered chains are more likely to have megamorphic/generic stubs as they will likely see more types.) So, indeed, there is more polymorphism than the raw numbers would suggest, because we're collapsing a long tail into the megamorphic/generic cases. For the specific question of which or how many stubs to inline, though, it seems to me that the type of stub is somewhat immaterial though, in the sense that we have N stubs and a chain (or tree) of decisions between them; we just need to choose which subset of the stubs to inline. (Though, perhaps with better heuristics here, we could get away with specializing more, creating more stubs.)

You are running into a bunch of cliff-mitigation strategies that are skewing your numbers - the generic stuff being just a example. The ICs we encounter have both chain-size thresholds as well as generic and megamorphic stubs to handle situations where too much polymorphism causes IC chains to be come degenerately long.

Turning these off (i.e. increasing thresholds to something ridiculously high, and disabling all generic and megamorphic stubs), would probably give a better idea of the true polymorphism present on the your various use cases.

15.4% is a pretty big number. It doesn't directly affect our inlining, but it does imply that we should look at our IC strategies and find ways to avoid going generic in the first place.

My pet idea for this continues to be creating megamorphic stubs by merging together specialized stubs with the same code but different stub fields. How hard would it be for you to modify this patch so that it dumps data about identical stubs when we transition to megamorphic, instead of after we've thrown the old IC chain away?

Also: can you do a little bit more analysis of the generic ICs with a high hitcount? I suspect that most of them will have fairly simple CacheIR (call this VM function), so just identifying the CacheIR sequences with the highest entry count could give helpful hints.

(In reply to Iain Ireland [:iain] from comment #11)

Also: can you do a little bit more analysis of the generic ICs with a high hitcount? I suspect that most of them will have fairly simple CacheIR (call this VM function), so just identifying the CacheIR sequences with the highest entry count could give helpful hints.

It seems that in every case in my dump, an IC in 'generic' state has only a fallback stub (hence no CacheIR). This perplexed me at first but I think this arises because of a combination of two things:

  1. The caller to ICState::maybeTransition() is supposed to discard all stubs when we transition to a new mode:

https://searchfox.org/mozilla-central/source/js/src/jit/ICState.h#72

and this indeed seems to be the case at the callsites I checked.

  1. The ICState::canAttachStub() heuristic immediately returns false if we're in generic mode:

https://searchfox.org/mozilla-central/source/js/src/jit/ICState.h#65

We could capture the chain just before we go generic (and likewise, just before megamorphic); this gets a little more complex wrt how I'm aggregating data, because I would need to separately dump the JitScript and then each IC as it has a temporal event, but it shouldn't be terrible. That said...

(In reply to Kannan Vijayan [:djvj] from comment #10)

You are running into a bunch of cliff-mitigation strategies that are skewing your numbers - the generic stuff being just a example. The ICs we encounter have both chain-size thresholds as well as generic and megamorphic stubs to handle situations where too much polymorphism causes IC chains to be come degenerately long.

Turning these off (i.e. increasing thresholds to something ridiculously high, and disabling all generic and megamorphic stubs), would probably give a better idea of the true polymorphism present on the your various use cases.

I tried this just now too, and things got pretty crashy pretty quickly. Turning off generic/megamorphic altogether (by overriding maybeTransition to never transition) actually causes the parent process to crash, rather than (or in addition to?) the web content process, which was pretty novel (out of memory? little bits of JS in the parent? no idea).

I'll work on debugging the above so that we can get clean 'all-specialized' data.

(I'd not be shocked to see memory usage skyrocket. A number of the ICs if not transitioning to megamorophic add an IC for each object or pair or obects, and so can very quickly produce very long IC chains)

It seems that in every case in my dump, an IC in 'generic' state has only a fallback stub (hence no CacheIR).

Oh, duh. Sorry, dumb request on my part. Hopefully better request: can we get a breakdown of how the frequency of each IC kind that goes generic corresponds to the overall frequency of ICs of that kind? (That is to say: which IC kinds are disproportionately likely to go generic?)

Attachment #9112101 - Attachment is obsolete: true

The bug assignee didn't login in Bugzilla in the last months and this bug has priority 'P2'.
:sdetar, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: chris → nobody
Flags: needinfo?(sdetar)

Given that we do now compile ICs to MIR, I think we can close this one as fixed.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(sdetar)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: