Closed
Bug 1494473
Opened 7 years ago
Closed 7 years ago
Collect statistics about CacheIR IC execution
Categories
(Core :: JavaScript Engine: JIT, enhancement, P3)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla65
People
(Reporter: mgaudet, Assigned: mgaudet)
References
(Blocks 2 open bugs)
Details
Attachments
(2 files, 2 obsolete files)
It may be valuable for us to gather detailed execution statistics about CacheIR stubs, to further our ability to make decisions.
For one could imagine providing execution weighted data in the BaselineInspector.
Assignee | ||
Comment 1•7 years ago
|
||
By adding a counter increment to the entry of an IC stub, as well as on
a successful-value-return Path, we are able to compute a hit-rate for
a particular stub (as implicitly the failure case is |entered - success|).
This information can be queried in the baseline inspector to provide
more information to Ion should it be desired.
Note: this version uses 64 bit counters to try to avoid overflow situations.
Attachment #9012378 -
Flags: feedback?(kvijayan)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → mgaudet
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•7 years ago
|
||
This allows us to extract information from the IC chains like this:
/Users/mgaudet/mozilla-unified-clean/js/src/octane/box2d.js:174:250 (sub)
CacheIR_Regular 1009 236 773
CacheIR_Regular 772 772 0
Which shows there are two CacheIR stubs attached for a JSOP_SUB bytecode, and
out of the 1009 usages of the IC Chain, 236 succeeded on the first IC, and 772
succeeded on the second IC.
Attachment #9012379 -
Flags: feedback?(kvijayan)
Assignee | ||
Comment 3•7 years ago
|
||
Comment on attachment 9012378 [details] [diff] [review]
Add success and entered counters to Baseline's CacheIR_Regular stubs
Before landing this, this patch would of course need to be try-built, and benchmarked appropriately.
This patch currently only handles Baseline CacheIR ICs. A similar approach can be taken in Ion, however, it will require some more complexity (patched addresses I believe) due to the lack of ICStubReg in the Ion case.
Also note: Currently uses 64 bit counters to try to avoid caring about overflow, but one could imagine shortening the counters if we're OK with 32 bits of accuracy.
Attachment #9012378 -
Flags: feedback?(jdemooij)
Assignee | ||
Comment 4•7 years ago
|
||
(one more comment about Patch 2: discardStubs/discardOptimizedStubs will hide data from that dump)
![]() |
||
Comment 5•7 years ago
|
||
Comment on attachment 9012378 [details] [diff] [review]
Add success and entered counters to Baseline's CacheIR_Regular stubs
Neat! Those are pretty big differences, and I can see these sorts of distribution unevenness showing up basically everywhere, and in even more exaggerated ways (e.g. a 95%/5% split on stubs). Ion utilizing this information to selectively inline certain paths and include bailout / offramp paths for others would be valuable to keep in mind.
Additionally, this approach can be extended to other areas of the engine as well:
1. Value-specialization for Ion to inject speculative path splitting based on predicted high-frequency constants.
2. Basic block outgoing-edge counts, to feed into PGO style compile decisions, including better selection of where to put bailouts / offramps into lower tiers.
Attachment #9012378 -
Flags: feedback?(kvijayan) → feedback+
![]() |
||
Comment 6•7 years ago
|
||
Comment on attachment 9012378 [details] [diff] [review]
Add success and entered counters to Baseline's CacheIR_Regular stubs
Review of attachment 9012378 [details] [diff] [review]:
-----------------------------------------------------------------
Nice.
::: js/src/jit/BaselineCacheIRCompiler.cpp
@@ +79,5 @@
> +static void
> +EmitUpdateSuccessCounter(MacroAssembler& masm)
> +{
> + Address success(ICStubReg, ICCacheIR_Regular::offsetOfSuccess());
> + masm.addPtr(Imm32(1), success);
Just for noting: if we end up going the route of using uint32_ts with periodic scaling, then the addPtr() can be followed by an `jif_overflow` to a the scaling code.
@@ +190,5 @@
> masm.setSecondScratchReg(BaselineSecondScratchReg);
> #endif
>
> + if (kind_ == BaselineCacheIRStubKind::Regular) {
> + EmitUpdateEnteredCounter(masm);
As noted before, if we only keep a success count on the stubs, and avoid the "entered" counter.. leaving that to be a sum of the hits across all stubs including fallback stub, we can eliminate this update from the optimized cases.
::: js/src/jit/BaselineIC.h
@@ +823,5 @@
> + // value.
> + uint64_t entered_;
> +
> + // Counts the nmber of times the stub has successfully provided a value.
> + uint64_t success_;
We don't need two fields here. We can keep a count on the fallback stub.
Additionally, this can be a uint32_t. The way to handle this is to periodically scale things
So if a particular IC's total hits become UINT32_MAX count (very rare), then we actually jump to some trampoline code into C++ that walks the IC chain, and scales all the values down by some constant factor (e.g. 2 or 4).
This loses some precision, in that very-low-frequency stubs will scale down to 0.. and we can adjust for that to ensure that a nonzero values never scales down to anything lower than 1.
This has the small imprecision of maybe over-representing small-frequency stubs in extreme cases.. but in practice it won't be a big issue.
![]() |
||
Comment 7•7 years ago
|
||
Comment on attachment 9012379 [details] [diff] [review]
Allow dumping information on script cleanup using DUMP_IC_INFO environment variable
Presumably this would be turned into Spew output.
Attachment #9012379 -
Flags: feedback?(kvijayan) → feedback+
Comment 8•7 years ago
|
||
Comment on attachment 9012378 [details] [diff] [review]
Add success and entered counters to Baseline's CacheIR_Regular stubs
Review of attachment 9012378 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/BaselineCacheIRCompiler.cpp
@@ +76,5 @@
> }
> };
>
> +static void
> +EmitUpdateSuccessCounter(MacroAssembler& masm)
If there's one caller maybe just inline there.
I agree with Kannan that one counter is probably sufficient. I sort of like the success one because "did this optimization succeed?" is the most useful bit of information I expect and answering that won't require more analysis.
I'd consider an uint32_t that we increment without worrying about overflow. Even if we hit this stub 1 million times a second, then it will still take more than an hour to overflow, and in practice GC will probably have purged the stub long before that. For profiling/heuristics I think that's still plenty useful and not worth the extra hit/overflow/complexity of handling overflow. If that makes us uncomfortable I'd say uint64_t for simplicity/perf.
Attachment #9012378 -
Flags: feedback?(jdemooij) → feedback+
Assignee | ||
Comment 9•7 years ago
|
||
I've been working on productizing these patches. The current barrier I'm trying to overcome is that it appears there is no guarantee that by the time emitReturnFromIC() runs ICStubReg is still actually pointed to the beginning of the ICStub.
I'm still trying to find what code right now is most responsible, but it can be seen by injecting a masm.printf and looking at the value of ICStubReg just before we update the 'counter'
ICStubReg: 6210003d7718
ICStubReg: 6210003af310
ICStubReg: 1fffdda13814
ICStubReg: 6210003af700
ICStubReg: 6210003af310
One of these things is not like the other.
My first analytic gut instinct was BaseineCompiler::callVM, but it -appears- that should be saving and restoring ICStubReg.
Assignee | ||
Updated•7 years ago
|
Priority: -- → P3
QA Contact: sdetar
Updated•7 years ago
|
QA Contact: sdetar
Assignee | ||
Comment 10•7 years ago
|
||
By adding a counter increment to the entry of an IC stub, it is possible to analyze
the IC chain for patterns that indicate how the ICs are performing, which can
influence the baseline inspector, and allow Ion to weight the choices it makes on IC
data.
Assignee | ||
Comment 11•7 years ago
|
||
The output for a particular IC chain looks like this:
box2d.js:174:250 (sub) 1009 -> 772 -> (fb) 2
Which is read like this: This sub opcode has two ICs attached. The first IC was
entered 1009 times, the second 772 and the fallback stub was hit twice.
There are some conclusions we can draw from this (and some we cannot)
- We can say with confidence the fallback stub was only hit twice, meaning 99.998%
of queries to the IC chain hit in the IC rather than the fallback.
- We cannot however say necessarily that the first IC failed to provide a value
771 times.
Since new ICs are added at the front, it is possible that there was a phase
transition that happened, and the first 771 requests were served by the first
IC, followed by a transition to a new type and addition of a new IC, and
subsequently 1009 queries were served by the newly added IC.
- We can conclude with confidence that there have been at least two different ICs
involved in computation across the lifetime of the IC chain, and they are almost
equally probable.
Though we can't do temporal reasoning with entry counters, we can still do some
reasoning.
Depends on D8859
Assignee | ||
Updated•7 years ago
|
Attachment #9012378 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #9012379 -
Attachment is obsolete: true
Assignee | ||
Comment 12•7 years ago
|
||
I *think* as a side effect of adding counters to fallback stubs, it's possible to infer branch taken probabilities and code execution frequencies:
At a given branch, walk the bytecode forward on both the target and fallthrough looking for ops with ICEntries; When you encounter them check the IC chains:
1. If there are zero hits to the fallback, the code has never been executed, so you can infer the branch has only been taken the one way.
2. If there is an IC chain, you can sum the number of entries across the chain, and compare frequencies to give a rough branch biasing. If we wanted this to be more robust, on IC unlinking, we could roll the sum into a special field added to ICFallbackStub to help track counts more accurately.
A side note: it may make sense from a reasoning perspective to reset the IC counters on the chain to 1 on the addition of a new stub; this would mean that at any time you look at an IC chain you will be able to reason about the success rate of each stub by looking at it's entry count divided by the entry count of its successors; resetting restores some semblance of the temporal reasoning discussed in Comment 11, as you'd be able to clearly answer the phase transition question.
Comment 13•7 years ago
|
||
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2ac1bca7094c
Add entered counters to Baseline's CacheIR_Regular and Fallback stubs r=jandem
https://hg.mozilla.org/integration/autoland/rev/4bdddb2c5100
Using IONFLAGS=bl-ic-stats dump baseline IC statistics on script cleanup r=jandem
![]() |
||
Comment 14•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2ac1bca7094c
https://hg.mozilla.org/mozilla-central/rev/4bdddb2c5100
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox65:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•