Consider pre-inlining common IC paths into baseline code
Categories
(Core :: JavaScript Engine: JIT, task, P2)
Tracking
()
People
(Reporter: iain, Unassigned)
References
(Blocks 1 open bug)
Details
Our stub-based approach to ICs gives us huge flexibility. However, it does come at the cost of a lot of indirect calls. Every time we enter an IC chain, we're doing at least one indirect call.
It seems likely that, in many cases, there's a single dominant approach. For example, GuardToObject/GuardShape/Load(Fixed|Dynamic)Slot is probably used for the majority of GetProp ICs. GuardGlobalGeneration/LoadObject/LoadDynamicSlotResult is similarly going to be the majority of GetGName ICs. More speculatively, I wouldn't be surprised if Int32+Int32 was the majority of Add ICs. We could do a rigorous survey to find the distribution.
Once we have that, we could consider speculatively inlining one copy of the dominant IC into the baseline code. For example, in addition to storing an ICStub* for each GetProp, we could have a shape slot and an index slot. When we reach that IC, we would:
- Guard that the shape slot is not null.
- Guard that the receiver is an object (GuardToObject).
- Guard the shape of that object (GuardShape)
- Load the index from the index slot, and use to load the property (LoadFixed|DynamicSlot).
If any of the guards failed, we would call into the IC chain, the same way we do today.
In cases where the IC is monomorphic and matches our guess, we get a fast inline path, and eliminate the indirect call completely. If we're wrong, we have one extra load/branch (for checking that the shape slot is not null). We also spend a couple of words storing the speculative state. If our hit-rate is high enough, this probably pays off.
If we only do this for a few ICs, we can probably do it by hand. If we do it for more, we could consider implementing some sort of InlineCacheIRCompiler as a third subclass of CacheIRCompiler. If we do it right, we might be able to support additional ops by simply specifying the CacheIR sequence we want to inline.
This would let us get maximum performance in the most frequent cases, while still preserving our ability to flexibly optimize less common cases.
Description
•