Open Bug 1689328 Opened 3 years ago Updated 2 years ago

IonMonkey: BacktrackingAllocator: use loop depths in computation of spill costs (weights)

Categories

(Core :: JavaScript Engine: JIT, enhancement, P2)

enhancement

Tracking

()

People

(Reporter: jseward, Assigned: jseward)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

The code produced for wasm-via-Ionmonkey on both x64 and arm64 shows cases where Ion generates unnecessary spilling in non-innermost loops, in particularly for the "embenchen" test rust-fannkuch. On digging around Ion's BacktrackingAllocator.cpp it transpires that the allocator only distinguishes between blocks in innermost loops ("hot blocks") and all others ("cold blocks"), when making spilling/splitting decisions.

This heuristic does appear to be effective at minimising (often completely avoiding) spilling in innermost loops. But it seems inherently suboptimal for any function that has loops nested more than one deep. Consider a function whose body is (in pseudo-C)

  A ; B ; loop { C ; D ; loop { E } }

Here, Ion will identify E as hot, and A, B, C and D as cold. Hence it will be unable to make use of the fact that C and D are likely much hotter than A and B.

In particular, this problem would be most extreme in the case where E is simple (small) and the innermost loop loop { E } typically has a low trip count, whilst C and D are relatively complex (large). Indeed, this is I suspect what happened with rust-fannkuch.

I believe spill-cost computations and hence spilling decisions would be more reasonable if they took loop depths into account. I attach a proof-of-concept patch, which is configured to use the best set of costs I have so far found, for wasm. The patch, while it works, is not yet suitable for production use, and needs reworking. Caveats are:

  • It slows the compiler down tremendously, especially for large functions. This is because of a naive implementation of LoopDepthScaling(). That scans the entire graph, looking for the instruction in question. A production implementation would build some form of instruction-number-to-loop-depth mapping at the start and query that as needed.

  • Per logic in LoopDepthScaling(), the up-scaling of spill cost values relative to the unmodified version can be up to 10^5 (100k). Hence in BacktrackingAllocator::computeSpillWeight() it is infeasible to use the values 1 or 2 million to denote "infinity", since real non-minimal bundles may have a higher cost. Those "infinity" values will need to be pushed upwards -- a lot -- and may require to be represented in 64 bits, not 32.

  • As it currently stands, BacktrackingAllocator::trySplitAcrossHotcode() uses different splitting strategies for wasm and JS. I experimented with using the JS scheme even for wasm, but that gives much worse results. I have not made any assessment of the effects of using the wasm scheme for JS. This is somewhat controlled by config option CONF_MODIFIED_SPLIT at the top of the patch.

Results, for wasm on x64, are shown in the next comment. For the most part they are neutral. However for rust-fannkuch 38% of stores and 24% of loads are removed. The bz2 test also shows around a 4% reduction in memory traffic. There are some smaller losses -- wasm_zlib shows a 2.3% increase in the number of stores. Overall it looks like a win though. Initial measurements indicate the results also carry over to arm64.

As the patch stands, loop-depths are considered for both wasm and JS compilation. I have made no assessment using JS benchmarks and hence it would be interesting (necessary) to find out:

  • whether using loop-depths for JS is a win. If not, it will entail additional complexity in the patch to keep the JS heuristics (behaviour) unchanged.

  • whether using the wasm splitting strategy for JS is a win, in which case we could cut out several tens of lines of splitting code.

Attached patch WIP patchSplinter Review
Assignee: nobody → jseward

For what is worth IonBuilder used to annotate blocks with the number of times they got encountered before compiling, I do not think this is yet the case in WarpBuilder, but I presume that this would be simple to add, and use that as a way to hint the Backtracking Allocator for JS.

Severity: -- → N/A
Priority: -- → P2
Blocks: sm-regalloc
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: