Cranelift: Split live ranges across function calls
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
People
(Reporter: lth, Assigned: lth)
References
Details
The Cranelift register allocator currently has a view of live values that they can either be assigned to a stack slot or to a register for their entire live range. At the same time, every value that is live across a function call must be spilled and is thus assigned to a stack slot. This leads to pessimal code in many situations.
The expedient fix for this is simple live range splitting around calls: a value live across a call is copied into a temporary before the call and out of the temporary after the call, and the temporary is assigned to a stack slot. This effectively splits the live range of the value and allows it to be kept in a register both before and after the call.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
Work is happening here: https://github.com/lars-t-hansen/cranelift/tree/call_splitting
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 2•6 years ago
|
||
I've implemented this and have some preliminary reports.
Simply splitting live ranges across calls by inserting a copy as descirbed above works very well for some simple benchmark programs such as fib and wasmdot3 -- cranelift code beats ion code handily after this.
On larger programs, slowdowns are generally observed because of the increased stack traffic. To handle this, I implemented a basket of optimizations:
- don't spill constants but regenerate them after the call
- don't spill values that have been spilled in a dominator; reuse the spilled value
- don't fill values and regenerate constants that aren't referenced
These optimizations improve the situation but by and large, on our benchmark suite, code with splitting across calls is still slower than the original allocator. The reasons appear to be at least three: there are often many more values to save/restore than there are registers, so both save and restore end up copying from memory to memory for a number of values; values that have been hoisted out of loops and are live after will tend to be re-introduced in the loop if there's a call in the loop, but should be saved/restored across the loop instead; and the algorithm that inserts copies ends up creating redundant phi parameters in some cases, which will tend to obscure what the values are and reduce the efficacy of the optimizations, and to introduce more values to be saved.
We might consider additionally pruning the SSA for so as to remove the redundant names. As for saving/restoring too many values and re-introducing values in a loop, fixing this gets us into general live range splitting (bug 1552740).
Live range splitting across calls has a large negative impact on compiler speed, though probably in part due to using some naive data structures and algorithms.
In addition to that, I've experimented with a simple rematerialization pass where constants are sunk down to just before their uses; this improves some benchmark programs mildly (a couple of percentage points) and appears to have no detrimental effects apart from adding a pass to the compiler. The main effect appears to come from sinking integer constants and floating point zeroes; other floating point constants are apparently less important.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•5 years ago
|
Description
•