regalloc: Unnecessary stores to argument slots
Categories
(Core :: JavaScript Engine: JIT, enhancement, P3)
Tracking
()
People
(Reporter: iain, Unassigned)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [sp3])
Attachments
(2 files)
While staring at assembly in bug 1911160, I noticed that some of the MoveGroups were doing unnecessary work.
Here's a small testcase that replicates the same behaviour, moderately trimmed down from the original code:
let C = {};
let c = new Object(C);
C.costMatrix = [0];
C.costBetween = function(from) {
if (from >= this.costMatrix.length)
return 0;
return this.costMatrix[from];
}
with ({}) {}
for (var i = 0; i < 2000; i++) {
C.costBetween(0)
}
The only Ion compilation when running this testcase is for costBetween. It has three blocks: the entry block and the two returns. In the entry block, it loads two parameters (this and from). At the beginning of the second return block, we generate a MoveGroup with two moves, one per parameter, moving them back into the argument slot. We eventually generate a store for these. I believe these moves are being inserted because of this code, which "adds moves to resolve graph edges with different allocations at their source and target".
There's no reason for us to be doing these stores: the value in the argument slot will never be read again, and in any case we're overwriting the slot with the same value we just read out of it a block earlier.
To make it worse, we can apparently do this multiple times. If if (from >= this.costMatrix.length) is replaced with if (from >= this.costMatrix.length || from == 1 || from == 2|| from == 3 ...), then we generate an extra block for each of the from == N cases, and we generate the same MoveGroup at the beginning of each block. If we use any extra arguments in the first block, then we generate an extra store per MoveGroup.
I don't think any of these stores should be individually very expensive, since they're writing to the arguments slots (which are on the stack and should be in L1) and there are no dependencies waiting for the store to complete, but it still seems potentially worthwhile to clean them up.
Julian, you're currently our resident regalloc expert. On a scale from trivial to impossible, do you have any sense of how hard this might be to fix?
Comment 1•1 year ago
|
||
(In reply to Iain Ireland [:iain] from comment #0)
On a scale from trivial to impossible, do you have any sense of how hard this might be to fix?
To know what is really going on, we'd need to dump the LIR coming into the RA
and also the RA's splitting and allocation decisions (IONFLAGS=regalloc).
Generally speaking the RA is good at choosing registers to allocate live ranges
to, but if none are available, it may choose to split ranges and often makes
poor choices in doing so, even for trivial inputs. See bug 1752520.
In bug 1758274 I redid the splitting logic from scratch, which did improve a
lot of the bad cases. But that's a huge changeset we don't want to land, and
I've never been able to figure out how to land bits of it incrementally. Yulia
recently made some patches that improve spill cost estimation, particularly in
loops (bug 1896581), but I don't think that would have any effect in the test
case here since it has no loops.
Back in Feb I rebased the big-splitting-rework patch so we can compare against
it in cases like these. It might be worth trying applying it (bug 1758274
comment 18, then bug 1758274 comment 19) and seeing if you get anything better.
| Reporter | ||
Comment 2•1 year ago
|
||
| Reporter | ||
Comment 3•1 year ago
|
||
I've attached the regalloc output. I'm cautiously optimistic that this is something specific to how we handle argument slots, and won't require a complete overhaul of range splitting.
My reasoning is that AFAICT we can't actually modify the parameter (at least in JS); assigning to that variable will be reflected in the SSA graph, but the actual value of parameter 0 is immutable. So the register value that we are moving back into the argument slot is necessarily the same value that we loaded out of that slot earlier, and there's no reason to actually perform the store.
| Reporter | ||
Comment 4•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 5•1 year ago
|
||
Bug 1920951 part 5 gets rid of these moves.
Comment 6•1 year ago
•
|
||
For the version with the || from == 1 || from == 2|| from == 3 change that Iain suggested, we get this:
Before:
[Codegen] movq %rdi, 0x28(%rbp)
[Codegen] movq %rdx, 0x20(%rbp)
[Codegen] cmpl $0x1, %esi
[Codegen] je .Lfrom184
[Codegen] movq %rdi, 0x28(%rbp)
[Codegen] movq %rdx, 0x20(%rbp)
[Codegen] cmpl $0x2, %esi
[Codegen] je .Lfrom201
[Codegen] movq %rdi, 0x28(%rbp)
[Codegen] movq %rdx, 0x20(%rbp)
[Codegen] cmpl $0x3, %esi
[Codegen] jne .Lfrom218
[Codegen] movabsq $0xfff8800000000000, %rcx
[Codegen] jmp .Lfrom233
[Codegen] movq %rdi, 0x28(%rbp)
[Codegen] movq %rdx, 0x20(%rbp)
[Codegen] # LIR=InitializedLength
...
After:
[Codegen] cmpl $0x1, %esi
[Codegen] je .Lfrom176
[Codegen] cmpl $0x2, %esi
[Codegen] je .Lfrom185
[Codegen] cmpl $0x3, %esi
[Codegen] jne .Lfrom194
[Codegen] movabsq $0xfff8800000000000, %rcx
[Codegen] jmp .Lfrom209
[Codegen] # LIR=InitializedLength
Comment 7•1 year ago
|
||
Closing per previous two comments.
Description
•