Closed Bug 1912154 Opened 1 year ago Closed 1 year ago

regalloc: Unnecessary stores to argument slots

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED DUPLICATE of bug 1920951

People

(Reporter: iain, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(2 files)

While staring at assembly in bug 1911160, I noticed that some of the MoveGroups were doing unnecessary work.

Here's a small testcase that replicates the same behaviour, moderately trimmed down from the original code:

let C = {};
let c = new Object(C);
C.costMatrix = [0];

C.costBetween = function(from) {
  if (from >= this.costMatrix.length)
    return 0;
  return this.costMatrix[from];
}

with ({}) {}
for (var i = 0; i < 2000; i++) {
   C.costBetween(0)
}

The only Ion compilation when running this testcase is for costBetween. It has three blocks: the entry block and the two returns. In the entry block, it loads two parameters (this and from). At the beginning of the second return block, we generate a MoveGroup with two moves, one per parameter, moving them back into the argument slot. We eventually generate a store for these. I believe these moves are being inserted because of this code, which "adds moves to resolve graph edges with different allocations at their source and target".

There's no reason for us to be doing these stores: the value in the argument slot will never be read again, and in any case we're overwriting the slot with the same value we just read out of it a block earlier.

To make it worse, we can apparently do this multiple times. If if (from >= this.costMatrix.length) is replaced with if (from >= this.costMatrix.length || from == 1 || from == 2|| from == 3 ...), then we generate an extra block for each of the from == N cases, and we generate the same MoveGroup at the beginning of each block. If we use any extra arguments in the first block, then we generate an extra store per MoveGroup.

I don't think any of these stores should be individually very expensive, since they're writing to the arguments slots (which are on the stack and should be in L1) and there are no dependencies waiting for the store to complete, but it still seems potentially worthwhile to clean them up.

Julian, you're currently our resident regalloc expert. On a scale from trivial to impossible, do you have any sense of how hard this might be to fix?

Flags: needinfo?(jseward)

(In reply to Iain Ireland [:iain] from comment #0)

On a scale from trivial to impossible, do you have any sense of how hard this might be to fix?

To know what is really going on, we'd need to dump the LIR coming into the RA
and also the RA's splitting and allocation decisions (IONFLAGS=regalloc).

Generally speaking the RA is good at choosing registers to allocate live ranges
to, but if none are available, it may choose to split ranges and often makes
poor choices in doing so, even for trivial inputs. See bug 1752520.

In bug 1758274 I redid the splitting logic from scratch, which did improve a
lot of the bad cases. But that's a huge changeset we don't want to land, and
I've never been able to figure out how to land bits of it incrementally. Yulia
recently made some patches that improve spill cost estimation, particularly in
loops (bug 1896581), but I don't think that would have any effect in the test
case here since it has no loops.

Back in Feb I rebased the big-splitting-rework patch so we can compare against
it in cases like these. It might be worth trying applying it (bug 1758274
comment 18, then bug 1758274 comment 19) and seeing if you get anything better.

Flags: needinfo?(jseward)
Attached file regalloc.txt

I've attached the regalloc output. I'm cautiously optimistic that this is something specific to how we handle argument slots, and won't require a complete overhaul of range splitting.

My reasoning is that AFAICT we can't actually modify the parameter (at least in JS); assigning to that variable will be reflected in the SSA graph, but the actual value of parameter 0 is immutable. So the register value that we are moving back into the argument slot is necessarily the same value that we loaded out of that slot earlier, and there's no reason to actually perform the store.

I took a quick peek at the regalloc with the patches from bug 1758274. It does appear to be an improvement in this case. Specifically: we have a bundle that looks like this: ``` LB0(parent=none v1 3-27 { 3_def:F:arg:0 9_v1:KA 11_v1:R 13_v1:KA ... } ## v1 32-41 { 37_v1:KA 41_v1:KA }) ``` This represents a parameter to the function that is passed in a stack slot, used once in a register, and otherwise only used by resume points if we bail out. WIth the current splitting algorithm, we split this into two bundles: ``` LB17(parent=LB15 v1 10-27 { 11_v1:R }) LB15(parent=none v1 3-27 { 3_def:F:arg:0 9_v1:KA 13_v1:KA ... } ## v1 32-41 { 37_v1:KA 41_v1:KA }) ``` With the big splitting rework, we instead split like this: ``` LB15(parent=none) v1 3-9 { 3_def:F:arg:0 9_v1:KA } LB16(parent=none) v1 10-27 { 11_v1:R 13_v1:KA ... } ## v1 32-41 { 37_v1:KA 41_v1:KA } ``` Instead of one bundle nested inside another, we have a sequence of two non-overlapping bundles. Since the uses in the second block are part of the same bundle as the register use, they are automatically assigned the same allocation and we don't have to create a MoveGroup. It's not obvious to me whether this actually solves the issue I'd identified (redundant moves to allocations that already contain the required value), or just dodges it in this case. I'm attaching the full regalloc jitspew.
Whiteboard: [sp3]

Bug 1920951 part 5 gets rid of these moves.

For the version with the || from == 1 || from == 2|| from == 3 change that Iain suggested, we get this:

Before:

[Codegen] movq       %rdi, 0x28(%rbp)
[Codegen] movq       %rdx, 0x20(%rbp)
[Codegen] cmpl       $0x1, %esi
[Codegen] je         .Lfrom184
[Codegen] movq       %rdi, 0x28(%rbp)
[Codegen] movq       %rdx, 0x20(%rbp)
[Codegen] cmpl       $0x2, %esi
[Codegen] je         .Lfrom201
[Codegen] movq       %rdi, 0x28(%rbp)
[Codegen] movq       %rdx, 0x20(%rbp)
[Codegen] cmpl       $0x3, %esi
[Codegen] jne        .Lfrom218
[Codegen] movabsq    $0xfff8800000000000, %rcx
[Codegen] jmp        .Lfrom233
[Codegen] movq       %rdi, 0x28(%rbp)
[Codegen] movq       %rdx, 0x20(%rbp)
[Codegen]                                 # LIR=InitializedLength
...

After:

[Codegen] cmpl       $0x1, %esi
[Codegen] je         .Lfrom176
[Codegen] cmpl       $0x2, %esi
[Codegen] je         .Lfrom185
[Codegen] cmpl       $0x3, %esi
[Codegen] jne        .Lfrom194
[Codegen] movabsq    $0xfff8800000000000, %rcx
[Codegen] jmp        .Lfrom209
[Codegen]                                 # LIR=InitializedLength
Depends on: 1920951

Closing per previous two comments.

Status: NEW → RESOLVED
Closed: 1 year ago
Duplicate of bug: 1920951
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: