Optimize calls to tryAllocateRegister a bit in the register allocator
Categories
(Core :: JavaScript Engine: JIT, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox95 | --- | fixed |
People
(Reporter: jandem, Assigned: jandem)
References
(Blocks 1 open bug)
Details
(Keywords: perf-alert)
Attachments
(3 files)
The register allocator has a few places where it loops from 0 to AnyRegister::Total
to search for an available register. Because this includes floating point register aliases, there can be quite a lot of registers (128 on arm64).
tryAllocateRegister
doesn't do much work if the type doesn't match, but we can optimize this pretty easily, for example by considering only general or float registers depending on the bundle's type. This eliminates a lot of unnecessary calls to tryAllocateRegister
.
Comment 1•3 years ago
|
||
+1 on removing some of the constant factors in regalloc!
Those loops have been bugging me for a while and I guess first SIMD and then ARM64 really changed the assumptions for them.
Just fully unfolding the search a la Bentley's binary search should be able to find a register in log(register set size) operations and should not require dispatching on type first. Something like this:
uint64_t b = <low 64 bits of register set>
int code = 0
if (b == 0) { b = <high 64 bits of register set>; code += 64; }
if ((b & 0xFFFFFFFF) == 0) { b >>= 32; code += 32; }
if ((b & 0xFFFF) == 0) { b >>= 16; code += 16 }
if ((b & 0xFF) == 0) { b >>= 8; code += 8 }
if ((b & 0xF) == 0) { b >>= 4; code += 4 }
if ((b & 0x3) == 0) { b >>= 2; code += 2 }
if ((b & 1) == 0) { b >>= 1; code += 1 }
if (b) return code
return <notfound>
Comment 2•3 years ago
|
||
Post-coffee, /me sees that tryAllocateRegister is more involved than that...
Assignee | ||
Comment 3•3 years ago
|
||
For each virtual register id, the register allocator has a VirtualRegister instance.
LiveRange used to store the virtual register id, but it's more efficient for it to
point to the VirtualRegister because this lets us get rid of the array lookup to go
from id to VirtualRegister.
This doesn't increase the size of LiveRange, because conveniently it had four padding
bytes on 64-bit platforms.
Assignee | ||
Comment 4•3 years ago
|
||
tryAllocateRegister is one of the hottest functions in the register allocator.
We had a few places where we call this for every register until we find one.
Modern architectures have a lot of registers (ARM64 has 32 general + 96 float
because of all float aliases) so this could result in a lot of calls.
Optimize this to only try the general registers if we need a general register,
and only try the float registers if we need a float register. This eliminates
at least 65% of the calls to tryAllocateRegister.
This could probably be optimized more in the future by similarly separating the
Float32/Double/SIMD cases better.
Depends on D127653
Assignee | ||
Comment 5•3 years ago
|
||
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c17a19015337 part 1 - Store VirtualRegister* instead of vreg id in LiveRange. r=iain https://hg.mozilla.org/integration/autoland/rev/79928c822bcc part 2 - Optimize calls to tryAllocateRegister a bit. r=iain https://hg.mozilla.org/integration/autoland/rev/c41e6aa3e428 part 3 - Add tryAllocateAnyRegister to deduplicate some code. r=iain
Comment 7•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c17a19015337
https://hg.mozilla.org/mozilla-central/rev/79928c822bcc
https://hg.mozilla.org/mozilla-central/rev/c41e6aa3e428
Assignee | ||
Comment 8•3 years ago
|
||
I noticed this alert that has the changes in this bug in the commit log:
== Change summary for alert #31768 (as of Fri, 08 Oct 2021 18:19:34 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
3% | wasm-godot-optimizing | macosx1015-64-shippable-qr | webrender | 1,250.47 -> 1,218.50 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31768
Comment 9•3 years ago
|
||
Nice!
Updated•3 years ago
|
Description
•