Closed Bug 1734152 Opened 1 year ago Closed 1 year ago

Optimize calls to tryAllocateRegister a bit in the register allocator


(Core :: JavaScript Engine: JIT, task, P3)




95 Branch
Tracking Status
firefox95 --- fixed


(Reporter: jandem, Assigned: jandem)


(Blocks 1 open bug)


(Keywords: perf-alert)


(3 files)

The register allocator has a few places where it loops from 0 to AnyRegister::Total to search for an available register. Because this includes floating point register aliases, there can be quite a lot of registers (128 on arm64).

tryAllocateRegister doesn't do much work if the type doesn't match, but we can optimize this pretty easily, for example by considering only general or float registers depending on the bundle's type. This eliminates a lot of unnecessary calls to tryAllocateRegister.

+1 on removing some of the constant factors in regalloc!

Those loops have been bugging me for a while and I guess first SIMD and then ARM64 really changed the assumptions for them.

Just fully unfolding the search a la Bentley's binary search should be able to find a register in log(register set size) operations and should not require dispatching on type first. Something like this:

uint64_t b = <low 64 bits of register set>
int code = 0
if (b == 0) { b = <high 64 bits of register set>; code += 64; }
if ((b & 0xFFFFFFFF) == 0) { b >>= 32; code += 32; }
if ((b & 0xFFFF) == 0) { b >>= 16; code += 16 }
if ((b & 0xFF) == 0) { b >>= 8; code += 8 }
if ((b & 0xF) == 0) { b >>= 4; code += 4 }
if ((b & 0x3) == 0) { b >>= 2; code += 2 }
if ((b & 1) == 0) { b >>= 1; code += 1 }
if (b) return code
return <notfound>

Post-coffee, /me sees that tryAllocateRegister is more involved than that...

For each virtual register id, the register allocator has a VirtualRegister instance.

LiveRange used to store the virtual register id, but it's more efficient for it to
point to the VirtualRegister because this lets us get rid of the array lookup to go
from id to VirtualRegister.

This doesn't increase the size of LiveRange, because conveniently it had four padding
bytes on 64-bit platforms.

tryAllocateRegister is one of the hottest functions in the register allocator.
We had a few places where we call this for every register until we find one.
Modern architectures have a lot of registers (ARM64 has 32 general + 96 float
because of all float aliases) so this could result in a lot of calls.

Optimize this to only try the general registers if we need a general register,
and only try the float registers if we need a float register. This eliminates
at least 65% of the calls to tryAllocateRegister.

This could probably be optimized more in the future by similarly separating the
Float32/Double/SIMD cases better.

Depends on D127653

Pushed by
part 1 - Store VirtualRegister* instead of vreg id in LiveRange. r=iain
part 2 - Optimize calls to tryAllocateRegister a bit. r=iain
part 3 - Add tryAllocateAnyRegister to deduplicate some code. r=iain
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch

I noticed this alert that has the changes in this bug in the commit log:

== Change summary for alert #31768 (as of Fri, 08 Oct 2021 18:19:34 GMT) ==


Ratio Test Platform Options Absolute values (old vs new)
3% wasm-godot-optimizing macosx1015-64-shippable-qr webrender 1,250.47 -> 1,218.50

For up to date results, see:


You need to log in before you can comment on or make changes to this bug.