Closed Bug 1734152 Opened 4 years ago Closed 4 years ago

Optimize calls to tryAllocateRegister a bit in the register allocator

Tracking

()

Status:

RESOLVED FIXED

Milestone:

95 Branch

Tracking Flags:

Tracking

Status

firefox95

---

fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(3 files)

Bug 1734152 part 1 - Store VirtualRegister* instead of vreg id in LiveRange. r?iain! 4 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1734152 part 2 - Optimize calls to tryAllocateRegister a bit. r?iain! 4 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1734152 part 3 - Add tryAllocateAnyRegister to deduplicate some code. r?iain! 4 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review

Jan de Mooij [:jandem]

Assignee

Description

•

4 years ago

The register allocator has a few places where it loops from 0 to AnyRegister::Total to search for an available register. Because this includes floating point register aliases, there can be quite a lot of registers (128 on arm64).

tryAllocateRegister doesn't do much work if the type doesn't match, but we can optimize this pretty easily, for example by considering only general or float registers depending on the bundle's type. This eliminates a lot of unnecessary calls to tryAllocateRegister.

Lars T Hansen [:lth]

Comment 1

•

4 years ago

+1 on removing some of the constant factors in regalloc!

Those loops have been bugging me for a while and I guess first SIMD and then ARM64 really changed the assumptions for them.

Just fully unfolding the search a la Bentley's binary search should be able to find a register in log(register set size) operations and should not require dispatching on type first. Something like this:

uint64_t b = <low 64 bits of register set>
int code = 0
if (b == 0) { b = <high 64 bits of register set>; code += 64; }
if ((b & 0xFFFFFFFF) == 0) { b >>= 32; code += 32; }
if ((b & 0xFFFF) == 0) { b >>= 16; code += 16 }
if ((b & 0xFF) == 0) { b >>= 8; code += 8 }
if ((b & 0xF) == 0) { b >>= 4; code += 4 }
if ((b & 0x3) == 0) { b >>= 2; code += 2 }
if ((b & 1) == 0) { b >>= 1; code += 1 }
if (b) return code
return <notfound>

Lars T Hansen [:lth]

Comment 2

•

4 years ago

Post-coffee, /me sees that tryAllocateRegister is more involved than that...

Jan de Mooij [:jandem]

Assignee

Comment 3

•

4 years ago

Attached file Bug 1734152 part 1 - Store VirtualRegister* instead of vreg id in LiveRange. r?iain! — Details

For each virtual register id, the register allocator has a VirtualRegister instance.

LiveRange used to store the virtual register id, but it's more efficient for it to
point to the VirtualRegister because this lets us get rid of the array lookup to go
from id to VirtualRegister.

This doesn't increase the size of LiveRange, because conveniently it had four padding
bytes on 64-bit platforms.

Jan de Mooij [:jandem]

Assignee

Comment 4

•

4 years ago

Attached file Bug 1734152 part 2 - Optimize calls to tryAllocateRegister a bit. r?iain! — Details

tryAllocateRegister is one of the hottest functions in the register allocator.
We had a few places where we call this for every register until we find one.
Modern architectures have a lot of registers (ARM64 has 32 general + 96 float
because of all float aliases) so this could result in a lot of calls.

Optimize this to only try the general registers if we need a general register,
and only try the float registers if we need a float register. This eliminates
at least 65% of the calls to tryAllocateRegister.

This could probably be optimized more in the future by similarly separating the
Float32/Double/SIMD cases better.

Depends on D127653

Jan de Mooij [:jandem]

Assignee

Comment 5

•

4 years ago

Attached file Bug 1734152 part 3 - Add tryAllocateAnyRegister to deduplicate some code. r?iain! — Details

Pulsebot

Comment 6

•

4 years ago

Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c17a19015337 part 1 - Store VirtualRegister* instead of vreg id in LiveRange. r=iain https://hg.mozilla.org/integration/autoland/rev/79928c822bcc part 2 - Optimize calls to tryAllocateRegister a bit. r=iain https://hg.mozilla.org/integration/autoland/rev/c41e6aa3e428 part 3 - Add tryAllocateAnyRegister to deduplicate some code. r=iain

Noemi Erli[:noemi_erli]

Comment 7

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c17a19015337
https://hg.mozilla.org/mozilla-central/rev/79928c822bcc
https://hg.mozilla.org/mozilla-central/rev/c41e6aa3e428

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox95: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 95 Branch

Jan de Mooij [:jandem]

Assignee

Comment 8

•

4 years ago

I noticed this alert that has the changes in this bug in the commit log:

== Change summary for alert #31768 (as of Fri, 08 Oct 2021 18:19:34 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
3%	wasm-godot-optimizing	macosx1015-64-shippable-qr	webrender	1,250.47 -> 1,218.50

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31768

Lars T Hansen [:lth]

Comment 9

•

4 years ago

Nice!

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

4 years ago

Keywords: perf-alert

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Optimize calls to tryAllocateRegister a bit in the register allocator

Categories

(Core :: JavaScript Engine: JIT, task, P3)

Tracking

()

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

(Keywords: perf-alert)

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Improvements:

Comment 9

Updated

Attachment

General

Description

File Name

Content Type