Closed Bug 814966 Opened 13 years ago Closed 13 years ago

IonMonkey: implement LLVM's greedy backtracking register allocator

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla20

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(2 files, 2 obsolete files)

WIP (e5082df10222) 13 years ago Brian Hackett [Laid off!] 91.25 KB, patch		Details \| Diff \| Splinter Review
WIP (e5082df10222) 13 years ago Brian Hackett [Laid off!] 95.01 KB, patch		Details \| Diff \| Splinter Review
am3 microbenchmark 13 years ago Brian Hackett [Laid off!] 866 bytes, application/x-javascript		Details
patch (e5082df10222) 13 years ago Brian Hackett [Laid off!] 108.65 KB, patch	jandem : review+	Details \| Diff \| Splinter Review

Brian Hackett [Laid off!]

Assignee

Description

•

13 years ago

Last year a post was made to the LLVM project blog about the new register allocator used in LLVM 3.0 http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html Since this post there has been a lot of interest inside and outside the JS team in trying this allocator out in IonMonkey. The appealing features are (a) potentially faster generated code than a well tuned LSRA, and (b) less tuning required to generate good code than in LSRA. Summarizing and extrapolating a bit the linked post, the design of this allocator is very similar to LSRA, selecting an allocation based on the live ranges of a set of virtual registers and allowing those live ranges to be split during allocation. The key differences are: a) Assigning physical locations to vregs using a priority queue rather than visiting them in linear order. High priority vregs have high use densities in their live range. By itself this is kind of a wash and generates similar code to LSRA, but making this change enables (b). b) Allowing unassignment of previous allocations. This is different from spilling an existing allocation (as LSRA does), and works by simply freeing the vreg's physical register and placing it back in the priority queue for a second chance in a different register. The idea with the (b) is that visiting vregs in priority order can easily generate a suboptimal (in the graph coloring sense) allocation. For example, with live ranges such as: v1 .. v2 ... v3 ..... On a two register machine, visiting these in the order v1,v2,v3 may allocate different physical registers to v1 and v2, even though they do not interfere, and leaving no option for allocating v3. With this backtracking, however, allocating v3 could unassign v2, give v3 a physical register, then revisit v2 and give it the remaining one. If the example were larger and a physical register could not be found for v3, its live range can be split at arbitrary points while it is in the priority queue, to facilitate easier allocation. I like the flexibility and cleanliness of this design a lot. My main concern is running time of the allocator, as even an argument for termination cannot be made for the general approach. It seems like such an argument could be made with a few more constraints for when it is possible to split and unassign vregs, and without much tuning this allocator could do a good job even on large scripts by quickly allocating for the large boring parts and then focusing in on refining a good allocation for short loop bodies scattered about.

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Depends on: 817213

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Depends on: 817769

Brian Hackett [Laid off!]

Assignee

Comment 1

•

13 years ago

Attached patch WIP (e5082df10222) (obsolete) — Details — Splinter Review

WIP, generates an ok allocation on am3() but still a little worse than LSRA (on x86, 32.7% spill code vs. 31.6% for LSRA). Needs more tuning, including better handling for phis and MUST_REUSE_INPUT use/def chains.

Assignee: general → bhackett1024

Brian Hackett [Laid off!]

Assignee

Comment 2

•

13 years ago

Attached patch WIP (e5082df10222) (obsolete) — Details — Splinter Review

Patch with more tuning for am3/x86. On an am3 microbenchmark (adjusted so that type information / etc. is the same as on v8-crypto) this improves the spill code % from 31.6% to 26.4%, giving an icount reduction of 7.4% and a raw speedup of 5.1%. I think there's still more to be had here (plus examining behavior on other benchmarks and architectures) but for now want to get this initial version passing jit-tests and into the tree.

Attachment #690385 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Assignee

Comment 3

•

13 years ago

Attached file am3 microbenchmark — Details

Actually, playing a bit more with the microbenchmark, the performance curve is very sensitive to the length of the input arrays. Varying this the raw speedup from the backtracking allocator is between 5% and 25% (!) though seems to stabilize around 10% for larger arrays. Microbenchmark attached; the input array lengths in the full benchmark are usually either 19 or 1, though I don't think it's worth trying to game the latter case. Mostly interested in am3 for the intense register pressure in its inner loop.

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Depends on: 814396

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Depends on: 813671

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Depends on: 821151

Brian Hackett [Laid off!]

Assignee

Comment 4

•

13 years ago

Attached patch patch (e5082df10222) — Details — Splinter Review

Passes jit-tests --ion on x86 (with the dependent bug fixes), and preserves the regalloc improvements on am3. Still a good ways to go, but I'm happy with this allocator as a starting point.

Attachment #690593 - Attachment is obsolete: true

Attachment #691683 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 5

•

13 years ago

Comment on attachment 691683 [details] [diff] [review] patch (e5082df10222) Review of attachment 691683 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, from reading the code and comments it's pretty clear how the algorithm works. Can you check how this allocator does on the micro-benchmark in bug 688078? It's based on the safe_add function in SS crypto-md5/sha1 and has an inner loop with high register pressure. We should also see how running time compares to LSRA and how this allocator does on e.g. Kraken and Emscripen code, but we can do that after landing the initial version. Maybe also good to ask for fuzzing before improving it further. ::: js/src/ion/BacktrackingAllocator.cpp @@ +57,5 @@ > + if (block->mir()->isLoopHeader()) > + backedge = block->mir()->backedge()->lir(); > + > + if (block == backedge) { > + LBlock *header = block->mir()->getSuccessor(0)->lir(); Nit: s/getSuccessor(0)/loopHeaderOfBackedge() @@ +127,5 @@ > + if (IonSpewEnabled(IonSpew_RegAlloc)) > + dumpRegisterGroups(); > + > + // Allocate, spill and split register intervals until finished. > + while (!queuedIntervals.empty()) { Nit: check shouldCancel() @@ +218,5 @@ > + > +bool > +BacktrackingAllocator::groupAndQueueRegisters() > +{ > + for (size_t i = 0; i < graph.numVirtualRegisters(); i++) { Nit: check shouldCancel() @@ +232,5 @@ > + } > + } > + > + LDefinition *def = reg.def(); > + if (def && def->policy() == LDefinition::MUST_REUSE_INPUT) { Can def be NULL here? @@ +336,5 @@ > + return true; > + } > + > + // Spill intervals which have no hint or register requirement. > + if (interval->requirement()->kind() == Requirement::NONE) { Can we assert interval->hint()->kind() is NONE? @@ +379,5 @@ > + > +bool > +BacktrackingAllocator::isReusedInput(LUse *use, LInstruction *ins, bool considerCopy) > +{ > + if (ins->numDefs() == 0 || ins->getDef(0)->policy() != LDefinition::MUST_REUSE_INPUT) What about temps with MUST_REUSE_INPUT policy, as created by tempCopy? If we can ignore these it deserves a comment. @@ +653,5 @@ > + LiveInterval *prevInterval = reg->getInterval(k); > + if (prevInterval->start() != interval->start()) > + break; > + if (*prevInterval->getAllocation() == *interval->getAllocation()) > + skip = true; We can "break;" here too right? @@ +661,5 @@ > + > + LBlock *block = insData[interval->start()].block(); > + if (interval->start() > inputOf(block->firstId())) { > + CodePosition start = interval->start(); > + InstructionData *data = &insData[start]; Nit: can move these before the "if" so that you can use data->block() to get the block. @@ +1030,5 @@ > + size_t usesTotal = 0; > + > + if (interval->index() == 0) { > + VirtualRegister *reg = &vregs[interval->vreg()]; > + if (reg->def()->policy() == LDefinition::PRESET && reg->def()->output()->isRegister()) { Nit: no braces ::: js/src/ion/BacktrackingAllocator.h @@ +12,5 @@ > + > +#include "ds/PriorityQueue.h" > +#include "ds/SplayTree.h" > + > +// Backtracking priority queue based register allocator based on that described Can you add a comment somewhere explaining how the live ranges differ from LSRA? Ideally both allocators would use exactly the same liveness information, but if doing it this way simplifies the algorithm it should be fine, as long as we explain how and why. ::: js/src/ion/LIR-Common.h @@ +829,5 @@ > } > }; > > // Takes in either an integer or boolean input and tests it for truthiness. > +class LTestDAndBranch : public LInstructionHelper<0, 1, 0> Good catch. ::: js/src/ion/LIR.h @@ +200,5 @@ > > +#ifdef DEBUG > + const char * toString() const; > +#else > + const char * toString() const { return "???"; } Nit: no space after * ::: js/src/ion/LiveRangeAllocator.cpp @@ +617,5 @@ > + AnyRegister reg = temp->output()->toRegister(); > + if (!addFixedRangeAtHead(reg, inputOf(*ins), outputOf(*ins))) > + return false; > + } else { > + if (!vregs[temp].getInterval(0)->addRangeAtHead(inputOf(*ins), outputOf(*ins))) JS_ASSERT(!ins->isCall()); ::: js/src/ion/RegisterAllocator.cpp @@ +94,1 @@ > #ifdef DEBUG Nit: move this #ifdef before the if (IonSpewEnabled(...))

Attachment #691683 - Flags: review?(jdemooij) → review+

Brian Hackett [Laid off!]

Assignee

Comment 6

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/dc4887f61d2e

Ryan VanderMeulen [:RyanVM]

Comment 7

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/dc4887f61d2e

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla20

Brian Hackett [Laid off!]

Assignee

Updated

•

13 years ago

Blocks: 822116

Brian Hackett [Laid off!]

Assignee

Updated

•

12 years ago

Blocks: 825447

Brian Hackett [Laid off!]

Assignee

Updated

•

12 years ago

Depends on: 826734

Brian Hackett [Laid off!]

Assignee

Updated

•

12 years ago

Blocks: 826741

You need to log in before you can comment on or make changes to this bug.

Bugzilla

IonMonkey: implement LLVM's greedy backtracking register allocator

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 2 obsolete files)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type