Closed
Bug 609899
Opened 14 years ago
Closed 14 years ago
JM: Allocate registers across branches and joins
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bhackett1024, Assigned: bhackett1024)
References
Details
Attachments
(1 file, 3 obsolete files)
159.01 KB,
patch
|
Details | Diff | Splinter Review |
Using the information computed by the analysis in bug 608346, JM can compute a register allocation to use at join points, and maintain variables in registers when branching or joining.
Assignee | ||
Updated•14 years ago
|
Blocks: TypeInference
Assignee | ||
Comment 1•14 years ago
|
||
This works on toy examples. This does an analysis pass to precompute some live and soon-to-be-used variables at each bytecode and hands things off to the FrameState to maintain registers within basic blocks and pick a consistent register allocation at join points. I'm not sure this is the best approach. The basic problem is that this doesn't fully utilize the registers across branches --- with 5 registers total to use on x86, only 3 are carried across to ensure that 2 are available for testing the branch condition. This hurts on microbenchmarks: Everything fits in 3 registers. function foo(x, y) { var res = 0; for (var i = 0; i < x; i++) res++; } foo(10000000, 1); v8 28.8 jsc 32.1 js -j 22.2 js -m (old) 33.0 js -m (types) 28.8 js -m (new) 14.5 Larger loop, res is spilled. function foo(x, y) { var res = 0; for (var i = 0; i < x; i++) res += y; } foo(10000000, 1); v8 28.7 jsc 36.0 js -j 28.4 js -m (old) 43.0 js -m (types) 25.2 js -m (new) 24.7 I'd like to investigate using Linear Scan register allocation or a variant, which is the algorithm (apparently) used in the HotSpot client JVM. This is a backwards pass to compute live variables and ranges, then a forward pass to assign registers. Which is similar to what this patch already does, except it has full liveness information and would need info about intermediate stack temporaries and temporaries within a compiled opcode.
Assignee | ||
Comment 2•14 years ago
|
||
Rewrite. I like this design a lot more. This is based on a binpacking Linear Scan allocator ("Quality and Speed in Linear-scan Register Allocation", by Traub et. al.), combining a fast liveness prepass with a forward, allocate-as-you-go walk, as the FrameState does now. So processing of individual ops doesn't change much at all, and the FrameState just gets some eviction heuristics (not implemented yet). The tricky stuff is (still) carrying registers around loops, which Linear Scan work generally avoids talking about (hmm...). Instead of making a guess at a good loop head allocation, we defer its generation, watching for variables which are accessed early in the loop body and retroactively deciding they were in registers at the loop head. This requires patching up previous stub code rejoin paths (but not previous sync code --- registers at loop heads are always synced), not implemented yet. This approach lets any number of registers be carried around the loop, but only ones which will be used quickly end up getting carried. This gives a good allocation both on simple loops (both loops in comment 1 take 14.5ms) and on loops with lots of register pressure (am3 in v8-crypto, only simulated so far).
Assignee: general → bhackett1024
Attachment #490459 -
Attachment is obsolete: true
Assignee | ||
Comment 3•14 years ago
|
||
Passes trace-tests with --enable-type-inference. Still need to get methodjit/tracejit integration working.
Attachment #494548 -
Attachment is obsolete: true
Assignee | ||
Comment 4•14 years ago
|
||
Patch landed on JM. This needs more tuning in the Compiler (most conditional branches still syncAndForgetEverything()) before measuring effectiveness, but the FrameState and analysis changes are in good shape and things work with and without type inference. http://hg.mozilla.org/projects/jaegermonkey/rev/25d5598cbad9
Attachment #495445 -
Attachment is obsolete: true
Assignee | ||
Comment 5•14 years ago
|
||
Initial crop of regalloc bugfixes (and compiler bug workaround, blech), from broken mochitests. http://hg.mozilla.org/projects/jaegermonkey/rev/ac0a42d57813 http://hg.mozilla.org/projects/jaegermonkey/rev/9b8ceaa548c2 http://hg.mozilla.org/projects/jaegermonkey/rev/662b71c3ff24 http://hg.mozilla.org/projects/jaegermonkey/rev/136d6a93418e http://hg.mozilla.org/projects/jaegermonkey/rev/da4218029a5c
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
(In reply to comment #1) > I'd like to investigate using Linear Scan register allocation or a variant, > which is the algorithm (apparently) used in the HotSpot client JVM. Just to note, the server compiler uses a graph-coloring one, while it takes more time it yields more efficient results, which could be worthwhile once/if off-thread compiling gets implemented in JM.
You need to log in
before you can comment on or make changes to this bug.
Description
•