Closed Bug 948838 Opened 11 years ago Closed 8 years ago

Extra spill/reload with backtracking allocator

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Assigned: bhackett1024)

References

Details

Attachments

(2 files, 4 obsolete files)

See bz's testcase in bug 946426 comment 0: (function(count) { var start = new Date; while (count-- != 0); var stop = new Date; print((stop - start) / 10000000 * 1e6); })(10000000); On 32-bit I get this with LSRA: [Codegen] movl %eax, %ebp [Codegen] subl $0x1, %ebp [Codegen] jo ((705)) [Codegen] testl %eax, %eax [Codegen] je ((713)) [Codegen] movl %ebp, %eax [Codegen] jmp ((720)) And with --ion-regalloc=backtracking: [Codegen] movl %ebx, %eax [Codegen] subl $0x1, %eax [Codegen] jo ((711)) [Codegen] movl %eax, 0x78(%esp) [Codegen] testl %ebx, %ebx [Codegen] je ((723)) [Codegen] movl 0x78(%esp), %ebx [Codegen] jmp ((732)) It would be nice if we could get rid of this store/load inside the loop, like with LSRA. The problem is worse if I change |count-- != 0| to |--count != 0|: With LSRA: [Codegen] subl $0x1, %eax [Codegen] jo ((703)) [Codegen] testl %eax, %eax [Codegen] jne ((711)) Backtracking: [Codegen] subl $0x1, %ebx [Codegen] jo ((709)) [Codegen] movl %ebx, 0x78(%esp) [Codegen] testl %ebx, %ebx [Codegen] je ((721)) [Codegen] movl 0x78(%esp), %ebx [Codegen] jmp ((730))
LiveIntervals don't currently contain the uses for phi nodes. Consequently, the heuristics see no reason to extend the lifetime of the result register of the sub. Also, phi uses are LUse::ANY, which can be in memory but prefers a register, and the backtracking allocator often ignores this preference (isRegisterUse returns false for ANY in many cases). That explains the load. The store comes from the fact that there are LUse::KEEPALIVE uses after the loop, and the backtracking allocator is spilling the value as soon as it can. This may be trickier to fix, though it's also less important.
Attached patch patch (obsolete) — Splinter Review
I think another problem here is that the presence of the call to new Date after the loop is affecting register allocation within the loop. I don't think this should be happening --- when allocating we should consider the structure of the hot code first and fill in the details for cold code afterward. trySplitAcrossHotcode originally did this, but with bug 939824 its behavior changed somewhat (split points were still done based on the interval's register uses, which are not ideal here) and before that bug 826734 gave splitting across calls (as is done here) a higher priority than splitting across hot code. The attached patch restores the earlier behavior and fixes this testcase but for several octane tests increases the amount of spill code executed: OLD NEW richards.js: 0.082 .079 deltablue.js: 0.099 .103 crypto.js: 0.272 .29 raytrace.js: 0.095 .097 earley-boyer.js: 0.087 .114 splay.js: 0.063 .063 navier-stokes.js: 0.286 .288 pdfjs.js: 0.152 .148 mandreel.js: 0.252 .234 gbemu.js: 0.083 .136 box2d.js: 0.082 .083 zlib.js: 0.186 .186 typescript.js: 0.095 .103 So I'd like to not land this without a better understanding / fix for what's going on here.
Attached patch tweaks (obsolete) — Splinter Review
I think the main issue here is the disabling of spilling to argument slots which happened in bug 906858. Splitting at hot code boundaries presumably leads to more cold intervals without register uses, and when those are arguments we end up doing memory -> memory moves. If I restore the argument slot spilling and make the attached minor tweaks I get the following fractions of spill code executed, which are generally flat or an improvement vs. the current state. richards.js: 0.072 deltablue.js: 0.101 crypto.js: 0.269 raytrace.js: 0.098 earley-boyer.js: 0.085 splay.js: 0.055 navier-stokes.js: 0.287 pdfjs.js: 0.159 mandreel.js: 0.235 gbemu.js: 0.093 box2d.js: 0.083 zlib.js: 0.186 typescript.js: 0.097
I've been looking into this. I agree with bhackett's analysis; the splitAt mechanism introduced in bug 939824 currently tries to minimize ranges, but in testcases like the one here, this causes it to spill inside the loop. So far I've only seen this come up in non-asm.js code. I'm looking into improving it.
Blocks: 1063603
Attached patch patch (rebased) (obsolete) — Splinter Review
Rebased Brian's patch, no non-trivial changes.
Attachment #8395658 - Attachment is obsolete: true
Attached patch tweaks (rebased) (obsolete) — Splinter Review
Attachment #8395915 - Attachment is obsolete: true
Attached patch patchSplinter Review
The problem with landing this patch has been that it regressed some asm.js tests (bullet and lua_binarytrees on AWFY, at least). It's not obvious why this is the case and rather than dig around trying to understand it I would rather break this impasse by just preserving the current behavior for asm.js code. We really want to switch to the backtracking allocator for normal Ion compilation and doing this while preserving both identical behavior and performance for asm.js code will be pretty annoying to do. It would be better to make the necessary changes to fix Ion perf, include minor special casing like this in cases where asm.js perf is negatively affected, then try to remove these special cases once we're using the backtracking allocator exclusively.
Assignee: nobody → bhackett1024
Attachment #8526787 - Attachment is obsolete: true
Attachment #8548940 - Flags: review?(sunfish)
Attached patch tweaksSplinter Review
Tweaks patch. This modifies the allocator a bit to avoid artificial contention between registers, and to reduce the weight given to ANY uses of a vreg. On the asm.js AWFY tests on x86, this doesn't seem to move any of the apps much, and on the unbench tests copy slows down (10%) but several others speed up (1-10%).
Attachment #8526788 - Attachment is obsolete: true
Attachment #8549683 - Flags: review?(sunfish)
Comment on attachment 8549683 [details] [diff] [review] tweaks Review of attachment 8549683 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/BacktrackingAllocator.cpp @@ +451,5 @@ > + > + i = (i + 1) % AnyRegister::Total; > + } while (i != registerStart); > + > + registerStart = (registerStart + 1) % AnyRegister::Total; I don't currently see how this helps. Using the same register for two disjoint LiveIntervals should mean that a third LiveInterval which intersects with both has more choices. Having the allocator cycle around the registers here seems like it would increase contention overall. It's possible that this has a significant affect on asmjs-ubench benchmarks, because many of them use div and mod operators which have fixed inputs and outputs, and cycling around changes when we pick registers that later turn out to be needed by fixed intervals. @@ +1448,5 @@ > LUse *use = iter->use; > > switch (use->policy()) { > case LUse::ANY: > + usesTotal += 500; I have no intuition about what number is right here, so whatever looks good in benchmarks is cool with me :-).
Attachment #8549683 - Flags: review?(sunfish)
Comment on attachment 8548940 [details] [diff] [review] patch Review of attachment 8548940 [details] [diff] [review]: ----------------------------------------------------------------- Sounds good. There's more to learn here, but this patch looks good for now.
Attachment #8548940 - Flags: review?(sunfish) → review+
appears to have regressed Octane-deltablue and Octane-earleyBoyer on http://arewefastyet.com/#machine=28&view=breakdown&suite=octane on Backtracking allocator
Also looks like a 4% regression on x64 Octane-zlib.
I'm going to mark this as fixed. The big part has landed. Please open a new bug if you still want to land the tweaks.
Status: NEW → RESOLVED
Closed: 8 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: