Closed Bug 948838 Opened 12 years ago Closed 9 years ago

Extra spill/reload with backtracking allocator

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jandem, Assigned: bhackett1024)

References

Details

Attachments

(2 files, 4 obsolete files)

patch 11 years ago Brian Hackett [Laid off!] 5.29 KB, patch		Details \| Diff \| Splinter Review
tweaks 11 years ago Brian Hackett [Laid off!] 2.60 KB, patch		Details \| Diff \| Splinter Review
patch (rebased) 11 years ago Jan de Mooij [:jandem] 5.50 KB, patch		Details \| Diff \| Splinter Review
tweaks (rebased) 11 years ago Jan de Mooij [:jandem] 2.73 KB, patch		Details \| Diff \| Splinter Review
patch 11 years ago Brian Hackett [Laid off!] 6.23 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review
tweaks 11 years ago Brian Hackett [Laid off!] 2.73 KB, patch		Details \| Diff \| Splinter Review

Jan de Mooij [:jandem]

Reporter

Description

•

12 years ago

See bz's testcase in bug 946426 comment 0: (function(count) { var start = new Date; while (count-- != 0); var stop = new Date; print((stop - start) / 10000000 * 1e6); })(10000000); On 32-bit I get this with LSRA: [Codegen] movl %eax, %ebp [Codegen] subl $0x1, %ebp [Codegen] jo ((705)) [Codegen] testl %eax, %eax [Codegen] je ((713)) [Codegen] movl %ebp, %eax [Codegen] jmp ((720)) And with --ion-regalloc=backtracking: [Codegen] movl %ebx, %eax [Codegen] subl $0x1, %eax [Codegen] jo ((711)) [Codegen] movl %eax, 0x78(%esp) [Codegen] testl %ebx, %ebx [Codegen] je ((723)) [Codegen] movl 0x78(%esp), %ebx [Codegen] jmp ((732)) It would be nice if we could get rid of this store/load inside the loop, like with LSRA. The problem is worse if I change |count-- != 0| to |--count != 0|: With LSRA: [Codegen] subl $0x1, %eax [Codegen] jo ((703)) [Codegen] testl %eax, %eax [Codegen] jne ((711)) Backtracking: [Codegen] subl $0x1, %ebx [Codegen] jo ((709)) [Codegen] movl %ebx, 0x78(%esp) [Codegen] testl %ebx, %ebx [Codegen] je ((721)) [Codegen] movl 0x78(%esp), %ebx [Codegen] jmp ((730))

Dan Gohman [:sunfish]

Comment 1

•

12 years ago

LiveIntervals don't currently contain the uses for phi nodes. Consequently, the heuristics see no reason to extend the lifetime of the result register of the sub. Also, phi uses are LUse::ANY, which can be in memory but prefers a register, and the backtracking allocator often ignores this preference (isRegisterUse returns false for ANY in many cases). That explains the load. The store comes from the fact that there are LUse::KEEPALIVE uses after the loop, and the backtracking allocator is spilling the value as soon as it can. This may be trickier to fix, though it's also less important.

Brian Hackett [Laid off!]

Assignee

Comment 2

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

I think another problem here is that the presence of the call to new Date after the loop is affecting register allocation within the loop. I don't think this should be happening --- when allocating we should consider the structure of the hot code first and fill in the details for cold code afterward. trySplitAcrossHotcode originally did this, but with bug 939824 its behavior changed somewhat (split points were still done based on the interval's register uses, which are not ideal here) and before that bug 826734 gave splitting across calls (as is done here) a higher priority than splitting across hot code. The attached patch restores the earlier behavior and fixes this testcase but for several octane tests increases the amount of spill code executed: OLD NEW richards.js: 0.082 .079 deltablue.js: 0.099 .103 crypto.js: 0.272 .29 raytrace.js: 0.095 .097 earley-boyer.js: 0.087 .114 splay.js: 0.063 .063 navier-stokes.js: 0.286 .288 pdfjs.js: 0.152 .148 mandreel.js: 0.252 .234 gbemu.js: 0.083 .136 box2d.js: 0.082 .083 zlib.js: 0.186 .186 typescript.js: 0.095 .103 So I'd like to not land this without a better understanding / fix for what's going on here.

Brian Hackett [Laid off!]

Assignee

Comment 3

•

11 years ago

Attached patch tweaks (obsolete) — Details — Splinter Review

I think the main issue here is the disabling of spilling to argument slots which happened in bug 906858. Splitting at hot code boundaries presumably leads to more cold intervals without register uses, and when those are arguments we end up doing memory -> memory moves. If I restore the argument slot spilling and make the attached minor tweaks I get the following fractions of spill code executed, which are generally flat or an improvement vs. the current state. richards.js: 0.072 deltablue.js: 0.101 crypto.js: 0.269 raytrace.js: 0.098 earley-boyer.js: 0.085 splay.js: 0.055 navier-stokes.js: 0.287 pdfjs.js: 0.159 mandreel.js: 0.235 gbemu.js: 0.093 box2d.js: 0.083 zlib.js: 0.186 typescript.js: 0.097

Dan Gohman [:sunfish]

Comment 4

•

11 years ago

I've been looking into this. I agree with bhackett's analysis; the splitAt mechanism introduced in bug 939824 currently tries to minimize ranges, but in testcases like the one here, this causes it to spill inside the loop. So far I've only seen this come up in non-asm.js code. I'm looking into improving it.

Jan de Mooij [:jandem]

Reporter

Updated

•

11 years ago

Blocks: 1063603

Jan de Mooij [:jandem]

Reporter

Comment 5

•

11 years ago

Attached patch patch (rebased) (obsolete) — Details — Splinter Review

Rebased Brian's patch, no non-trivial changes.

Attachment #8395658 - Attachment is obsolete: true

Jan de Mooij [:jandem]

Reporter

Comment 6

•

11 years ago

Attached patch tweaks (rebased) (obsolete) — Details — Splinter Review

Attachment #8395915 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Assignee

Comment 7

•

11 years ago

Attached patch patch — Details — Splinter Review

The problem with landing this patch has been that it regressed some asm.js tests (bullet and lua_binarytrees on AWFY, at least). It's not obvious why this is the case and rather than dig around trying to understand it I would rather break this impasse by just preserving the current behavior for asm.js code. We really want to switch to the backtracking allocator for normal Ion compilation and doing this while preserving both identical behavior and performance for asm.js code will be pretty annoying to do. It would be better to make the necessary changes to fix Ion perf, include minor special casing like this in cases where asm.js perf is negatively affected, then try to remove these special cases once we're using the backtracking allocator exclusively.

Assignee: nobody → bhackett1024

Attachment #8526787 - Attachment is obsolete: true

Attachment #8548940 - Flags: review?(sunfish)

Brian Hackett [Laid off!]

Assignee

Comment 8

•

11 years ago

Attached patch tweaks — Details — Splinter Review

Tweaks patch. This modifies the allocator a bit to avoid artificial contention between registers, and to reduce the weight given to ANY uses of a vreg. On the asm.js AWFY tests on x86, this doesn't seem to move any of the apps much, and on the unbench tests copy slows down (10%) but several others speed up (1-10%).

Attachment #8526788 - Attachment is obsolete: true

Attachment #8549683 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Comment 9

•

11 years ago

Comment on attachment 8549683 [details] [diff] [review] tweaks Review of attachment 8549683 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/BacktrackingAllocator.cpp @@ +451,5 @@ > + > + i = (i + 1) % AnyRegister::Total; > + } while (i != registerStart); > + > + registerStart = (registerStart + 1) % AnyRegister::Total; I don't currently see how this helps. Using the same register for two disjoint LiveIntervals should mean that a third LiveInterval which intersects with both has more choices. Having the allocator cycle around the registers here seems like it would increase contention overall. It's possible that this has a significant affect on asmjs-ubench benchmarks, because many of them use div and mod operators which have fixed inputs and outputs, and cycling around changes when we pick registers that later turn out to be needed by fixed intervals. @@ +1448,5 @@ > LUse *use = iter->use; > > switch (use->policy()) { > case LUse::ANY: > + usesTotal += 500; I have no intuition about what number is right here, so whatever looks good in benchmarks is cool with me :-).

Attachment #8549683 - Flags: review?(sunfish)

Dan Gohman [:sunfish]

Comment 10

•

11 years ago

Comment on attachment 8548940 [details] [diff] [review] patch Review of attachment 8548940 [details] [diff] [review]: ----------------------------------------------------------------- Sounds good. There's more to learn here, but this patch looks good for now.

Attachment #8548940 - Flags: review?(sunfish) → review+

Brian Hackett [Laid off!]

Assignee

Comment 11

•

11 years ago

Part 1: https://hg.mozilla.org/integration/mozilla-inbound/rev/ccca03a662e3

Keywords: leave-open

Mayank Bansal

Comment 12

•

11 years ago

appears to have regressed Octane-deltablue and Octane-earleyBoyer on http://arewefastyet.com/#machine=28&view=breakdown&suite=octane on Backtracking allocator

Phil Ringnalda (:philor)

Comment 13

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/ccca03a662e3

Luke Wagner [:luke]

Comment 14

•

11 years ago

Also looks like a 4% regression on x64 Octane-zlib.

Hannes Verschore [:h4writer]

Comment 15

•

9 years ago

I'm going to mark this as fixed. The big part has landed. Please open a new bug if you still want to land the tweaks.

Status: NEW → RESOLVED

Closed: 9 years ago

Keywords: leave-open

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Extra spill/reload with backtracking allocator

Categories

(Core :: JavaScript Engine: JIT, defect)

Tracking

()

People

(Reporter: jandem, Assigned: bhackett1024)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 4 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Attachment

General

Description

File Name

Content Type