Closed Bug 822116 Opened 12 years ago Closed 12 years ago

IonMonkey: Backtracking allocator x86/x64 tuning

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(3 files, 1 obsolete file)

Bug 814966 focused on making an am3 microbenchmark faster on x86 than LSRA.  Currently, these results do not extrapolate at all to other benchmarks, or even to the am3() used in v8-crypto.  The only difference between the two is that the microbenchmark adds an argument 't' and substitutes a this.array for a t.array (outside the loop body), which manages to increase the spill code % from 25% to 36%!  The amount of spill code generated using the backtracking allocator is also greater than with LSRA on all the benchmarks.

The backtracking allocator needs to be far more robust against such changes, and to be able to consistently improve on the code generated using LSRA.  This bug is about the tuning work to get there, focusing on x86/x64 behavior on SS/v8/kraken.  The improvements here should hopefully carry over to ARM, but the architectures are different enough (memory vs. register operands, two vs. three operand arithmetic, register requirements, ...) to focus on the former for now.
Current comparison between the spill code generated by the LSRA and Backtracking allocators, for x86/x64 and benchmarks.  For each test, e.g.

check-3d-cube.js :: 4299964
x86 old 2.6 new 27.4
x64 old 0.4 new 23.5

After the :: is the number of non-spill Ion instructions executed on x86, for weighting the benchmark.  After 'old' is the % of Ion instructions which were spill code for LSRA, after 'new' is the same for Backtracking.  (Few benchmarks are as bad as this one.)  At the end of each suite, ratios are given for Backtracking spill code / LSRA spill code, totaled across all tests.  > 1 means LSRA did better, < 1 means Backtracking did better.

Current ratios:

sunspider
x86 ratio 1.069
x64 ratio 1.195

v8
x86 ratio 1.319
x64 ratio 1.109

kraken
x86 ratio 1.073
x64 ratio 1.287
Attached patch WIP (dc4887f61d2e) (obsolete) — Splinter Review
WIP after a few hours, putting up with new numbers so that comment 1 doesn't look so bleak.  New ratios:

sunspider
x86 ratio 0.837
x64 ratio 1.006

v8
x86 ratio 1.103
x64 ratio 1.266

kraken
x86 ratio 0.731
x64 ratio 0.849

Some outliers in ss/kraken to look at, but most remaining work needs to see what's going on with v8bench.
(In reply to Brian Hackett (:bhackett) from comment #2)
> Created attachment 692775 [details] [diff] [review]
> WIP (dc4887f61d2e)
> 
> WIP after a few hours, putting up with new numbers so that comment 1 doesn't
> look so bleak.  

I wasn't worried :-), I knew it would get better. It looks like this is going very well.
There is of course always more that could be done here, but this patch puts allocator performance in a much better place than it is now.  New ratios:

sunspider
x86 ratio 0.8
x64 ratio 0.874

v8
x86 ratio 0.809
x64 ratio 0.995

kraken
x86 ratio 0.716
x64 ratio 0.885
Assignee: general → bhackett1024
Attachment #692775 - Attachment is obsolete: true
Attachment #693390 - Flags: review?(jdemooij)
Comment on attachment 693390 [details] [diff] [review]
patch (dc4887f61d2e)

Review of attachment 693390 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/Lowering.cpp
@@ +429,5 @@
>          // The second operand has known null/undefined type, so just test the
>          // first operand.
>          if (IsNullOrUndefined(comp->specialization())) {
>              LIsNullOrUndefinedAndBranch *lir = new LIsNullOrUndefinedAndBranch(ifTrue, ifFalse);
> +            if (!useBoxAtStart(lir, LIsNullOrUndefinedAndBranch::Value, left))

Just curious, why does this make a difference?
Attachment #693390 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #6)
> ::: js/src/ion/Lowering.cpp
> @@ +429,5 @@
> >          // The second operand has known null/undefined type, so just test the
> >          // first operand.
> >          if (IsNullOrUndefined(comp->specialization())) {
> >              LIsNullOrUndefinedAndBranch *lir = new LIsNullOrUndefinedAndBranch(ifTrue, ifFalse);
> > +            if (!useBoxAtStart(lir, LIsNullOrUndefinedAndBranch::Value, left))
> 
> Just curious, why does this make a difference?

Maybe something to be fixed in the allocator itself, but if there are two intervals for the vreg, one ending at the end of the comparev and the other starting at the beginning of it (intervals can overlap, and the new one starts at the instruction's input in case the operand is clobbered within), we would generate spill code for the transition between the intervals at both the start of the comparev and at the start of any successor blocks.
Any numbers regarding the effects this new allocator has on our benchmark scores?
Oof, the one change this made to code that is actually run by default interacted badly with the bug 792108 changes.

Measuring ss/v8/kraken on x86 with off thread compilation I see a <1% difference between the allocators on all benchmarks.  Need to investigate this more.

https://hg.mozilla.org/integration/mozilla-inbound/rev/976f2d5ad400
https://hg.mozilla.org/mozilla-central/rev/976f2d5ad400
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Depends on: 826734
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: