AWFY shumway splay.swf is very sensitive to GC timing

NEW
Unassigned

Status

()

defect
4 years ago
4 years ago

People

(Reporter: bhackett, Unassigned)

Tracking

Trunk
x86
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Reporter

Description

4 years ago
On x64, shumway splay.swf shows a large difference in score between LSRA and the backtracking register allocator due to what looks like a GC timing issue.  I can usually reproduce this locally, but the difference goes away if I use --no-threads.

Base score:                                9072
--ion-regalloc=bactracking:                6320
--no-threads:                              9478
--no-threads --ion-regalloc=backtracking:  9556

If I use MOZ_GCTIMER=stdout, I see different points where we end up GC'ing using the different flags.

Base:

Running: ../benchmarks/shumway/splay.swf
Reading: ../benchmarks/shumway/splay.swf
SWF load time: 0.0040sec
Running tests ...
[object Splay],[object Splay],[object Splay]
NotifyStart Splay
54.616000 48.454000 4.410000
68.917000 62.230000 2.669000
70.655000 65.550000 2.253000
79.100000 71.807000 3.819000
runs / second: 897
81.560000 75.581000 2.990000
77.905000 71.652000 3.277000
81.682000 74.669000 2.587000
runs / second: 1113
NotifyStep Splay
NotifyResult Splay 9075
NotifyStart Splay
134.751000 125.451000 3.941000
runs / second: 969
134.996000 126.901000 2.307000
138.006000 129.119000 3.286000
runs / second: 1104
NotifyStep Splay
NotifyResult Splay 8926
NotifyStart Splay
134.605000 126.950000 2.992000
148.735000 140.792000 2.789000
runs / second: 1107
138.634000 131.071000 2.581000
159.415000 151.989000 2.327000
runs / second: 1137
NotifyStep Splay
NotifyResult Splay 9218
NotifyScore 9072
FSCommand: quit; 
45.707000 32.703000 11.092000
168.833000 0.022000 0.639000

--ion-regalloc=backtracking:

Running: ../benchmarks/shumway/splay.swf
Reading: ../benchmarks/shumway/splay.swf
SWF load time: 0.0040sec
Running tests ...
[object Splay],[object Splay],[object Splay]
NotifyStart Splay
58.532000 49.346000 4.899000
86.984000 61.540000 2.837000
112.833000 89.282000 2.481000
runs / second: 745
123.891000 89.528000 2.925000
114.003000 89.980000 2.887000
runs / second: 766
NotifyStep Splay
NotifyResult Splay 6245
NotifyStart Splay
193.662000 164.513000 4.054000
runs / second: 859
204.734000 166.249000 2.559000
runs / second: 689
NotifyStep Splay
NotifyResult Splay 5616
NotifyStart Splay
209.311000 189.207000 3.776000
209.488000 186.386000 2.279000
runs / second: 674
182.505000 166.938000 2.371000
runs / second: 883
NotifyStep Splay
NotifyResult Splay 7199
NotifyScore 6320
FSCommand: quit; 
39.462000 9.181000 9.944000
142.805000 0.018000 0.549000

It's strange that changing the register allocator should have such an effect on performance, given that (a) the only effects this change has are minor ones on off thread Ion compilation time, Ion code runtime, and the size of the generated code, and (b) this effect is reproducible on multiple machines.

I'm mostly just filing this to make a note of it, and am mainly hoping this discrepancy just goes away as GC trigger heuristics are changed.  All the GCs are ALLOC_TRIGGER, FWIW.
You need to log in before you can comment on or make changes to this bug.