Closed Bug 848152 Opened 11 years ago Closed 10 years ago

BaselineCompiler: Investigate AWFY x86-32 performance instability on Octane benches

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 849526

People

(Reporter: djvj, Unassigned)

References

Details

Attachments

(1 file)

On Box2D, CodeLoad, PdfJS and GameBoy, the Baseline compiler scores show a very high degree of variance.

This should be investigated and resolved as either an AWFY issue or a general issue.
Just as an initial test I ran these on my linux box with a 32-bit build - 10 runs, measuring standard sample deviation (as a percentage of mean runtime):

            Richards ( 3.15) [12426.95 - 13235.65]
           DeltaBlue ( 2.88) [15109.92 - 16006.68]
              Crypto ( 1.36) [19290.40 - 19821.60]
            RayTrace ( 1.48) [27195.32 - 28014.08]
         EarleyBoyer ( 0.42) [23991.99 - 24196.21]
              RegExp ( 2.45) [3141.31 - 3299.29]
               Splay ( 3.60) [16357.36 - 17580.44]
        NavierStokes ( 0.12) [24963.04 - 25021.96]
               PdfJS ( 1.46) [12182.69 - 12544.91]
            Mandreel ( 0.74) [18954.98 - 19238.82]
             Gameboy ( 2.58) [17782.02 - 18722.58]
            CodeLoad ( 0.31) [13002.36 - 13083.04]
               Box2D ( 6.31) [22317.16 - 25321.24]
               Score ( 0.86) [15922.92 - 16198.28]

This is with --ion-parallel-compile=on and --no-jm.

THere is definitely a high variation associated with Box2D and GameBoy, however nothing coming close to what AWFY displays.  Also, PdfJS here has a pretty low StdDev, but shows up more prominently on the awfy charts.
Attached file AWFY pdf
Something else I noticed - the overall variation is correlated between the different benches.  The attachment is a PDF of AWFY with the relevant graphs moved to be the top 3 (clumsily done with DOM editor and print-to-PDFed).

Those peaks definitely correlate.  So whatever is happening on AWFY is affecting the engine for the entire run, not just bad luck on some particular benchmark.
I happened to take a look at the graphs and noticed something that might be interesting. On AWFY v8real-Splay is also all over the place, whereas octane-Splay only has a couple spikes.
That is interesting - thanks for pointing it out.  Looking around,  AWFY ARM's Octane Splay breakdown shows JM+TI, JM+TI+Ion, and BC+TI+Ion all show instability.. but BC has tended to be more stable than the others (except for the last couple runs).

I haven't had time to dig into this in a real way yet, but when I do, the three things that I'm thinking of focusing my attention on are: useCount behaviour, background compilation, and bailout behaviour.

Another thing would be to look into what is similar about the three benches that are displaying this behaviour.  Is there some particular feature or action (e.g. object allocation, typed arrays, GC activity, etc.) that they all have in common?
Depends on: 849526
Things look to be much more stable now. Bug 849526 and I think a few other optimizations might have largely fixed this. Could you re-run the same tests on the same machine as comment 1 for comparison, when you get the time?

Also, bug 849526 was compiler specific. I find it odd that AWFY is primarily Mac with a couple ARM machines. If compiler or platform specific bugs like that can pop up, wouldn't it be a good idea to replace the older two Mac machines in the list with Windows and a Linux desktop boxes? (the Mac Mini graphs don't appear to have been updated since September)
Out of the Octane tests, the main offenders in recent spikes appear to be CodeLoad, Gameboy and PdfJS, with some variance in Box2D on the same runs. Is the conservative stack scanner still implicated here?
Apparently Bug 753203 solved the problem.
Yup, this is a duplicate of bug 849526.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: