1373396 - Reduce variances for AWFY speedometer benchmark

Assignee

Description

•

7 years ago

I’ve run the benchmark with different configurations: 1st run 2nd run 3rd run Nightly 60.84 54.75 64.02 (inbound build - non-PGO - AWFY/proxy) Nightly 62.44 62.26 62.23 (inbound build - non-PGO) Nightly 64.94 63.93 62.77 (inbound build - non-PGO) with config changes [1] Canary 103.50 This numbers were obtained on the Quantum reference laptop. I restarted the browser in between runs to simulate similar conditions to AWFY. [1] https://github.com/mozilla/arewefastyet/blob/master/slave/configs.py#L20-L22 Set JSGC_DISABLE_POISONING=1 is required besides changes to about:config How can I verify that manually setting the env and then starting the browser worked as expected? I believe the scores indicate that running speedometer via AWFY adds unwanted score variances. I also believe that running with configuration recommendations from the following site improve the scores, however, it also increases variances a bit: https://developer.mozilla.org/en-US/docs/Mozilla/Benchmarking WARNING: Three runs are not enough to say with confidence that all my deductions are correct. At the moment I have few ideas to investigate to improve the current variances with AWFY: 1) Read closely what the automation code 2) Run the harness within a RAM disk * I wonder if the proxy is slowing us down 3) Switch AWFY to use the real website * So far it’s proven to yield better results 4) Set up the speedometer site on a host we control

Joel Maher ( :jmaher ) (UTC -8)

Comment 1

•

7 years ago

we need to decide if setting up AWFY automation to use environment variables and prefs similar to what we do with other unittests and perf tests. I am leaning towards the fewer the better as we would be comparing against default values of other browsers- but the numbers need to be reliable.

Whiteboard: [PI:June]

Armen [:armenzg]

Assignee

Comment 2

•

7 years ago

It seems that the score on the Quantum reference laptop actually hits a consistent score of 51: https://arewefastyet.com/#machine=36&view=single&suite=speedometer-misc&subtest=score I need to determine why my reference laptop has a score of about 60. Trying to determine this over email with Sean.

Armen [:armenzg]

Assignee

Comment 3

•

7 years ago

The latest hypothesis is that running speedometer within all the other benchmarks can cause the machine to overheat which the machine can reduce the CPU frequency to reduce heat. This could account for the variance. Running all benchmarks on my reference laptop (scores in the 50s match automation): 1. 55.98 2. 63.84 3. 58.74 (run right after the previous run) 4. 66.40 (today; after machine had been off all night) 5. 66.20 I've thought of outputting CPU frequencies (temperature if possible) at the end of each run to determine if there's correlation between low scores and low CPU frequencies. I was hoping to use psutil; unfortunately, it does not work under Cygwin. https://github.com/giampaolo/psutil/issues/82

Armen [:armenzg]

Assignee

Comment 4

•

7 years ago

My current hypothesis which has given 3 positive results in a row is that the preference "shut off display after 15 minutes" is set on the production machine. * 56.06 * 55.49 * 55.51 All benchmarks against opt m-i (like production). I'm now narrowing it down to just speedometer-misc to see if I can gather data points faster.

Armen [:armenzg]

Assignee

Comment 5

•

7 years ago

If I run dromaeo and speedometer I also hit the reduced score (speedomter finishes in less than 15 mins so I added dromaeo). Running speedometer by itself with a 15 minutes timer does *not* show the reduced score. Running speedometer by itself with a 5 minutes timer *does* show the reduced score. I would like to change the setting on machine 16. I'm asking where to announce this change.

Armen [:armenzg]

Assignee

Comment 6

•

7 years ago

I've changed the setting on the machine. I had reached out the quantum team before doing so. I will wait for official results before closing this.

Chris Peterson [:cpeterson]

Updated

•

7 years ago

Blocks: Speedometer_V2

Armen [:armenzg]

Assignee

Comment 7

•

7 years ago

On automation, we're now hitting scores around 67 (non-PGO) instead of 56. ehsan posted some results with his reference laptop in here: https://ehsanakhgari.org/blog/2017-06-23/quantum-flow-engineering-newsletter-14 A couple of differences with AWFY is that he runs it in full screen and did not apply the GC poisoning: https://developer.mozilla.org/en-US/docs/Mozilla/Benchmarking Running on my local machine: AWFY 67.20 - with all perf changes - **not** maximized benj.me 68.05 - clean profile - maximized benj.me 69.47 - no GC poisoning but other perf changes - maximized benj.me 70.53 - with all perf changes - maximized I believe that if we manage to make AWFY run in fulscreen we would get those last 2-3 points. As far as I know I'm looking for an answer on how to make Firefox run in full screen for automation. https://groups.google.com/forum/#!topic/mozilla.dev.platform/w2AoQLeD-Ss

(no longer active)

Comment 8

•

7 years ago

FWIW I have been getting a variance of a few points locally for a few weeks now, I don't think that is unusual.

Armen [:armenzg]

Assignee

Comment 9

•

7 years ago

I've run a PGO build via AWFY after this PR [1] and I've got a score of 71.50. Automation with inbound builds is at 68.69. STR: python download.py --repo mozilla-central -o ~/repos/mozilla-central-pgo/ -c 64bit -b pgo python execute.py -s remote -b remote.speedometer-misc -e ~/repos/mozilla-central-pgo/ -c default [1] https://github.com/mozilla/arewefastyet/pull/134

Armen [:armenzg]

Assignee

Comment 10

•

7 years ago

We started running PGO twice in a row once a day: https://arewefastyet.com/#machine=36&view=single&suite=speedometer-misc&subtest=score We're getting satisfactory results.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Bob Clary [:bc] (inactive)

Updated

•

7 years ago

Component: General → AWFY

BMO Automation

Updated

•

5 years ago

Product: Testing → Testing Graveyard

Bugzilla

Reduce variances for AWFY speedometer benchmark

Categories

(Testing Graveyard :: AWFY, enhancement)

Tracking

(Not tracked)

People

(Reporter: armenzg, Assigned: armenzg)

References

(Blocks 1 open bug)

Details

(Whiteboard: [PI:June])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated