Closed Bug 1373396 Opened 7 years ago Closed 7 years ago

Reduce variances for AWFY speedometer benchmark

Categories

(Testing Graveyard :: AWFY, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

(Blocks 1 open bug)

Details

(Whiteboard: [PI:June])

I’ve run the benchmark with different configurations:

             1st run    2nd run    3rd run
Nightly        60.84      54.75      64.02    (inbound build - non-PGO - AWFY/proxy)
Nightly        62.44      62.26      62.23    (inbound build - non-PGO)  
Nightly        64.94      63.93      62.77    (inbound build - non-PGO) with config changes [1]
Canary        103.50                                                                                                                                             

This numbers were obtained on the Quantum reference laptop.
I restarted the browser in between runs to simulate similar conditions to AWFY.

[1] https://github.com/mozilla/arewefastyet/blob/master/slave/configs.py#L20-L22
Set JSGC_DISABLE_POISONING=1 is required besides changes to about:config
How can I verify that manually setting the env and then starting the browser worked as expected?

I believe the scores indicate that running speedometer via AWFY adds unwanted score variances.
I also believe that running with configuration recommendations from the following site improve the scores, however, it also increases variances a bit:
https://developer.mozilla.org/en-US/docs/Mozilla/Benchmarking

WARNING: Three runs are not enough to say with confidence that all my deductions are correct.

At the moment I have few ideas to investigate to improve the current variances with AWFY:
1) Read closely what the automation code
2) Run the harness within a RAM disk
   * I wonder if the proxy is slowing us down
3) Switch AWFY to use the real website
   * So far it’s proven to yield better results
4) Set up the speedometer site on a host we control
we need to decide if setting up AWFY automation to use environment variables and prefs similar to what we do with other unittests and perf tests.  I am leaning towards the fewer the better as we would be comparing against default values of other browsers- but the numbers need to be reliable.
Whiteboard: [PI:June]
It seems that the score on the Quantum reference laptop actually hits a consistent score of 51:
https://arewefastyet.com/#machine=36&view=single&suite=speedometer-misc&subtest=score

I need to determine why my reference laptop has a score of about 60.

Trying to determine this over email with Sean.
The latest hypothesis is that running speedometer within all the other benchmarks can cause the machine to overheat which the machine can reduce the CPU frequency to reduce heat. This could account for the variance.

Running all benchmarks on my reference laptop (scores in the 50s match automation):
1. 55.98
2. 63.84
3. 58.74 (run right after the previous run)
4. 66.40 (today; after machine had been off all night)
5. 66.20

I've thought of outputting CPU frequencies (temperature if possible) at the end of each run to determine if there's correlation between low scores and low CPU frequencies.

I was hoping to use psutil; unfortunately, it does not work under Cygwin.
https://github.com/giampaolo/psutil/issues/82
My current hypothesis which has given 3 positive results in a row is that the preference "shut off display after 15 minutes" is set on the production machine.
* 56.06
* 55.49
* 55.51

All benchmarks against opt m-i (like production).

I'm now narrowing it down to just speedometer-misc to see if I can gather data points faster.
If I run dromaeo and speedometer I also hit the reduced score (speedomter finishes in less than 15 mins so I added dromaeo).

Running speedometer by itself with a 15 minutes timer does *not* show the reduced score.
Running speedometer by itself with a 5 minutes timer *does* show the reduced score.

I would like to change the setting on machine 16. I'm asking where to announce this change.
I've changed the setting on the machine. I had reached out the quantum team before doing so.
I will wait for official results before closing this.
On automation, we're now hitting scores around 67 (non-PGO) instead of 56.

ehsan posted some results with his reference laptop in here:
https://ehsanakhgari.org/blog/2017-06-23/quantum-flow-engineering-newsletter-14

A couple of differences with AWFY is that he runs it in full screen and did not apply the GC poisoning:
https://developer.mozilla.org/en-US/docs/Mozilla/Benchmarking

Running on my local machine:
AWFY     67.20 - with all perf changes - **not** maximized
benj.me  68.05 - clean profile - maximized
benj.me  69.47 - no GC poisoning but other perf changes - maximized
benj.me  70.53 - with all perf changes - maximized

I believe that if we manage to make AWFY run in fulscreen we would get those last 2-3 points.

As far as I know I'm looking for an answer on how to make Firefox run in full screen for automation.
https://groups.google.com/forum/#!topic/mozilla.dev.platform/w2AoQLeD-Ss
FWIW I have been getting a variance of a few points locally for a few weeks now, I don't think that is unusual.
I've run a PGO build via AWFY after this PR [1] and I've got a score of 71.50.
Automation with inbound builds is at 68.69.

STR:
python download.py --repo mozilla-central -o ~/repos/mozilla-central-pgo/ -c 64bit -b pgo
python execute.py -s remote -b remote.speedometer-misc -e ~/repos/mozilla-central-pgo/ -c default

[1] https://github.com/mozilla/arewefastyet/pull/134
We started running PGO twice in a row once a day:
https://arewefastyet.com/#machine=36&view=single&suite=speedometer-misc&subtest=score

We're getting satisfactory results.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: General → AWFY
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.