Closed Bug 1547135 Opened 6 months ago Closed 5 months ago

Android Raptor noise reduction investigation

Categories

(Testing :: Raptor, enhancement, P1)

enhancement

Tracking

(firefox68 fixed)

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: egao, Assigned: egao)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

Related to bug #1525017 and bug #1536090, this bug tracks work to reduce the Android Raptor noise/jitter.

Android Raptor tests see a lot of jitter, and some test suites are particularly prone to significant variation between runs, such as tp6m-1 amazon.

Document has been prepared to outline the problem, approach taken and the outcome(s) of the approaches taken. It can be viewed at https://docs.google.com/document/d/152G_x5HhsfQkI0V-q3aqRdpJKKrKDrhdfwnHI95KSXQ/edit#.

Any further work to implement permanent changes will be tracked against this bug, or children of this bug.

Priority: -- → P1

Related to this broader raptor noise bug (Bug 1502138)

Experiment of increasing the raptor delay between pageloads from 1 second to 10 seconds (android):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c059749d487e8668fb006b1f247de5f34edc5897&newProject=try&newRevision=20258fcc8dd2d755c82cac08955595552db554dd&framework=10

I don't know if this would be practical in the lab, but there do appear to be some large perf wins and some regressions.

(In reply to Andrew Creskey from comment #4)

Experiment of increasing the raptor delay between pageloads from 1 second to 10 seconds (android):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c059749d487e8668fb006b1f247de5f34edc5897&newProject=try&newRevision=20258fcc8dd2d755c82cac08955595552db554dd&framework=10

I don't know if this would be practical in the lab, but there do appear to be some large perf wins and some regressions.

Your results, considered alongside the browser settle time paint an interesting picture.

For some reason, the settling of browser gives significantly worse test results for some tests like Instagram, but it does reduce the overall noise significantly.

Similar thing repeats with your run with pageload delay modified. Instagram sees a noticeable regression in performance, but noise is reduced significantly. I almost wonder if there is something up with some of the sites.

Yes, I'm scratching my head at these results as well.

I did start a new baseline from the same revision as the one I was using.

For a sanity point I need to compare the two baselines and see how they compare.

Flags: needinfo?(acreskey)

(In reply to Andrew Creskey from comment #6)

Yes, I'm scratching my head at these results as well.

I did start a new baseline from the same revision as the one I was using.

For a sanity point I need to compare the two baselines and see how they compare.

I discussed the results with :jmaher today.

My tweaks have a positive impact on both pageload times and the noise metric, due to the increased and more stable performance levels.

For browser settle, pageload times for some tests do see a regression, sometimes noticeably.

With that said, it was noted that if our goal is to reduce and minimize the noise metric, then the browser settle time has a net positive impact, and the commbination of browser settle time and my package of tweaks overall improves both pageload times and noise metric (against baseline) significantly.

Ultimately I think it is worth going forward with both the ADB tweaks and browser settle modifications. Since the performance team had been looking at the browser settle time prior to my work being started, the plan is for my patch to only contain ADB tweaks.

Thanks Edwin.

I'm looking forward to seeing how the ADB tweaks improve noise in our test environment.

If you weren't aware these are some techniques from Chromium raised in the Automationeers Assemble talk last week which they found helpful in reducing noise:

• Wait until Android battery temperature has cooled to a given level before starting tests
• Ensure screen is enabled during testing
• Always run all repeats of a given test on the same device since performance differs device to device

FYI, this is a performance comparison of two pushes based on the same revision:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c059749d487e8668fb006b1f247de5f34edc5897&newProject=try&newRevision=2ff6be5092202f8b43b1757505a4c77a8c33ae15&framework=10

Outside of noise, the biggest standout is the ~180% change in raptor-tp6m-google-maps-geckoview opt.
That looks like something changed in our test infrastructure.

Flags: needinfo?(acreskey)

I would suspect that 180% regression is a random outlier as there is only 1 data point- this is why we typically run multiple times to see what the range is. Edwin has been looking at ~20 data points for each test/change in order to see the difference. Even doing 3 runs will give you a better idea, ideally 5+.

Regarding the need to use the same device- I would agree with that- :bc has done a great job in the past of specific devices for specific tests in previous versions of android perf testing. We could switch back to that and assign a device for each test type- there is a risk as if the device is offline we are not getting results. In fact doing sometime similar for desktop would be nice- maybe pools of 2-3 devices/test job would be more adequate.

Attachment #9063055 - Attachment description: Bug 1547135 - reduce Andoid tp6m test result jitter → Bug 1547135 - reduce tp6m test result jitter for Android (Pixel 2)
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/44f3132aaed8
reduce tp6m test result jitter for Android (Pixel 2) r=jmaher,acreskey,rwood

:igoldan, :bebe: I suspect this may cause baseline performance changes/improvements on android FYI

Flags: needinfo?(igoldan)
Flags: needinfo?(fstrugariu)

I was looking at Android Speedometer scores for autoland and m-c (for changes from AArch64 PGO bug 1543215) and see this bug's change does appear to have impacted Speedometer results. The autoland results, in particular, look noisier than before.

https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=mozilla-central,1776166,1,10&series=mozilla-central,2008594,1,10&series=mozilla-central,1955058,1,10&series=autoland,1795984,1,10&series=autoland,2008165,1,10&selected=autoland,1795984,471903,805392654,10

(In reply to Chris Peterson [:cpeterson] from comment #14)

I was looking at Android Speedometer scores for autoland and m-c (for changes from AArch64 PGO bug 1543215) and see this bug's change does appear to have impacted Speedometer results. The autoland results, in particular, look noisier than before.

https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=mozilla-central,1776166,1,10&series=mozilla-central,2008594,1,10&series=mozilla-central,1955058,1,10&series=autoland,1795984,1,10&series=autoland,2008165,1,10&selected=autoland,1795984,471903,805392654,10

Interesting. In my testing from prior to this patch landing (https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3b746ffb5d132ce80e442ac59a48d4ebd0776b0d&newProject=try&newRevision=24358abe9cc59a6283229dbcc95415bfaf586ca7&framework=10), some tests had execution time and/or variance go up slightly, but your perfherder data shows otherwise.

I'm in the process of executing 25 runs of each tp6m test on this revision: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=tp6m%2Candroid&revision=e63ac3728fe5c14ba4e092408e7a144911228b8c&selectedJob=245678138 which is on try. This is the final revision in this series, which bundles tweaks for both Google Pixel 2 and Motorola G5. The results from this revision should be directly comparable to your perfherder data.

Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

I confirm the harness update
== Change summary for alert #20892 (as of Fri, 10 May 2019 00:43:32 GMT) ==

Improvements:

46% raptor-tp6m-instagram-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 opt 494.67 -> 267.42
45% raptor-tp6m-instagram-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 pgo 477.17 -> 262.67
39% raptor-tp6m-facebook-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 opt 1,290.23 -> 787.00
37% raptor-tp6m-instagram-geckoview-cold android-hw-p2-8-0-android-aarch64 opt 780.54 -> 492.56
36% raptor-tp6m-instagram-geckoview-cold android-hw-p2-8-0-android-aarch64 pgo 749.24 -> 476.83
34% raptor-tp6m-facebook-geckoview-cold android-hw-p2-8-0-android-aarch64 opt 1,565.60 -> 1,029.28
30% raptor-tp6m-amazon-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 pgo 719.54 -> 505.58
30% raptor-tp6m-bing-geckoview-cold android-hw-p2-8-0-android-aarch64 opt 364.24 -> 256.34
29% raptor-tp6m-bing-geckoview-cold android-hw-p2-8-0-android-aarch64 pgo 350.22 -> 247.44
29% raptor-tp6m-amazon-geckoview-cold android-hw-p2-8-0-android-aarch64 pgo 733.70 -> 520.71
29% raptor-tp6m-bing-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 opt 380.17 -> 271.17
28% raptor-tp6m-google-geckoview-cold android-hw-p2-8-0-android-aarch64 pgo 396.36 -> 284.59
28% raptor-tp6m-bing-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 pgo 365.25 -> 262.83
28% raptor-tp6m-google-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 pgo 377.58 -> 273.33
27% raptor-tp6m-bing-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 opt 400.33 -> 292.25
27% raptor-tp6m-bing-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 pgo 387.08 -> 283.00
27% raptor-tp6m-google-geckoview-cold android-hw-p2-8-0-android-aarch64 opt 402.54 -> 295.29
26% raptor-tp6m-amazon-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 pgo 981.96 -> 726.08
26% raptor-tp6m-youtube-geckoview-cold android-hw-p2-8-0-android-aarch64 pgo 541.01 -> 400.62
26% raptor-tp6m-google-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 opt 379.83 -> 281.42
25% raptor-tp6m-amazon-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 opt 715.96 -> 540.12
24% raptor-tp6m-youtube-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 pgo 558.25 -> 424.08
24% raptor-tp6m-bing-geckoview loadtime android-hw-p2-8-0-android-aarch64 pgo 105.81 -> 80.75
23% raptor-tp6m-amazon-geckoview-cold android-hw-p2-8-0-android-aarch64 opt 731.79 -> 560.12
23% raptor-tp6m-youtube-geckoview-cold android-hw-p2-8-0-android-aarch64 opt 555.01 -> 426.57
23% raptor-tp6m-bing-geckoview android-hw-p2-8-0-android-aarch64 pgo 111.07 -> 85.70
22% raptor-tp6m-facebook-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 opt 2,528.46 -> 1,972.17
21% raptor-tp6m-youtube-geckoview-cold fcp android-hw-p2-8-0-android-aarch64 opt 564.54 -> 445.42
21% raptor-tp6m-bing-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 129.15 -> 102.29
20% raptor-tp6m-google-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 pgo 664.25 -> 531.33
20% raptor-tp6m-wikipedia-geckoview android-hw-p2-8-0-android-aarch64 opt 151.05 -> 121.10
19% raptor-tp6m-wikipedia-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 156.25 -> 125.79
19% raptor-tp6m-reddit-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 163.58 -> 132.33
19% raptor-tp6m-bing-geckoview loadtime android-hw-p2-8-0-android-aarch64 opt 108.40 -> 87.88
19% raptor-tp6m-reddit-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 164.19 -> 133.12
19% raptor-tp6m-bing-restaurants-geckoview android-hw-p2-8-0-android-aarch64 pgo 147.06 -> 119.37
19% raptor-tp6m-google-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 opt 682.79 -> 554.75
19% raptor-tp6m-wikipedia-geckoview android-hw-p2-8-0-android-aarch64 pgo 143.68 -> 116.97
18% raptor-tp6m-instagram-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 131.54 -> 107.25
18% raptor-tp6m-wikipedia-geckoview loadtime android-hw-p2-8-0-android-aarch64 opt 163.12 -> 133.00
18% raptor-tp6m-bing-geckoview android-hw-p2-8-0-android-aarch64 opt 112.68 -> 91.89
18% raptor-tp6m-wikipedia-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 149.88 -> 122.33
18% raptor-tp6m-bing-restaurants-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 168.25 -> 137.96
18% raptor-tp6m-wikipedia-geckoview loadtime android-hw-p2-8-0-android-aarch64 pgo 156.56 -> 128.46
17% raptor-tp6m-bing-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 130.31 -> 107.92
16% raptor-tp6m-bing-restaurants-geckoview loadtime android-hw-p2-8-0-android-aarch64 pgo 161.67 -> 136.12
16% raptor-tp6m-facebook-geckoview-cold fcp android-hw-g5-7-0-arm7-api-16 opt 1,849.50 -> 1,557.96
15% raptor-tp6m-bing-restaurants-geckoview android-hw-p2-8-0-android-aarch64 opt 149.66 -> 126.66
15% raptor-tp6m-instagram-geckoview-cold fcp android-hw-g5-7-0-arm7-api-16 opt 702.08 -> 594.67
15% raptor-tp6m-bing-restaurants-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 170.58 -> 145.67
14% raptor-tp6m-youtube-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 opt 880.42 -> 752.92
14% raptor-tp6m-facebook-geckoview-cold android-hw-g5-7-0-arm7-api-16 opt 2,290.97 -> 1,972.62
14% raptor-tp6m-stackoverflow-geckoview android-hw-p2-8-0-android-aarch64 pgo 252.67 -> 217.73
14% raptor-tp6m-stackoverflow-geckoview android-hw-p2-8-0-android-aarch64 opt 266.44 -> 230.18
14% raptor-tp6m-bing-restaurants-geckoview loadtime android-hw-p2-8-0-android-aarch64 opt 166.52 -> 144.00
13% raptor-tp6m-instagram-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 131.08 -> 113.42
13% raptor-tp6m-stackoverflow-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 248.29 -> 214.92
13% raptor-tp6m-stackoverflow-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 261.69 -> 227.00
13% raptor-tp6m-instagram-geckoview-cold android-hw-g5-7-0-arm7-api-16 opt 1,127.27 -> 984.55
13% raptor-tp6m-stackoverflow-geckoview loadtime android-hw-p2-8-0-android-aarch64 pgo 373.75 -> 326.75
13% raptor-tp6m-stackoverflow-geckoview loadtime android-hw-p2-8-0-android-aarch64 opt 392.98 -> 343.54
13% raptor-tp6m-ebay-kleinanzeigen-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 258.38 -> 226.08
12% raptor-tp6m-instagram-geckoview-cold loadtime android-hw-p2-8-0-android-aarch64 opt 2,551.00 -> 2,238.83
11% raptor-tp6m-ebay-kleinanzeigen-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 270.48 -> 239.46
10% raptor-tp6m-facebook-geckoview-cold loadtime android-hw-g5-7-0-arm7-api-16 opt 3,713.42 -> 3,326.42
10% raptor-tp6m-ebay-kleinanzeigen-geckoview android-hw-p2-8-0-android-aarch64 pgo 361.52 -> 324.68
10% raptor-tp6m-youtube-watch-geckoview fcp android-hw-p2-8-0-android-aarch64 opt 191.12 -> 172.33
9% raptor-tp6m-ebay-kleinanzeigen-geckoview android-hw-p2-8-0-android-aarch64 opt 378.31 -> 342.57
6% raptor-tp6m-instagram-geckoview loadtime android-hw-p2-8-0-android-aarch64 pgo 1,066.04 -> 997.21
6% raptor-tp6m-bbc-geckoview fcp android-hw-p2-8-0-android-aarch64 pgo 354.35 -> 332.67
6% raptor-tp6m-youtube-watch-geckoview loadtime android-hw-p2-8-0-android-aarch64 opt 502.67 -> 473.12
5% raptor-tp6m-instagram-geckoview loadtime android-hw-p2-8-0-android-aarch64 opt 1,135.75 -> 1,080.83
5% raptor-speedometer-geckoview android-hw-p2-8-0-android-aarch64 pgo 24.53 -> 25.66
4% raptor-speedometer-geckoview android-hw-p2-8-0-android-aarch64 opt 22.04 -> 23.01

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=20892

Flags: needinfo?(igoldan)
Flags: needinfo?(fstrugariu)

I didn't realize this but the calls to set the no-op scheduler require root on the android device, otherwise raptor fails:

15:44:49    ERROR -  Traceback (most recent call last):
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 1211, in <module>
15:44:49     INFO -      main()
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 1169, in main
15:44:49     INFO -      raptor.tune_performance()
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 675, in tune_performance
15:44:49     INFO -      self.set_scheduler()
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 703, in set_scheduler
15:44:49     INFO -      self._set_value_and_check_exitcode(scheduler_location, 'noop')
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 689, in _set_value_and_check_exitcode
15:44:49     INFO -      process = self.device.shell(' '.join(['echo', str(value), '>', str(file_name)]), root=True)
15:44:49     INFO -    File "/Users/acreskey/dev/src/build/obj-release/testing/raptor-venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1352, in shell
15:44:49     INFO -      raise ADBRootError('Can not run command %s as root!' % cmd)
15:44:49     INFO -  mozdevice.adb.ADBRootError: Can not run command echo noop > /sys/block/sda/queue/scheduler as root!
15:44:49    ERROR - Return code: 1

Raptor is used by developers on their local hardware and, prior to this change, it did not required rooted devices.

So what if we made the set_scheduler() command fail gracefully without root?

(In reply to Andrew Creskey from comment #18)

I didn't realize this but the calls to set the no-op scheduler require root on the android device, otherwise raptor fails:

15:44:49    ERROR -  Traceback (most recent call last):
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 1211, in <module>
15:44:49     INFO -      main()
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 1169, in main
15:44:49     INFO -      raptor.tune_performance()
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 675, in tune_performance
15:44:49     INFO -      self.set_scheduler()
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 703, in set_scheduler
15:44:49     INFO -      self._set_value_and_check_exitcode(scheduler_location, 'noop')
15:44:49     INFO -    File "/Users/acreskey/dev/src/mozilla-central/testing/raptor/raptor/raptor.py", line 689, in _set_value_and_check_exitcode
15:44:49     INFO -      process = self.device.shell(' '.join(['echo', str(value), '>', str(file_name)]), root=True)
15:44:49     INFO -    File "/Users/acreskey/dev/src/build/obj-release/testing/raptor-venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1352, in shell
15:44:49     INFO -      raise ADBRootError('Can not run command %s as root!' % cmd)
15:44:49     INFO -  mozdevice.adb.ADBRootError: Can not run command echo noop > /sys/block/sda/queue/scheduler as root!
15:44:49    ERROR - Return code: 1

Raptor is used by developers on their local hardware and, prior to this change, it did not required rooted devices.

So what if we made the set_scheduler() command fail gracefully without root?

Interesting. I had thought the requirement for root was accounted for here:
https://searchfox.org/mozilla-central/source/testing/raptor/raptor/raptor.py#692

if (self.device._have_su or self.device._have_android_su):

I had misread my own code - somehow the scheduler tuning method ended up outside of the if/else statement meant to skip unrooted devices. I've created a bug to address this.

egao: I notice that we never invoke disable_animations -- oversight? See https://searchfox.org/mozilla-central/search?q=disable_animations&case=false&regexp=false&path=*.py.

Flags: needinfo?(egao)

:nalexander - it is not an oversight - I wrote the method but at last minute chose not to invoke the method, due to concerns around restoring the device to its initial state if a crash is experienced. The reason being that animations are persistent (unlike the other tweaks such as CPU frequency) and so there is no guarantee that device will be restored to animation=100.0 if the device crashes.

It was one of the things on my plate to investigate if disabling animations even make a difference, which I have not gotten around to. Given that it is easy to re-implement, this method can be removed for the time being.

Flags: needinfo?(egao)
You need to log in before you can comment on or make changes to this bug.