Closed Bug 1713238 Opened 3 years ago Closed 2 years ago

fenix perftest VIEW test has bimodal replicates running locally

Categories

(Testing :: mozperftest, defect, P2)

Default
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mcomella, Unassigned)

Details

Attachments

(4 files)

Locally, I ran 10 test suites with 15 iterations and another 10 test suites with 25 iterations: both sets of replicates displayed bimodal behavior where half the results grouped together and the other half of results grouped together (see the attached screenshots: I also attached a screenshot of results from a script that replicates perftest that doesn't display bimodal behavior). The raw data and light analysis from these tests can be found on this sheet.

We've been experiencing noise in the results of perftest: perhaps this is the cause and that the replicates themselves are not too noisy.

For my test, I used a local fenix nightly build on 72ac23ddb9f219355885ddf24ce97b357871db2f and mozilla-central 4db2640cb0cd.

Here are some ideas from acreskey and sparky about how to debug this issue.

acreskey suggested with start up profiling during a moz perftest run, we could understand the problem: that's bug 1664857 which contains some approaches/workarounds.

sparky suggested trying my new script without "pre-warming" the app to ensure we don't hit a first run (he wasn't sure if perftest does this and my script does). He also suggested completely removing prefs from browsertime (as my script doesn't change any prefs).

acreskey referenced previous bimodal results as another possible insight (emphasis mine):

The other bimodal results were for pageload.
The biggest cause is that the definition of the what's in the loadgroup for the document can be expanded by early JS calls. If the calls run late, the loadgroup stays smaller.
Mozperftest startup tests don't do any conditioning, so it may be something that's in the startup flow.
We had tried the conditioned profiles, but they are too problematic.
Running with a profile that's conditioned on that device, just before the run, could shed some lights on this.

Attached are the arguments I submit to mach perftest.

My new script can be found in PR form: https://github.com/mozilla-mobile/fenix/pull/19659 It's subject to change upon review.

Could this be related to having the CPU clock locked? I vaguely remember the results of perftest being inconsistent when running with locked clocks locally. To test, you can run my standalone tools/measure_start_up.py script in the fenix repo or something simpler like this: https://medium.com/androiddevelopers/testing-app-startup-performance-36169c27ee55 (that my script is based on).

Priority: -- → P2
Whiteboard: [perftest:triage]

:mcomella, you could try playing with the setup we have here to see if that's the cause: https://searchfox.org/mozilla-central/source/python/mozperftest/mozperftest/system/android_perf_tuner.py#89

Severity: -- → S3
Whiteboard: [perftest:triage]

We've also seen bimodal results in our other fenix start up VIEW benchmark https://github.com/mozilla-mobile/fenix/issues/22144 so this could be a behavior in fenix that the test framework is uncovering.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: