Closed Bug 1649511 Opened 4 years ago Closed 8 months ago

Verify that performance tuning on G5/P2 is still working as expected

Categories

(Testing :: Performance, task, P3)

task

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sparky, Unassigned)

References

Details

This bug is for verifying if the performance tuning we do on P2/G5 devices is still helping us with reducing noise, and if not, find a way to fix it.

From :egao in bug 1646368:
(In reply to Edwin Takahashi (:egao) from comment #7)

Yes, I did implement the tuning features in Raptor suites, though I admit I am not surprised that Motorola G5 units are not behaving as well as they should be.

I tuned the Pixel 2 with a local hardware I had on hand, so I was able to experiment quickly and where integer values were required (eg. frequency) I could get them quickly.

I did not have a Motorola G5 on hand, so I had to try them on tryserver and this was a pain.

Let's evaluate if the perf tuning is still meeting our requirements. Is there a way to see the performance trendeline for the applink on both fenix and fennec going back a year? If not, let's run 20+ instances of a handful of suites and find out the variant with and without the tuning.

It is possible that while device tuning worked well last year this time, doing so caused extra wear and tear on the device as one possible explanation of why we're seeing worse results. The other possible explanation is that fenix behaves much lighter and better than fennec does, so requires less fiddling with the hardware.

:egao, I'll get some data and make some graphs of what we're seeing with tuning on and off so we can see the effects of tuning at the moment.

This is very interesting! I did a run with amazon through browsertime on Fenix and I found that (for this test at least) the G5 tuning helped reduce variance significantly on average (p < 0.05)* and the same was true for P2. From the figures below, you can see that many pageload/vismet subtests are affected by this tuning, and some much more than others (these tests used ~150 trials/replicates).

Standard deviation differences and levene's test p-values for warm: https://mozilla.modular.im/_matrix/media/r0/download/mozilla.modular.im/138acb09ba6ec6648c266a54d35ca1065d73f2b4
Standard deviation differences and levene's test p-values for cold: https://mozilla.modular.im/_matrix/media/r0/download/mozilla.modular.im/6a0d50ea33fc1516c7f9dc438841237b2642d504

Breakdown of the significant changes:

P2 - cold
Average noise diff: 0.7324947932242337
Significance: 0.006203328400687897
G5 - cold
Average noise diff: 0.9206374496589014
Significance: 0.04946482701654589

P2 - warm
Average noise diff: 0.6895326576007672
Significance: 0.01480210338650273
G5 - warm
Average noise diff: 0.8623252088414859
Significance: 0.0034517589585389393

So on pageload tests, P2 benefits the most from the tuning with reductions in variance of ~30%. G5 sees a smaller reduction of ~10%. Note that outliers were removed before analysis with the assumption that the data distribution follows a standard gaussian distribution.

Now onto the applink tests which are run on the same devices so the noise level should be similar and the tuning should also cause a decrease in variance. Not so. We actually see a ~200% (p < 0.001)*** increase in variance with tuning on the G5, and a 70% (p < 0.001)*** reduction in variance on the P2. So P2 is still most positively impacted by tuning, but G5 is adversely impacted instead.

Graph of data distribution for applink fenix: https://mozilla.modular.im/_matrix/media/r0/download/mozilla.modular.im/56973a992b0b99d4241b1ebd5b167eba6b6d0fe9

These results are in stark contrast to pageload tests and suggest that the tuning we do needs to be device-specific and also somewhat test-specific.

Greg, great analysis. I'm so glad you did this.

Should we attempt to tune the G5 for applink, or disable and then tune it?

Flags: needinfo?(gmierz2)

:acreskey, we should remove the applink tuning for G5 at least until we can get it working better.

:egao would you be able to look into the applink G5 tuning issues?

Flags: needinfo?(gmierz2) → needinfo?(egao)

(In reply to Greg Mierzwinski [:sparky] from comment #4)

:acreskey, we should remove the applink tuning for G5 at least until we can get it working better.

Agreed -- I'll do this now: https://bugzilla.mozilla.org/show_bug.cgi?id=1653293

(In reply to Greg Mierzwinski [:sparky] from comment #4)

:acreskey, we should remove the applink tuning for G5 at least until we can get it working better.

:egao would you be able to look into the applink G5 tuning issues?

I can make time for that, though I am going on parental leave as of Monday, July 20 for a month. Could it wait until then? If not, I suggest you assign someone to work on this.

Flags: needinfo?(egao)
See Also: → 1653855

(In reply to Greg Mierzwinski [:sparky] from comment #4)

:acreskey, we should remove the applink tuning for G5 at least until we can get it working better.

:egao would you be able to look into the applink G5 tuning issues?

:bc, is this something you might be able to look into in this quarter?

Flags: needinfo?(bob)
Severity: S1 → S3
Priority: P2 → P3

I'll ask.

I won't be able to take this.

Flags: needinfo?(bob)

Hey sparky I am closing this bug now that the G5 and P2 have been removed from CI

Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.