Closed Bug 1966228 Opened 10 months ago Closed 10 months ago

Baseline profile runs have stopped detecting startup functions entirely

Categories

(Firefox for Android :: Performance, defect)

All
Android
defect

Tracking

()

RESOLVED FIXED
140 Branch
Tracking Status
firefox138 --- unaffected
firefox139 --- unaffected
firefox140 --- fixed

People

(Reporter: mstange, Assigned: npoon)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: regression, Whiteboard: [fxdroid] [group4])

Attachments

(3 files)

Steps to reproduce:

  1. Look at the baseline-prof.txt artifact of a generate-baseline-profile-firebase-fenix run, for example from this range of autoland pushes.

Expected results:
Many lines should start with "HSP". The "S" is for "Startup".

Actual results:
After the landing of bug 1961852, none of the baseline-prof.txt artifacts have lines starting with "HSP" any more.
Except one: The generate-baseline-profile-firebase-fenix job that ran on the landing of bug 1961852 itself has HSP lines! I don't know why. My only guess is that it was a fluke - it's not just intermittent, literally none of the later jobs have HSP in them, as far as I can tell.

Without the "S", it means that baseline profiles are useless for startup performance.

from this job

from this job

Blocks: perf-android
See Also: → 1966234

Set release status flags based on info from the regressing bug 1961852

:royang, since you are the author of the regressor, bug 1961852, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(royang)

I noticed this bug by seeing that the simpleperf profiles collected in the various startup perf tests (e.g. perftest-android-hw-a55-aarch64-shippable-startup-fenix-homeview-startup-simpleperf, example job, example simpleperf profile) showed that all of Fenix's Java / Kotlin functions were running in the ART interpreter instead of being ahead-of-time compiled ("ART OAT"). Digging deeper, I saw that the adb shell dumpsys package dexopt output in these perf tests no longer contained [status=speed-profile] for the Fenix package. So these tests could have actually detected this bug. I've filed bug 1966234 so that we can catch this issue in the future, by failing startup perf CI tests if we don't see [status=speed-profile].

Assignee: nobody → npoon
Status: NEW → ASSIGNED
Keywords: regressionleave-open
No longer regressed by: 1961852
Whiteboard: [fxdroid] [group4]

I think the regressor is actually the landing of the different CUJs over in Bug 1887820 but the problem doesn't actually lie with this patch itself. Titouan and I paired recently and we realized that the baseline profiles that get returned after generation is not the combination of the profiles (when it should be doing so). It just returns the profile of the last CUJ that runs. By this, I mean that taskcluster just retrieves the last baseline profile instead of combining them and then returning one merged profile of all of the CUJs. In many cases, this is no longer the startup profile or launch intent CUJ, which is why the returned baseline-prof.txt doesn't have the HSP lines. In Roger's patch, I think it contained HSP because either the launch intent CUJ or startup profile ran last.

The toolbar patch in Bug 1961852 switched the CUJ tests order around by ignoring some of them so it would make sense as to why we got the impression that the toolbar patch caused this regression

Flags: needinfo?(royang)
Blocks: 1924726
Severity: -- → S3

As a result, Markus and I have discussed to temporarily disable CUJ generation until we can figure this out. I have filed a follow up bug for re-enabling the CUJ generation over in Bug 1966496

(In reply to Nicholas Poon [:Nick] from comment #5)

The toolbar patch in Bug 1961852 switched the CUJ tests order around by ignoring some of them so it would make sense as to why we got the impression that the toolbar patch caused this regression

Well, the specific problem of "no more HSP in the baseline-prof.txt artifacts" started once the CUJ test order was switched, so marking this as a regression from bug 1961852 is still correct. I'll put the annotation back. It doesn't mean bug 1961852 is at fault for the problem, it just means that circumstances came together in such a way that the specific problem from comment 0 started happening with the landing of bug 1961852.

Thanks for digging into this!

Keywords: regression
Regressed by: 1961852
Pushed by npoon@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2f8d1dbbacf6 Disable baseline profile CUJ generation from shipping with fenix builds r=android-reviewers,mstange,calu
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 140 Branch

This seems to have worked! Here's a task from a recent mozilla-central push and its baseline-prof.txt artifact starts with "HSP". Thanks!

And here's an imported simpleperf profile from the homeview-startup test from Nick's try push: https://share.firefox.dev/3GVFgri
A lot less red than before.

Regressions: 1967439
See Also: → 1966181
Duplicate of this bug: 1966626
Blocks: 1966181
See Also: 1966181
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: