Closed Bug 1864194 Opened 11 months ago Closed 10 months ago

93.03 - 37.73% welcome loadtime / welcome fcp + 9 more (OSX, Windows) regression on Tue Nov 07 2023

Categories

(Firefox :: Messaging System, defect, P1)

Firefox 121
Desktop
Unspecified
defect
Points:
5

Tracking

()

RESOLVED FIXED
121 Branch
Iteration:
122.2 - Dec 4 - Dec 15
Tracking Status
firefox-esr115 --- unaffected
firefox119 --- unaffected
firefox120 --- unaffected
firefox121 --- fixed

People

(Reporter: afinder, Assigned: nsauermann)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 obsolete file)

Perfherder has detected a browsertime performance regression from push 0bda9015bace079f555c7ad49ffe196459218ee0. As author of one of the patches included in that push, we need your help to address this regression.

== Change summary for alert #40149 (as of Tue, 07 Nov 2023 14:18:31 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
93% welcome loadtime linux1804-64-shippable-qr cold fission webrender 72.64 -> 140.27 Before/After
84% welcome loadtime windows10-64-shippable-qr cold fission webrender 53.49 -> 98.22 Before/After
59% welcome loadtime linux1804-64-shippable-qr fission warm webrender 44.90 -> 71.48 Before/After
58% welcome loadtime macosx1015-64-shippable-qr fission warm webrender 28.36 -> 44.86 Before/After
38% welcome loadtime windows10-64-shippable-qr fission warm webrender 32.88 -> 45.29 Before/After
32% welcome fcp windows10-64-shippable-qr fission warm webrender 39.98 -> 52.85 Before/After
11% welcome fcp windows10-64-shippable-qr cold fission webrender 80.44 -> 88.90 Before/After
2% welcome ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,195.23 -> 1,224.41 Before/After
2% welcome PerceptualSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,197.07 -> 1,221.73 Before/After
2% welcome SpeedIndex linux1804-64-shippable-qr cold fission webrender 1,194.73 -> 1,218.82 Before/After
2% welcome FirstVisualChange linux1804-64-shippable-qr cold fission webrender 1,157.48 -> 1,180.73 Before/After

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
9% welcome fcp linux1804-64-shippable-qr fission warm webrender 59.07 -> 53.99 Before/After
6% welcome fcp linux1804-64-shippable-qr cold fission webrender 130.94 -> 122.95 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=40149

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(nsauermann)

Set release status flags based on info from the regressing bug 1859583

Hi Alex! I'm a bit unsure of how that patch caused these regressions. Is it possible to re-trigger the jobs to confirm the before-and-after behaviour? I'm not familiar with these regressions so I might be overlooking something, thanks!

If it retriggers with the same severity, feel free to backout the patch and I can investigate.

Flags: needinfo?(nsauermann) → needinfo?(afinder)

(In reply to Negin from comment #2)

Hi Alex! I'm a bit unsure of how that patch caused these regressions. Is it possible to re-trigger the jobs to confirm the before-and-after behaviour? I'm not familiar with these regressions so I might be overlooking something, thanks!

If it retriggers with the same severity, feel free to backout the patch and I can investigate.

Hi Negin, I triggered some extra data points before and after the revision, as visible in the graph, but it looks like a pretty clear regression. Let me know with a needinfo if I can help with other details.

Flags: needinfo?(afinder)

Thanks for confirming! Given that my team is out for the week for a work week and I have to do some investigation, I think it's unlikely that this will be addressed in a timely manner. Please feel free to back out the patch as it's not user facing and unlikely to be addressed this week.

Flags: needinfo?(afinder)

Fixed by backout.

Status: NEW → RESOLVED
Closed: 11 months ago
Flags: needinfo?(afinder)
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch

Hi Alex! Sorry for the multiple NIs, this is my first rodeo with performance regressions. I wanted to confirm if I'm going about this correctly and clarify if I'm looking at the correct metrics.

I've done several try runs with browsertime-first-install-firefox-welcome removing bits of the code and comparing it to the regressions chart you created above.

I then compared one of the cases linux1804-64-shippable-qr with the corresponding before load time values. Each reversion still showed similar after values i.e.:

134.8	ms	
welcome fcp opt cold fission webrender

But what's confusing is that when comparing the completely reverted patch with the original patch, I'm not seeing the same before ratios. As a sanity check, I did another try run with the original patch and then with it completely reverted.

original patch

127.9	ms	
welcome fcp opt cold fission webrender

completely reverted try run

130.2	ms	
welcome fcp opt cold fission webrender

So I'm wondering if I'm debugging this incorrectly and if you have any pointers? Or if there's an explanation for the difference in ratios for the before/afters.

Flags: needinfo?(afinder)

Ah actually, disregard the above. I was looking at the wrong value welcome fcp opt cold fission webrender instead of welcome loadtime opt cold fission webrender. I can see the regression now!

Culprit seems to be the changes to removing ShellServices calls as previous try run shows reduction in ms, I just wasn't looking at the right values initially!

Flags: needinfo?(afinder)

Reopening this bug, will be using this bug to reintroduce ShellService call removal and investgate next steps (changing load time metric vs other other potential improvements)

Assignee: nobody → nsauermann
Iteration: --- → 122.2 - Dec 4 - Dec 15
Points: --- → 3
Priority: -- → P1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Points: 3 → 5

Note that the set default code on mozilla-central has changed from synchronous to asynchronous in https://bugzilla.mozilla.org/show_bug.cgi?id=1863980 in the last small number of days, and this has been uplifted to 121, so any testing that is done in 121 or later is likely to reflect different behavior & performance characteristics than described earlier in this bug.

Note that the perf characteristics will change again in 122 in bug 1868410, which I expect to have more details soon. I've added it to See Also. I wonder if re-landing the backed-out stuff would be easier to track in either the (re-opened) bug where they originally landed or in a new bug altogether...

See Also: → 1868410

Which is to say, it seems like it doesn't make sense to have this bug open, since the perf regression here was fixed by backout, which is complete.

(In reply to Dan Mosedale (:dmosedale, :dmose) from comment #12)

Which is to say, it seems like it doesn't make sense to have this bug open, since the perf regression here was fixed by backout, which is complete.

Ah that makes sense to me! Sorry I misunderstood your first comment. I'll close this ticket and make a new one for the ShellService calls.

Status: REOPENED → RESOLVED
Closed: 11 months ago10 months ago
Resolution: --- → FIXED
See Also: → 1868662
Attachment #9367304 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: