Closed Bug 1851578 Opened 9 months ago Closed 8 months ago

26 - 19% nytimes ContentfulSpeedIndex / pinterest FirstVisualChange + more (Windows, Linux, Android, OSX ) regression on Tue August 29 2023

Categories

(Core :: DOM: Core & HTML, defect)

defect

Tracking

()

RESOLVED INVALID
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- unaffected
firefox117 --- unaffected
firefox118 --- unaffected
firefox119 + wontfix

People

(Reporter: aesanu, Unassigned)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 file)

Perfherder has detected a browsertime performance regression from push 0b374bcf3727e39fb3a5bbcc66005c9dc4690f8d. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
26% nytimes ContentfulSpeedIndex windows10-64-shippable-qr fission warm webrender 281.34 -> 355.82
19% pinterest FirstVisualChange linux1804-64-shippable-qr cold fission webrender 696.89 -> 830.78
19% pinterest fcp linux1804-64-shippable-qr cold fission webrender 642.70 -> 763.83
13% imdb ContentfulSpeedIndex android-hw-a51-11-0-aarch64-shippable-qr cold webrender 2,472.08 -> 2,798.38
13% nytimes fcp windows10-64-shippable-qr cold fission webrender 329.27 -> 371.17
12% nytimes FirstVisualChange windows10-64-shippable-qr cold fission webrender 363.39 -> 407.69
12% pinterest SpeedIndex macosx1015-64-shippable-qr fission warm webrender 331.65 -> 371.17 Before/After
10% pinterest ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,105.71 -> 1,213.02
9% buzzfeed fcp windows10-64-shippable-qr cold fission webrender 569.00 -> 618.82
9% buzzfeed fcp linux1804-64-shippable-qr cold fission webrender 714.04 -> 775.02
... ... ... ... ... ...
5% linkedin fcp macosx1015-64-shippable-qr cold fission webrender 395.43 -> 416.65 Before/After
5% expedia LastVisualChange linux1804-64-shippable-qr fission warm webrender 1,366.68 -> 1,432.20
4% netflix SpeedIndex linux1804-64-shippable-qr fission warm webrender 855.97 -> 893.63
4% netflix PerceptualSpeedIndex linux1804-64-shippable-qr fission warm webrender 736.14 -> 767.90
2% nytimes SpeedIndex windows10-64-shippable-qr fission warm webrender 1,005.02 -> 1,027.34

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
31% linkedin loadtime windows10-64-shippable-qr cold fission webrender 2,177.42 -> 1,513.38
25% twitch loadtime windows10-64-shippable-qr cold fission webrender 1,662.34 -> 1,241.32
24% linkedin loadtime windows10-64-shippable-qr fission warm webrender 1,303.12 -> 986.86
11% linkedin loadtime macosx1015-64-shippable-qr fission warm webrender 1,034.76 -> 917.82 Before/After
11% linkedin ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 2,188.23 -> 1,941.16
... ... ... ... ... ...
3% facebook-cristiano loadtime android-hw-a51-11-0-aarch64-shippable-qr warm webrender 667.65 -> 650.78

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(arai.unmht)

Set release status flags based on info from the regressing bug 1845668

These patches can change the ordering of JS execution and layout.

Most of regressions/improvements come from the bug 1845668 patch part 1, which changes how main thread task gets resumed when off-thread compilation finishes:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=6451393187384a4f56faf8024e8c22c2b54e7dde&newProject=try&newRevision=f235bdba461a5fe1515a43d74fbc23f5c2dcbee1&framework=13&page=3&showOnlyImportant=1

The regressions are mostly in visual part, which frequently happens if there's race between JS and other resources. I'll verify that this and next week.
The improvements are in loadtime, which also need investigation why that happened.
Anyway I think these are not serious things that needs backout.

I don't observe the improvement or regression on local run in any suite.

Then, in nytimes case, there's certainly race between resources (JS, font, image, etc), and the visual score depends on which one becomes available first.

(apart from the benchmark score, I noticed that it's better separating the [A,B] and [C,D] range explained in https://phabricator.services.mozilla.com/D184897#6113651 from the off-thread compilation/decode marker, given [C,D] part dominate the time, while it's mostly just waiting on the main-thread's other tasks. of course it's still nice to visualize the range tho)

Tracking for Fx119 for a resolution on the investigation - RE: comment 2 and comment 6

I'm looking at the time series results and I don't see any notable regressions at all. I would have expected to see the 30% regression in CSI for nytimes but I don't see any real movement from the attached graph. I think perhaps this was incorrectly marked due to noise.

I also tried to reproduce this on try and could not. I think I'm going to mark this as invalid, but please reopen if there is more evidence. Thanks!

Flags: needinfo?(arai.unmht)
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: