2 - 50.73% raptor-tp6-* regression on push b0adb7067dffabf9c6e5b1255a8ac4be8768d32b (Thu October 10 2019)
Categories
(Firefox Build System :: Toolchains, defect)
Tracking
(firefox71 affected)
| Tracking | Status | |
|---|---|---|
| firefox71 | --- | affected |
People
(Reporter: marauder, Unassigned)
Details
(Keywords: perf, perf-alert, regression)
Raptor has detected a Firefox performance regression from push:
raptor-tp6-binast-instagram-firefox loadtime / raptor-tp6-facebook-firefox loadtime / raptor-tp6-facebook-firefox-cold / raptor-tp6-facebook-firefox-cold fcp / raptor-tp6-facebook-firefox-cold loadtime / raptor-tp6-fandom-firefox / raptor-tp6-fandom-firefox fcp / raptor-tp6-fandom-firefox loadtime / raptor-tp6-fandom-firefox-cold loadtime / raptor-tp6-google-mail-firefox loadtime / raptor-tp6-linkedin-firefox / raptor-tp6-linkedin-firefox fcp / raptor-tp6-outlook-firefox-cold / raptor-tp6-outlook-firefox-cold fcp / raptor-tp6-outlook-firefox-cold loadtime / raptor-tp6-pinterest-firefox loadtime / raptor-tp6-pinterest-firefox-cold loadtime / raptor-tp6-sheets-firefox fcp / raptor-tp6-sheets-firefox loadtime / raptor-tp6-twitch-firefox / raptor-tp6-yahoo-mail-firefox loadtime / raptor-tp6-yahoo-news-firefox / raptor-tp6-yahoo-news-firefox fcp / raptor-tp6-yandex-firefox loadtime / raptor-webaudio-firefox (linux64-shippable, linux64-shippable-qr, windows7-32-shippable)
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
51% raptor-tp6-outlook-firefox-cold fcp windows7-32-shippable opt 245.75 -> 370.42
24% raptor-tp6-outlook-firefox-cold windows7-32-shippable opt 251.59 -> 311.59
21% raptor-tp6-outlook-firefox-cold windows7-32-shippable opt 248.76 -> 300.40
9% raptor-tp6-fandom-firefox fcp linux64-shippable opt 170.31 -> 185.46
7% raptor-tp6-twitch-firefox linux64-shippable-qr opt 138.49 -> 147.49
6% raptor-tp6-linkedin-firefox fcp linux64-shippable-qr opt 566.75 -> 603.42
6% raptor-tp6-linkedin-firefox linux64-shippable-qr opt 708.11 -> 751.35
6% raptor-tp6-linkedin-firefox fcp linux64-shippable opt 545.58 -> 577.46
5% raptor-tp6-yahoo-mail-firefox loadtime linux64-shippable opt 403.29 -> 424.83
5% raptor-tp6-google-mail-firefox loadtime linux64-shippable-qr opt 418.00 -> 439.58
5% raptor-tp6-linkedin-firefox linux64-shippable opt 681.31 -> 715.16
5% raptor-tp6-outlook-firefox-cold fcp linux64-shippable-qr opt 390.50 -> 408.25
4% raptor-tp6-fandom-firefox linux64-shippable opt 158.53 -> 165.30
4% raptor-tp6-fandom-firefox loadtime linux64-shippable opt 182.15 -> 189.50
4% raptor-tp6-google-mail-firefox loadtime linux64-shippable opt 399.92 -> 415.33
4% raptor-tp6-yahoo-news-firefox fcp linux64-shippable-qr opt 348.12 -> 361.00
3% raptor-tp6-yandex-firefox loadtime linux64-shippable opt 206.44 -> 213.38
3% raptor-tp6-facebook-firefox-cold loadtime linux64-shippable-qr opt 850.21 -> 877.25
3% raptor-tp6-binast-instagram-firefox loadtime linux64-shippable opt 455.77 -> 469.38
3% raptor-tp6-sheets-firefox fcp linux64-shippable-qr opt 403.33 -> 414.71
3% raptor-tp6-facebook-firefox-cold fcp linux64-shippable opt 355.42 -> 364.92
3% raptor-tp6-outlook-firefox-cold loadtime linux64-shippable-qr opt 399.92 -> 410.50
3% raptor-tp6-yahoo-news-firefox linux64-shippable-qr opt 384.48 -> 394.57
3% raptor-tp6-binast-instagram-firefox loadtime linux64-shippable-qr opt 471.35 -> 483.42
3% raptor-tp6-facebook-firefox-cold loadtime linux64-shippable opt 833.08 -> 854.00
2% raptor-tp6-facebook-firefox-cold linux64-shippable opt 426.27 -> 436.70
2% raptor-tp6-facebook-firefox loadtime linux64-shippable opt 399.23 -> 408.71
2% raptor-tp6-fandom-firefox-cold loadtime linux64-shippable-qr opt 423.08 -> 432.92
2% raptor-tp6-pinterest-firefox-cold loadtime linux64-shippable-qr opt 1,044.25 -> 1,068.50
2% raptor-tp6-pinterest-firefox-cold loadtime linux64-shippable opt 1,025.33 -> 1,048.58
2% raptor-webaudio-firefox linux64-shippable-qr opt 149.04 -> 152.33
2% raptor-tp6-sheets-firefox loadtime linux64-shippable-qr opt 854.77 -> 872.21
2% raptor-tp6-pinterest-firefox loadtime linux64-shippable opt 958.85 -> 978.04
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=23429
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.
To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/TestEngineering/Performance/Raptor
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling
| Reporter | ||
Updated•6 years ago
|
This came up before on the first landing of the patch, and we've been investigating for the past two weeks. There isn't anything actionable to go on.
While an alarming "51% regression" might seem like the cause would be obvious on a profile, the score is a misleading artifact of the way the test is set up. The tp6 test takes a median of 25 runs. This causes problems when the data points are not even but clustered around certain values. With fcp, depending on the timing of the refresh tick, we might load N scripts or N+1 scripts or N+2 scripts (etc) before we do the paint. The values on the outlook-cold test end up clustered around 240ms, 400ms, and 500ms. I've sorted them and added spaces for comparison:
Before {233, 233, 235, 235, 235, 235, 235, 236, 237, 237, 239, 240, 240, 241, 246, 246, 251, 251, 385, 387, 393, 409, 498, 501, 535}
After {218, 222, 224, 225, 235, 236, 237, 240, 241, 242, 242, 250, 370, 372, 379, 383, 385, 386, 387, 393, 394, 401, 502, 504, 518}
The "after" build produces faster numbers within a bucket, but hits the larger buckets more often, which increases the median from 246 to 370. This might be a counterintuitive effect where loading the first few scripts faster leaves enough time to start an extra one before the refresh tick comes in.
While the above accounts for the fcp regressions (and the "overall" scores too, since they include fcp as a component) it does not account for the loadtime subtest, which captures the execution of all scripts and shouldn't depend on where you slice it. However, when I do try runs with profiling enabled, I can't reproduce more than about 1% regression which is too small for me to spend any more time on at this point.
| Reporter | ||
Comment 2•6 years ago
|
||
Thanks for the details!
Description
•