44 - 2.97% twitch FirstVisualChange / tumblr PerceptualSpeedIndex + 7 more (Linux) regression on Tue June 15 2021
Categories
(Firefox Build System :: Toolchains, defect)
Tracking
(firefox-esr78 unaffected, firefox89 unaffected, firefox90 unaffected, firefox91 affected)
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox89 | --- | unaffected |
firefox90 | --- | unaffected |
firefox91 | --- | affected |
People
(Reporter: Bebe, Assigned: mhentges)
References
(Regression)
Details
(Keywords: perf, perf-alert, regression)
Attachments
(1 obsolete file)
Perfherder has detected a browsertime performance regression from push cd54d705a55dcb037a99b25d89475fba2bafe795. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
44% | twitch | FirstVisualChange | linux1804-64-shippable | cold | 83.33 -> 120.00 |
44% | twitch | FirstVisualChange | linux1804-64-shippable | cold | 83.33 -> 120.00 |
18% | FirstVisualChange | linux1804-64-shippable | cold | 203.33 -> 240.00 | |
14% | FirstVisualChange | linux1804-64-shippable-qr | cold webrender | 280.00 -> 320.00 | |
11% | netflix | FirstVisualChange | linux1804-64-shippable-qr | cold webrender | 251.67 -> 280.00 |
5% | fcp | linux1804-64-shippable | cold | 209.75 -> 220.96 | |
4% | SpeedIndex | linux1804-64-shippable-qr | cold webrender | 457.46 -> 477.25 | |
4% | fnbpaint | linux1804-64-shippable-qr | cold webrender | 271.46 -> 282.71 | |
3% | tumblr | PerceptualSpeedIndex | linux1804-64-shippable | cold | 727.42 -> 749.00 |
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
17% | netflix | fcp | linux1804-64-shippable-qr | warm webrender | 95.62 -> 79.46 |
17% | netflix | fnbpaint | linux1804-64-shippable-qr | warm webrender | 96.85 -> 80.58 |
6% | tumblr | fnbpaint | linux1804-64-shippable-qr | warm webrender | 280.60 -> 264.21 |
5% | youtube | fnbpaint | linux1804-64-shippable-qr | warm webrender | 256.29 -> 243.42 |
5% | fcp | linux1804-64-shippable | warm | 188.18 -> 178.88 | |
... | ... | ... | ... | ... | ... |
2% | microsoft | loadtime | linux1804-64-shippable-qr | warm webrender | 336.90 -> 329.33 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Comment 1•3 years ago
|
||
Set release status flags based on info from the regressing bug 1690377
Comment 2•3 years ago
•
|
||
FWIW, the chances of these regressions really having been caused by the NodeJS upgrade strike me as pretty low.
Unfortunately, I'm out on PTO, so I'm going to have to depend on some folks here.
First off, my sense is that backing this out should be avoided at all possible, because we've just asked all the developers to upgrade their toolchains to Node 12, and since --enable-bootstrap was landed after this patch, if we back it out, everybody will have to downgrade by running mach bootstrap
manually, which sounds exceedingly painful. mhentges, would you agree? If so, can you help prevent that from happening?
The main thing that the NodeJS upgrade would effect in terms of what gets built on the build machines, as far as I know, is the bundling of devtools. That strikes me as quite unlikely to have caused the perf regressions linked to here, but maybe there's something I don't know about how the devtools code works. Honza, what are your thoughts on this?
One possibility that springs to mind is if PerfHerder itself uses the in-tree Node toolchain (eg a Node script does the testing and timing using web driver or puppeteer or something), and somehow changing versions of Node changed that timing. The chances of this strike me as fairly low, but I've been wrong before! :Bebe, would you please reach out to a perfherder owner and ask them if that theory is even slightly plausible, or if they have any better suggestions of things to look into?
Mardak, Standard8, can you guys think of anything that could explain this?
mhentges, would you be willing to take this bug in my absence and drive it?
Thanks to all for any and all help/input..
Comment 3•3 years ago
|
||
since --enable-bootstrap was landed after this patch
Only for mac right now.
everybody will have to downgrade by running mach bootstrap manually
actually, with --enable-bootstrap, downgrade would be automatic. The pain would be when --enable-bootstrap is not used.
Comment 4•3 years ago
|
||
One possibility that springs to mind is if PerfHerder itself uses the in-tree Node toolchain (eg a Node script does the testing and timing using web driver or puppeteer or something), and somehow changing versions of Node changed that timing.
An easy way to check if it's the node version for tests would be to change the fetches definitions for tests to use the node 10 toolchains instead of node 12.
Comment 5•3 years ago
|
||
I can't think of anything else apart from the possible effect of test harnesses.
Comment 6•3 years ago
|
||
I am not seeing anything related to DevTools in the alert summary
https://treeherder.mozilla.org/perfherder/alerts?id=30404
Julian, is there anything I am missing?
Honza
Comment 7•3 years ago
•
|
||
At runtime, nothing in DevTools or Remote uses node. However, the harness itself uses node I think (https://searchfox.org/mozilla-central/source/testing/raptor/browsertime). If there's any performance regression linked to updating node I guess it's related to the harness?
Regarding bundling, node is used to bundle the DevTools debugger UI, but this only loaded if you open DevTools, which is not the case here.
Comment 8•3 years ago
|
||
:Honza, :jdescottes: I was just imagining some exceedingly unlikely world where the bundling step generated some bit of code that got run at startup, despite the fact that devtools itself doesn't start at that point, and somehow the version of node affected what got generated into the bundle. In other words, I was grasping at straws. :-)
Comment 9•3 years ago
|
||
(In reply to Dan Mosedale (:dmose, :dmosedale) from comment #8)
:Honza, :jdescottes: I was just imagining some exceedingly unlikely world where the bundling step generated some bit of code that got run at startup, despite the fact that devtools itself doesn't start at that point, and somehow the version of node affected what got generated into the bundle. In other words, I was grasping at straws. :-)
That's understandable, DevTools have a bad record of slowing down the browser startup unexpectedly :)
FWIW, we mentioned this to the rest of the devtools peers, in case we missed something.
Comment 10•3 years ago
|
||
Comment 11•3 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #3)
since --enable-bootstrap was landed after this patch
Only for mac right now.
It sounds like we should definitely try to avoid a backout.
(In reply to Mike Hommey [:glandium] from comment #4)
An easy way to check if it's the node version for tests would be to change the fetches definitions for tests to use the node 10 toolchains instead of node 12.
I've put up a simple patch which should make perftest run using node-10 on Linux:
./mach try --no-artifact --rebuild 3 fuzzy
and then I selected the jobs that matched linux-browsertime. Unfortunately, this keeps failing with connection errors, almost certainly because I'm on a flight with a very flaky internet connection.
I think the next step is to run two try builds like that, one with my patch, and one without, and compare the jobs. I've been using https://wiki.mozilla.org/TestEngineering/Performance/RunningTests as my playbook, as I haven't done this before.
Assignee | ||
Comment 12•3 years ago
|
||
I'll be the unlucky soul who picks up this ticket :)
I've taken Dan's patch and have pushed the following to try
:
- All
linux-browsertime
oncentral
(2ad6d79
): treeherder - All
linux-browsertime
onD118387
: treeherder
I'll keep an eye on this and continue based on results.
Assignee | ||
Comment 13•3 years ago
|
||
Comparison is here.
Note that the twitch
FirstVisualChange
tests did not have the same level of regression (I searched for "twitch FirstVisualChange" to spot them in the big comparison list).
However, I'm seeing a lot of "jitter" here with all sorts of results that don't match the alert as reported.
I'm going to do a second try
run off central
to compare against - if it also significantly varies from the original central
run, then I'll have increased confidence that this NodeJS change didn't cause the perf regression.
Assignee | ||
Comment 14•3 years ago
|
||
OK, I have some solid evidence that this regression may not be due to the NodeJS change.
Take a look at this comparison between ea659615bc03dde68a1fd9f486b4b7b6cb4450e8
(central
+ try push revision) and 6c5f580a8d5f7c39cb36ecfb8464e0d1e4dd5b5a
(central
+ a try push revision made at a different time). They vary pretty greatly (amazon ContentfulSpeedIndex opt cold
has an 11% regression?) despite the fact that they have the same code.
I'm going to bug :sparky to see how I can wrangle these numbers.
Updated•3 years ago
|
Reporter | ||
Comment 15•3 years ago
|
||
removing the ni? and waiting for :sparky for more information
Comment 16•3 years ago
|
||
(In reply to Mitchell Hentges [:mhentges] 🦀 from comment #14)
OK, I have some solid evidence that this regression may not be due to the NodeJS change.
Take a look at this comparison betweenea659615bc03dde68a1fd9f486b4b7b6cb4450e8
(central
+ try push revision) and6c5f580a8d5f7c39cb36ecfb8464e0d1e4dd5b5a
(central
+ a try push revision made at a different time). They vary pretty greatly (amazon ContentfulSpeedIndex opt cold
has an 11% regression?) despite the fact that they have the same code.I'm going to bug :sparky to see how I can wrangle these numbers.
:mhentges, the reason your comparison numbers are bad is because the vismet tasks all ran on the same data. I noticed that you used --rebuild
which doesn't work for browsertime and vismet. You'll need to use something like the add-new-jobs action or the retrigger to add more trials.
Assignee | ||
Comment 17•3 years ago
|
||
Ok, sending these again, but this time without --rebuild
.
My command for each of these was ./mach try fuzzy --no-artifact
- All
linux-browsertime
oncentral
(70be81
): treeherder - All
linux-browsertime
oncentral
(70be81
), again: treeherder - All
linux-browsertime
oncentral
+D118387
: treeherder
(also clearing obsolete NI for edilee)
Assignee | ||
Comment 18•3 years ago
•
|
||
The results are in!
central
-to-central
comparison: only one "important" change. Still a -77% delta, but still, one one that fell out of line.central
-to-D118387
comparison: only two "important" changes, and both are improvements, if anything.
:sparky, do these results and my conclusions seem accurate to you? Can we mark this ticket as invalid?
Comment 19•3 years ago
|
||
I would accept these changes and WONTFIX rather than INVALID since they are valid but we shouldn't be concerned about metric changes caused by infrastructure changes.
The differences seem to be caused by changes in the variance of the pageload - the variance decreased so we're seeing improvements in the pageload numbers.
Assignee | ||
Comment 20•3 years ago
|
||
Works for me, thanks :) 👍
Updated•3 years ago
|
Updated•3 years ago
|
Description
•