1717198 - 44 - 2.97% twitch FirstVisualChange / tumblr PerceptualSpeedIndex + 7 more (Linux) regression on Tue June 15 2021

Reporter

Description

•

3 years ago

Perfherder has detected a browsertime performance regression from push cd54d705a55dcb037a99b25d89475fba2bafe795. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio	Suite	Test	Platform	Options	Absolute values (old vs new)
44%	twitch	FirstVisualChange	linux1804-64-shippable	cold	83.33 -> 120.00
44%	twitch	FirstVisualChange	linux1804-64-shippable	cold	83.33 -> 120.00
18%	twitter	FirstVisualChange	linux1804-64-shippable	cold	203.33 -> 240.00
14%	reddit	FirstVisualChange	linux1804-64-shippable-qr	cold webrender	280.00 -> 320.00
11%	netflix	FirstVisualChange	linux1804-64-shippable-qr	cold webrender	251.67 -> 280.00
5%	twitter	fcp	linux1804-64-shippable	cold	209.75 -> 220.96
4%	reddit	SpeedIndex	linux1804-64-shippable-qr	cold webrender	457.46 -> 477.25
4%	reddit	fnbpaint	linux1804-64-shippable-qr	cold webrender	271.46 -> 282.71
3%	tumblr	PerceptualSpeedIndex	linux1804-64-shippable	cold	727.42 -> 749.00

Improvements:

Ratio	Suite	Test	Platform	Options	Absolute values (old vs new)
17%	netflix	fcp	linux1804-64-shippable-qr	warm webrender	95.62 -> 79.46
17%	netflix	fnbpaint	linux1804-64-shippable-qr	warm webrender	96.85 -> 80.58
6%	tumblr	fnbpaint	linux1804-64-shippable-qr	warm webrender	280.60 -> 264.21
5%	youtube	fnbpaint	linux1804-64-shippable-qr	warm webrender	256.29 -> 243.42
5%	twitter	fcp	linux1804-64-shippable	warm	188.18 -> 178.88
...	...	...	...	...	...
2%	microsoft	loadtime	linux1804-64-shippable-qr	warm webrender	336.90 -> 329.33

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(dmosedale)

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

3 years ago

Set release status flags based on info from the regressing bug 1690377

status-firefox89: --- → unaffected

status-firefox90: --- → unaffected

status-firefox-esr78: --- → unaffected

Dan Mosedale (:dmosedale, :dmose)

Comment 2

•

3 years ago

•

Edited

FWIW, the chances of these regressions really having been caused by the NodeJS upgrade strike me as pretty low.

Unfortunately, I'm out on PTO, so I'm going to have to depend on some folks here.

First off, my sense is that backing this out should be avoided at all possible, because we've just asked all the developers to upgrade their toolchains to Node 12, and since --enable-bootstrap was landed after this patch, if we back it out, everybody will have to downgrade by running mach bootstrap manually, which sounds exceedingly painful. mhentges, would you agree? If so, can you help prevent that from happening?

The main thing that the NodeJS upgrade would effect in terms of what gets built on the build machines, as far as I know, is the bundling of devtools. That strikes me as quite unlikely to have caused the perf regressions linked to here, but maybe there's something I don't know about how the devtools code works. Honza, what are your thoughts on this?

One possibility that springs to mind is if PerfHerder itself uses the in-tree Node toolchain (eg a Node script does the testing and timing using web driver or puppeteer or something), and somehow changing versions of Node changed that timing. The chances of this strike me as fairly low, but I've been wrong before! :Bebe, would you please reach out to a perfherder owner and ask them if that theory is even slightly plausible, or if they have any better suggestions of things to look into?

Mardak, Standard8, can you guys think of anything that could explain this?

mhentges, would you be willing to take this bug in my absence and drive it?

Thanks to all for any and all help/input..

Flags: needinfo?(standard8)

Flags: needinfo?(odvarko)

Flags: needinfo?(mhentges)

Flags: needinfo?(fstrugariu)

Flags: needinfo?(edilee)

Flags: needinfo?(dmosedale)

Mike Hommey [:glandium]

Comment 3

•

3 years ago

since --enable-bootstrap was landed after this patch

Only for mac right now.

everybody will have to downgrade by running mach bootstrap manually

actually, with --enable-bootstrap, downgrade would be automatic. The pain would be when --enable-bootstrap is not used.

Mike Hommey [:glandium]

Comment 4

•

3 years ago

One possibility that springs to mind is if PerfHerder itself uses the in-tree Node toolchain (eg a Node script does the testing and timing using web driver or puppeteer or something), and somehow changing versions of Node changed that timing.

An easy way to check if it's the node version for tests would be to change the fetches definitions for tests to use the node 10 toolchains instead of node 12.

Mark Banner (:standard8)

Comment 5

•

3 years ago

I can't think of anything else apart from the possible effect of test harnesses.

Flags: needinfo?(standard8)

Jan Honza Odvarko [:Honza] (always need-info? me)

Comment 6

•

3 years ago

I am not seeing anything related to DevTools in the alert summary
https://treeherder.mozilla.org/perfherder/alerts?id=30404
Julian, is there anything I am missing?

Honza

Flags: needinfo?(odvarko) → needinfo?(jdescottes)

Julian Descottes [:jdescottes]

Comment 7

•

3 years ago

•

Edited

At runtime, nothing in DevTools or Remote uses node. However, the harness itself uses node I think (https://searchfox.org/mozilla-central/source/testing/raptor/browsertime). If there's any performance regression linked to updating node I guess it's related to the harness?

Regarding bundling, node is used to bundle the DevTools debugger UI, but this only loaded if you open DevTools, which is not the case here.

Flags: needinfo?(jdescottes)

Dan Mosedale (:dmosedale, :dmose)

Comment 8

•

3 years ago

:Honza, :jdescottes: I was just imagining some exceedingly unlikely world where the bundling step generated some bit of code that got run at startup, despite the fact that devtools itself doesn't start at that point, and somehow the version of node affected what got generated into the bundle. In other words, I was grasping at straws. :-)

Julian Descottes [:jdescottes]

Comment 9

•

3 years ago

(In reply to Dan Mosedale (:dmose, :dmosedale) from comment #8)

:Honza, :jdescottes: I was just imagining some exceedingly unlikely world where the bundling step generated some bit of code that got run at startup, despite the fact that devtools itself doesn't start at that point, and somehow the version of node affected what got generated into the bundle. In other words, I was grasping at straws. :-)

That's understandable, DevTools have a bad record of slowing down the browser startup unexpectedly :)
FWIW, we mentioned this to the rest of the devtools peers, in case we missed something.

Dan Mosedale (:dmosedale, :dmose)

Comment 10

•

3 years ago

Attached file WIP: Bug 1717198 - Revert Linux perf tests to Node 10 (obsolete) — Details

Dan Mosedale (:dmosedale, :dmose)

Comment 11

•

3 years ago

(In reply to Mike Hommey [:glandium] from comment #3)

since --enable-bootstrap was landed after this patch

Only for mac right now.

It sounds like we should definitely try to avoid a backout.

(In reply to Mike Hommey [:glandium] from comment #4)

An easy way to check if it's the node version for tests would be to change the fetches definitions for tests to use the node 10 toolchains instead of node 12.

I've put up a simple patch which should make perftest run using node-10 on Linux:

  ./mach try --no-artifact --rebuild 3 fuzzy

and then I selected the jobs that matched linux-browsertime. Unfortunately, this keeps failing with connection errors, almost certainly because I'm on a flight with a very flaky internet connection.

I think the next step is to run two try builds like that, one with my patch, and one without, and compare the jobs. I've been using https://wiki.mozilla.org/TestEngineering/Performance/RunningTests as my playbook, as I haven't done this before.

Mitchell Hentges [:mhentges] 🦀

Assignee

Comment 12

•

3 years ago

I'll be the unlucky soul who picks up this ticket :)
I've taken Dan's patch and have pushed the following to try:

All linux-browsertime on central (2ad6d79): treeherder
All linux-browsertime on D118387: treeherder

I'll keep an eye on this and continue based on results.

Assignee: nobody → mhentges

Status: NEW → ASSIGNED

Flags: needinfo?(mhentges)

Mitchell Hentges [:mhentges] 🦀

Assignee

Comment 13

•

3 years ago

Comparison is here.
Note that the twitch FirstVisualChange tests did not have the same level of regression (I searched for "twitch FirstVisualChange" to spot them in the big comparison list).

However, I'm seeing a lot of "jitter" here with all sorts of results that don't match the alert as reported.
I'm going to do a second try run off central to compare against - if it also significantly varies from the original central run, then I'll have increased confidence that this NodeJS change didn't cause the perf regression.

Mitchell Hentges [:mhentges] 🦀

Assignee

Comment 14

•

3 years ago

OK, I have some solid evidence that this regression may not be due to the NodeJS change.
Take a look at this comparison between ea659615bc03dde68a1fd9f486b4b7b6cb4450e8 (central + try push revision) and 6c5f580a8d5f7c39cb36ecfb8464e0d1e4dd5b5a (central + a try push revision made at a different time). They vary pretty greatly (amazon ContentfulSpeedIndex opt cold has an 11% regression?) despite the fact that they have the same code.

I'm going to bug :sparky to see how I can wrangle these numbers.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Flags: needinfo?(gmierz2)

Florin Strugariu [:Bebe]

Reporter

Comment 15

•

3 years ago

removing the ni? and waiting for :sparky for more information

Flags: needinfo?(fstrugariu)

Greg Mierzwinski [:sparky]

Comment 16

•

3 years ago

(In reply to Mitchell Hentges [:mhentges] 🦀 from comment #14)

OK, I have some solid evidence that this regression may not be due to the NodeJS change.
Take a look at this comparison between ea659615bc03dde68a1fd9f486b4b7b6cb4450e8 (central + try push revision) and 6c5f580a8d5f7c39cb36ecfb8464e0d1e4dd5b5a (central + a try push revision made at a different time). They vary pretty greatly (amazon ContentfulSpeedIndex opt cold has an 11% regression?) despite the fact that they have the same code.

I'm going to bug :sparky to see how I can wrangle these numbers.

:mhentges, the reason your comparison numbers are bad is because the vismet tasks all ran on the same data. I noticed that you used --rebuild which doesn't work for browsertime and vismet. You'll need to use something like the add-new-jobs action or the retrigger to add more trials.

Flags: needinfo?(gmierz2)

Mitchell Hentges [:mhentges] 🦀

Assignee

Comment 17

•

3 years ago

Ok, sending these again, but this time without --rebuild.
My command for each of these was ./mach try fuzzy --no-artifact

All linux-browsertime on central (70be81): treeherder
All linux-browsertime on central (70be81), again: treeherder
All linux-browsertime on central + D118387: treeherder

(also clearing obsolete NI for edilee)

Flags: needinfo?(edilee)

Mitchell Hentges [:mhentges] 🦀

Assignee

Comment 18

•

3 years ago

•

Edited

The results are in!

central-to-central comparison: only one "important" change. Still a -77% delta, but still, one one that fell out of line.
central-to-D118387 comparison: only two "important" changes, and both are improvements, if anything.

:sparky, do these results and my conclusions seem accurate to you? Can we mark this ticket as invalid?

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Comment 19

•

3 years ago

I would accept these changes and WONTFIX rather than INVALID since they are valid but we shouldn't be concerned about metric changes caused by infrastructure changes.

The differences seem to be caused by changes in the variance of the pageload - the variance decreased so we're seeing improvements in the pageload numbers.

Flags: needinfo?(gmierz2)

Mitchell Hentges [:mhentges] 🦀

Assignee

Comment 20

•

3 years ago

Works for me, thanks :) 👍

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → WONTFIX

Phabricator Automation

Updated

•

3 years ago

Attachment #9228175 - Attachment is obsolete: true

BMO Automation

Updated

•

3 years ago

Has Regression Range: --- → yes