1256667 - 2.18 - 5.4% sessionrestore_no_auto_restore / tart / ts_paint (windows7-32, windows8-64, windowsxp) regression on push b5621f52feab (Mon Mar 7 2016)

Reporter

Description

•

9 years ago

Talos has detected a Firefox performance regression from push b5621f52feab. As author of one of the patches included in that push, we need your help to address this regression. This is a list of all known regressions and improvements related to the push: https://treeherder.mozilla.org/perf.html#/alerts?id=299 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#ts_paint https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p win32,win64 -u none -t other-e10s[Windows 7,Windows 8,Windows XP],svgr-e10s[Windows 7,Windows 8,Windows XP] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data (we suggest --rebuild 5 to be more confident in the results) To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e [path]/firefox -a ts_paint:tart:sessionrestore_no_auto_restore (add --e10s to run tests in e10s mode) Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Friday, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

9 years ago

:myk, I love code removal! I am surprised that we ended up with some regressions. the ts_paint ones caught me at first, then I found session_restore, and then tart. What is nice is that our startup file io bytes have been reduced! here is a breakdown of the tests, you can see the subtests from here: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=31655275a522&newProject=mozilla-inbound&newRevision=b5621f52feab&framework=1 do you have any thoughts on this? this seems to affect e10s, and not the non-e10s as much.

Flags: needinfo?(myk)

Kohei Yoshino

Updated

•

9 years ago

Component: Untriaged → Session Restore

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 2

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #1) > :myk, I love code removal! I am surprised that we ended up with some > regressions. the ts_paint ones caught me at first, then I found > session_restore, and then tart. What is nice is that our startup file io > bytes have been reduced! > > here is a breakdown of the tests, you can see the subtests from here: > https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla- > inbound&originalRevision=31655275a522&newProject=mozilla- > inbound&newRevision=b5621f52feab&framework=1 > > do you have any thoughts on this? this seems to affect e10s, and not the > non-e10s as much. Hmm, it's puzzling, as I wouldn't have expected that patch to produce any perf regressions, much less those specific ones! I'll investigate further, and I've cc:ed two colleagues who may also be able to help.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 3

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #0) > If you would like to re-run this Talos test on a potential fix, use try with > the following syntax: > > try: -b o -p win32,win64 -u none -t other-e10s[Windows 7,Windows 8,Windows > XP],svgr-e10s[Windows 7,Windows 8,Windows XP] --rebuild 5 # add > "mozharness: --spsProfile" to generate profile data I re-read through the patch, and I didn't see anything that could account for the regressions. In the meantime, I also scheduled a tryserver run, using your suggested syntax, on the tip of central (as of a couple days ago, the last time I pulled it): https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cf657accbe8 And then I scheduled one on a branch off the tip that reverts the removal of the desktop web runtime: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b4a790dc4ff Only a single run for each test suite has completed so far, but the comparison between the two revisions shows no significant difference on any of the tests: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=0b4a790dc4ff&framework=1&showOnlyImportant=0 If the rest of the test runs confirm those results, then the desktop web runtime removal is unlikely to have caused this regression. Unfortunately, I don't have a good alternative explanation! Perhaps the runtime's removal did cause these regressions, but one of the half-dozen-or-so followup patches we've landed since then has clawed back that performance? I also wonder if there's any chance of an infrastructure issue here. I don't know talos/treeherder/perfherder well, and the perfherder comparison in question clearly compares the runtime removal revision to its parent revision, which should mean that no other change can be involved. But its grandparent revision sure looks like the kind of change that could affect paint perf, upon superficial examination, anyway: https://hg.mozilla.org/integration/mozilla-inbound/rev/65f225f7d0db Is there any chance that the comparison might have entrained that change? (FWIW, I'm also building and testing the original revision and its parent on my Windows 7 x64 VM to see if I can reproduce the regression there, although I'm not sure how reliable its results can be.)

Assignee: nobody → myk

Status: NEW → ASSIGNED

Flags: needinfo?(myk)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

9 years ago

Myk, thanks for jumping on this and getting try pushes. Now that we have data, I see the backout shows a clear win on ts_paint for win7/win8 on e10s. There are concerns that this could be infrastructure or random noise- thanks for thinking of that. To work around that I retrigger the jobs on the original push a few times as well as the push before. This gives me a full set of talos data with plenty of data points related to just the change in question. What I don't see is the larger sessionrestore or tart improvements, I do see what might be related improvements on the backout, but it is either not enough data, or other things have landed in the last week which would have improved whatever influenced the change. As to why this removal of code is causing ts_paint issues, I am not clear.

Marco Castelluccio [:marco]

Comment 5

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #4) > As to why this removal of code is causing ts_paint issues, I am not clear. IIRC there were some concerns that ts_paint wasn't correctly measured on e10s, was that fixed/dismissed?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

9 years ago

:avih, could you comment on e10s + ts_paint?

Flags: needinfo?(avihpit)

Avi Halachmi (:avih)

Comment 7

•

9 years ago

(In reply to Marco Castelluccio [:marco] from comment #5) > (In reply to Joel Maher (:jmaher) from comment #4) > > As to why this removal of code is causing ts_paint issues, I am not clear. > > IIRC there were some concerns that ts_paint wasn't correctly measured on > e10s, was that fixed/dismissed? I'm not really familiar with ts_paint, but according to bug 1250620 comment 11, I don't think there are such concerns, unless you know otherwise.

Flags: needinfo?(avihpit)

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 8

•

9 years ago

(In reply to Avi Halachmi (:avih) from comment #7) > I'm not really familiar with ts_paint, but according to bug 1250620 comment > 11, I don't think there are such concerns, unless you know otherwise. Indeed. I just read through bug 1174767 and bug 1254628, and there was a question about what ts_paint should be measuring (time to paint just browser chrome or time to paint both browser chrome and the initial web page). Currently, it measures the latter, but some folks think it should measure the former (or that there should be measures of both). And that might affect the meaning of ts_paint in the future, but it shouldn't affect a comparison between ts_paint results that are both using e10s today. There was also a concern about whether the use of MozAfterPaint in ts_paint (and other Talos tests) would correctly measure the time to paint the initial web page with e10s. And jimm confirmed that it does in bug 1254628, comment 2. So ts_paint should function as expected with e10s.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 9

•

9 years ago

Although I removed browser chrome and didn't add any new chrome, I theorized that the removal might be able to trigger a chrome paint regression somehow. So I reverted those browser chrome removals: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d5fd199c2a5 So far only Windows 8 has run tests six times, but it throws cold water on the theory, showing no change in ts_paint: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=9d5fd199c2a5&framework=1&showOnlyImportant=0 Another theory is that removing WebappManager triggered the regression. nsBrowserGlue.js, which is evaluated on app-startup, lazily imports WebappManager.jsm on final-ui-startup in order to call WebappManager.init(). WebappManager.jsm then eagerly imports a variety of other modules. I'm not sure when Talos starts timing ts_paint, but if it somehow manages to happen during/after final-ui-startup without e10s and before final-ui-startup with e10s (because it happens in the child process with e10s), then that could account for this. So I reverted WebappManager removal in this tryserver run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f4e7db088e59 I'm still waiting on the results, which will show up here: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=f4e7db088e59&framework=1&showOnlyImportant=0

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 10

•

9 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #9) > I'm still waiting on the results, which will show up here: > > https://treeherder.mozilla.org/perf.html#/ > compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newR > evision=f4e7db088e59&framework=1&showOnlyImportant=0 It looks like reintroducing WebappManager.jsm does claw back the perf hit, although it still isn't clear why. It could be that it eagerly imports a module that is otherwise imported at a time when it affects ts_paint more. Here's a try run on a branch that makes BrowserGlue._finalUIStartup() import the same modules that WebappManager.jsm used to import (modulo the ones that nsBrowserGlue.js has already imported): https://treeherder.mozilla.org/#/jobs?repo=try&revision=15a51c0d5921 https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=15a51c0d5921&framework=1&showOnlyImportant=0 If that doesn't claw back the regression, then I'll look into WebappManager.init() next.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 11

•

9 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #10) > Here's a try run on a branch that makes BrowserGlue._finalUIStartup() import > the same modules that WebappManager.jsm used to import (modulo the ones that > nsBrowserGlue.js has already imported): > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=15a51c0d5921 > https://treeherder.mozilla.org/perf.html#/ > compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newR > evision=15a51c0d5921&framework=1&showOnlyImportant=0 The results confirm that importing those modules claws back the regression. Now to figure out which one(s) in particular are responsible: AppsUtils: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7c40e45bacdc https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=7c40e45bacdc&framework=1&showOnlyImportant=0 PrivateBrowsingUtils: https://treeherder.mozilla.org/#/jobs?repo=try&revision=037a9ba8c776 https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=037a9ba8c776&framework=1&showOnlyImportant=0 Promise: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff19a56665de https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=ff19a56665de&framework=1&showOnlyImportant=0 Task: https://treeherder.mozilla.org/#/jobs?repo=try&revision=16777f4ac80e https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=16777f4ac80e&framework=1&showOnlyImportant=0 Webapps: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f41524351204 https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=f41524351204&framework=1&showOnlyImportant=0

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 12

•

9 years ago

Attached patch evaluate Webapps.jsm early — Details — Splinter Review

(In reply to Myk Melez [:myk] [@mykmelez] from comment #11) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=f41524351204&filter-tier=1&filter-tier=2&filter-tier=3 > https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9cf657accbe8&newProject=try&newRevision=f41524351204&framework=1 The various try runs all came back inconclusive except for this one, which appears to account for all (or at least the significant portion) of the regression. So here's the patch to evaluate Webapps.jsm early in order to claw it back. I'd still like to understand why it occurs. And I wonder if it's possible to do more than claw back the regression and actually improve ts_paint by avoiding the evaluation of Webapps.jsm entirely. But that's a separable concern from the regression itself, so I'm requesting review on this patch to get us back to where we were before, after which I'll investigate Webapps.jsm evaluation further in a followup.

Attachment #8731889 - Flags: review?(dtownsend)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

9 years ago

Thanks Myk for looking into this and finding an area to focus on!

Dave Townsend [:mossop]

Comment 14

•

9 years ago

Comment on attachment 8731889 [details] [diff] [review] evaluate Webapps.jsm early Review of attachment 8731889 [details] [diff] [review]: ----------------------------------------------------------------- Confusing :s

Attachment #8731889 - Flags: review?(dtownsend) → review+

Myk Melez [:myk] [@mykmelez]

Assignee

Updated

•

9 years ago

Blocks: 1257898

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 15

•

9 years ago

(In reply to Dave Townsend [:mossop] from comment #14) > Confusing :s Indeed! I filed followup bug 1257898 to figure out and resolve the underlying issue.

Pulsebot

Comment 16

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e4efc904501f

Ryan VanderMeulen [:RyanVM]

Comment 17

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e4efc904501f

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox48: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 48

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 18

•

9 years ago

as a note, we see the improvements we expected: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=77caf1897d7a&newProject=mozilla-inbound&newRevision=e4efc904501f&framework=1 and a few regressions- but these effectively back out the improvements seen when landing the original patch. So I think there are no concerns here.

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Version: unspecified → 48 Branch