Closed Bug 1194343 Opened 10 years ago Closed 10 years ago

2.84% regression on Win8 Ts, Paint e10s (v.43) on August 12, 2015 from push 9300342ec9f0

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(e10s-)

RESOLVED FIXED
Tracking Status
e10s - ---

People

(Reporter: vaibhav1994, Unassigned)

References

Details

There is a regression from push 9300342ec9f0 on fx-team for ts_paint e10s. I have done a bunch of retriggers in https://treeherder.mozilla.org/#/jobs?repo=fx-team&fromchange=076372b0b680&tochange=410f6c78d226&filter-searchStr=Windows%208%2064-bit%20fx-team%20talos%20other The value goes from 392-396 -> 404-407 This regression is then picked up by mozilla-inbound too. here is the graph: http://graphs.mozilla.org/graph.html#tests=[[83,132,49],[83,131,49]]&sel=1439117272999.033,1439440983971.3792,367.6715176715177,424.74012474012477&displayrange=30&datatype=geo Now there are 3 revisions in this range - 9300342ec9f0, a922f1e1150b and f1c8c8ddd899, so we need to find out the root cause of this.
Needinfo'ing Cykesiopka, Tracy and Yury who are the respective patch authors. It looks like 9300342ec9f0 and a922f1e1150b change some part of tests, so they might not be the root cause. Cykesiopka and Tracy can you verify this?
Flags: needinfo?(ydelendik)
Flags: needinfo?(twalker)
Flags: needinfo?(cykesiopka.bmo)
Blocks: 1194348
No longer blocks: 1165349
Summary: 2.84% regression on Win8 Ts, Paint e10s (v.41) on August 12, 2015 from push 9300342ec9f0 → 2.84% regression on Win8 Ts, Paint e10s (v.43) on August 12, 2015 from push 9300342ec9f0
(In reply to Vaibhav (:vaibhav1994) from comment #1) > Needinfo'ing Cykesiopka, Tracy and Yury who are the respective patch authors. > > It looks like 9300342ec9f0 and a922f1e1150b change some part of tests, so > they might not be the root cause. Cykesiopka and Tracy can you verify this? Most of the code in f1c8c8ddd899 is not enabled for default content and it is not executed during testing, except for code in ShumwayBootstrapUtils.jsm, which is running one time during Gecko process startup. So I don't see how f1c8c8ddd899 could cause the regression.
Flags: needinfo?(ydelendik)
Startup time is exactly what ts_paint measures. So it actually makes some sense.
(In reply to Vaibhav (:vaibhav1994) from comment #1) > Needinfo'ing Cykesiopka, Tracy and Yury who are the respective patch authors. > > It looks like 9300342ec9f0 and a922f1e1150b change some part of tests, so > they might not be the root cause. Cykesiopka and Tracy can you verify this? Yes, my patch was to enable some e10s browser chrome tests only.
Flags: needinfo?(twalker)
(In reply to Vaibhav (:vaibhav1994) from comment #1) > Needinfo'ing Cykesiopka, Tracy and Yury who are the respective patch authors. > > It looks like 9300342ec9f0 and a922f1e1150b change some part of tests, so > they might not be the root cause. Cykesiopka and Tracy can you verify this? My patch adds a "Learn More" link to an uncommon web console warning, so it doesn't seem likely that the patch could've caused a paint/start up time regression.
Flags: needinfo?(cykesiopka.bmo)
To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a ts_paint Looks like f1c8c8ddd899 is the cause, since :yury says that his patch affects Gecko startup, which is exactly what ts_paint measures. :yury, can you take out some time to investigate this more?
Flags: needinfo?(ydelendik)
(In reply to Vaibhav (:vaibhav1994) from comment #6) > Looks like f1c8c8ddd899 is the cause, since :yury says that his patch > affects Gecko startup, which is exactly what ts_paint measures. :yury, can > you take out some time to investigate this more? Does it affect other platforms? I have Mac OSX 10.10.4 only available atm. The patch is not a OS specific. Is Windows 7 affected as well?
Flags: needinfo?(vaibhavmagarwal)
I checked the graphs, the regression is only seen on Windows. Yes, Windows 7 is affected as well, as seen in http://graphs.mozilla.org/graph.html#tests=[[83,131,47],[83,132,47]]&sel=1439008573887.5957,1439484233462.0637,410.3950103950104,523.9085239085239&displayrange=30&datatype=geo WINNT 6.1 (ix) (e10s) == Win 7 32-bit e10s
Flags: needinfo?(vaibhavmagarwal)
I run `talos --develop -e <path>/firefox -a ts_paint --e10s` three times on Windows 7 (32 bit). At 4d75e29fff38 mean values for three runs were (485, 444, 470), and at 9300342ec9f0 mean values for different three runs were (483, 461, 458). How can I tell if it's a regression using these values?
Flags: needinfo?(ydelendik) → needinfo?(vaibhavmagarwal)
Passing the needinfo to :jmaher, since he knows more about running e10s tests locally.
Flags: needinfo?(vaibhavmagarwal) → needinfo?(jmaher)
this doesn't look like a culprit- let me do some try pushes and see if ican figure out which is causing problems for us.
Flags: needinfo?(jmaher)
pushing to try: hg update f1c8c8ddd899: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9b5189c7a174 hg update 4d75e29fff38: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea2c030849a1 you can see that when f1c8c8ddd899 lands we see a noticeable bump in ts_paint numbers: https://treeherder.mozilla.org/perf.html?newProject=try&newRevision=9b5189c7a174#/compare?originalProject=try&originalRevision=ea2c030849a1&newProject=try&newRevision=9b5189c7a174 I guess this is seen on the specific hardware on try vs locally- to me that lessens the severity of this- it still means we should try to figure out what really happened.
Thanks for the comment Joel. :yury, you might have to test it out on try with your changes or request a loan machine.
(In reply to Vaibhav (:vaibhav1994) from comment #13) > Thanks for the comment Joel. :yury, you might have to test it out on try > with your changes or request a loan machine. I don't understand "request a loan machine" in relation to "I guess this is seen on the specific hardware on try vs locally". Do I need to request my own or direct access to try server with Windows 7 32bit installed? (I have Windows 7 32bit machine, see comment 9) Can somebody reproduce that on specific physical configuration? Also, provide STR and what shall I look for to reproduce the issue.
here are the specs of the machines: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation I am not sure why this is not easy to reproduce locally, as I said if you cannot do that then it does reduce the severity of the regression. We clearly see the regression between a try push on the previous revision and the single shumway code landed in this revision: https://hg.mozilla.org/integration/fx-team/rev/f1c8c8ddd899 maybe we should look at the code? this is all in browser/extensions/. I am not convinced that any code is really executing upon startup, but I could be wrong. Maybe we load the extensions into the browser and don't initialize them fully? Maybe it is loaded and initialized now on accident? I am just trying to think of what might be the cause if it is something simple to fix.
Quick explanation of the changes made in f1c8c8ddd899: - in isJSPluginSupported, we made refactoring to make a check at https://hg.mozilla.org/integration/fx-team/diff/f1c8c8ddd899/browser/extensions/shumway/content/ShumwayBootstrapUtils.jsm#l1.103 if API exists and a boolean pref, and we moved all logic into else branch (including global imports and variables) - for main process (non-e10s) ShumwayBootstrapUtils.js are called from http://mxr.mozilla.org/mozilla-central/source/browser/extensions/shumway/chrome/bootstrap-content.js#39 (then http://mxr.mozilla.org/mozilla-central/source/browser/extensions/shumway/content/ShumwayUtils.jsm#102), for child process (e10s) it is called from http://mxr.mozilla.org/mozilla-central/source/browser/extensions/shumway/chrome/bootstrap-content.js#39 Old and new code in ShumwayBootstrapUtils are differ only in one getBoolPref and order of import. Am I missing something?
do we need to set shumway.jsplugins=False in talos? Otherwise it seems like something small is causing this. I am not sure what to do, we might have to pass on this. Maybe :avih can take a brief look.
Flags: needinfo?(avihpit)
(In reply to Joel Maher (:jmaher) from comment #17) > do we need to set shumway.jsplugins=False in talos? I think not. The outcome of this bug should be to fix the regression or to explain why it's an acceptable one. Making the regression invisible to talos doesn't help anyone, and will prevent us from detecting further regressions when shumway is enabled.
Flags: needinfo?(avihpit)
This week I'm planning to land bug 1192450, which will temporary disable Shumway for Nightly users (Shumway code is not available for release, beta or aurora). I think we can close this bug by landing the above.
See Also: → 1192450
tracking-e10s: --- → -
Vaibhav, can you verify if values for ts_paint (e10s?) went down after landing of bug 1192450. If yes, we can close this bug (the Shumway shall be disabled). Once Shumway will be enabled again, we will track it by some other bug.
Flags: needinfo?(vaibhavmagarwal)
Yes, the values have gone down for ts_paint (e10s) after landing of bug 1192450. For people interested, here is the graph: http://graphs.mozilla.org/graph.html#tests=[[83,132,49],[83,131,49]]&sel=1439101803198,1440099435198,355.01066098081026,442.32409381663115&displayrange=30&datatype=geo Also, we can see in Treeherder: https://treeherder.mozilla.org/#/jobs?repo=fx-team&fromchange=c6ebcbe6189e&tochange=4e04908c15a3&filter-searchStr=windows%208%2064-bit%20fx-team%20talos%20other that the values went from 409-413 -> 398-400 after the patch 89776b90cb3b landed. :yury thanks for fixing it!
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(vaibhavmagarwal)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.