Closed Bug 1213364 Opened 6 years ago Closed 6 years ago

9% Linux 64/Win* ts_paint regression on Fx-Team-Non-PGO on October 08, 2015 from push 5d1ec6ca937447eabc914c463a107142be4d092d

Categories

(Toolkit :: Password Manager, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: wlach, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][e10s])

Talos has detected a Firefox performance regression from your commit 5d1ec6ca937447eabc914c463a107142be4d092d in bug 1193341.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=5d1ec6ca937447eabc914c463a107142be4d092d&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#ts_paint

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64,win64,win32 -u none -t other  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a ts_paint

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Tuesday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Flags: needinfo?(paolo.mozmail)
Perfherder compare view:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=ee116a783005&newProject=fx-team&newRevision=5d1ec6ca9374&e10s=1

You can see the same regression across three platforms in perfherder graphs:

https://treeherder.mozilla.org/perf.html#/graphs?series=[fx-team,3476c73a12c03397de3c2c251219ed6ce54c0615,1]&series=[fx-team,fca65650a72680f9068c4834da886b5573069d1e,1]&series=[fx-team,39edc1e5add718095ac5c876673f736954d54c49,1]

Normally I would do more retriggers to validate this regression, but the results are so consistent across platforms that I think it's unnecessary in this case (though I did do 2 extra on either side for linux64)

Please feel to reach out to either me (I'm :wlach) or jmaher on irc if you need help.
Hi Paolo, 

As you may know, we're trying out a new 24hour backout policy for large Talos regressions (10%+) on Windows. Can you get back to us on this bug as soon as possible?

Thanks

More info on the new policy:

https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling#Policy_.231:_Backouts_within_48-hours_for_large_regressions

https://groups.google.com/forum/#!topic/mozilla.dev.platform/QHdn-ogf8kQ
Adding needinfo for reviewer MattN since the weekend has already started in Europe, although I see that MattN is on PTO :(
Flags: needinfo?(MattN+bmo)
(In reply to William Lachance (:wlach) from comment #4)
> Try run for backout if we don't hear back:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=3295a22c3018

Unfortunately the backout doesn't apply cleanly, probably because stuff that depends on the bug has landed since (e.g. https://treeherder.mozilla.org/logviewer.html#?job_id=12463288&repo=try#L1663).

We really do need this addressed asap though. A 10% performance regression on this benchmark just isn't acceptable.
This also seems to have regressed the session restore test, probably for similar reasons (?). See e.g.:

https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=[fx-team,878b980d0b7f428578bca3666fc80f130d0709bf,1]&highlightedRevisions=578fbd8d8b3c

The regression is much smaller here though (~2.5%)
Please go ahead with the backout - the other dependency you're looking for is bug 1179961.

I'll spend some time to set up a local environment following the references in comment 0 next week. I suspect the solution is to avoid any reaction to the "pageshow" event unless a password field has been previously detected, but it will require a few attempts to get right.
Flags: needinfo?(paolo.mozmail)
Flags: needinfo?(MattN+bmo)
I did another try push with both bugs backed out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd25add99a60
I manually verified that the backed out patch on try would give us a ~10% ts_paint improvement on Linux x64 opt, Windows XP opt, Windows 7 opt and Windows 8 x64 opt

TreeHerder perf-compare didn't work for some reason (see bug 1213635)
Blocks: 1179961
It appears that the cause of the significant regression was loading the additional InsecurePasswordUtils.jsm module in the child process. This probably caused additional I/O to happen.

I started new Talos tests on an updated patch following the suggested try syntax from comment 0:

Base revision: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4fae6ad6b84e
With bug 1193341: https://treeherder.mozilla.org/#/jobs?repo=try&revision=55946d42eb22
With bug 1179961: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab7919d420f7
I did some retriggers and look over the results of the try pushes- I saw no differences, then I realized that this is e10s only- I added some linux64 jobs for e10s and will wait for results.
(In reply to Joel Maher (:jmaher) from comment #12)
> I did some retriggers and look over the results of the try pushes- I saw no
> differences, then I realized that this is e10s only- I added some linux64
> jobs for e10s and will wait for results.

Thanks! Does it mean the suggested try syntax didn't include e10s tests, or just the results weren't available yet?
oh, it was the suggested try syntax- we are behind the times in our tools/templates when it comes to e10s.
looking at the e10s jobs, ts_paint is not showing any change on the 3 try pushes.  This is for linux64.  I see in the original bug we went from ~515 -> ~550, here we stay in the ~520 range.
Thanks for checking the Talos results. I've re-landed the first of the two bugs, let's see if results are confirmed.
I am not seeing improvements:
https://treeherder.allizom.org/perf.html#/compare?originalProject=fx-team&originalRevision=398514bdc83f&newProject=fx-team&newRevision=cfcbb20f317d&e10s=1

with a 10% difference, I would expect this regression to be easy to reproduce- as in the tools/tests should be reliable enough to use for this larger regression.

there is another patch, maybe that will do the trick!
wait, unless this was backed out and then we are relanding- therefore if that is the case we don't have a new regression coming in- so this might be good news.
So, I landed bug 1193341 a few days ago and it seems didn't cause any performance alerts.

I've now landed bug 1179961, which depended no the previous one but was probably not the cause of the regression. I guess that if we don't see any alerts for it too, we could close this bug?
correct, looking forward to closing this bug!
Seems fixed in the latest version of the patches. Thanks for the help!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.