13.62 - 14.72% sessionrestore / sessionrestore_no_auto_restore (osx-10-10) regression on push 51ed3d07846d9a133b901a99c4430feabe33cd8a (Tue May 2 2017)

RESOLVED INVALID

Status

()

defect
RESOLVED INVALID
2 years ago
2 years ago

People

(Reporter: jmaher, Unassigned)

Tracking

({perf, regression, talos-regression})

53 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
Talos has detected a Firefox performance regression from push 51ed3d07846d9a133b901a99c4430feabe33cd8a. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 15%  sessionrestore osx-10-10 opt e10s     894.75 -> 1026.42
 14%  sessionrestore_no_auto_restore osx-10-10 opt e10s930.75 -> 1057.5


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=6328

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
(Reporter)

Comment 1

2 years ago
this regression happened when talos was broken for many commits:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5c15d347dabbd647954d5266606420e15fbf6823&tochange=51ed3d07846d9a133b901a99c4430feabe33cd8a

so this will require some bisecting where we either:
1) update to the original source and backout the offending patch
2) backout patch by patch

this will be a series of 10 try pushes (ignoring 1 servo merge, 1 servo test expectations in bug 1341102, eslint bug 1359540, root cause bug 1352204+backout, and bug 1357116+backout).

This means these bugs are suspect:
Bug 1359655
Bug 1361417
Bug 1360597 
Bug 1360863 
Bug 1352305 
Bug 1360299
Bug 1348278 
Bug 1354079
Bug 1361072


if patch authors of the above bugs could help determine if they caused what appears to be osx only startup regression that would help.

Ionut, can you do these try pushes with this syntax: |mach try -b o -p macosx64 -u none -t other-e10s --rebuild 5|
Flags: needinfo?(ionut.goldan)
jmaher: you should ?needinfo people if you need info from them, as many people get too much bugmail to read all of it. You got lucky that I noticed. ;)

Bug 1360863 (re-)adds a string, so I don't see how that could cause a talos regression.
Joel, you listed bug 1361417 as a potential cause, this is a bugzilla.mozilla.org administrative change to a component name. Are there hard coded dependencies in Talos on component name? If so, this could become a problem because we're trying to clean up a lot of components, naming schemes, and hierarchies this quarter.
(Reporter)

Comment 4

2 years ago
I listed all bugs that showed up in the range where we had no data due to failed tests- that looks like an obvious candidate to skip.

Thanks for the info!
No longer blocks: 1361417

Comment 5

2 years ago
Bug 1348278 adds some behavior to mousedown to speed up pageload latency. The chances of it affecting session restore are somewhere between slim and none.
Bug 1352305 changes behavior of sizing dialog windows when 'privacy.resistFingerprinting' is true. There is no chance for causing this problem since 'privacy.resistFingerprinting' is off by default.
Bug 1361072 is a simple cleanup of existing logic of AccessibleCaret, and is enabled only on platforms supporting touch events. It's unlikely to be the cause of this regression.
No longer blocks: 1361072
(In reply to Joel Maher ( :jmaher) from comment #1)
> this regression happened when talos was broken for many commits:
> https://hg.mozilla.org/integration/autoland/
> pushloghtml?fromchange=5c15d347dabbd647954d5266606420e15fbf6823&tochange=51ed
> 3d07846d9a133b901a99c4430feabe33cd8a

How confident are you that this range is correct?

It looks like the regression is gone again. I think it likely was fallout from bug 1054740, which has been backed out for other reasons, although it's weird that it would affect sessionrestore_no_auto_restore, since that bug and its patch are about actually restoring a session.
(In reply to Joel Maher ( :jmaher) from comment #1)

> Ionut, can you do these try pushes with this syntax: |mach try -b o -p
> macosx64 -u none -t other-e10s --rebuild 5|

Yes, will do that.
Flags: needinfo?(ionut.goldan)
I do not believe Bug 1360299 is the cause; it applies to a non-default compilation flag.
(Reporter)

Comment 12

2 years ago
:mkelly, can you comment on the possibility of the code from bug 1359655 causing a session restore regression on osx?
Flags: needinfo?(mkelly)
(Reporter)

Comment 13

2 years ago
:dao, can you comment on the possibility of the code from bug 1360597 causing a session restore regression on osx?
Flags: needinfo?(dao+bmo)
(Reporter)

Comment 14

2 years ago
:gijs, can you comment on the possibility of the code from bug 1354079 causing a session restore regression on osx?
Flags: needinfo?(gijskruitbosch+bugs)

Comment 15

2 years ago
(In reply to Joel Maher ( :jmaher) from comment #14)
> :gijs, can you comment on the possibility of the code from bug 1354079
> causing a session restore regression on osx?

Anything is possible, but it seems unlikely. Ditto for the other changes here.

Dão is right though, the regression seems fixed based on the last points on the graph you linked.
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(dao+bmo)
(Reporter)

Comment 16

2 years ago
while it is fixed, the fix is clearly bug 1353571, and not related to anything in this list.  So there is a lingering regression to figure out... :)  I am still not sure about bug 1054740 which Dao mentioned, in that case it landed and was backed out in a couple hours difference, this regression was May 2 and fixed May 4 (really 32 hours difference)

Comment 17

2 years ago
(In reply to Joel Maher ( :jmaher) from comment #16)
> while it is fixed, the fix is clearly bug 1353571, and not related to
> anything in this list.  So there is a lingering regression to figure out...
> :)  I am still not sure about bug 1054740 which Dao mentioned, in that case
> it landed and was backed out in a couple hours difference, this regression
> was May 2 and fixed May 4 (really 32 hours difference)

Then I don't know. If we're down to those 3 bugs, can someone do trypushes backing each of them out and seeing which reverts the perf regression, or something? Did the trypushes from comment #1 already happen?

I'd rather we measure and know, than just guess which patches are/aren't responsible here.
(In reply to Joel Maher ( :jmaher) from comment #16)
> I am still not sure about bug 1054740 which Dao mentioned, in that case
> it landed and was backed out in a couple hours difference, this regression
> was May 2 and fixed May 4 (really 32 hours difference)

Hmm, okay. In that case I'm going to attempt to re-land this.

As for bug 1360597, that code may be in the startup path, but nothing expensive is (or at least should be) happening there.
(In reply to Joel Maher ( :jmaher) from comment #12)
> :mkelly, can you comment on the possibility of the code from bug 1359655
> causing a session restore regression on osx?

I'll echo comment 18: The SHIELD sync adds code to the startup path, although our startup pretty quickly jumps to some async code, which I think(?) makes add-on startup move on, and our startup isn't terribly expensive anyway.

To see if it affects anything, I pushed a backout of the SHIELD changes to try with the command from comment 1: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd9a18ca7ad0ef9d90ac8dafdc881908dce5d038
Flags: needinfo?(mkelly)
With jmaher's help, I did 6 local Talos runs: 3 with mozilla-central as it is, and 3 with the Normandy sync backed out: https://gist.github.com/Osmose/e5ee0a0d55e2622494eab7d7ffc5764a

> <jmaher> I only question without_normandy1.json, all the values seem higher
> <jmaher> but it could be random stuff
> <Osmose> That was my first run after a rebuild
> <jmaher> ok
> <jmaher> so I see 1238 vs 1268, really <3% difference
Component: Untriaged → Session Restore
(Reporter)

Comment 21

2 years ago
Ionut did a lot of try runs:
https://treeherder.mozilla.org/#/jobs?repo=try&author=ionut.goldan@softvision.ro&filter-searchStr=talos%20other&selectedJob=96697223&group_state=expanded

none of them showed the culprit, so this is not really solvable.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.