Closed Bug 1385915 Opened 7 years ago Closed 7 years ago

4.04% sessionrestore_no_auto_restore (windows7-32) regression on push 691b88d4102b920415847447b679e2505fb2307a (Sat Jul 29 2017)

Categories

(Firefox :: Normandy Client, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox55 --- unaffected
firefox56 --- wontfix
firefox57 --- wontfix

People

(Reporter: igoldan, Assigned: rhelmer)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=691b88d4102b920415847447b679e2505fb2307a

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  4%  sessionrestore_no_auto_restore windows7-32 opt e10s     832.33 -> 866.00


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=8424

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Push https://hg.mozilla.org/integration/autoland/rev/c424ad1c5f9441f43f3f792d853170f3fd8216e9 broke a lot builds, paralyzing the Talos tests for about 3 hours. When that push got backed out, the regression above was noticed.
The usual procedure for this involves me ni?ing all bug owners which pushed during that time window.

The bugs pushed during the window:
bug 1383338
bug 1379270
bug 1367551
bug 1384792
bug 1384680
bug 1255227
bug 1376306
bug 1385391
bug 1356271
bug 1384695
bug 1172574
bug 1384241
bug 1385137
bug 1385201

For all those I will needinfo, please state if your bug is related to the regression and why/why not. Thanks in advance!
Flags: needinfo?(nalexander)
Flags: needinfo?(mconley)
Flags: needinfo?(lchang)
Flags: needinfo?(hurley)
Flags: needinfo?(cmanchester)
Flags: needinfo?(catlee)
Flags: needinfo?(canaltinova)
Flags: needinfo?(bwinton)
Flags: needinfo?(bgrinstead)
Bug 1172574 and Bug 1384241 were mach changes, so should be unrelated
Flags: needinfo?(nalexander)
Flags: needinfo?(bgrinstead)
Bug 1356271 is unrelated.

Bug 1379270 _might_ be related.
Flags: needinfo?(mconley)
What :mconley said.  (I don't _think_ it's related, but it could be…)
Flags: needinfo?(bwinton)
Bug 1367551 should not be related, it's an http/2 only change, shouldn't touch session restore.
Flags: needinfo?(hurley)
Bug 1385391 is not related, that's an artifact build only change.
Flags: needinfo?(cmanchester)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> Push
> https://hg.mozilla.org/integration/autoland/rev/
> c424ad1c5f9441f43f3f792d853170f3fd8216e9 broke a lot builds, paralyzing the
> Talos tests for about 3 hours. When that push got backed out, the regression
> above was noticed.
> The usual procedure for this involves me ni?ing all bug owners which pushed
> during that time window.
> 
> The bugs pushed during the window:

> bug 1384680
> bug 1255227
> bug 1376306
> bug 1384695

These are all Android specific.

> bug 1384241

This is build-system specific and would not impact Win 32 tests.
Bug 1385137 shouldn't be related, it's just a test removal.
Flags: needinfo?(canaltinova)
bug 1384792 is only for xpcshell tests.
Flags: needinfo?(catlee)
Bug 1385201 shouldn't be related as it only renamed a pref and changed its type from boolean to string.
Flags: needinfo?(lchang)
I pushed a backout patch for bug 1379270 to try at https://treeherder.mozilla.org/#/jobs?repo=try&revision=4cb1e553bce2f0bab1a84a09f7824219aaaf5132 so if someone wants to figure out what the appropriate commit to diff against in Perfherder is, we can see whether backing out that bug will fix the regression or not…
I did a 3rd (exhaustive this time) attempt to bisect this regression. The last attempt only showed a ~1.8% improvement after backing out bug 1367551 and nothing for bug 1379270.

This is the baseline, which is more recent than all the bugs listed and its tests are all passing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c2bd47d32ef79afb91c9d211d751ae21c37526a
It includes the regression. I then pushed by backing out every bug from the list, except bug 1376306, bug 1255227 and bug 1384680 (backout failed here).

Bug 1385201 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e42ee577ad2bb234df0962c479296bb364af3fb1

Bug 1385137 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d732bcf2c7b9408296512324b1c70e53b6f2b0c1

Bug 1384241 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d45336323175901a99b8363b991b00757f9ef2c

Bug 1172574 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=052176433f173d0b7ccccc5eceae34adc5124d2d

Bug 1384695 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e117e06224bf43f794d9761ff16219ef3f5f9a8e

Bug 1356271 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=87edabae5e6ea039ae6ffa0837de9e09cd870a4a

Bug 1385391 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d861c03b877d252fd59a6dd27ead06145bfc5d7

Bug 1384792 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=466c411e6d5c48b817f5977ff45e6b6bdd04e0ec

Bug 1367551 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=20425fa192218c490ab774f6ca0ad1bf165d8495

Bug 1379270 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f5d6421fcb1c93867734ab23f5bd944aa789a29

Bug 1383338 backout:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aafc2b8333a5519cebb460445d9b71ae610c9ffc
I think I found the origin of the 4% regression. Looks like backing out bug 1383338 brought a 3.63% improvement.
:rhelmer I see you are the owner of this bug. Can you please look for a fix? Or is this something we can accept?
Flags: needinfo?(rhelmer)
Component: Untriaged → Add-on
Product: Firefox → Shield
Blocks: 1383338
So originally we thought https://bugzilla.mozilla.org/show_bug.cgi?id=1383338#c13 was a perf win, but turned out it wasn't running on first-run... is this just a reversion of that supposed win?
Flags: needinfo?(rhelmer) → needinfo?(ionut.goldan)
I don't understand what you mean by reversion. If you refer to the backout technique from comment 12, then yes; this should have reverted that win.
Flags: needinfo?(ionut.goldan)
rhelmer, can you suggest next steps here?
Flags: needinfo?(rhelmer)
(In reply to Jim Mathies [:jimm] from comment #16)
> rhelmer, can you suggest next steps here?

Sorry, I had misread the name of the test here when I wrote comment 14.

The shield add-on now waits until after the session restore notification has been observed, so I am surprised it'd cause a regression here, but we could most likely delay it further so it doesn't.

I would expect bug 1385177 to possibly cause this (since shield wasn't running on first-run before then, because it needed to disable the old built-in self-repair feature first), not bug 1383338... however I will take the bug and investigate.
Assignee: nobody → rhelmer
Flags: needinfo?(rhelmer)
Hey Rob, any updates here?
Flags: needinfo?(rhelmer)
Mike/Mike - do you have any spare cycles to take a look at this? My suspicion is that this is showing up because shield-recipe-client formerly exited early on first run (to disable the old built-in self-support feature) that this didn't show up in startup tests before but I haven't had time to sit down and take a look yet.
Flags: needinfo?(rhelmer)
Flags: needinfo?(mkelly)
Flags: needinfo?(mcooper)
I think that this regression is an artifact of the test, and not something that real users will be affected by.

Prior to bug 1383338, the Shield add-on would be abundantly cautious about it's predecessor, Self-Repair. Shield was designed to detect that Self-Repair was enabled, disable it, and then wait until the next time the browser started until it actually acted. As a result, Shield was never active the first time a profile was run.

In bug 1383338, :rhelmer removed this first-run delay, so that Shield could act during the first few minutes of a profile's first start. Because of this, we are now doing more work near the beginning of the first run of a new profile.

However, this same work would have been done for most users on subsequent runs of a given Firefox profile. I believe this regression represents the difference in Shield's impact between first run and later runs. Because of this, I think that we should accept this regression.
Flags: needinfo?(mkelly)
Flags: needinfo?(mcooper)
Harald are we OK with this regression in 56?
Flags: needinfo?(hkirschner)
Per :mythmom's description, if this really just impacts first run and removes overhead from subsequent runs this should improve our overall startup performance. We also need to run experiments right after first launch. Given that, I am OK!
Flags: needinfo?(hkirschner)
Thanks, Mike and Harald.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Product: Shield → Firefox
You need to log in before you can comment on or make changes to this bug.