Closed Bug 1385915 Opened 2 years ago Closed 2 years ago

4.04% sessionrestore_no_auto_restore (windows7-32) regression on push 691b88d4102b920415847447b679e2505fb2307a (Sat Jul 29 2017)


(Firefox :: Normandy Client, defect)

Not set



Tracking Status
firefox55 --- unaffected
firefox56 --- wontfix
firefox57 --- wontfix


(Reporter: igoldan, Assigned: rhelmer)



(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

As author of one of the patches included in that push, we need your help to address this regression.


  4%  sessionrestore_no_auto_restore windows7-32 opt e10s     832.33 -> 866.00

You can find links to graphs and comparison views for each of the above tests at:

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see:

For information on reproducing and debugging the regression, either on try or locally, see:

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:
Push broke a lot builds, paralyzing the Talos tests for about 3 hours. When that push got backed out, the regression above was noticed.
The usual procedure for this involves me ni?ing all bug owners which pushed during that time window.

The bugs pushed during the window:
bug 1383338
bug 1379270
bug 1367551
bug 1384792
bug 1384680
bug 1255227
bug 1376306
bug 1385391
bug 1356271
bug 1384695
bug 1172574
bug 1384241
bug 1385137
bug 1385201

For all those I will needinfo, please state if your bug is related to the regression and why/why not. Thanks in advance!
Flags: needinfo?(nalexander)
Flags: needinfo?(mconley)
Flags: needinfo?(lchang)
Flags: needinfo?(hurley)
Flags: needinfo?(cmanchester)
Flags: needinfo?(catlee)
Flags: needinfo?(canaltinova)
Flags: needinfo?(bwinton)
Flags: needinfo?(bgrinstead)
Bug 1172574 and Bug 1384241 were mach changes, so should be unrelated
Flags: needinfo?(nalexander)
Flags: needinfo?(bgrinstead)
Bug 1356271 is unrelated.

Bug 1379270 _might_ be related.
Flags: needinfo?(mconley)
What :mconley said.  (I don't _think_ it's related, but it could be…)
Flags: needinfo?(bwinton)
Bug 1367551 should not be related, it's an http/2 only change, shouldn't touch session restore.
Flags: needinfo?(hurley)
Bug 1385391 is not related, that's an artifact build only change.
Flags: needinfo?(cmanchester)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> Push
> c424ad1c5f9441f43f3f792d853170f3fd8216e9 broke a lot builds, paralyzing the
> Talos tests for about 3 hours. When that push got backed out, the regression
> above was noticed.
> The usual procedure for this involves me ni?ing all bug owners which pushed
> during that time window.
> The bugs pushed during the window:

> bug 1384680
> bug 1255227
> bug 1376306
> bug 1384695

These are all Android specific.

> bug 1384241

This is build-system specific and would not impact Win 32 tests.
Bug 1385137 shouldn't be related, it's just a test removal.
Flags: needinfo?(canaltinova)
bug 1384792 is only for xpcshell tests.
Flags: needinfo?(catlee)
Bug 1385201 shouldn't be related as it only renamed a pref and changed its type from boolean to string.
Flags: needinfo?(lchang)
I pushed a backout patch for bug 1379270 to try at so if someone wants to figure out what the appropriate commit to diff against in Perfherder is, we can see whether backing out that bug will fix the regression or not…
I did a 3rd (exhaustive this time) attempt to bisect this regression. The last attempt only showed a ~1.8% improvement after backing out bug 1367551 and nothing for bug 1379270.

This is the baseline, which is more recent than all the bugs listed and its tests are all passing:
It includes the regression. I then pushed by backing out every bug from the list, except bug 1376306, bug 1255227 and bug 1384680 (backout failed here).

Bug 1385201 backout:

Bug 1385137 backout:

Bug 1384241 backout:

Bug 1172574 backout:

Bug 1384695 backout:

Bug 1356271 backout:

Bug 1385391 backout:

Bug 1384792 backout:

Bug 1367551 backout:

Bug 1379270 backout:

Bug 1383338 backout:
I think I found the origin of the 4% regression. Looks like backing out bug 1383338 brought a 3.63% improvement.
:rhelmer I see you are the owner of this bug. Can you please look for a fix? Or is this something we can accept?
Flags: needinfo?(rhelmer)
Component: Untriaged → Add-on
Product: Firefox → Shield
So originally we thought was a perf win, but turned out it wasn't running on first-run... is this just a reversion of that supposed win?
Flags: needinfo?(rhelmer) → needinfo?(ionut.goldan)
I don't understand what you mean by reversion. If you refer to the backout technique from comment 12, then yes; this should have reverted that win.
Flags: needinfo?(ionut.goldan)
rhelmer, can you suggest next steps here?
Flags: needinfo?(rhelmer)
(In reply to Jim Mathies [:jimm] from comment #16)
> rhelmer, can you suggest next steps here?

Sorry, I had misread the name of the test here when I wrote comment 14.

The shield add-on now waits until after the session restore notification has been observed, so I am surprised it'd cause a regression here, but we could most likely delay it further so it doesn't.

I would expect bug 1385177 to possibly cause this (since shield wasn't running on first-run before then, because it needed to disable the old built-in self-repair feature first), not bug 1383338... however I will take the bug and investigate.
Assignee: nobody → rhelmer
Flags: needinfo?(rhelmer)
Hey Rob, any updates here?
Flags: needinfo?(rhelmer)
Mike/Mike - do you have any spare cycles to take a look at this? My suspicion is that this is showing up because shield-recipe-client formerly exited early on first run (to disable the old built-in self-support feature) that this didn't show up in startup tests before but I haven't had time to sit down and take a look yet.
Flags: needinfo?(rhelmer)
Flags: needinfo?(mkelly)
Flags: needinfo?(mcooper)
I think that this regression is an artifact of the test, and not something that real users will be affected by.

Prior to bug 1383338, the Shield add-on would be abundantly cautious about it's predecessor, Self-Repair. Shield was designed to detect that Self-Repair was enabled, disable it, and then wait until the next time the browser started until it actually acted. As a result, Shield was never active the first time a profile was run.

In bug 1383338, :rhelmer removed this first-run delay, so that Shield could act during the first few minutes of a profile's first start. Because of this, we are now doing more work near the beginning of the first run of a new profile.

However, this same work would have been done for most users on subsequent runs of a given Firefox profile. I believe this regression represents the difference in Shield's impact between first run and later runs. Because of this, I think that we should accept this regression.
Flags: needinfo?(mkelly)
Flags: needinfo?(mcooper)
Harald are we OK with this regression in 56?
Flags: needinfo?(hkirschner)
Per :mythmom's description, if this really just impacts first run and removes overhead from subsequent runs this should improve our overall startup performance. We also need to run experiments right after first launch. Given that, I am OK!
Flags: needinfo?(hkirschner)
Thanks, Mike and Harald.
Closed: 2 years ago
Resolution: --- → WONTFIX
Product: Shield → Firefox
You need to log in before you can comment on or make changes to this bug.