Closed Bug 1250362 Opened 4 years ago Closed 4 years ago

2.08 - 19.14% sessionrestore / sessionrestore_no_auto_restore / ts_paint (linux64, windows7-32, windows8-64) e10s regression on push 135340a254f4 (Mon Feb 22 2016)

Categories

(Firefox :: Session Restore, defect, P2)

46 Branch
defect

Tracking

()

RESOLVED FIXED
Firefox 47
Tracking Status
e10s + ---
firefox45 --- unaffected
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: wlach, Assigned: mak)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(1 file)

Hi Marco, could you investigate this please? It seems like a fairly serious regression.

--

Talos has detected a Firefox performance regression from push 135340a254f4. As author of one of the patches included in that push, we need your help to address this regression.

This is a list of all known regressions and improvements related to the push:

https://treeherder.allizom.org/perf.html#/alerts?id=237

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see:

https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore

Reproducing and debugging the regression:

If you would like to re-run this Talos test on a potential fix, use try with the following syntax:

try: -b o -p linux64,win64 -u none -t other-e10s[Windows 8] --rebuild 5  # add "mozharness: --spsProfile" to generate profile data

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:

https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:

talos --develop -e <path>/firefox -a sessionrestore_no_auto_restore:sessionrestore

(add --e10s to run tests in e10s mode)

Making a decision:

As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Thursday, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:

https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Flags: needinfo?(mak77)
David, any idea about these session restore tests?
it looks "fancy", at a minimum.
Flags: needinfo?(mak77) → needinfo?(dteller)
Flags: needinfo?(mak77)
ochameau, it looks like Bug 1248600 is also in the changeset, and that's one of yours. It seems to be touching startup, too. Could you take a look on your side?

wlach, any chance you could bissect further and find out whether it's Bug 1249786 or Bug 1248600?
Flags: needinfo?(wlachance)
Flags: needinfo?(poirot.alex)
Flags: needinfo?(dteller)
Bug 1248600 has been backed out, so I imagine we will get new numbers with just bug 1249786?

Otherwise, bug 1248600 is moving slightly the very precise moment when the devtools modules are loaded.
It moves from in middle of gBrowserInit.onLoad to browser-delayed-startup-finished notification.
If this perf test happen to watch something during browser-delayed-startup-finished, it may have some impact.
But if that happen to be that, this is just a move. We are going to free some computation time during earlier stages. Don't we also see a win in some other tests?
Flags: needinfo?(poirot.alex)
(In reply to Alexandre Poirot [:ochameau] from comment #3)
> But if that happen to be that, this is just a move. We are going to free
> some computation time during earlier stages. Don't we also see a win in some
> other tests?

Note that this Talos test is somewhat broken (see bug 1250362). I therefore believe that, if the regression is due to mak's patch, we shouldn't block on it, because they are quite important for stability.
I'd still like to know the reason the change is moving the timing on this test, it may be useful to understand more of the sanitize impact on start.
Flags: needinfo?(mak77)
here is a compare view (still waiting on more data):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=1bd2197b4d48&newProject=fx-team&newRevision=135340a254f4&framework=1

two data points on there are false alarms:
* winxp a11y e10s
* win7 svg_opacity e10s
* win7 tresize 

and this seems to be fixed by the backout or false alarm:
* win7 session restore e10s


which means we have:
session restore: win8/linux64
session restore no auto restore: win8/linux64
ts paint: win8/linux64

^ some mix of e10s/pgo/opt

:Yoric, if we feel that sessionrestore test is not valid, lets update it (yes, I see a patch) and delete the old test.  Having trust in our tests is important!
Component: Untriaged → Session Restore
(In reply to Alexandre Poirot [:ochameau] from comment #3)
> Bug 1248600 has been backed out, so I imagine we will get new numbers with
> just bug 1249786?

Yes, and the numbers haven't improved after the backout, so it looks like we've confirmed that the problem is bug 1249786.
Flags: needinfo?(wlachance)
(In reply to William Lachance (:wlach) from comment #8)
> (In reply to Alexandre Poirot [:ochameau] from comment #3)
> > Bug 1248600 has been backed out, so I imagine we will get new numbers with
> > just bug 1249786?
> 
> Yes, and the numbers haven't improved after the backout, so it looks like
> we've confirmed that the problem is bug 1249786.

Sorry, meant to link to a graph (though you can really just pick any in the alert summary above: https://treeherder.allizom.org/perf.html#/alerts?id=237):

https://treeherder.allizom.org/perf.html#/graphs?series=[fx-team,cc4be2ad4c55588a87acfe014c9c55cdde0528f6,1]
Summary: 4.02 - 10.52% sessionrestore / sessionrestore_no_auto_restore (linux64, windows8-64) e10s regression on push 135340a254f4 (Mon Feb 22 2016) → 2.08 - 19.14% sessionrestore / sessionrestore_no_auto_restore / ts_paint (linux64, windows7-32, windows8-64) e10s regression on push 135340a254f4 (Mon Feb 22 2016)
David, does this test sanitize or set a sanitize pref? Do we use a clean profile at every run?
I think I may have figured something, Sanitizer.onStartup is invoked at final-ui-startup and it now does the work that before was done by browser.js::_initializeSanitizer in delayedStartup

Among other things it was calling gPrefService.savePrefFile(null). In bug 1249786 I added back that call in Sanitizer.onStartup, unconditionally. That could not be a good time to save the prefs. We can reset the pref just before starting shutdown sanitization and only when it's set.

I can try to push a patch doing that to try and see if that's it.
The test starts with a clean profile (except for the session itself). But, as you mention, flushing preferences on startup may have a performance hit.
marco, can you take this?
Blocks: e10s-perf
tracking-e10s: --- → +
Flags: needinfo?(mak77)
Priority: -- → P2
I'm not 100% sure that we need to flush. For one thing, bug 789945 will ensure that the flush takes place OMT without having to call `savePrefFile`, hopefully very soon. Even before that, the worst that can happen is having sanitization run once again on next startup, if there is a crash before something else causes a flush. I suspect that we can live with that.

But yes, mak, as you mention, just by restricting the flush to the case when we actually touch the preference, we should grab back our performance loss.
well, I'm going to try my idea and see if the Tryserver agrees with my theory.

(In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from comment #14)
Even before that, the worst that can happen is having
> sanitization run once again on next startup, if there is a crash before
> something else causes a flush. I suspect that we can live with that.

No, it's the opposite, the reason to clear the pref is so that if we are supposed to clear on shutdown, but we crash before shutdown, we clear on startup. So we must be sure the pref is stored before an eventual crash.
I got something wrong in the previous code comments, indeed the cases where sanitizeInProgress is not set may be more common than I expected. Will take care of updating the comments.
Assignee: nobody → mak77
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] from comment #15)
> So we must be sure the pref is stored before an eventual crash.

And here I mean, even before we start a shutdown sanitize.
So the scenario is browser starts, it is supposed to clear on shutdown but it crashes (at any time), shutdown won't ever happen. We want to clear on startup.
Blocks: 1250424
Comment on attachment 8722711 [details]
MozReview Request: Bug 1250362 - don't flush prefs on startup unless it's really needed. r=yoric

https://reviewboard.mozilla.org/r/36183/#review32773

Looks good to me.
Attachment #8722711 - Flags: review?(dteller) → review+
Ah, right.
Well, now let's see if this improves the perf.
Try is taking ages to run 5 talos jobs...
Btw, the first job shows sessionrestore opt e10s -3.15%, sessionrestore_no_auto_restore opt e10s -9.40%, ts_paint -0.9%... that makes me think the fix is solving this.
I'll still wait a little bit for the missing jobs but I think we're fine.
https://hg.mozilla.org/mozilla-central/rev/cab091009e28
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 47
:yoric, can we uplift this to aurora?  the perf regression is on aurora as well
Flags: needinfo?(dteller)
I'm 99% sure that we can, but mak should have the final word.
Flags: needinfo?(dteller) → needinfo?(mak77)
already uplifted along with bug 1249786.
Flags: needinfo?(mak77)
[bugday-20160323]

Status: RESOLVED,FIXED -> UNVERIFIED

Comments:
STR: Not clear.
Developer specific testing

Component: 
Name			Firefox
Version			46.0b9
Build ID		20160322075646
Update Channel          beta
User Agent		Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
OS			Windows 7 SP1 x86_64

Expected Results: 
Developer specific testing

Actual Results: 
As expected
Version: unspecified → 46 Branch
You need to log in before you can comment on or make changes to this bug.