Regression in Win*/Linux* sessionrestore_no_auto_restore/ts_paint on Fx-Team-Non-PGO (v.40) on May 7th, 2015 from 584c7f8c6e5b

RESOLVED FIXED

Status

Testing
Talos
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: vaibhav1994, Assigned: jaws)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fixed per investigation in comment #35])

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
Talos has detected regressions in:
http://alertmanager.allizom.org:8080/alerts.html?rev=584c7f8c6e5b&showAll=1&testIndex=0&platIndex=0

This is a slightly unusual bug, as even though there was a original big regression by the revision, it got corrected by a later patch, though not fully.

For example, we can see in the graph: http://graphs.mozilla.org/graph.html#tests=[[83,132,25],[83,131,25]]&sel=1430724679805,1431329479805&displayrange=7&datatype=geo

the values goes from baseline ~1000, to >1200 by push 584c7f8c6e5b, and reduces to 1030-1040 after push 663616cbe447 (4% regression in the end).

Joel, what are your thoughts on this?
Flags: needinfo?(jmaher)
(Reporter)

Comment 1

3 years ago
Mozilla-Inbound and Mozilla-Inbound-Non-PGO regressions: http://alertmanager.allizom.org:8080/alerts.html?rev=ee863cf58138&showAll=1&testIndex=0&platIndex=0
oh, this is interesting, we have the regression (pocket) related, then we have a random fix improving it.  I did some retriggers in case there was some other reason:
https://treeherder.mozilla.org/#/jobs?repo=fx-team&startdate=2015-05-08&enddate=2015-05-08&filter-searchStr=talos%20other%20windows%207%20opt

looking at win7 ts_paint, we can see the drop from 1200 -> 1050 on this set of changes:
https://hg.mozilla.org/integration/fx-team/pushloghtml?changeset=663616cbe447

these changes are telemetry changes, unrelated to pocket.  So this is great we have a win!  It puts less overall pressure on the impact of the pocket regressions (at least in this single instance).  The original regression is still valid to track and investigate.

Thanks for looking into this!
Flags: needinfo?(jmaher)
All ts_paint had a similar issue!  We regression 20%+ (which is pretty serious), and now with the telemetry fix we are ~5% regressed (still serious enough to care about).

There are some session restore issues- these are smaller (barely 2%) and appear to not be fixed by the telemetry fix, although they trend slightly down.
I know there is a pending tpaint regression related to pocket.  This landing set is either readermode or pocket- hard to tell with the resolution we have.  Now we have a large ts_paint regression introduced.  Gavin, could you help us determine what the root cause is here or help redirect us to someone who can and make a decision on this?  For reference, lets keep this discussion on trunk trees (although this set of regressions as outlined above will be on Aurora today).
Flags: needinfo?(gavin.sharp)
This was caused by the pocket pref-flip, which is something I verified on m-c, where it landed in a single push directly on central, and noted in the other (tpaint) perf bug for it.

AIUI it won't be on aurora, because the pref flip was backed out there: https://bugzilla.mozilla.org/show_bug.cgi?id=1161881#c9
(In reply to :Gijs Kruitbosch from comment #6)
> This was caused by the pocket pref-flip, which is something I verified on
> m-c, where it landed in a single push directly on central, and noted in the
> other (tpaint) perf bug for it.

Specifically: https://bugzilla.mozilla.org/show_bug.cgi?id=1161593#c19

> AIUI it won't be on aurora, because the pref flip was backed out there:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1161881#c9

it will hit release (38.0.5) though...
(Reporter)

Comment 8

3 years ago
On mozilla-inbound-non-pgo Ts, Paint test on win: http://graphs.mozilla.org/graph.html#tests=[[83,132,31],[83,63,31],[83,131,31]]&sel=1428759277514,1431351277514&displayrange=30&datatype=geo

Here the regression rises then reduces after 77d92f6d7679, which is: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b6b1185fcb09&tochange=77d92f6d7679

There are revisions related to pocket there like 39dc888ce14c, e29d46c8d9b3 and adadd752ff08 which might be causing a reduction in values.
No longer blocks: 1160775
ok, good info to have!  we are uplifting today which is why I mentioned aurora- glad to know the current aurora and future beta will not be of concern atm.

What could I do to help reduce the perf impact of the pocket pref?
(In reply to Joel Maher (:jmaher) from comment #3)

> looking at win7 ts_paint, we can see the drop from 1200 -> 1050 on this set
> of changes:
> https://hg.mozilla.org/integration/fx-team/pushloghtml?changeset=663616cbe447
> 
> these changes are telemetry changes, unrelated to pocket.  So this is great
> we have a win!

Hmm. That seems pretty surprising, I don't see anything in there (663616cbe447 / 5add1e8ab1e2 / 1815927ac2fc) that I would expect to have any impact on performance, let alone a major improvement. Maybe the nsExceptionHandler.cpp changes are better, but that should only be running when we're crashing, not during a normal startup. Although I do remember Benjamin fighting with some telemetry(?)-related performance issue recently, so maybe I'm missing something? Benjamin -- anything in your changeset that you would expect to have a positive impact on Ts Paint?

Has anyone tried doing a comparison run with that backed out? I worry that either that's not the right changeset for the improvement or something else is wonky.


This regression sounds a lot like what we found in Australis (bug 880611), and specifically bug 888800. Would be worth a quick check to see if a similar fix (adding an appropriate <toolbarbutton> to the navbar in browser.xul) has an effect here. But could be something totally different.

A few ideas that can be quickly investigated:
  - Is the iframe involved? (especially since it spins up an about:blank document inside)
    Does just removing it improve Ts Paint, as a quick hack?
  - Is it possible the extra button is causing the toolbar to go into overflow on Talos,
    ala bug 1163231? Should be easy to verify with a local run or screenshot on Talos?
  - Bug 1162283 added a stringbundle load to CustomizableWidgets.jsm. Could that be causing file IO
    and thus this regression? Easy enough to comment out to check.

Failing any quick discoveries from that, we're probably going to need to do the slow and tedious route of bisection and profiler diffs to find out what's happening. 

One other random note: it's interesting that the conditionally-added Hello button either didn't cause a similar regression, or was missed.
Flags: needinfo?(benjamin)
Gijs: can you help with the initial investigation here?
Assignee: nobody → gijskruitbosch+bugs
Flags: needinfo?(gijskruitbosch+bugs)
I spent this morning and the beginning of the afternoon profiling this.

Unfortunately the remaining regression is tricky to work out in terms of where it's coming from. Comparing profiles with/without pocket on my machine, the difference is pretty slight (on the order of 20-odd ms on a total of 1260 when profiling, 760 when not profiling) and spread out across a number of places.

There appear to be at least the following things going on:

- we load a bunch of extra script that does Stuff straight from the XUL file.
- the button being inserted is indeed not free. We cause extra XBL bindings and style work etc. when we create the button and insert it.
- the onLocationChange stuff we added is loading Pocket.jsm and causing time spent there which is not useful because the user is never logged in to pocket, and as long as they aren't this is all a no-op. I will explore making this a push-state rather than pull-state thing so that we don't lose cycles on it. I'll be doing this in bug 1161593.

There seems to be other relatively low-hanging fruit as well though, like the gDevTools module which gets loaded and created in delayedStartup and loads a bunch of files. I may look into this further if the current approach does not yield the results we need in the appropriate timeframe.
Status: NEW → ASSIGNED
Depends on: 1161593
Flags: needinfo?(gijskruitbosch+bugs)
great info Gijs, thanks for looking into it.
Flags: needinfo?(gavin.sharp)
running notes from a shared debugging poking-around with Jared:

- the locale fallback stuff seems to not be free, mostly for the CustomizableWidgets case (the other two callsites won't get exercised in any of our talos tests). I don't know how much just fixing the exception but keeping the separate file will help.
- the iframe that is removed by the other file may also be involved.
- the social API / add-on checking is also not free, but I don't know how much of a difference those are making. Certainly for the add-on we could defer to the add-on to just destroyWidget the builtin widget. If we can also move the socialAPI stuff into nsBrowserGlue that may help.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cb547b76634
https://treeherder.mozilla.org/#/jobs?repo=try&revision=84630e0871bc
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b48c881e7566

telemetry thing backed out:

 https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa078d4a8d2c


this is all on fx-team tip of a few minutes ago, and all with PGO. There's 2 baselines, one with and one without the pocket pref set to true, and a check to see if fixing the l10n stuff makes a noticeable difference (it does for me locally, but we'll see).
one fix, and one thing that potentially will make things worse again (but it was a correctness fix, so we kind of had to take it anyway)

So here's some trypushes to see where we're at with that landed:

with and without pocket pref enabled:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=188165e33ad0
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc32d89d53b0

with the onLocationChange handler disabled:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4d17a0bb4c8

with the pocket scripts not loaded and not called from said onLocationChange handler:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=4661471d4248

both of the above:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=8324d84dc629


seeing as the landed patch avoids some stringbundle awkwardness, let's see what happens if we avoid those altogether:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=b72a1df3acc9

that + the onlocationchange + script stuff:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad375294b124


then there's this iframe thingy:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=71fb956a9bbb

that + everything else:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1e49137e864



Joel, I hear you have scripts to retrigger things using mozci so as not to spend forever clicking around treeherder. Can you share those with me somehow, please?
Flags: needinfo?(jmaher)
Here is what I use to trigger:
$ git clone https://github.com/armenzg/mozilla_ci_tools
$ cd mozilla_ci_tools/scripts
$ python alltalos.py --repo-name try  --times 6 --rev 188165e33ad0

a few notes:
* this will ask for ldap credentials so it can trigger stuff
* it will schedule all talos- osx 10.10, osx 10.6, linux32, linux64, winxp, win7, win8, AND android
* --times 6 = 6 retriggers, this means that it will be 7 jobs for the original ones; and 6 jobs for ones that don't normally get schedule (i.e. osx 10.10, android)
* if this is run and a build has not completed, it will schedule a build.
** NOTE: this means that it will schedule an android build
* there is no magical server sitting around (yet) that accepts retrigger commands and executes them when a build is finished- it is best to wait for the builds to finish
* be patient- this downloads a lot of info about jobs and status, and then iterates through the jobs.  This is pretty reliable, just let it work and wait for the jobs to show up in treeherder (it can take up to 5 minutes to see all the jobs in treeherder)
* it is acceptable to hack the script to do what you want locally, here is a local hack to avoid android builds/jobs:
--- a/mozci/scripts/alltalos.py
+++ b/mozci/scripts/alltalos.py
@@ -72,6 +72,14 @@ def main():
     buildernames = build_talos_buildernames_for_repo(options.repo_name, pgo)
 
     for buildername in buildernames:
+
+        # skip android jobs
+        try:
+            buildername.index('ndroid')
+            continue
+        except:
+            pass
+
         trigger_job(revision=options.revision,
                     buildername=buildername,
                     times=options.times,

I will check these out in the morning and retrigger if needed!  Soon (later this month), this will be available via try syntax, there are a lot of kinks to sort out- but that is actively being worked on by a couple folks.
Flags: needinfo?(jmaher)
I'm in the process of retriggering, but apparently nobody had ever tried using it on Windows before (didn't work, needed to switch to my mac to do that). I also found it retriggered a lot of suites I don't care about, in addition to the android issue, so I submitted a pull request to add commandline options for those:

https://github.com/armenzg/mozilla_ci_tools/pull/198


Now I can do:

$ python mozci/scripts/alltalos.py --repo-name try --rev a4d17a0bb4c8 --times 14 --filter 'other,svgr' --platform desktop
(In reply to :Gijs Kruitbosch from comment #16)
> seeing as the landed patch avoids some stringbundle awkwardness, let's see
> what happens if we avoid those altogether:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b72a1df3acc9
> 
> that + the onlocationchange + script stuff:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad375294b124

> that + everything else:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1e49137e864

These three did not work because I made a mistake in the stringbundle patch.

New push:

remove stringbundle use:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=95fa9fcca143
(In reply to :Gijs Kruitbosch from comment #16)
> one fix, and one thing that potentially will make things worse again (but it
> was a correctness fix, so we kind of had to take it anyway)
> 
> So here's some trypushes to see where we're at with that landed:
> 
> with and without pocket pref enabled:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=188165e33ad0
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc32d89d53b0


So this shows that right now, unsetting the pref produces a modest gain in tpaint, ts_paint and sessionstore, of about 1%, all except ts_paint on winXP which has a comparatively high 3% difference:

https://compare-talos.paas.mozilla.org/?oldBranch=Try&oldRevs=dc32d89d53b0&newBranch=Try&newRev=188165e33ad0&submit=true

> with the onLocationChange handler disabled:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4d17a0bb4c8
> 
> with the pocket scripts not loaded and not called from said onLocationChange
> handler:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=4661471d4248
> 
> both of the above:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=8324d84dc629


https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dc32d89d53b0&newProject=try&newRevision=8324d84dc629

Seems to provide modest ts_paint and sessionstore improvements of up to 1% (on XP, particularly). This will need some work in order for it to work well, though. It is unclear which of these changesets provided the most value here, the numbers conflict between OSX and WinXP, but the effects are bigger when combining both csets, so I'll file a followup to work on that.
 
> then there's this iframe thingy:
> remote:  
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=71fb956a9bbb


https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dc32d89d53b0&newProject=try&newRevision=71fb956a9bbb

shows that this should get us back the remainder of the tpaint and ts_paint things, so I'll file a followup for that as well.
Priority: -- → P2
As I'm going to stop poking at this, here's a summary of where I think we're currently at:


- when this originally landed on nightly, there was a "big" regression in ts_paint, on the order of 20-30%. It got fixed almost immediately by the landing of the telemetry changes. The try pushes from comment 15 show exactly the same regression recurring when backing out the telemetry change, so this isn't a "fluke" in the "not reliably reproducible" sense, though it still makes little sense;
- it's not clear why the telemetry thing (or, really, the pref flip from bug 1161881) would have quite such drastic effects
- it isn't clear to what extent that same regression happened as this got uplifted - if this is timing-related, it's quite possible the interaction of the patch with other stuff is non-obvious and the same doesn't happen on beta right now. On the other hand, it could also be the same but without the telemetry "fix" meaning we now have a serious regression on beta. It's hard to be sure, see below.
- in terms of assessing pocket perf overall, esp. when also considering bug 1161593, it's not clear what the current state of things is. There are a number of causes of uncertainty:
  - there are no perf measurements for mozilla-release because talos doesn't run there, so there's no info at all
  - things didn't land on beta-38 for a few weeks because we already knew we were having a 38 and then 38.0.5, and chose to do both on mozilla-release early
  - graphs of branch things (which are interesting because they didn't have the telemetry "fix" that fixed most of the issue on trunk) got messed up by branch uplifts about a week ago, further complicating the assessment of what overall perf impact was
  - things landed over the course of a week or two, and not everything was behind the pref that was flipped by bug 1161881 which caused the immediate ts_paint impact (see the first point).
  - however, the pref kicked a bunch of things into action which had already landed, both directly (ie things that explicitly check for the pref) and indirectly (things that check for the button which is gated by the pref).

In terms of mitigating the issue, bug 1164410 and bug 1164940 have both landed. The former was a correctness fix that was good to have even without any perf improvements, though try did predict a tiny perf gain. It's unclear if that's borne out by data from m-c/fx-team, though it would make sense if it was. The latter got me at least one email regarding tpaint improvements. Finally, bug 1164942 has now also landed on fx-team after bouncing twice, which will probably mess with any perf info we would otherwise have gotten. Maybe the merge will tell, once it happens.

All in all I would expect those three issues to fix most of the tpaint issue in bug 1161593, and potentially some of the ts_paint stuff here, though as noted above it's really hard to be sure about any of that.


Regarding further steps, really the most important thing right now to me would seem to be assessing what's going on with beta and if it has the same without-telemetry-changes regression, and if so, to get to the bottom of that. The easiest way to check that might be to do a trypush of beta tip, and beta-tip with the pref flipped back to off, and beta-tip with the telemetry changes uplifted (assuming that's reasonably easy to do), and to retrigger a bunch and check what the numbers say.

Secondly, it'd be good to know if we're out of the woods on trunk / generally. This is harder because, as noted, the pref flip doesn't cover everything. The trivial trypush + trypush with the pref flipped might not be enough to ascertain we've not regressed stuff with the other landings in small bite-size pieces. I've thought that a try push with all the pocket changes backed out would be too time-consuming to set up, and/or too hard to draw conclusions from, but perhaps I'm wrong. I'm not sure how else you would assess this without having the 20% thing to work off of (and assuming that's fixable in a way that would further benefit trunk beyond what's been fixed by the telemetry change itself).

Dolske has noted that perhaps we need a loaner machine, though I'm not sure what OS we'd need for that to be effective, either...

If this is a rambly/not-super-clear note, then that's because sadly, it's not 100% clear where we stand right now (and never was, see comment #0). Sorry, but feel free to ping me with questions.
Assignee: gijskruitbosch+bugs → nobody
No longer blocks: 1161838
Status: ASSIGNED → NEW
(In reply to :Gijs Kruitbosch from comment #21)
> Finally, bug 1164942 has now also landed on fx-team after bouncing twice,
> which will probably mess with any perf info we would otherwise have gotten.
> Maybe the merge will tell, once it happens.

FWIW, having just looked at graphserver, it seems initial signs for tpaint are pretty positive, but we're still not quite back where we were on Windows - though that could of course be because of other changes that landed in the meantime...
When we turned on pocket for mozilla-beta (http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=c20c4ef55f08&tochange=f75252949669 on May 11th after the uplift), we ended up with a 22.1% windows 8 ts,paint regression as seen on the graph:
http://graphs.mozilla.org/graph.html#tests=[[83,53,31]]&sel=none&displayrange=30&datatype=geo

I know we have done some fixes on trunk for pocket, could we consider uplifting these to mozilla-beta?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Joel Maher (:jmaher) from comment #23)
> When we turned on pocket for mozilla-beta
> (http://hg.mozilla.org/releases/mozilla-beta/
> pushloghtml?fromchange=c20c4ef55f08&tochange=f75252949669 on May 11th after
> the uplift), we ended up with a 22.1% windows 8 ts,paint regression as seen
> on the graph:
> http://graphs.mozilla.org/graph.html#tests=[[83,53,
> 31]]&sel=none&displayrange=30&datatype=geo
> 
> I know we have done some fixes on trunk for pocket, could we consider
> uplifting these to mozilla-beta?

Everything is already on beta except bug 1164942 which I just requested uplift for.

It would be interesting to know what the 22.1% regression is really about and why the telemetry change "fixed" it on trunk. Without that information, I do not imagine it likely that any of the 3 fixes on nightly will fix the regression on beta (I mean, 2 are there already and have apparently done sod all by comparison to the 22% regression).
Flags: needinfo?(gijskruitbosch+bugs)

Comment 25

3 years ago
https://hg.mozilla.org/integration/fx-team/rev/1815927ac2fc has the side effect of loading the addon manager at a slightly earlier point in startup, which also initializes NSS at a slightly earlier point in startup. My understanding is that either was it was before tpaint, so it shouldn't have changed tpaint. But I've been wrong before, and been yak-shaving parts of this for a while now.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #25)
> https://hg.mozilla.org/integration/fx-team/rev/1815927ac2fc has the side
> effect of loading the addon manager at a slightly earlier point in startup,
> which also initializes NSS at a slightly earlier point in startup. My
> understanding is that either was it was before tpaint, so it shouldn't have
> changed tpaint. But I've been wrong before, and been yak-shaving parts of
> this for a while now.

This could be it because the pocket button checks with the add-on manager whether the pocket add-on is present or not.
(In reply to Joel Maher (:jmaher) from comment #27)
> does it make sense to request
> https://hg.mozilla.org/integration/fx-team/rev/1815927ac2fc on beta?

I can't speak as to that, but if we're just doing it for this perf issue, probably not before verifying that it fixes the perf issue. If the risk is too high, there might be ways of working around it. We should really verify that "this is it" first, though.

Comment 29

3 years ago
(In reply to :Gijs Kruitbosch from comment #26)
> This could be it because the pocket button checks with the add-on manager
> whether the pocket add-on is present or not.

Thought to toss out there if this is causing the issue:
Would checking a pref be a lighter weight call then loading the add-on manager? In the upgrade path, we could look to see if one of the extensions preferences exist as a way to verify the add-on exists instead. The only thing you wouldn't be able to do that way is to detect that it was removed. But Pocket could push an update to the add-on to add a hook on uninstall to set some pref (or could handle that on the Mozilla side too presumably).
(In reply to Nate Weiner from comment #29)
> (In reply to :Gijs Kruitbosch from comment #26)
> > This could be it because the pocket button checks with the add-on manager
> > whether the pocket add-on is present or not.
> 
> Thought to toss out there if this is causing the issue:
> Would checking a pref be a lighter weight call then loading the add-on
> manager?

I'd think so. A try push with few retriggers could help verify it.
... seeing as nobody was actually doing the trypushes and I was bored on a train, I accidentally:

baseline:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea916c195bba

pref off:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=21ef47ca0f83

get rid of the conditional destroy promise based on the add-on manager:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=d075c9853647

(these are all based on current beta tip)
Joel, thanks for the retriggers.

Good news:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=21ef47ca0f83&newProject=try&newRevision=d075c9853647

this is a comparison between "pocket pref flipped to false" and "remove the add-on manager calls".

It's a wash, and the same cannot be said for the two other comparisons, which show the 20% difference one way or the other. So removing the reliance on the add-on manager should fix this.

Nate, are there extant prefs that we can rely on existing versions of the add-on changing?
Flags: needinfo?(nate)
yay, this is good news!
Yes, `browser.pocket.settings.installed`. I already have try pushes that check for that pref. I will post my response shortly, no need for more work here.
Assignee: nobody → jaws
Status: NEW → ASSIGNED
Flags: needinfo?(nate)
Created attachment 8608801 [details] [diff] [review]
Patch A

I spent time yesterday investigating this situation and have the following results.

I attempted three different patches and combinations thereof. The three patches that I attempted are:
A.) Checking the `browser.pocket.settings.installed` pref before invoking the Add-ons Manager
B.) Checking `SocialService.hasEnabledProviders` before calling `SocialService.getProvider`
C.) Placing the Pocket button in the customization palette by default

=== Background and motivation behind the three patches ===

A.) As Benjamin's patch has shown (and other discussion in this bug), we have seen a high correlation between tspaint and the loading of the Add-ons Manager. Benjamin's patch brought a big win on tspaint, and we would like to realize this same win by not invoking the Add-ons Manager. During start-up, we are currently asking the Add-ons Manager if the Pocket add-on is installed. This is always false for Talos, and will be false for the extreme majority of Release users.

B.) There exists a note in SocialService.jsm that says to use `hasEnabledProviders` as an optimization before doing other work with SocialService. During start-up, we are currently asking SocialService if the Pocket Social-API plugin is installed, and if so uninstalling it. This is always false for Talos, and will be false for the extreme majority of Release users.

C.) This patch is used as a way to see what the costs are of placing a new widget in the toolbar. This is done to measure the costs of the reflows/layout work with other flex items (URL bar + Search bar).

=== Results ===

I compared the above three patches with two baselines: the clean tip of the tree; and a backout of Benjamin's work. These comparisons are run with try syntax of: "try: -b do -p linux -u none -t other_nol64,other_l64". Baseline revision is 6d3a597b0935. The extended data can be found at [1].

When comparing with the clean tip of the tree (positive numbers are slower, negative numbers are faster):

A.)
-0.54% session_restore_no_auto_restore
-0.12% tspaint

B.)
0.61% session_restore_no_auto_restore
-0.15% tspaint

C.)
-0.53% session_restore_no_auto_restore
-0.15% tspaint

Combining the above patches with the backout of Benjamin's patch:

A.)
1.18% session_restore_no_auto_restore
-3.11% tspaint

B.)
1.19% session_restore_no_auto_restore
23.05% tspaint

C.)
1.08% session_restore_no_auto_restore
21.8% tspaint

I also did a performance run with only Benjamin's patch backed out. This provides us a baseline for the backout.

Backout:
2.16% session_restore_no_auto_restore
23.32% tspaint

=== Interpreting the data ===

Percentage differences less than 1% are within a noisy range that is basically negligible for our investigations.

In the first set of comparisons, all three patches (A, B, and C) fail to generate a large difference in any direction.

In the second set of comparisons, a large variation can be seen by patch A compared to patches {B, C} as well as the pure backout of the Benjamin's patch. In fact, between patch A and the backout, we see a 26.43% difference in tspaint performance.

=== Looking deeper between patch A and the backout ===

As mentioned earlier, patch A does a basic preference check before invoking the Add-ons Manager. In the case of Talos, this prevents the Add-ons Manager from being loaded.

Benjamin's patch has the effect of loading the Add-ons Manager earlier in start-up.

=== Conclusion ===

Benjamin's patch causes the Add-ons Manager to be loaded earlier in start-up. It is possible that there is less contention at that time for the Add-ons Manager, and by the time the code in patch A gets run, the Add-ons Manager is loaded, primed, and ready to respond quickly.

The removal of Benjamin's patch (the backout baseline) shows that there is a considerable cost to being the first consumer of the Add-ons Manager. Patch A confirms this, since preventing the load of the Add-ons Manager removes all of the negative performance impacts.

Therefore, we are left in the following situation:

Firefox 41 and Firefox 40 contain Benjamin's telemetry change. This loads the Add-ons Manager much earlier in start-up, causing the usage of the Add-ons Manager by CustomizableUI to be small (just barely non-negligible). '''With this known, we can rest assured that no change is needed on mozilla-central or mozilla-aurora to fix this performance regression.'''

Firefox 38.0.5 and Firefox 39 do not contain Benjamin's telemetry change. These branches could benefit from checking the preference before invoking the Add-ons Manager. It is now too late in the release cycle to make this change to Firefox 38.0.5. Firefox 39 has about two weeks left in the release cycle that this change can be made, but this change is high in risk due to the reliance on preferences to check if an add-on installed, the unverified code of the add-on which sets this preference, and the ability for users to reset this preference and end up in a broken state (two Pocket buttons appearing on the toolbar as well as weirdness in menus).

It is in my opinion (as well as Dolske's) that we not uplift patch A to mozilla-beta due to the risk detailed in the previous paragraph.

[1] https://docs.google.com/spreadsheets/d/1-s4-VKWt-__8BM8XIO5f78BYKsPWA8s0GG6Zc5ydoUQ/edit?usp=sharing
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Whiteboard: [fixed per investigation in comment #35]
Thanks for the good investigation.

@jaws, would you be able to summarize which known perf regressions exist (if at all) on each version?
Flags: needinfo?(jaws)
Sure thing (clarified over IRC that a short summary of what versions are affected by the regression is what was requested),

Version 38.0.5, roughly 20% regression on sessionrestore_no_auto_restore
Version 39, same as 38.0.5
Version 40, regression fixed
Version 41, regression fixed, and so on for future versions.
Flags: needinfo?(jaws)
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #37)
> Version 38.0.5, roughly 20% regression on sessionrestore_no_auto_restore
> Version 39, same as 38.0.5
> Version 40, regression fixed
> Version 41, regression fixed, and so on for future versions.

Thanks.

Just adding for the sake of summary, that the regressions at 38.0.9 and 39 were not fixed since it was too risky this late. See comment 35 for more details.
(In reply to Jared Wein [:jaws] (please needinfo? me) from comment #37)
> Sure thing (clarified over IRC that a short summary of what versions are
> affected by the regression is what was requested),
> 
> Version 38.0.5, roughly 20% regression on sessionrestore_no_auto_restore
> Version 39, same as 38.0.5
> Version 40, regression fixed
> Version 41, regression fixed, and so on for future versions.

@jaws, looking again at the reported regressions, the "alert manager" link from comments 1/2/3, it shows:

2-3% session restore no auto restore (win/linux)
20-30% regressions in ts_paint (win/linux)

So actually the meaningful regression is ts_paint, while your summary doesn't mention it at all.

Help?
(In reply to Avi Halachmi (:avih) from comment #39)
> So actually the meaningful regression is ts_paint, while your summary
> doesn't mention it at all.

In comment #37 I meant to say ts_paint where I said sessionrestore. Please refer to comment #35 and the spreadsheet linked to at the end of that comment.
You need to log in before you can comment on or make changes to this bug.