Open Bug 1393910 Opened 7 years ago Updated 2 years ago

39.89% sessionrestore_many_windows (windows7-32) regression on push 3cbad3a6fc6e (Thu Aug 24 2017)

Categories

(Toolkit :: Safe Browsing, defect, P3)

53 Branch
defect

Tracking

()

Tracking Status
firefox57 --- fix-optional

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=3cbad3a6fc6e

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 40%  sessionrestore_many_windows windows7-32 pgo e10s     2,597.00 -> 3,632.83

Improvements:

 25%  sessionrestore_many_windows windows10-64 opt e10s     4,107.08 -> 3,079.17


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=9006

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
I am not sure why we have a large improvement on win10 opt (non pgo), and a large regression on win7 pgo.  

I know we changed some preferences only in all test harnesses-  I see a few options here:
* the preferences really do affect session restore and we need to document this
* the regression is related to a pgo specific issue, that is a black box
* there are possibly differences between 32 and 64 bit, but I doubt this is the case as win7 opt didn't change, nor did win10 pgo

:francois, can you look into this and help figure out if there is a good explanation for this?
Flags: needinfo?(francois)
Component: Untriaged → Safe Browsing
Product: Firefox → Toolkit
The first thing would be to figure which of these prefs actually changes the sessionrestore talos numbers:

https://hg.mozilla.org/integration/autoland/diff/3cbad3a6fc6e/testing/talos/talos/config.py

We can ignore browser.safebrowsing.downloads.enabled since it won't do anything during session restore.

It's one of these two:

- plugins.flashBlock.enabled
- privacy.trackingprotection.annotate_channels

Is it just a matter of changing one of these at a time and pushing to Try or is there something else to worry about?

What Try chooser settings should I use to replicate the results from comment 0?
Flags: needinfo?(francois) → needinfo?(jmaher)
./mach try -b o -p win32,win64 -u none -t other-e10s --rebuild 3
Flags: needinfo?(jmaher)
From Try pushes above, the one with annotations enabled seems to point a slightly bigger regression. I did another set of pushes, using https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=07c2203ca05e5fd802f61e740b4511b02aafbe48&tochange=072ae808c7b3fdd26ff74f5b9221c44a31f21224 as baseline.

Baseline, before the settings were added: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e06c7ca29a2dacce8a46925bd206b4a8fcbb606

privacy.trackingprotection.annotate_channels disabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=20a250bb8cfc52a4f673c0b581e8aae1af6719ec

plugins.flashBlock.enabled disabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=619533164980d8d0397b7a435bb31d442e8e00e6

Once these reruns finish, I'll come back with the proper conclusion.
OK, results are now in. Looks like by commenting out plugins.flashBlock.enabled and leaving privacy.trackingprotection.annotate_channels settting on, we can reproduce a >38% regression on the same test.

In other words, the privacy.trackingprotection.annotate_channels is the culprit for this bug. François, please review the Try pushes I did, to confirm my findings. I want to be sure I pushed the code changes correctly.
Flags: needinfo?(francois)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6)
> In other words, the privacy.trackingprotection.annotate_channels is the
> culprit for this bug. François, please review the Try pushes I did, to
> confirm my findings. I want to be sure I pushed the code changes correctly.

These commits look good to me:

https://hg.mozilla.org/try/rev/20a250bb8cfc52a4f673c0b581e8aae1af6719ec
https://hg.mozilla.org/try/rev/619533164980d8d0397b7a435bb31d442e8e00e6

However, I have no idea why we are seeing both a regression and an improvement in session_restore. This pref disables a few perf-related features from Quantum that depend on the tracking protection list. I could understand a perf impact one way or the other:

1. session restore slower because the tracking annotation cause more work (CPU) to happen, or
2. session restore faster because unnecessary resources are throttled or loaded in a more efficient way

but if I understand the results correctly, we're seeing both of these at the same time on different platforms. That doesn't make sense to me.

I disabled it in bug 1388938 assuming it would be better not to rely on an external service (and a list that may not be fully downloaded by the time the Talos tests start), similar to how we disable tracking protection and Safe Browsing in these tests. Do we want to have Talos results rely on a feature with a network dependency?
Flags: needinfo?(francois)
(In reply to François Marier [:francois] from comment #7)
> Do we want to have Talos results rely on a
> feature with a network dependency?
Ni?ing test authors and Joel on that.
Flags: needinfo?(mdeboer)
Flags: needinfo?(jmaher)
Flags: needinfo?(dteller)
(In reply to François Marier [:francois] from comment #7)
> I disabled it in bug 1388938 assuming it would be better not to rely on an
> external service (and a list that may not be fully downloaded by the time
> the Talos tests start), similar to how we disable tracking protection and
> Safe Browsing in these tests. Do we want to have Talos results rely on a
> feature with a network dependency?

I think that question is for Talos in its entirety, thus better for Joel to answer.

However, my gut tells me that the only network dependency that these test should rely on is LAN and loopback. WAN should be out of the equation, because the WWW infra is not reliable, obviously.
Especially localhost network traffic is something we should include and consider a constant - because we control that segment of infrastructure.
Flags: needinfo?(mdeboer)
Mmmmh... I thought we had managed to remove network dependencies from that test. If not, that's not intended.
Flags: needinfo?(dteller)
all unittests and talos tests run in our CI system should not rely on external servers outside of original setup and cleanup.

this means:
* setting preferences/env variables to set server names/ports to be localhost or invalid
* setting preferences to disable services where possible
* ensuring the tests themselves have all the resources in-tree (or in a pageset bundle)

Typically I load a test and look at the network monitor in devtools, for sessionrestore I am not sure that will be possible- we could look at tcpdump or similar software for external network access and filter it on the firefox processes.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #11)
> all unittests and talos tests run in our CI system should not rely on
> external servers outside of original setup and cleanup.

Ok, so my assumption was correct.

Based on that, it sounds like the next step here is to do the first option from comment 1:

- document that privacy.trackingprotection.annotate_channels affects session_restore
Mike, do you have cycles to get a profile from a Talos run with the annotate_channels pref set to on when you get back from PTO by any chance?  It's pretty worrying that this pref causes a regression since we're shipping with it.  :-(
Flags: needinfo?(mconley)
Wasn't able to reproduce locally, so I've got try pushes cooking that should hopefully result in profiles:

annotate disabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=80d431811c60bda9c0dfac01b631da19da3feddd
annotate enabled (regression): https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0226ffad0cd9c13bf8f3f6305433645507bd77c

Leaving ni? so that I can come back and post the profiles when these things are done.
Frustratingly, neither of those pushes resulted in profiles. That's particularly frustrating because it worked for me locally. Classic Talos.

I have a Window 7 machine that I might be able to reproduce the problem on locally, so I'm trying that next.
One thing to keep in mind here, (and this is just my theory)..  is that the same cset caused a large regression in one platform, and a large improvement in another  (unless I'm misunderstanding comment 0).

With that in mind, I think it's more likely that the change shifted the timing for when something particularly expensive happens, making it fall inside the timing of the test for one platform, and outside for another..  IOW, it's _probably_ not a regression as in more work is being done, but just for the timing of things..

(I could be wrong, of course..)
Priority: -- → P3
mconley: Kevin's perf report listed this ticket as an open/deferred item from 57. Can you comment about the priority of this regression so the P3 decision is documented? Thank you.
Hey elan, and everybody,

Sorry about the delay here. I haven't really had much luck determining what's going wrong here yet. I've got profiles working on Try, so that's good. I'm trying to get PGO profiles out now.

I think this is worth investigating, but I think P3 is the right call - specifically because this test appears to be bi-modal:

https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=mozilla-inbound,1514529,1,1&series=autoland,1514549,1,1

I suspect something about tracking protection annotation just happened to push us off of some kind of cliff.

But if you look at the graph I linked to, we seem to hop between these two modes, and we should probably figure out why to make this test more reliable.

So I'm 70% sure this is a test issue, and the test needs to be made less bi-modal. I could be wrong, but that's my read on this.

Is that sufficient?

I'm going to keep the needinfo? on me to get those PGO profiles. Soon!
Flags: needinfo?(elancaster)
:mconley, thank you so much. I feel like 70% is acceptable, we just wanted to make sure we are being deliberate in our choices for perf related tickets. You've done due diligence, thanks again! If something changes, we'll see your comment or ping me.
Wow, that took a while. I finally have some data here:

No regression profile: https://perfht.ml/2jXE5gj
Regression profile: https://perfht.ml/2jWoL3y

Still not super clear what's happening, but it looks like in the regressing case, there's a longer gap of time before it starts (and completes) restoring tab content in this test. The actual loading doesn't appear to take longer - it's just that the onStateChange that ContentRestore.jsm is waiting for seems to arrive later.

Hey francois, are there other threads besides the main threads in the parent and content processes that I should gather profiles from to help shed light on what the delay is here?
Flags: needinfo?(mconley) → needinfo?(francois)
(In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews from comment #21)
> Hey francois, are there other threads besides the main threads in the parent
> and content processes that I should gather profiles from to help shed light
> on what the delay is here?

The classification occurs on the main thread:

https://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1771

but I wonder whether the Safe Browsing update thread is interfering with this. Henry Chang did a lot of work in the past few months to mitigate that (bug 1338017) so it's probably fine, but it might be worth a quick profile.
Flags: needinfo?(francois)
Flags: needinfo?(elancaster)
Hey Mike, for 57 we're thinking fix option based on your feedback. Are you ok with that?
Flags: needinfo?(mconley)
I think that's the right call.
Flags: needinfo?(mconley)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.