Closed Bug 1484633 Opened 6 years ago Closed 6 years ago

12.55 - 38.48% tp5o responsiveness / tp5o_webext responsiveness (linux64, linux64-qr, windows10-64, windows10-64-qr, windows7-32) regression on push e7869a533fac (Fri Aug 17 2018)

Categories

(Web Compatibility :: Interventions, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=18b999f1a4c141c31d83d13842de6231a774a34c&tochange=e7869a533fac3a2adef73f4b81c29a92dcf453ec

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 38%  tp5o responsiveness windows10-64 pgo e10s stylo     0.44 -> 0.62
 35%  tp5o responsiveness windows10-64-qr opt e10s stylo  0.53 -> 0.72
 33%  tp5o responsiveness windows7-32 opt e10s stylo      0.45 -> 0.60
 28%  tp5o responsiveness windows7-32 pgo e10s stylo      0.45 -> 0.57
 20%  tp5o responsiveness linux64 pgo e10s stylo          0.67 -> 0.81
 18%  tp5o responsiveness linux64 opt e10s stylo          0.77 -> 0.91
 16%  tp5o_webext responsiveness linux64 pgo e10s stylo   1.13 -> 1.31
 16%  tp5o_webext responsiveness linux64 opt e10s stylo   1.27 -> 1.47
 14%  tp5o_webext responsiveness linux64-qr opt e10s stylo1.70 -> 1.93
 13%  tp5o responsiveness linux64-qr opt e10s stylo       1.17 -> 1.32


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=15097

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: General → Go Faster
Product: Testing → Web Compatibility Tools
Flags: needinfo?(dschubert)
tps regression are also associated with bug 1481395:

== Change summary for alert #15097 (as of Thu, 16 Aug 2018 20:12:33 GMT) ==

Regressions:

  4%  tps windows10-64 pgo e10s stylo                     11.29 -> 11.77
  4%  tps windows7-32 opt e10s stylo                      11.43 -> 11.91

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=15097
Thanks, Ionuț! This looks just like the discussion we've had about the same extension in bug 1457538, and a bit like the discussion in bug 1466575, and I think we should handle those in the same way by marking those as WONTFIX. I looked at the profiles (thanks for linking those!) and also made some in a local build, and there is nothing obvious in terms of extension overhead, so there isn't really anything I can do besides digging into the underlying APIs and doing tuning there, but I am not really qualified for that, since I have no clue about how thee works.

The percentage values, once again, look bad, but that's simply caused by the absolute number being as small as they are.

This issue will come again with bug 1451484, when we convert the entire extension to a webextension and have more logic inside it, and at that point, we probably have to figure out what we can do about this. But for now, I still feel like having small regressions like this is okay, especially since we now are fixing actual websites with the code.

Mike, can I have your opinion here? :)
Flags: needinfo?(dschubert) → needinfo?(miket)
See Also: → 1457538
Hm. This is actually pretty surprising.

I wouldn't expect that change to have any effect, since none of the new content scripts should run on any sites that we load in tp5o. And I don't see any extension framework code running in the tps profile other than the tps extension.

The only change I see that I might expect to have an effect is the user agent override for google.com. And I'd only really expect that if it changes the behavior of the Google pages we load.
Hm. Looking at the logs, I don't understand why bug 1481395 is being blamed for this. It looks like the values for that revision were pretty normal, and increased in subsequent revisions.
According to the graph at https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=autoland,1646436,1&series=mozilla-inbound,d03cdc477d48982ab6bd17158beb7497c8afee75,0, there is a pretty significant jump when we landed. Otherwise, I am as surprised as you, but we had the same jump the last time we landed changes, ..
(In reply to Dennis Schubert [:denschub] from comment #7)
> According to the graph at
> https://treeherder.mozilla.org/perf.html#/
> graphs?timerange=1209600&series=autoland,1646436,1&series=mozilla-inbound,
> d03cdc477d48982ab6bd17158beb7497c8afee75,0, there is a pretty significant
> jump when we landed. Otherwise, I am as surprised as you, but we had the
> same jump the last time we landed changes, ..

Yeah, sorry, I did some backfilling and it became more obvious.
I don't have anything super meaningful to add, other than the regression is unfortunate, but we should probably accept it (unless someone has an amazing idea of how to fix).
Flags: needinfo?(miket)
(In reply to Mike Taylor [:miketaylr] (62 Regression Engineering Owner)(PTO Aug 15 - 17) from comment #9)
> I don't have anything super meaningful to add, other than the regression is
> unfortunate, but we should probably accept it (unless someone has an amazing
> idea of how to fix).

I don't even have any idea of what's *causing* it. The before and after profiles look pretty much the same, and neither of them show any code related to this extension.
from reading the comments, this bug is probably a wontfix as it is not obvious what caused the regression.  :kmag, care to add more or resolve this bug?
Flags: needinfo?(kmaglione+bmo)
:vchin we need your help on concluding this bug, as it's >2 weeks since it got stuck.
Flags: needinfo?(vchin)
(In reply to Kris Maglione [:kmag] from comment #5)
> Hm. This is actually pretty surprising.
> 
> I wouldn't expect that change to have any effect, since none of the new
> content scripts should run on any sites that we load in tp5o. And I don't
> see any extension framework code running in the tps profile other than the
> tps extension.
> 
> The only change I see that I might expect to have an effect is the user
> agent override for google.com. And I'd only really expect that if it changes
> the behavior of the Google pages we load.

Is that UA override on when we ran these tests? What is the override exactly?
Flags: needinfo?(vchin)
I don't believe we're shipping any UA overrides for google.com anymore -- certainly not in Fennec (we retired the Nightly-only Google Search UA spoofing experiment).
Sounds like we can try landing this again?
> I don't believe we're shipping any UA overrides for google.com anymore

We do have a UA override in place for sites.google.com, see bug 755590. I am not sure how `browser.webRequest.onBeforeSendHeaders.addListener` is matching addresses internally, but if it is doing a naive base-domain based matching first, it might have an impact.

> Sounds like we can try landing this again?

We didn't get backed out, so there is nothign to land. :) The question here is whether we can eat the regressions or not, and if the regressions are even caused by us (especially comment 10), and if so, if we can do something about this or if this is just the cost of migrating system extensions to Web Extensions.
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #11)
> from reading the comments, this bug is probably a wontfix as it is not
> obvious what caused the regression.  :kmag, care to add more or resolve this
> bug?

I agree we should WONTFIX here. Someone please reopen if you disagree. :)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Flags: needinfo?(kmaglione+bmo)
You need to log in before you can comment on or make changes to this bug.