2.23 - 3.47% tp5o / tp5o_webext (linux64, linux64-qr, osx-10-10, windows10-64, windows10-64-qr, windows7-32) regression on push 209ef31926beef24da82d91223a89d26c712eccc (Tue Nov 20 2018)

VERIFIED FIXED in Firefox 65

Status

()

defect
P2
normal
VERIFIED FIXED
8 months ago
5 months ago

People

(Reporter: igoldan, Assigned: Ehsan)

Tracking

(Blocks 1 bug, {perf, regression, talos-regression})

unspecified
Firefox 65
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox63 unaffected, firefox64 unaffected, firefox65 fixed)

Details

Attachments

(1 attachment)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=fd9a2224eb523e00beccac04e958719eae6b478a&tochange=209ef31926beef24da82d91223a89d26c712eccc

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  3%  tp5o windows10-64-qr opt e10s stylo         129.63 -> 134.14
  3%  tp5o windows10-64 opt e10s stylo            133.47 -> 137.43
  3%  tp5o osx-10-10 opt e10s stylo               229.17 -> 235.77
  3%  tp5o_webext windows10-64 opt e10s stylo     196.32 -> 201.94
  3%  tp5o windows10-64 pgo e10s stylo            119.80 -> 123.10
  2%  tp5o_webext windows7-32 pgo e10s stylo      174.54 -> 178.72
  2%  tp5o_webext windows7-32 opt e10s stylo      193.78 -> 198.21
  2%  tp5o linux64 pgo e10s stylo                 127.14 -> 130.01
  2%  tp5o linux64 opt e10s stylo                 141.44 -> 144.60
  2%  tp5o_webext linux64-qr opt e10s stylo       213.89 -> 218.65


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=17702

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Component: General → Tracking Protection
Product: Testing → Firefox
Flags: needinfo?(ehsan)
I'll soon post the Gecko profiles.
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #2)

> For tp5o on OS X:
> 
> after:
> https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.
> net%2Fv1%2Ftask%2FQHDqWRFKT8eTRG8C4zhjPg%2Fruns%2F0%2Fartifacts%2Fpublic%2Fte
> st_info%2Fprofile_tp5o.zip
> 
> before:
> https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.
> net%2Fv1%2Ftask%2FW84l9lDVRBuUPRJzGF0vVw%2Fruns%2F0%2Fartifacts%2Fpublic%2Fte
> st_info%2Fprofile_tp5o.zip

Oh, sorry. Got this wrong. For OS X, the correct profiles are these:

before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FQHDqWRFKT8eTRG8C4zhjPg%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tp5o.zip

after: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FW84l9lDVRBuUPRJzGF0vVw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_tp5o.zip
Thanks for the report.  This is expected in the sense that we are now doing more work, but it's also a huge regression which I'm not in a rush to just accept.  I'm discussing how to deal with this with my colleagues.  Maintaining the needinfo.
The patch attached is my attempt at fixing this regression without backing out the offending patch itself.  The approach is simple: if the top-level page already has the STATE_COOKIES_LOADED flag, just record the flag on our document and don't notify anything.  I came up with this approach when I noticed the content processes in the profiles after the change experiencing additional time running the OnSecurityChange callback coming from the nsGlobalWindowOuter::NotifyContentBlockingState() call site.

I pushed a couple of try pushes to verify that the regression does indeed go away:

Before: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8d4fdcedd7ae7d0996ddbf5040113c72d33da73
After: https://treeherder.mozilla.org/#/jobs?repo=try&revision=59cf9a5ac928b010047b7c70eaf8f87a363e7fb3

Talos compare: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=59cf9a5ac928b010047b7c70eaf8f87a363e7fb3&oldRevision=d8d4fdcedd7ae7d0996ddbf5040113c72d33da73
OK, the first approach was promising:

<https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d8d4fdcedd7ae7d0996ddbf5040113c72d33da73&newProject=try&newRevision=59cf9a5ac928b010047b7c70eaf8f87a363e7fb3&framework=1&showOnlyImportant=1>

But this wasn't completely recovering the performance we lost due to my original landing.  Then I decided to even go one step further and apply the same idea to everything that this function does and that indeed recovers all of the lost perf, and indeed buys a little bit more here or there too:

<https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d8d4fdcedd7ae7d0996ddbf5040113c72d33da73&newProject=try&newRevision=f8ae453e84a881aa24c5fc922b9e71e3e082b407&framework=1&showOnlyImportant=1>

The updated patch is ready for review now.
Flags: needinfo?(ehsan)
Assignee: nobody → ehsan
Status: NEW → ASSIGNED
Priority: -- → P2
Attachment #9026829 - Attachment description: Bug 1508944 - Avoid dispatching the OnSecurityChange notification repeatedly for STATE_COOKIES_LOADED notifications → Bug 1508944 - Avoid dispatching the OnSecurityChange notification repeatedly when nothing has changed
See Also: → 1509228
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d8cec61d53e8
Avoid dispatching the OnSecurityChange notification repeatedly when nothing has changed r=baku
Increased the timeout from 500ms to 1s, seems to make the test pass...
Flags: needinfo?(ehsan)
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8ab07484e939
Avoid dispatching the OnSecurityChange notification repeatedly when nothing has changed r=baku
https://hg.mozilla.org/mozilla-central/rev/8ab07484e939
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 65
perf alerts are in:
== Change summary for alert #17780 (as of Fri, 23 Nov 2018 23:58:06 GMT) ==

Improvements:

  6%  tsvg_static windows7-32 pgo e10s stylo      45.61 -> 43.02
  3%  tp5o windows7-32 opt e10s stylo             135.05 -> 130.70
  3%  tp5o windows10-64-qr opt e10s stylo         133.55 -> 129.44
  3%  tp5o_webext windows7-32 pgo e10s stylo      178.56 -> 173.46
  3%  tp5o windows7-32 pgo e10s stylo             121.51 -> 118.21
  3%  tp5o windows10-64 opt e10s stylo            137.12 -> 133.52
  2%  tp5o_webext windows10-64 pgo e10s stylo     180.69 -> 176.47
  2%  tp5o_webext windows10-64 opt e10s stylo     201.43 -> 196.90

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=17780
👍

But why did tsvg_static improve?
oh, that is marked as "invalid", it is a noisy test affected by lower/higher volumes of pushes.
Depends on: 1511477
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.