Closed Bug 1374725 Opened 7 years ago Closed 7 years ago

2.93 - 14.92% glterrain / tp5o_scroll / tscrollx / tsvgx (osx-10-10) regression on push d521c1d8d60503c088aa4127300a83161852f297 (Tue Jun 20 2017)

Categories

(Firefox :: Theme, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed

People

(Reporter: jmaher, Assigned: nhnt11)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=d521c1d8d60503c088aa4127300a83161852f297

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 15%  tp5o_scroll summary osx-10-10 opt e10s     2.87 -> 3.29
 15%  tscrollx summary osx-10-10 opt e10s        2.99 -> 3.43
 10%  glterrain summary osx-10-10 opt e10s       3.79 -> 4.17
  3%  tsvgx summary osx-10-10 opt e10s           404.03 -> 415.86


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=7423

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
:nhnt11, I see you are the patch author of bug 1367385 which looks be the root cause of these regressions, can you take a look at these regressions and determine if there are fixes we can do or if this is something we were expecting and can document.
Flags: needinfo?(nhnt11)
Component: Untriaged → Theme
The patches have been backed out because of this as well as broken colors: https://irccloud.mozilla.com/file/VngoNF9B/Screen%20Shot%202017-06-20%20at%2018.09.13.png
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(nhnt11)
Resolution: --- → FIXED
Assignee: nobody → nhnt11
Target Milestone: --- → Firefox 56
Version: 53 Branch → Trunk
And these are the backout's improvement notifications:

== Change summary for alert #7431 (as of June 20 2017 18:34 UTC) ==

Improvements:

 14%  tp5o_scroll summary osx-10-10 opt e10s     3.33 -> 2.87
 11%  tscrollx summary osx-10-10 opt e10s        3.36 -> 2.99
  9%  glterrain summary osx-10-10 opt e10s       4.18 -> 3.80
  3%  tsvgx summary osx-10-10 opt e10s           414.76 -> 402.62

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7431
I've started to look into this. I have profiles from automation:

Good case (CSSOM.163.com - avg: 2.8): https://perfht.ml/2tuecFI

Bad case (CSSOM.163.com - avg: 3.42): https://perfht.ml/2tu4Fig

I think what I'm seeing is the main thread in the content process waiting around a bit more in the bad case for events. This makes me think that requestAnimationFrame isn't firing as frequently.

I'm reminded a little bit of bug 924415, and I'm wondering if we're hitting a similar bug here in our Talos infrastructure. I'm going to see what happens if we minimize the pageloader window to see if that affects the test at all.
Okay, I've pushed some patches to try which hide the pageloader window, both with and without vibrancy.

If these two pushes have similar tp5o_scroll performance (so, no regression detected), that supports the hypothesis that a visible pageloader window is affecting the test, in a similar way to how bug 924415 was affected.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b523d48f91e9&newProject=try&newRevision=dcafd9aa601f&framework=1&showOnlyImportant=0
Flags: needinfo?(mconley)
Hiding the pageloader window had no effect whatsoever, so that hypothesis was bunk.

I looked at the profiles with jrmuizel during the last episode of The Joy of Profiling[1], and our new hypothesis is that this test is sensitive enough to detect slightly slower composites due to us waiting on the OS X WindowServer (the WindowServer is the thing that's actually doing the work to make the vibrancy effect happen).

If that's the case, there's not a whole lot we can do to improve this.

If we're dead set on proving this case though, we need to use something like Instruments to profile the WindowServer on a machine that this regression is reproducible on. So far, I've had no luck reproducing it locally on my MacBook. I have a 10.6 Mac Mini on my desk that roughly resembles the hardware specs of the Talos testing machines[2]. I've ServiceNow'd a license for Mountain Lion, and will attempt to reproduce there.

[1]: https://air.mozilla.org/the-joy-of-profiling-episode-5
[2]: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation
Flags: needinfo?(mconley)
Blocks: 1377284
You need to log in before you can comment on or make changes to this bug.