Closed Bug 1519064 Opened 5 years ago Closed 5 years ago

5.26 - 10.97% tp5o_scroll / tscrollx (linux64-qr, windows10-64-qr) regression on push 95324d36ded5b08991bd78245823931430066298 (Wed Jan 9 2019)

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
All
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- disabled
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fix-optional

People

(Reporter: igoldan, Unassigned)

References

Details

(5 keywords)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=20bb124acab30f5e69717c45dd15e116518de504&tochange=95324d36ded5b08991bd78245823931430066298

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

11% tscrollx windows10-64-qr opt e10s stylo 0.92 -> 1.02
8% tscrollx linux64-qr opt e10s stylo 1.81 -> 1.95
6% tp5o_scroll windows10-64-qr opt e10s stylo 1.98 -> 2.11
6% tp5o_scroll linux64-qr opt e10s stylo 2.80 -> 2.96
5% tp5o_scroll linux64-qr opt e10s stylo 2.82 -> 2.97

You can find links to graphs and comparison views for each of the above tests at: http://localhost:5000/perf.html#/alerts?id=18654

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Component: General → Graphics: WebRender
Product: Testing → Core
Flags: needinfo?(kats)
Flags: needinfo?(gwatson)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)

I'm afraid I cannot provide any kind of Gecko profiles. Both my attempts
have resulted in errors [1] [2].

anecdotally, if you enable picture caching, and try to take a profile with the pofiler, the browser hangs when you try to symbolicte/analyse the captured profile. Win10x64

Dropping needinfo on me, I think Glenn is going to look into this. However in the short term the plan is turn picture caching back off soon so that will also resolve this regression.

Flags: needinfo?(kats)

Yup, for now we intend to switch picture caching off until a couple of the known performance issues are resolved. Jeff, are you planning to make that change later today?

Flags: needinfo?(gwatson) → needinfo?(jmuizelaar)

Matt landed the backout on inbound last night, it should merge to m-c at some point today.

Flags: needinfo?(jmuizelaar)

(In reply to Kartikaya Gupta (away Jan 12-26; email:kats@mozilla.com) from comment #5)

Matt landed the backout on inbound last night, it should merge to m-c at some point today.

All regressions from comment 0 have been canceled. This is just a subset of the alerts:

== Change summary for alert #18711 (as of Fri, 11 Jan 2019 00:53:22 GMT) ==

Improvements:

9% tscrollx windows10-64-qr opt e10s stylo 1.01 -> 0.92
6% tp5o_scroll windows10-64-qr opt e10s stylo 2.12 -> 2.00
5% tp5o_scroll linux64-qr opt e10s stylo 2.98 -> 2.82

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18711

Priority: -- → P3

Seems like we have still have perf regressions, after relanding bug 1518405. tp5o_scroll got fixed, but the glterrain regression became evident. Looks like glterrain problems happened even on the initial landing; they weren't detected automatically because of the noise.

== Change summary for alert #18767 (as of Wed, 16 Jan 2019 20:12:59 GMT) ==

Regressions:

11% tscrollx windows10-64-qr opt e10s stylo 0.92 -> 1.02
5% tscrollx linux64-qr opt e10s stylo 1.81 -> 1.90
4% glterrain windows10-64-qr opt e10s stylo 1.02 -> 1.06

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18767

Flags: needinfo?(kats)
Flags: needinfo?(kats) → needinfo?(gwatson)

Yea, we landed with the knowledge that we'd have some regression here.

It's important to get these patches landed since we need to get them stable and they fix many other WR related issues we currently have.

Having said that, there are some easy optimizations we can make to this code which will hopefully remove these regressions. I will be working on these next week and will provide links here to the bugs as I work through the profiles.

Flags: needinfo?(gwatson)

It looks like the reland also caused this memory regression, according to AWSY. I'm not entirely sure about this, given the noisy results. Will do some extra retriggers so it's more obvious.

== Change summary for alert #18768 (as of Wed, 16 Jan 2019 19:12:29 GMT) ==

Regressions:

3% Heap Unclassified linux64-qr opt stylo 161,881,755.64 -> 166,022,512.29

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18768

:gw if you've filed some related bugs, please link them here.

Flags: needinfo?(gwatson)

Hi, I don't have anything specific to report for this - the push in question (enabling picture caching) resolves so many other performance bugs, that we'll just need to accept this as a regression on these tests, for now.

Once I finish up current list of correctness bugs, I do intend to return to general CPU profiling and optimization, so there will likely be improvements coming to this in the future.

Including Matt on this bug, just for completeness.

Flags: needinfo?(gwatson)

:gw any updates on this?Have you looked over the general CPU profiling and optimization?

Flags: needinfo?(gwatson)

I'm still (!) working on picture caching functionality. We're finally getting to the point now where we should soon start seeing some significant performance wins (at least on real sites, if not on our talos tests) from all the behind the scenes work.

In particular, https://phabricator.services.mozilla.com/D47395 and https://phabricator.services.mozilla.com/D46567 should be landing in the next week or two, which will hopefully be very noticeable wins.

Once I do complete the picture caching functionality for OS compositor integration, I do still intend to do a general profiling and optimization pass, but it's fairly low priority compared to the feature work for now. I think we could probably mark this as wont fix, or similar - what do you think?

Flags: needinfo?(gwatson)

Yeah. That's seems fine with me.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.