1469258 - 12.6 - 51.94% displaylist_mutate / glterrain (linux64-qr, windows10-64-qr) regression on push 7fbf8a9126e5 (Fri Jun 15 2018)

Reporter

Description

•

6 years ago

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c1d664fefdff80e066dde0140c22e8ff74fe8694&tochange=7fbf8a9126e54addb86d01ff6ec1e62e7fda18e5 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 52% glterrain windows10-64-qr opt e10s stylo 1.41 -> 2.14 19% displaylist_mutate windows10-64-qr opt e10s stylo4,115.04 -> 4,911.79 13% displaylist_mutate linux64-qr opt e10s stylo 4,832.74 -> 5,441.80 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13840 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Component: General → Graphics: WebRender

Product: Testing → Core

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

6 years ago

And here are the Gecko profiles: For Windows 10 Quantum Render: Before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FEHVHvaCgQoeQz1n-47LfOQ%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_displaylist_mutate.zip After: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FAZh-2Z3rQE-7pUKEkbVwZg%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_displaylist_mutate.zip

Ionuț Goldan [:igoldan]

Reporter

Comment 2

•

6 years ago

The perf regressions were caused by this landing: https://bugzilla.mozilla.org/show_bug.cgi?id=1467096#c13 :kats What are the plans for these? Can we fix them now?

Flags: needinfo?(bugmail)

Ionuț Goldan [:igoldan]

Reporter

Comment 3

•

6 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1) > And here are the Gecko profiles: > > For Windows 10 Quantum Render: > > Before: > https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster. > net%2Fv1%2Ftask%2FEHVHvaCgQoeQz1n- > 47LfOQ%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_displaylist_muta > te.zip > > After: > https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster. > net%2Fv1%2Ftask%2FAZh-2Z3rQE- > 7pUKEkbVwZg%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_displaylist > _mutate.zip These are for the displaylist_mutate perf test.

Ionuț Goldan [:igoldan]

Reporter

Comment 4

•

6 years ago

Also, these are the Gecko profiles for displaylist_mutate, from Linux 64 Quantum Render: Before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FINfV0yriQLuse_7dj01QfA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_displaylist_mutate.zip After: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FQ7pUiFe7QsOlHOUn_y1y2g%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_displaylist_mutate.zip

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 5

•

6 years ago

The regression was expected (see bug 1467096 comment 10) and the webrender issue at servo/webrender#2807 will help with this. I plan to work on that this week.

Blocks: 1426770, 1416645

Flags: needinfo?(bugmail)

Rares Doghi, Desktop QA

Updated

•

6 years ago

status-firefox60: --- → unaffected

status-firefox61: --- → unaffected

status-firefox62: --- → affected

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 6

•

6 years ago

Quick update: I wrote a patch for servo/webrender#2807 and it helped with displaylist_mutate but regressed tresize. I'm going to investigate that a bit more.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 7

•

6 years ago

I looked into the tresize regression and found a the root of the problem. tresize doesn't run in ASAP mode, so the thing it's measuring (average time between window.resize call to the following MozAfterPaint) is affected in large part by the amount of work that happens between the vsync that triggers composition, and the end of the composition. If we spend extra time doing stuff *before* that vsync it doesn't really factor into the end result, except if it causes a whole frame delay. So, currently the sequence of events is approximately this: (a) window.resize (b) DL build (c) WR scene build (d) schedule composite (e) WR render (for rebuilding hit-testing info) (f) (wait for vsync) (g) WR render (triggered via GenerateFrame transaction in WRBP::CompositeToTarget) (h) composite and fire MozAfterPaint With my patches, step (e) changes so that instead of doing a full render we just rebuild the hit-testing info. I verified that this does take an order of magnitude less time at step (e). However, it also *increases* the time spent in step (g), presumably because that render has to do more work and can't just reuse the results from (e). So while the amount of work done *before* the vsync is reduced, that doesn't affect the end result, but the small increase in work done *after* the vsync has a disproportionate impact on the test result. I verified that turning on ASAP mode for this test locally removes the regression, but I'll do a try push to confirm that since I don't know how far I can trust my local results.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 8

•

6 years ago

On tryserver there's still a small regression with the test in ASAP mode when comparing with my patches [1] and without [2]. Not sure why yet. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=52fc13269c5d050d325526f649590d063957eaaf [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=57055a302cb756ff5935ae80b7a6d46a2c5ede72

Aria Beingessner [:Gankra]

Comment 9

•

6 years ago

assuming this is high priority since kats is working on it

Assignee: nobody → bugmail

Blocks: stage-wr-nightly

Priority: -- → P1

Whiteboard: [gfx-noted]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

6 years ago

Depends on: 1471962

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 10

•

6 years ago

It's not that high.

Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly

Priority: P1 → P2

Mike Taylor [:miketaylr]

Comment 11

•

6 years ago

(flipping bits, on the assumption this WR nightly-only for the time being)

status-firefox62: affected → disabled

status-firefox63: --- → affected

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 12

•

6 years ago

I rebased my patches for servo/webrender#2807 and kicked off another try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca9520ec0b1ea14f97f7260d58ff5ccfd7860200. I'll see if there's still talos regressions from this.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 13

•

6 years ago

It still hurts tresize on windows.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

6 years ago

Depends on: 1479075

Maire Reavy [:mreavy]

Comment 14

•

6 years ago

Kat is on leave for 3 more weeks. Un-assigning so we know it needs a new owner.

Assignee: kats → nobody

Maire Reavy [:mreavy]

Updated

•

6 years ago

Priority: P2 → P3

Maire Reavy [:mreavy]

Updated

•

6 years ago

Priority: P3 → P2

Jeff Muizelaar [:jrmuizel]

Comment 15

•

6 years ago

Kats, is there anything we want to still do with this bug?

Flags: needinfo?(kats)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 16

•

6 years ago

Not raelly, I think we can close it. We have existing bugs tracking the talos regressions compared to non-WR. And the patches I was working on earlier were obsoleted by stuff nical landed which accomplished the same thing, I believe.

Status: NEW → RESOLVED

Closed: 6 years ago

Flags: needinfo?(kats)

Resolution: --- → WONTFIX

Bugzilla

12.6 - 51.94% displaylist_mutate / glterrain (linux64-qr, windows10-64-qr) regression on push 7fbf8a9126e5 (Fri Jun 15 2018)

Categories

(Core :: Graphics: WebRender, defect, P2)

Tracking

()

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression, Whiteboard: [gfx-noted])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14

Updated

Updated

Comment 15

Comment 16