Closed Bug 1483610 Opened 6 years ago Closed 6 years ago

70.62 - 92.53% displaylist_mutate (linux64-qr, windows10-64-qr) regression on push 3a0e1fb203fad7e435ab65e1b78a5ceb5998bdc0 (Tue Aug 14 2018)

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- disabled
firefox64 --- fixed

People

(Reporter: jmaher, Assigned: jrmuizel)

References

(Depends on 1 open bug)

Details

(Keywords: perf, regression, talos-regression, Whiteboard: [gfx-noted])

Attachments

(3 files)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=3a0e1fb203fad7e435ab65e1b78a5ceb5998bdc0

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 93%  displaylist_mutate windows10-64-qr opt e10s stylo     4,151.73 -> 7,993.22
 71%  displaylist_mutate linux64-qr opt e10s stylo          4,849.90 -> 8,274.71

Improvements:

  6%  tp5o_scroll linux64-qr opt e10s stylo     0.52 -> 0.49


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=14984

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
:jrmuizel, I see you landed the code in bug 1481570, this has caused a regression for displaylist_mutate (it is also very noisy now instead of very stable), can you look at the regression to see if there is a fix we can do or help decide if we need to backout or accept this regression?
Component: General → Graphics: WebRender
Flags: needinfo?(jmuizelaar)
Product: Testing → Core
MotionMark score became very very bad on latest nightly. It might also be related to this bug.
I am going to check which change caused the regression.
I confirmed that Bug 1481570 also regressed HTML5 Fish Bowl on my P50(Win10). Before Bug 1481570, WR profiler shows 25-30 fps. But since Bug 1481570, WR profiler shows 7-15fps. It might be a simper use case for the regression.

  https://testdrive-archive.azurewebsites.net/performance/fishbowl/

I used the following command to confirm it.

>  mozregression --good 2018-08-12 --pref gfx.webrender.all:true gfx.webrender.debug.compact-profiler:true gfx.webrender.debug.profiler:true -a https://testdrive-archive.azurewebsites.net/performance/fishbowl/


mozregression showed following result.

----------------------------------------------

 8:17.28 INFO: No more inbound revisions, bisection finished.
 8:17.28 INFO: Last good revision: 08bf805f6f0ef61f68686ef1ca2cc6f750a2cfa0
 8:17.28 INFO: First bad revision: 7846bdd3762cf494ec24efc9cfc3472dc715ce4f
 8:17.28 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=08bf805f6f0ef61f68686ef1ca2cc6f750a2cfa0&tochange=7846bdd3762cf494ec24efc9cfc3472dc715ce4f
HTML5 Fish Bowl case, gpu usage that was shown Windows Task Manager seems to be increased from around 80% to around 95%.
Capture of WebRender profiler at current nightly
From attachment 9001871 [details], since Bug 1481570 fix, C_CLIP  task seems to be increased compared to attachment 9001870 [details].
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
> By using
> https://treeherder.mozilla.org/#/jobs?repo=try&author=kgupta@mozilla.com the
> regression seems to happen within
> https://github.com/servo/webrender/compare/e4750616750f20fcbb278df0e324ec26aebd0a3c...c68118517dfcb81090139ea9acaff4f6c8b26431

Within the above changes, changes by :gw seems like a culprit.
:gw, can you comment to the bug?
Flags: needinfo?(gwatson)
See Also: → 1484027
Blocks: 1474583
I suspected we may have a few regressions similar to this when that patch landed. I suspect what's happening is that there is a case no longer handled where we should be removing a redundant clip mask. I will investigate these test cases tomorrow, thanks for looking into it!
Flags: needinfo?(gwatson)
Assignee: nobody → gwatson
Whiteboard: [gfx-noted]
I have a WIP patch that resolves the performance regression (on fishbowl, at least).

Unfortunately it breaks a couple of reftests on try, so I need to investigate and fix those before the patch will be ready for review.
The patch linked to above fixes the regression on the fishbowl performance test. I expect it will fix the other tests too (and perhaps improve over the original performance), although I haven't confirmed that yet.
(In reply to Glenn Watson [:gw] from comment #14)
> The patch linked to above fixes the regression on the fishbowl performance
> test. I expect it will fix the other tests too (and perhaps improve over the
> original performance), although I haven't confirmed that yet.

I confirmed that the MotionMark regression was addressed by using the following. Bug 1481570 made the score to 7-10 on P50(Win10), the fix became the score to 100-130.

> mozregression --repo try --launch a9d93837c673748ce01d0f45ccfc150d5f4fb036 -B release --pref gfx.webrender.all:true -a https://browserbench.org/MotionMark/
Depends on: 1485791
Flags: needinfo?(jmuizelaar)
Priority: -- → P2
No longer blocks: 1474583
Sotaro, is this fixed based on your comment in #15? Or is there still work to be done here?
Flags: needinfo?(sotaro.ikeda.g)
The perf problems(heavy clipping task) of MotionMark and fishbowl was addressed. But it seems different problem to displaylist_mutate :(

The following shows that displaylist_mutate is bad.

https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=autoland,1663497,1,1&series=autoland,1683784,1,1&series=autoland,1663687,1,1&series=autoland,1683809,1,1
Flags: needinfo?(sotaro.ikeda.g)
Priority: P2 → P1
attachment 9005100 [details] could be used to check regression. With the following mozregression command, I confirmed that fps was regressed. [1] was 24-25fps and [2] was 39-40fps on my P50(Win10).

[1] latest webrender buld

mozregression --repo try --launch 796b97711e9db6ee09dfe9d67d4baaf7bd3277b4 -B release --pref gfx.webrender.all:true gfx.webrender.debug.compact-profiler:true gfx.webrender.debug.profiler:true

[2] build before regression

mozregression --repo try --launch e895882944543b1d12bcfb940552af76a1569ef9 -B release --pref gfx.webrender.all:true gfx.webrender.debug.compact-profiler:true gfx.webrender.debug.profiler:true
From webrender profiler, CPU(backend) seemed to become more busy. I checked it with the following command.

mozregression  --good 2018-08-10 --pref gfx.webrender.all:true gfx.webrender.debug.profiler:true gfx.webrender.debug.gpu-sample-queries:true gfx.webrender.debug.gpu-time-queries:true -a https://bug1483610.bmoattachments.org/attachment.cgi?id=9005100

Before the regression, mean time of CPU(backend) was 6-7ms. But since the regression, it became 8ms-10ms.
The latest profile for this looks a lot like bug 1487864. We're painting frames in pairs, and no threads are being maxed out.
Depends on: frame-scheduling
We need to fix this before WR goes to the field, but this shouldn't block WR riding to beta.
Priority: P1 → P2
Assignee: gwatson → nobody
Blocks: 1490788
Part of the regression might be addressed by https://github.com/servo/webrender/pull/3117.
Depends on: 1494042
after bug 1494042, it seems these scores are back to normal, :jrmuizel, do you want to resolve this?
Flags: needinfo?(jmuizelaar)
Sure.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(jmuizelaar)
Resolution: --- → FIXED
Assignee: nobody → jmuizelaar
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: