Closed Bug 1465784 Opened Last year Closed Last year

2.1 - 136.89% displaylist_mutate / glterrain / sessionrestore_many_windows / tart / tp5o_scroll / tresize / tscrollx / tsvgx (linux64-qr, windows10-64-qr) regression on push ede2ec96fe1e (Wed May 30 2018)

Categories

(Core :: Graphics: WebRender, defect)

Other Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 --- unaffected
firefox62 --- fixed

People

(Reporter: igoldan, Assigned: kats)

References

(Blocks 4 open bugs)

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5df5e745ce6e0d76568c91e34ee57cf4594988e9&tochange=ede2ec96fe1ed2b8bf418027bde42bdd23c69505

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

137%  glterrain windows10-64-qr opt e10s stylo     0.91 -> 2.16
 46%  tresize windows10-64-qr opt e10s stylo       9.23 -> 13.43
 45%  displaylist_mutate linux64-qr opt e10s stylo 3,735.70 -> 5,413.45
 32%  tresize linux64-qr opt e10s stylo            13.48 -> 17.76
 15%  tsvgx linux64-qr opt e10s stylo              385.84 -> 442.26
  8%  tsvgx windows10-64-qr opt e10s stylo         448.66 -> 484.99
  4%  tp5o_scroll linux64-qr opt e10s stylo        0.52 -> 0.54
  4%  tart linux64-qr opt e10s stylo               1.61 -> 1.67
  3%  tart windows10-64-qr opt e10s stylo          1.53 -> 1.57
  3%  glterrain linux64-qr opt e10s stylo          6.03 -> 6.21
  3%  tscrollx linux64-qr opt e10s stylo           0.41 -> 0.42
  2%  sessionrestore_many_windows windows10-64-qr opt e10s stylo3,459.38 -> 3,532.12

Improvements:

 70%  ts_paint_heavy windows10-64-qr opt e10s stylo     1,167.58 -> 353.83
 70%  ts_paint windows10-64-qr opt e10s stylo           1,164.67 -> 353.42
 70%  ts_paint_webext windows10-64-qr opt e10s stylo    1,177.08 -> 358.00
 58%  tp5o responsiveness windows10-64-qr opt e10s stylo1.25 -> 0.52
 46%  tpaint windows10-64-qr opt e10s stylo             434.33 -> 236.55
 42%  tp5o_webext responsiveness windows10-64-qr opt e10s stylo1.82 -> 1.06


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13563

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: General → Graphics: WebRender
Product: Testing → Core
Flags: needinfo?(bugmail)
I'll look into this, thanks. Some regression is expected because of the higher latency to getting frames to the screen. I'll make sure there's nothing unexpected going on.
Assignee: nobody → bugmail
Flags: needinfo?(bugmail)
I can reproduce the displaylist_mutate perf regression on linux (first one I tried). I'll dig into it to see if there's anything we can improve here.
Unfortunately the profiles don't contain any of the interesting webrender threads. However I'm making some progress with my local setup (just loading the displaylist_mutate page standalone, without ASAP mode). It looks like whereas without async scene building, we were doing one rAF call per vsync, it now sometimes takes two vsyncs. It's not yet apparent to me why that's happening, so I'm still investigating.
At least part of the problem is that we're still doing an unnecessary ScheduleGenerateFrame() call at [1]. With async-scene-building enabled we shouldn't need this, because the post-swap hook will call ScheduleGenerateFrame instead, at [2].

Even with that fixed I'm seeing cases where a vsync triggers the rAF, which sends a display list to the compositor, and we start building that scene on the scene builder thread, but it doesn't complete before the next vsync. When this happens things get bunched up and we don't do one iteration per vsync which results in slowness.

However, this might get washed away in ASAP mode, so I'll try and see if removing the unnecessary ScheduleGenerateFrame() is sufficent to fix the problem.

[1] https://searchfox.org/mozilla-central/rev/3737701cfab93ccea04c0e9cab211ad10f931d87/gfx/layers/wr/WebRenderBridgeParent.cpp#680
[2] https://searchfox.org/mozilla-central/rev/3737701cfab93ccea04c0e9cab211ad10f931d87/gfx/webrender_bindings/src/bindings.rs#731
There's another change coming in bug 1466549 which will also help fix some of the perf regressions as it shuffles around when the rendering steps happen.
Depends on: 1466549
Comment on attachment 8983935 [details]
Bug 1465784 - Remove unnecessary render step with async-scene-building.

https://reviewboard.mozilla.org/r/249790/#review256034

Looks good!
Attachment #8983935 - Flags: review?(sotaro.ikeda.g) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/553ca5e06047
Remove unnecessary render step with async-scene-building. r=sotaro
https://hg.mozilla.org/mozilla-central/rev/553ca5e06047
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Let's see how things stand now on the tests from comment 0:

> Regressions:
> 
> 137%  glterrain windows10-64-qr opt e10s stylo     0.91 -> 2.16

This has improved a bit, now at around ~1.40. This is comparable to the windows10-64 (non-QR) numbers so I think is acceptable.

> 46%  tresize windows10-64-qr opt e10s stylo       9.23 -> 13.43

This is back down to the original levels, so regression fixed.

> 45%  displaylist_mutate linux64-qr opt e10s stylo 3,735.70 -> 5,413.45

This is now around 4800. So it improved a bit, but it's still worse than non-QR, so there's still more to do here.

> 32%  tresize linux64-qr opt e10s stylo            13.48 -> 17.76

This got better with bug 1466549 and then worse again with the patch on this bug. Currently around 17.50, worse than non-QR, so still more to do here.

> 15%  tsvgx linux64-qr opt e10s stylo              385.84 -> 442.26

This is now at around 349, so better than it was originally. i.e. regression fixed. It's still worse than non-QR but we're tracking that in bug 1416652.

>  8%  tsvgx windows10-64-qr opt e10s stylo         448.66 -> 484.99

This is around 427, also better than it was originally. regression fixed.

>  4%  tp5o_scroll linux64-qr opt e10s stylo        0.52 -> 0.54

It's not clear to me that this was an actual regression. It looks like the "regression" was just removing a perf improvement that arrived a few days ago (alert #13494). But there's been no change since the "regression". I'm going to ignore this one.

>  4%  tart linux64-qr opt e10s stylo               1.61 -> 1.67

This is back down to 1.62 or so, around where it was originally. regression fixed. Also this is doing much better than linux64 (non-QR) so we can accept any minor regressions here.

>  3%  tart windows10-64-qr opt e10s stylo          1.53 -> 1.57

This is back down to 1.53, or so, where it was originally. regression fixed. Again, doing much better than windows10-64 (non-QR).

>  3%  glterrain linux64-qr opt e10s stylo          6.03 -> 6.21

This one is still around 6.21, so the regression is still present. Needs more work.

>  3%  tscrollx linux64-qr opt e10s stylo           0.41 -> 0.42

This one is back down to the original levels, regression fixed. Also much better than linux64 (non-QR).

>  2%  sessionrestore_many_windows windows10-64-qr opt e10s stylo3,459.38 -> 3,532.12

This was a minor regression to begin with, but there was a big improvement on Jun 01. The current value is around 3000 which is still worse than non-QR. Let's track improvements to this with bug 1467190 which we have on file already.

> Improvements:
> 
> 70%  ts_paint_heavy windows10-64-qr opt e10s stylo     1,167.58 -> 353.83
> 70%  ts_paint windows10-64-qr opt e10s stylo           1,164.67 -> 353.42
> 70%  ts_paint_webext windows10-64-qr opt e10s stylo    1,177.08 -> 358.00
> 58%  tp5o responsiveness windows10-64-qr opt e10s stylo1.25 -> 0.52
> 46%  tpaint windows10-64-qr opt e10s stylo             434.33 -> 236.55
> 42%  tp5o_webext responsiveness windows10-64-qr opt e10s stylo1.82 -> 1.06

These improvements are all still there.

I'll clone this bug into a new bug to deal with the "still needs work" issues above.
Here are some of the perf results after the push from comment 10:

== Change summary for alert #13686 (as of Wed, 06 Jun 2018 22:08:56 GMT) ==

Improvements:

 16%  tsvgx linux64-qr opt e10s stylo     415.82 -> 350.35
 10%  displaylist_mutate linux64-qr opt e10s stylo5,361.90 -> 4,852.24
  8%  displaylist_mutate windows10-64-qr opt e10s stylo4,531.50 -> 4,184.74
  7%  tsvgx windows10-64-qr opt e10s stylo460.57 -> 427.18
  3%  tart windows10-64-qr opt e10s stylo 1.58 -> 1.54
  2%  tart linux64-qr opt e10s stylo      1.66 -> 1.62

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13686
You need to log in before you can comment on or make changes to this bug.