Closed Bug 1500101 Opened 6 years ago Closed 6 years ago

2.65 - 5.88% sessionrestore / sessionrestore_no_auto_restore / tart / tp5o_webext responsiveness / ts_paint / ts_paint_webext (linux64-qr, windows10-64-qr) regression on push a1f350ca24173934b0633bfff039d672d0187bbb (Wed Oct 17 2018)

Categories

(Core :: Graphics: WebRender, defect, P2)

64 Branch
x86_64
All
defect

Tracking

()

VERIFIED FIXED
mozilla65
Tracking Status
firefox-esr60 --- unaffected
firefox63 --- unaffected
firefox64 --- wontfix
firefox65 + fixed

People

(Reporter: igoldan, Assigned: gw)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=a1f350ca24173934b0633bfff039d672d0187bbb

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  6%  tart windows10-64-qr opt e10s stylo                          3.85 -> 4.07
  6%  tp5o_webext responsiveness linux64-qr opt e10s stylo         1.76 -> 1.85
  5%  sessionrestore linux64-qr opt e10s stylo                     841.50 -> 881.00
  5%  sessionrestore_no_auto_restore linux64-qr opt e10s stylo     860.17 -> 899.00
  4%  ts_paint linux64-qr opt e10s stylo                           857.25 -> 895.17
  3%  ts_paint_webext linux64-qr opt e10s stylo                    867.92 -> 890.92


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=16918

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: General → Graphics: WebRender
Product: Testing → Core
Flags: needinfo?(jmuizelaar)
I took a look at some of the profiles.

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> Here are the Gecko profiles on Linux 64bit QR:
> 
> 
> for tp5o_webext:

I didn't take a look at these profiles because the profiles are divided by page, but the talos regression report does not have a breakdown over pages to actually say which got slower.

> for ts_paint_webext:

before: https://perfht.ml/2RYomuj
after: https://perfht.ml/2RYosSH

_init in i965_dri.so has an increase in self time from 230ms to 265ms. The FlushRendering call on the main thread blocks for 40ms longer than before.

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #3)
> Also, these are the Gecko profiles for tart, on Windows 10 QR:

before: https://perfht.ml/2RXhmOh
after: https://perfht.ml/2RWMgWV

Composite times increase on average from 1.3ms to 1.5ms, roughly.
Here's the range of webrender commits:

https://github.com/servo/webrender/compare/98d507003c07c003ef0e0297dc4d29ee896a5868...a0a36d9b416ca3295f8def384814ffef60903a60

I guess we need to bisect the regression across these changes.
Flags: needinfo?(jmuizelaar)
Kats, do you want to do the bisection?
Flags: needinfo?(kats)
It's at least plausible that it could be related to the shared depth targets patch [1] - either because creating the extra set of FBOs is more expensive than we thought, or because the sharing causes contention somehow. But it'd be somewhat surprising.

[1] https://github.com/servo/webrender/commit/f2c94a6ff1a6e62421992e84bf8e64391631ddf8
Yup I can do the bisection.
Flags: needinfo?(kats)
  6%  tp5o_webext responsiveness linux64-qr opt e10s stylo         1.76 -> 1.85
  5%  sessionrestore linux64-qr opt e10s stylo                     841.50 -> 881.00
  5%  sessionrestore_no_auto_restore linux64-qr opt e10s stylo     860.17 -> 899.00

These at least seem to be coming from servo/webrender#3208.

Still waiting on the windows results.
  6%  tart windows10-64-qr opt e10s stylo                          3.85 -> 4.07

Also from servo/webrender#3208. Glenn, any thoughts?
Flags: needinfo?(gwatson)
Yep, that patch is likely to make sites with a lot of borders a bit slower, for now. It can result in more work being done per border - this is unlikely to be a problem on real world sites, but I can imagine it showing up if those tests have a lot of borders. Is that the case here?

I expect this regression to be removed once we take advantage of interning border primitives. I'm hoping to have that done in the next week or so, but the change referenced above is a blocker to making that happen, which I wanted to land incrementally.

Are we happy to live with that regression for a couple weeks and revisit once the border interning lands?
Flags: needinfo?(gwatson)
(In reply to Glenn Watson [:gw] from comment #12)
> Are we happy to live with that regression for a couple weeks and revisit
> once the border interning lands?

I am, assuming you're willing to take an action item to verify that we get a corresponding decrease in this metric once your followup bits land.
Assignee: nobody → gwatson
Both the initialization time regression and the compositing time regression affect a browser window that is mostly empty, so it's not a "site with a lot of borders" - it's the browser UI that's affected.

And the additional 30ms in browser initialization seem unexpected to me.
Are we certain it is https://github.com/servo/webrender/issues/3208 ?

I misread the original patch and thought it was the patch that introduced multiple render tasks per border segment. In fact, it's a patch that should strictly be an optimization (adding more border pixels to the opaque pass). Mea culpa.

So, on closer inspection, that patch seems unlikely to cause this regression.
Flags: needinfo?(kats)
The try pushes in comment 9 are based on the same inbound revision that bug 1499494 landed on (i.e. inbound revision 16ee6006e57c). And then on that base I applied the WR update to each merged PR and pushed to try. The talos numbers can be seen in the try pushes, it seems clear that all the numbers increase on the last (topmost) try push, which corresponds to PR 3208. I'm fairly certain I did it correctly, but please feel free to double-check.
Flags: needinfo?(kats)
Yep, your analysis seems correct to me. I'll do some investigation today to see if I can find out what's going on.
Assuming that the patch itself isn't completely broken (I just double checked myself, but perhaps someone could do a sanity check to make sure I didn't break all opacity calculations!), I wonder if it might be a driver thing where the border shader gets rebuilt / re-linked as it's probably now used both with blend enabled in transparent pass, and blend disabled in the opaque pass.

Matt, Dan, is there any easy way to check if this regression is related to driver time?
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(dglastonbury)
(In reply to Glenn Watson [:gw] from comment #18)
> Assuming that the patch itself isn't completely broken (I just double
> checked myself, but perhaps someone could do a sanity check to make sure I
> didn't break all opacity calculations!), I wonder if it might be a driver
> thing where the border shader gets rebuilt / re-linked as it's probably now
> used both with blend enabled in transparent pass, and blend disabled in the
> opaque pass.
> 
> Matt, Dan, is there any easy way to check if this regression is related to
> driver time?

The callstacks in the linux profiles are a bit broken, but it looks like first composite time went up by ~60ms. That's mostly time in _init in i965_dri.so. Hard to know exactly what that time is doing though.

The TART profile above just seem to have perf-html for me. They do affect Windows though, and TART explicitly discards results from the first few frames, and only measures frame times once the animation is running.

That suggests that there's an actual throughput regression, as well as the startup one.
Flags: needinfo?(matt.woodrow)
I'll run the test on my Windows box to see what I can see.
Flags: needinfo?(dglastonbury)
I did some experiments / sanity checks on talos:

(1) WR git:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5be6e54ed2c6fadc9fcdcfec07a85a6a1a6c6e66
 -> slow (as expected)

(2) WR git + revert the opaque border patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=11dab5a3e0b5745fb6b661f73ae33f026ce2fd65
 -> fast (as expected)

(3) WR git + disable one line in the patch that allows opaque border segments:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d2a5c28d051f6d96f92b12da5a50a7ac857bd61
 -> slow

---

This suggests that there is nothing wrong with the patch per se (or at least, it's doing what's intended), and that the regression is occurring due to the change in drawing some border segments as opaque and some as transparent.

I would not be too surprised if drivers end up having to do something internally to handle the shader being used both with/without blending, but it's surprising to me that this seems to actually affect compositor time. It can introduce a small number of extra batches, but I'd expect this to be small, and outweighed by the GPU time gains from drawing borders as opaque. Certainly, in the synthetic tests cases I came up with, this is indeed a GPU time win.

I don't know much about talos - is it possible to run some subset of those tests locally, so I can try to get some GPU profile numbers in each case?
Errr, of course I meant *fast* for (3)
It seems that this AWSY regression is related to this bug. However, this one got fixed after one day.

== Change summary for alert #16917 (as of Wed, 17 Oct 2018 13:53:18 GMT) ==

Regressions:

  3%  Heap Unclassified linux64-qr opt stylo     275,262,815.12 -> 284,412,126.63

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16917
Yes, you should be able to run all these tests locally. Dan can probably help, ping me if y'all get stuck.
Blocks: 1500846
No longer blocks: 1488324
Priority: -- → P2
(In reply to Glenn Watson [:gw] from comment #21)
> I don't know much about talos - is it possible to run some subset of those
> tests locally, so I can try to get some GPU profile numbers in each case?

Yes, it is possible. For either of these tests:

tp5o_webext
sessionrestore
sessionrestore_no_auto_restore
ts_paint
ts_paint_webext

run the command bellow:

| ./mach talos-tests --activeTests <test_name> |
You can do the same for tart, by using | ./mach talos-tests --activeTests tart |.
:gw any updates here?
Flags: needinfo?(gwatson)
(Glenn is swamped, might make sense to hand off to Dan. Glenn, feel free to redirect the NI if so.)
Dan, have you got cycles to look at this? Feel free to ping me for further information / context.
Flags: needinfo?(gwatson) → needinfo?(dglastonbury)
I'll take a look.
Flags: needinfo?(dglastonbury)
I did some investigation and with opaque/alpha split enabled one extra shader is being created at start up. This might explain the regression in tp_paint_webext.
In discussion on irc, :gw said he would revert this change.
Flags: needinfo?(gwatson)
Flags: needinfo?(gwatson)
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8c20f4551080
Update webrender to commit 347e66c2aa117724ac6b0f391b346f9c6898ad11 (WR PR 3259). r=kats
https://hg.mozilla.org/mozilla-central/rev/8c20f4551080
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: