Closed Bug 1465098 Opened 6 years ago Closed 6 years ago

2.72 - 4.56% Explicit Memory / Heap Unclassified (linux64-qr) regression on push 6e93883796aaffd17c2e0b387bd758749640ce66 (Mon May 28 2018)

Categories

(Core :: Graphics: WebRender, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 --- unaffected
firefox62 --- fixed

People

(Reporter: jmaher, Assigned: gw)

References

Details

(Keywords: perf, regression)

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=6e93883796aaffd17c2e0b387bd758749640ce66

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  5%  Heap Unclassified linux64-qr opt stylo     307,329,253.07 -> 321,352,310.02
  3%  Explicit Memory linux64-qr opt stylo       550,564,336.02 -> 565,528,766.34


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13485

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests
:kats, can you take a look at this regression and see if this is expected, can be fixed/reduced, or should be backed out?
Flags: needinfo?(bugmail)
From the try pushes in bug 1463416, here are the values for the numbers from the awsy jobs at various WR csets:

WR CSET                                     Heap Unclassified   Explicit
bb354abbf84602d3d8357c63c4f0b1139ec4deb1    311124635.15        553599985.83
                                            308868033.8         551666750.92
                                            311434851.21        552805567.45
                                            307824704.77        561039556.43
                                            310384111.82        552871241.51
5084aae77720ce6f3aedb43a31a054b729fdb398    306429669.98        547224445
523575ecb10126070903abb8215cafcaa9ec05a6    339520822.87        583581575.72
4b9263088971b43e386b0ce9db92e9edf0abff99    319825338.34        573531277.33
63c71ca9bbe4dec0ebc9c9bc8ab65b06a6b40641    322387523.97        566265388.93
                                            320369099.98        574808078.41
                                            317354985.55        562828236.17
                                            320784845.4         566687992.2
                                            328797452.49        572450226.34

From eyeballing this it looks like the regression was introduced between 5084aae77720ce6f3aedb43a31a054b729fdb398 and 523575ecb10126070903abb8215cafcaa9ec05a6, which is this range:

* 523575ec Auto merge of #2767 - gw3583:border-as-brush, r=kvark
* 9e1d61d2 Draw corners with a single pass + color mix technique.
* ac66a43f Address review comments
* 88326439 Add initial port of borders to brush primitives, with task caching.

i.e. servo/webrender#2767

Glenn, do you know if an increase in memory usage from that PR is expected? Looking at the subtests on the perf.html link in comment 0, the largest increase seems to be in the "Fresh Start" and "Fresh Start +30s" subtests, which mean that the memory is used on startup before the test even loads pages. If we can find a way to reduce this it would be nice.
Flags: needinfo?(bugmail) → needinfo?(gwatson)
Component: General → Graphics: WebRender
Product: Testing → Core
It's not at all clear to me how this change could cause this kind of regression. The only thing I can think of is that this change introduces a new shader, and the GPU driver may be allocating a lot of memory to compile it? If that was the case (it seems unlikely), it should be mitigated in the next week or so when I remove the old shader that is being obsoleted by this one.
Flags: needinfo?(gwatson)
Bug 1463416 brought tons of big perf wins according to Talos, but also a damp regression. Out of all the damp subtests, damp cold.inspector.open.settle.DAMP reports a 233% increase.

== Change summary for alert #13494 (as of Mon, 28 May 2018 09:33:40 GMT) ==

Regressions:

  7%  damp windows10-64-qr opt e10s stylo     84.26 -> 90.41

Improvements:

 38%  tp5o responsiveness windows10-64-qr opt e10s stylo     2.00 -> 1.25
 31%  tp5o_webext responsiveness windows10-64-qr opt e10s stylo2.59 -> 1.80
 25%  sessionrestore windows10-64-qr opt e10s stylo          1,441.33 -> 1,077.58
 24%  sessionrestore_no_auto_restore windows10-64-qr opt e10s stylo1,517.00 -> 1,150.50
 24%  ts_paint_webext windows10-64-qr opt e10s stylo         1,560.67 -> 1,185.58
 24%  ts_paint windows10-64-qr opt e10s stylo                1,534.58 -> 1,168.17
 24%  ts_paint_heavy windows10-64-qr opt e10s stylo          1,531.08 -> 1,170.08
 17%  tp5o responsiveness linux64-qr opt e10s stylo          1.43 -> 1.19
 13%  tpaint windows10-64-qr opt e10s stylo                  500.23 -> 435.39
 12%  sessionrestore_many_windows windows10-64-qr opt e10s stylo3,922.17 -> 3,442.33
 11%  sessionrestore_many_windows linux64-qr opt e10s stylo  1,663.75 -> 1,484.92
 11%  tp5o_webext responsiveness linux64-qr opt e10s stylo   2.33 -> 2.08
  5%  tresize linux64-qr opt e10s stylo                      14.23 -> 13.48
  4%  tp5o_scroll linux64-qr opt e10s stylo                  0.54 -> 0.52

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13494
Interesting. I spot-checked a few of these and again from the try pushes it looks like these are from servo/webrender#2767, the same change that brought the memory usage regression.

Also I'm not convinced the damp regression is "real" since based on the recent m-c graph [1] it seems to fluctuate quite a lot. Even if it is a real regression, it's probably worth accepting that to get all the other improvements.

Regarding the memory regression this bug was originally filed for, let's wait and see if Glenn's upcoming changes to remove the other shader helps with that. If not we can try to investigate why the memory usage went up. I'm going to assign this to Glenn for now - Glenn, please let me know which PR (when you have it) removes the old shader, and I can look at the numbers to see if the memory usage goes back down.

[1] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1685306,1,1&selected=mozilla-central,1685306,339881,474137961
Assignee: nobody → gwatson
This PR (https://github.com/servo/webrender/pull/2795) removes the three old shaders that are superseded by the shader referenced above.
Ionut, has this problem gone away from the changes mentioned in comment 6 ?
Flags: needinfo?(igoldan)
The PR mentioned in comment 6 is landing in m-c as part of bug 1466549.
Depends on: 1466549
The graph does show memory usage going back down today after the WR update landed: https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1660770,1,4&series=autoland,1660780,1,4

Should be good to close this bug now. Feel free to do it, or I'll do it tomorrow if there's no objections.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #7)
> Ionut, has this problem gone away from the changes mentioned in comment 6 ?

The memory regressions from comment 0 have totally gone away. The damp has improved only half way.
Flags: needinfo?(igoldan)
So I guess we can close this bug, given the big wins we got.
Yes. settle DAMP tests can be ignored. It most likely means that the test isn't waiting correctly for a layout/paint/event correctly.
settle are run after each test and wait for any pending paint (mozAfterPaint) event, so that we see if we suddently miss some after a patch. Unfortunately, this practice is easily subject to intermittent and can be changes by subtle platform tweaks.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> The PR mentioned in comment 6 is landing in m-c as part of bug 1466549.

I confirm the AWSY fixes with these results:

== Change summary for alert #13667 (as of Wed, 06 Jun 2018 11:19:59 GMT) ==

Improvements:

  5%  Heap Unclassified linux64-qr opt stylo     321,533,337.82 -> 306,795,557.89
  3%  Explicit Memory linux64-qr opt stylo       565,867,776.26 -> 551,624,772.93

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13667
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.