Closed Bug 1424325 Opened 7 years ago Closed 6 years ago

2.01 - 38.83% Explicit Memory / Heap Unclassified (linux32-stylo-disabled, osx-10-10) regression on push ebf4af04d10cc42f613015aafc225002c8c273c1 (Thu Dec 7 2017)

Categories

(Core :: Graphics: Layers, defect)

53 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- unaffected
firefox58 --- unaffected
firefox59 --- fixed

People

(Reporter: jmaher, Assigned: rhunt)

References

Details

(Keywords: perf, regression, Whiteboard: [stylo-])

Attachments

(1 file)

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=ebf4af04d10cc42f613015aafc225002c8c273c1

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 39%  Heap Unclassified osx-10-10 opt      72,216,685.28 -> 100,261,791.96
 11%  Explicit Memory osx-10-10 opt        331,290,762.10 -> 368,764,252.35
  2%  Explicit Memory linux32-stylo-disabled opt stylo-disabled243,384,238.46 -> 248,264,147.76


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10945

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests
:rhunt, I see you authored the patches here which caused these memory regressions, can you take a look at this and help determine if there is a fix we can do?
Component: Untriaged → Graphics: Layers
Flags: needinfo?(rhunt)
Product: Firefox → Core
and here are talos regressions:
== Change summary for alert #10939 (as of Thu, 07 Dec 2017 23:25:40 GMT) ==

Regressions:

 63%  tsvgr_opacity windows7-32 opt e10s     277.32 -> 451.00
 59%  tsvgr_opacity windows7-32 pgo e10s     227.42 -> 361.61
 54%  rasterflood_svg windows7-32 pgo e10s   15,620.78 -> 24,066.68
 53%  rasterflood_svg windows7-32 opt e10s   16,500.47 -> 25,318.29
 30%  tsvgr_opacity windows10-64 opt e10s    233.25 -> 303.47
 28%  tsvgr_opacity windows10-64 pgo e10s    220.56 -> 281.68
 20%  rasterflood_svg windows10-64 pgo e10s  13,936.10 -> 16,726.39
 17%  tsvgx windows10-64 opt e10s            236.91 -> 277.85
 17%  rasterflood_svg windows10-64 opt e10s  14,688.57 -> 17,166.94
 15%  tsvgx windows10-64 pgo e10s            220.00 -> 254.05
 13%  tsvgx windows7-32 opt e10s             477.20 -> 538.91
 11%  displaylist_mutate linux64 opt e10s    5,289.90 -> 5,849.51
 10%  tsvgx windows7-32 pgo e10s             408.87 -> 451.64
 10%  rasterflood_gradient windows7-32 opt e10s143.08 -> 128.50
  9%  rasterflood_gradient windows7-32 pgo e10s141.33 -> 128.33

Improvements:

 34%  rasterflood_svg osx-10-10 opt e10s     19,522.52 -> 12,902.20
  6%  rasterflood_gradient osx-10-10 opt e10s175.00 -> 185.92

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10939

Please ensure that we look at both the memory and performance issues!
This regression is across the board bad enough that we should back it out until a proper investigation can be done.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #0)
> We have detected an awsy regression from push:
> 
> https://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?changeset=ebf4af04d10cc42f613015aafc225002c8c273c1
> 
> As author of one of the patches included in that push, we need your help to
> address this regression.
> 
> Regressions:
> 
>  39%  Heap Unclassified osx-10-10 opt      72,216,685.28 -> 100,261,791.96
>  11%  Explicit Memory osx-10-10 opt        331,290,762.10 -> 368,764,252.35
>   2%  Explicit Memory linux32-stylo-disabled opt
> stylo-disabled243,384,238.46 -> 248,264,147.76
> 
> 
> You can find links to graphs and comparison views for each of the above
> tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10945
> 
> On the page above you can see an alert for each affected platform as well as
> a link to a graph showing the history of scores for this test. There is also
> a link to a treeherder page showing the jobs in a pushlog format.
> 
> To learn more about the regressing test(s), please see:
> https://wiki.mozilla.org/AWSY/Tests

This memory issue is most likely related to bug 1424172, which is on inbound right now.

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #2)
> and here are talos regressions:
> == Change summary for alert #10939 (as of Thu, 07 Dec 2017 23:25:40 GMT) ==
> 
> Regressions:
> 
>  63%  tsvgr_opacity windows7-32 opt e10s     277.32 -> 451.00
>  59%  tsvgr_opacity windows7-32 pgo e10s     227.42 -> 361.61
>  54%  rasterflood_svg windows7-32 pgo e10s   15,620.78 -> 24,066.68
>  53%  rasterflood_svg windows7-32 opt e10s   16,500.47 -> 25,318.29
>  30%  tsvgr_opacity windows10-64 opt e10s    233.25 -> 303.47
>  28%  tsvgr_opacity windows10-64 pgo e10s    220.56 -> 281.68
>  20%  rasterflood_svg windows10-64 pgo e10s  13,936.10 -> 16,726.39
>  17%  tsvgx windows10-64 opt e10s            236.91 -> 277.85
>  17%  rasterflood_svg windows10-64 opt e10s  14,688.57 -> 17,166.94
>  15%  tsvgx windows10-64 pgo e10s            220.00 -> 254.05
>  13%  tsvgx windows7-32 opt e10s             477.20 -> 538.91
>  11%  displaylist_mutate linux64 opt e10s    5,289.90 -> 5,849.51
>  10%  tsvgx windows7-32 pgo e10s             408.87 -> 451.64
>  10%  rasterflood_gradient windows7-32 opt e10s143.08 -> 128.50
>   9%  rasterflood_gradient windows7-32 pgo e10s141.33 -> 128.33
> 
> Improvements:
> 
>  34%  rasterflood_svg osx-10-10 opt e10s     19,522.52 -> 12,902.20
>   6%  rasterflood_gradient osx-10-10 opt e10s175.00 -> 185.92
> 
> For up to date results, see:
> https://treeherder.mozilla.org/perf.html#/alerts?id=10939
> 
> Please ensure that we look at both the memory and performance issues!

The rasterflood improvements on OSX are expected, but the windows regressions are not..

Most of the patches change how tiling work, which is only enabled on OSX. There is a patch that does a refactoring, but I can't see yet how that could've caused this. I'll keep investigating.
Flags: needinfo?(rhunt)
Okay, the regression is from OMTP being disabled on windows. Will have a fix in a moment.
Assignee: nobody → rhunt
Comment on attachment 8935877 [details]
Disable OMTP only if edge padding is enabled and tiling is also enabled (bug 1424325, )

https://reviewboard.mozilla.org/r/206738/#review212560
Attachment #8935877 - Flags: review?(bas) → review+
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ced781e57166
Disable OMTP only if edge padding is enabled and tiling is also enabled (bug 1424325, r=bas)
https://hg.mozilla.org/mozilla-central/rev/ced781e57166
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
and this is fixed:
== Change summary for alert #10970 (as of Sat, 09 Dec 2017 03:11:55 GMT) ==

Improvements:

 40%  tsvgr_opacity windows7-32 opt e10s     449.19 -> 269.02
 37%  tsvgr_opacity windows7-32 pgo e10s     366.00 -> 231.68
 35%  rasterflood_svg windows7-32 opt e10s   25,403.03 -> 16,507.73
 35%  rasterflood_svg windows7-32 pgo e10s   23,995.32 -> 15,595.81
 31%  rasterflood_svg windows10-64 pgo e10s  16,624.48 -> 11,476.36
 31%  rasterflood_svg windows10-64 opt e10s  17,015.42 -> 11,769.51
 26%  tsvgr_opacity windows10-64 opt e10s    301.08 -> 223.08
 22%  tsvgr_opacity windows10-64 pgo e10s    281.20 -> 219.18
 15%  tsvgx windows10-64 opt e10s            276.91 -> 236.62
 13%  tsvgx windows10-64 pgo e10s            252.89 -> 220.52
 11%  tsvgx windows7-32 opt e10s             537.98 -> 476.98
 11%  rasterflood_gradient windows7-32 opt e10s129.21 -> 143.00
 10%  rasterflood_gradient windows7-32 pgo e10s128.53 -> 141.58
  9%  tsvgx windows7-32 pgo e10s             449.80 -> 409.55

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10970
:rhunt All alerts mentioned here have been fixed. Still, I noticed an 8% Resident Memory regression on OS X.
At the moment, I'm into linking it to bug 1422392. I'm not yet sure about it, so I'll do further investigation. Tomorrow I'll come back with the bisection results.
Flags: needinfo?(rhunt)
This is the regression:

  8%  Resident Memory osx-10-10 opt        648,598,603.83 -> 698,330,293.13

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10945
I did a comparison on Try with the backout [1]. It looks like this bug is indeed responsible for it.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=114cc5f2489b&newProject=try&newRevision=000de5d27f8f&framework=4&filter=resident
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #11)
> :rhunt All alerts mentioned here have been fixed. Still, I noticed an 8%
> Resident Memory regression on OS X.
Oh, forgot to show the updated AWSY results:

== Change summary for alert #10965 (as of Fri, 08 Dec 2017 14:33:46 GMT) ==

Improvements:

 27%  Heap Unclassified osx-10-10 opt      100,293,998.22 -> 73,185,628.78
  8%  Explicit Memory osx-10-10 opt        369,035,382.52 -> 339,693,220.90

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10965
I'm actively investigating this. Currently I'm seeing a small difference between gfx-textures, and a larger difference in gfx-textures-peak. I'm guessing that is responsible for the difference in resident memory.

I'm still not sure why we are using more graphics memory at the peak with OMTP enabled, that's not intended.
Tagging this bug [stylo-] to get it out of Stylo bug triage queries (because the bug summary mentions "stylo").
Whiteboard: [stylo-]
After investigating this further, this appears to me to be timing related and not do to a leak. I did a try run with a debug feature enabled (SYNC_OMTP) which forces the OMTP code path to be used but synchronously and the regression seemed to basically go away [1].

My best hypothesis is that the extra memory usage is due to the fact that source surfaces live through a duration of a paint, and now that paints are async, they can live longer and overlapped with other allocations. There isn't a way around that as that is the feature of OMTP.

Given that this is just a resident memory regression and seems to me to be relatively minor, I would recommend accepting this regression.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0dfc6edc1974&newProject=try&newRevision=c210f24bc08417dc7431e9b37fa494e17da8e6c6&framework=4
Flags: needinfo?(rhunt)
(In reply to Ryan Hunt [:rhunt] from comment #17)
> After investigating this further, this appears to me to be timing related
> and not do to a leak. I did a try run with a debug feature enabled
> (SYNC_OMTP) which forces the OMTP code path to be used but synchronously and
> the regression seemed to basically go away [1].
> 
> My best hypothesis is that the extra memory usage is due to the fact that
> source surfaces live through a duration of a paint, and now that paints are
> async, they can live longer and overlapped with other allocations. There
> isn't a way around that as that is the feature of OMTP.
> 
> Given that this is just a resident memory regression and seems to me to be
> relatively minor, I would recommend accepting this regression.

The original regression looks to be ~100 - 200 MiB. This is not minor. Please note that resident memory is our main way of comparing our memory usage against other browsers.

> [1]
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=0dfc6edc1974&newProject=try&newR
> evision=c210f24bc08417dc7431e9b37fa494e17da8e6c6&framework=4

Ryan can you do AWSY multiple AWSY try runs with and without this feature enabled? I'd like to know what that actual regression is at this point. Or is that essentially what [1] was? That looks more like a 50 - 100 MiB regression (or improvement).

I'm not saying we definitely shouldn't take a regression on OSX, I just want to get clearer picture.
Flags: needinfo?(rhunt)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #18)
> (In reply to Ryan Hunt [:rhunt] from comment #17)
> > After investigating this further, this appears to me to be timing related
> > and not do to a leak. I did a try run with a debug feature enabled
> > (SYNC_OMTP) which forces the OMTP code path to be used but synchronously and
> > the regression seemed to basically go away [1].
> > 
> > My best hypothesis is that the extra memory usage is due to the fact that
> > source surfaces live through a duration of a paint, and now that paints are
> > async, they can live longer and overlapped with other allocations. There
> > isn't a way around that as that is the feature of OMTP.
> > 
> > Given that this is just a resident memory regression and seems to me to be
> > relatively minor, I would recommend accepting this regression.
> 
> The original regression looks to be ~100 - 200 MiB. This is not minor.
> Please note that resident memory is our main way of comparing our memory
> usage against other browsers.

The original (~100-200MiB) regression reported here was due to a memory leak, and has been resolved. 

The remaining memory regression here is about ~5.4% or 35 MiB [1]. Which I understand can be significant, but it is not ~100-200 MiB and is unrelated to the cause of the original regression.

> > [1]
> > https://treeherder.mozilla.org/perf.html#/
> > compare?originalProject=try&originalRevision=0dfc6edc1974&newProject=try&newR
> > evision=c210f24bc08417dc7431e9b37fa494e17da8e6c6&framework=4
> 
> Ryan can you do AWSY multiple AWSY try runs with and without this feature
> enabled? I'd like to know what that actual regression is at this point. Or
> is that essentially what [1] was? That looks more like a 50 - 100 MiB
> regression (or improvement).
> 
> I'm not saying we definitely shouldn't take a regression on OSX, I just want
> to get clearer picture.

I believe that :IGoldan did a try run for that [1]. It looks like it went from enabled (681,846,066 bytes or 650 MiB) to disabled (644,905,595 bytes or 615 MiB) which is an increase of 35 MiB.

The try run I referenced in comment 17 enabled sync-OMTP, which uses the new code path, but not asynchronously (it blocks). This means any leaking or other differences in behavior from the new code path should be visible, but any timing differences from being asynchronous will not.

From that try run, it appears that forcing the painting to be sync eliminated the regression. My working theory here is it is due to holding onto the graphics memory used in painting while the main thread continues running and maybe allocating, which is necessary for OMTP.

There's a chance there is something actionable here to reduce the memory usage, but unfortunately I have been unable to reproduce any meaningful difference between OMTP enabled or disabled on my local AWSY test runs or through basic browsing. Only on try. Which makes it difficult to debug potential issues. Any suggestions are welcome.

I am willing to continue looking into this as I don't want to unduly increase the memory usage. I also should have presented all this information better in my previous comment.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=114cc5f2489b&newProject=try&newRevision=000de5d27f8f&framework=4&filter=resident
[2] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0dfc6edc1974&newProject=try&newRevision=c210f24bc08417dc7431e9b37fa494e17da8e6c6&framework=4
Flags: needinfo?(rhunt)
So I think I have finally figured out what is going on here.

It looks like async painting is lowering the rate[1] that we are able to reuse the front buffer for tiles, causing us to double buffer tiles more frequently. This increases the pressure on the TextureClientPool increasing our peak graphics memory usage.

This is happening because we are able to paint faster, giving the compositor less time to upload each tile to a TextureGL before we paint again. Once the compositor uploads the tile, we are free to reuse it in the content process. If the compositor doesn't have enough time before we begin painting again, the front buffer will be locked and content will need to get a new texture from the pool.

Note that this does not affect OMTP on windows because we don't use tiling and don't need to upload textures on the compositor side.

I am not sure of the best course for mitigating this regression because it is caused by us painting faster which was a goal of OMTP.

A potential fix for this could be bug 1265824, which would allow us to remove an extra buffer in the GL driver?

[1] Front Buffer Reuse Rates

Tests are four trials of `./mach awsy-test --iterations=1 --per-tab-pause=1 --settle-wait-time=1`
Reuse is amount of times over test we could reuse the front buffer, allocate is the amount of times we could not.

OMTP Enabled:

Reuse | Allocate | %      |
======|==========|========|
3461  | 5612     | 38%    |
3358  | 5739     | 37%    |
3587  | 5712     | 38.5%  |
3412  | 5625     | 37.7%  |

OMTP Synchronized:

Reuse | Allocate | %      |
======|==========|========|
4196  | 5271     | 44%    |
4192  | 5222     | 44.5%  |
4300  | 4835     | 47%    |
4180  | 5216     | 44.5%  |

I was surprised how consistent the numbers were.
One thing I forgot to address, the potential of there being a leak because the regression is skewed towards the end of the AWSY test. The biggest difference I was able to see in about:memory reports has only been in the parent process and under 'gfx-textures' and 'gfx-textures-peak'. Which corresponds to the texture client/host/source tree of objects.

I've counted constructors and destructors of these objects on multiple runs and have not seen any leaks. So I don't see any evidence that it's a leak. Though I'm not sure why just higher texture allocation rates would have a larger impact at the end of AWSY. I don't know how much of an impact fragmentation could have, but it's the only guess I have. Any insight or opinions are welcome.

Looking some more at the client tiling code today, I'm not seeing any obvious wastes of memory we could try and eliminate. I think the best bet at reducing the impact of extra double buffering would be to eliminate the intermediate buffer we use in bug 1265824.

Unfortunately that bug is no longer staffed, and the graphics team would like to prioritize other work before that as I think that this regression is a speed-memory trade off.
Flags: needinfo?(erahm)
Thanks for the detailed analysis, Ryan. I can definitely see how there would be more fragmentation now that we're allocating more buffers. Given that there's bug 1365708 (staffed or not) for improving things I think it's okay to accept the regression for now. If it ends up being a leak we should start seeing beta users report rather large RSS and gfx-textures; at that point we can revisit the priority of bug 1365708.

I'm going to close this as WONTFIX for now.
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Flags: needinfo?(erahm)
Resolution: --- → WONTFIX
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #22)
> Thanks for the detailed analysis, Ryan. I can definitely see how there would
> be more fragmentation now that we're allocating more buffers. Given that
> there's bug 1365708 (staffed or not) for improving things I think it's okay
> to accept the regression for now. If it ends up being a leak we should start
> seeing beta users report rather large RSS and gfx-textures; at that point we
> can revisit the priority of bug 1365708.
> 
> I'm going to close this as WONTFIX for now.

Sorry that should have been bug 1265824.
Depends on: 1265824
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: