1499509 - Increasing the content process count triggered a larger Resident AWSY regression in WR than in non-WR

Reporter

Description

•

6 years ago

The push at [1] included bug 1470280, which bumped the process count in Nightly up to 8. Looking at the graphs [2] of Resident Memory for windows10-64 versus windows10-64-qr, I see an increase of ~15 megabytes for regular Firefox, but ~30 megabytes with WebRender enabled. This runs contrary to my expectations, since WR generally moves graphics stuff out of the content processes, and so should be more immune to content process count increases. We should investigate this. [1] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b89a744deccb5be6113036d95c5c208e1ae2b59f&tochange=e4220fa7a191903a814e8cf473cf544fe9762625 [2] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1784207,1,4&series=mozilla-central,1653930,1,4&zoom=1539217689063.889,1539239543726.389,400000000,922985019.9685168&selected=mozilla-central,1784207,389615,606029731,4

Jeff Muizelaar [:jrmuizel]

Comment 1

•

6 years ago

Some thoughts: - Do we know which processes the increases are happening in? - We cache a buffer of the size of the last display list in the content process. So we'll have more of these. They can be 1MB+ - Image handling is considerably different with WebRender. This could cause a change.

Bobby Holley (:bholley)

Reporter

Updated

•

6 years ago

Blocks: wr-memory

Bobby Holley (:bholley)

Reporter

Comment 2

•

6 years ago

The comparison view on the subtests is enlightening, with WR [1] and without WR [2]. The percentages aren't really relevant but the absolute value of the mean difference is. If I read that right, and assuming the top-level score is just a geometric mean of the subtests, then the difference between WR/nonWR is primary because there's less _improvement_ in the tabs-closed tests. I'm not really sure what to make of that. Eric? [1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=b89a744deccb5be6113036d95c5c208e1ae2b59f&newProject=mozilla-central&newRevision=e4220fa7a191903a814e8cf473cf544fe9762625&originalSignature=d8db7e6d1a47f4fdee8baf0bc25c568de796c1f9&newSignature=d8db7e6d1a47f4fdee8baf0bc25c568de796c1f9&framework=4 [2] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&originalRevision=b89a744deccb5be6113036d95c5c208e1ae2b59f&newProject=mozilla-central&newRevision=e4220fa7a191903a814e8cf473cf544fe9762625&originalSignature=c5bf2e43033c90a9bb6e510d438274900fe867f4&newSignature=c5bf2e43033c90a9bb6e510d438274900fe867f4&framework=4

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 3

•

6 years ago

(In reply to Bobby Holley (:bholley) from comment #2) > The comparison view on the subtests is enlightening, with WR [1] and without > WR [2]. The percentages aren't really relevant but the absolute value of the > mean difference is. > > If I read that right, and assuming the top-level score is just a geometric > mean of the subtests, This is correct. > then the difference between WR/nonWR is primary > because there's less _improvement_ in the tabs-closed tests. I'm not really > sure what to make of that. Eric? > For the tabs closed measure we close all tabs but one, then navigate to about:blank. That one remaining process still has whatever cruft has accumulated. By having more processes we expect less cruft to accumulate per process, so when 7/8 are closed the remaining one should have less overhead than the 4 process case. In the QR case, if we look at the memory reports you can see there's a GPU process whereas non-QR does not have one. It seems like this could be the main reason we didn't see as large of a decrease (the stuff that got moved out to the GPU process w/ QR is sticking around now). I'm actually a little surprised we don't have a GPU process in automation for the non-QR case -- I thought that was enabled by default now.

Flags: needinfo?(erahm)

Bobby Holley (:bholley)

Reporter

Updated

•

6 years ago

Depends on: 1499554

Bobby Holley (:bholley)

Reporter

Comment 4

•

6 years ago

(In reply to Eric Rahm [:erahm] from comment #3) > (In reply to Bobby Holley (:bholley) from comment #2) > > The comparison view on the subtests is enlightening, with WR [1] and without > > WR [2]. The percentages aren't really relevant but the absolute value of the > > mean difference is. > > > > If I read that right, and assuming the top-level score is just a geometric > > mean of the subtests, > > This is correct. > > > then the difference between WR/nonWR is primary > > because there's less _improvement_ in the tabs-closed tests. I'm not really > > sure what to make of that. Eric? > > > > For the tabs closed measure we close all tabs but one, then navigate to > about:blank. That one remaining process still has whatever cruft has > accumulated. By having more processes we expect less cruft to accumulate per > process, so when 7/8 are closed the remaining one should have less overhead > than the 4 process case. That is very helpful analysis, thank you! > In the QR case, if we look at the memory reports you can see there's a GPU > process whereas non-QR does not have one. It seems like this could be the > main reason we didn't see as large of a decrease (the stuff that got moved > out to the GPU process w/ QR is sticking around now). > > I'm actually a little surprised we don't have a GPU process in automation > for the non-QR case -- I thought that was enabled by default now. Indeed, that seems like a fairly serious problem! Filed bug 1499554 to continue the investigation.

Maire Reavy [:mreavy]

Updated

•

6 years ago

Priority: -- → P3

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 5

•

6 years ago

Since bug 1499554 is fixed, I took another look at the Resident memory comparison on perfherder: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1653930,1,4&series=mozilla-central,1784207,1,4 Since around December 8 WR-enabled is doing better than WR-disabled. Is there anything else we should investigate here, or can we close this out?

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Reporter

Comment 6

•

6 years ago

There's still some more memory work in the pipe, but this specific bug can be closed out I think.

Status: NEW → RESOLVED

Closed: 6 years ago

Flags: needinfo?(bobbyholley)

Resolution: --- → WORKSFORME

Bugzilla

Quick Search

Increasing the content process count triggered a larger Resident AWSY regression in WR than in non-WR

Categories

(Core :: Graphics: WebRender, defect, P3)

Tracking

()

People

(Reporter: bholley, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Comment 6