Closed Bug 1499509 Opened 6 years ago Closed 6 years ago

Increasing the content process count triggered a larger Resident AWSY regression in WR than in non-WR

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bholley, Unassigned)

References

Details

The push at [1] included bug 1470280, which bumped the process count in Nightly up to 8. Looking at the graphs [2] of Resident Memory for windows10-64 versus windows10-64-qr, I see an increase of ~15 megabytes for regular Firefox, but ~30 megabytes with WebRender enabled. This runs contrary to my expectations, since WR generally moves graphics stuff out of the content processes, and so should be more immune to content process count increases. We should investigate this. [1] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b89a744deccb5be6113036d95c5c208e1ae2b59f&tochange=e4220fa7a191903a814e8cf473cf544fe9762625 [2] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1784207,1,4&series=mozilla-central,1653930,1,4&zoom=1539217689063.889,1539239543726.389,400000000,922985019.9685168&selected=mozilla-central,1784207,389615,606029731,4
Some thoughts: - Do we know which processes the increases are happening in? - We cache a buffer of the size of the last display list in the content process. So we'll have more of these. They can be 1MB+ - Image handling is considerably different with WebRender. This could cause a change.
Blocks: wr-memory
(In reply to Bobby Holley (:bholley) from comment #2) > The comparison view on the subtests is enlightening, with WR [1] and without > WR [2]. The percentages aren't really relevant but the absolute value of the > mean difference is. > > If I read that right, and assuming the top-level score is just a geometric > mean of the subtests, This is correct. > then the difference between WR/nonWR is primary > because there's less _improvement_ in the tabs-closed tests. I'm not really > sure what to make of that. Eric? > For the tabs closed measure we close all tabs but one, then navigate to about:blank. That one remaining process still has whatever cruft has accumulated. By having more processes we expect less cruft to accumulate per process, so when 7/8 are closed the remaining one should have less overhead than the 4 process case. In the QR case, if we look at the memory reports you can see there's a GPU process whereas non-QR does not have one. It seems like this could be the main reason we didn't see as large of a decrease (the stuff that got moved out to the GPU process w/ QR is sticking around now). I'm actually a little surprised we don't have a GPU process in automation for the non-QR case -- I thought that was enabled by default now.
Flags: needinfo?(erahm)
Depends on: 1499554
(In reply to Eric Rahm [:erahm] from comment #3) > (In reply to Bobby Holley (:bholley) from comment #2) > > The comparison view on the subtests is enlightening, with WR [1] and without > > WR [2]. The percentages aren't really relevant but the absolute value of the > > mean difference is. > > > > If I read that right, and assuming the top-level score is just a geometric > > mean of the subtests, > > This is correct. > > > then the difference between WR/nonWR is primary > > because there's less _improvement_ in the tabs-closed tests. I'm not really > > sure what to make of that. Eric? > > > > For the tabs closed measure we close all tabs but one, then navigate to > about:blank. That one remaining process still has whatever cruft has > accumulated. By having more processes we expect less cruft to accumulate per > process, so when 7/8 are closed the remaining one should have less overhead > than the 4 process case. That is very helpful analysis, thank you! > In the QR case, if we look at the memory reports you can see there's a GPU > process whereas non-QR does not have one. It seems like this could be the > main reason we didn't see as large of a decrease (the stuff that got moved > out to the GPU process w/ QR is sticking around now). > > I'm actually a little surprised we don't have a GPU process in automation > for the non-QR case -- I thought that was enabled by default now. Indeed, that seems like a fairly serious problem! Filed bug 1499554 to continue the investigation.
Priority: -- → P3
Since bug 1499554 is fixed, I took another look at the Resident memory comparison on perfherder: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1653930,1,4&series=mozilla-central,1784207,1,4 Since around December 8 WR-enabled is doing better than WR-disabled. Is there anything else we should investigate here, or can we close this out?
Flags: needinfo?(bobbyholley)
There's still some more memory work in the pipe, but this specific bug can be closed out I think.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(bobbyholley)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.