Closed Bug 1495936 Opened 7 years ago Closed 6 years ago

Higher startup private bytes and VRAM usage under WebRender

Categories

(Core :: Graphics: WebRender, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(5 files)

I did some testing today around memory usage in a clean startup of Firefox with and without WebRender. I loaded the browser and then opened a single page [1]. The private working sets, which are what the task manager reports, are about the same summed over all the processes. However, private bytes are around 150-200MB higher. The working set size is more important, since that represents usage of physical RAM, and also corresponds to what users see in the task manager. However, private bytes count against the commit limit, which is RAM+swap. Eric Rahm says that swap on windows defaults to 1.5x ram size, so a machine with 2GB of ram would have a system-wide limit of 5GB. An extra 200MB there is not great. The net value of "GPU committed bytes" is also 140MB higher, and "GPU dedicated bytes" is 100MB higher (I haven't worked out what the difference is). I suspect the private bytes and VRAM usage are related, so I'm filing them together. We can split out later if need be. [1] https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer
Attached image memory usage screenshot
Attaching a screenshot with all the stats in one place. PID 6764 is the WR GPU process. PID 5676 is the Fx GPU process.
Here's a screenshot while performing a similar workload on the quantum reference laptop. On that device, the higher pbytes _do_ translate into higher working set. This may be a difference with the driver, or with overall system resources. We don't need to worry about this one for the MVP (it's not NVIDIA), but we should keep it in mind, so I'm posting it here.
Using WPR, I recorded all the large VirtualAlloc stacks in the GPU process during startup. This one is for WR.
Here are the non-WR stacks for comparison. It's not really a fair comparison because the GPU heavy lifting happens in the content process for non-WR.
I'll try to get back to this tomorrow, and investigate the largest stacks from comment 3.
Depends on: 1495977
Depends on: 1496838
Priority: -- → P2
Just re-did these measurements, and WR is now significantly better - 348.3 rather than 433.6, or 20% better. \o/
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: