Huge graphics memory fragmentation on Windows

RESOLVED WORKSFORME

Status

()

Core
Graphics
RESOLVED WORKSFORME
4 years ago
3 years ago

People

(Reporter: dmajor, Unassigned)

Tracking

(Blocks: 2 bugs)

36 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P2])

(Reporter)

Description

4 years ago
This is a distinct bug from: Bug 1062065 - Huge graphics memory usage in Windows

In bp-7715dd07-2489-4426-a90f-693812141118, we OOMed with 3GB "free" memory because the free blocks were too small to be usable. The memory pattern looked like this (WC = write-combine):
[1K WC][15K free][1K WC][15K free][1K WC][15K free][...]

I assume the WC regions are graphics-related. Roughly 1 in 5 OOMs on Nightly this week have obscenely large "free" values. I still need to confirm whether they all follow this WC pattern.
(Reporter)

Comment 1

4 years ago
> Roughly 1 in 5 OOMs on Nightly this week have obscenely large "free" values.
It's more like 1 in 8, after removing the cases that ran out of physical/pagefile.
The ones that ran out of physical/pagefile, are they all graphics drivers that store gpu-committed in their own process space instead of the Firefox process?

If they are it could be that its exactly the same problem. Its a pity that its close to impossible to prove from crash reports...
(Reporter)

Comment 3

4 years ago
At the present time I'm deliberately ignoring physical memory exhaustion reports. They are a relatively small fraction of OOMs and we have much bigger fish to fry.
(Reporter)

Updated

4 years ago
Blocks: 965936
Whiteboard: [MemShrink] → [MemShrink:P2]
(Reporter)

Comment 4

4 years ago
I fail at hex. The pattern is actually 4K WC followed by 60K free, which makes sense given that Windows VirtualAlloc works in 64K chunks.

In bug 1097262 I added a field to crash reports that measures the amount of space lost to fragmentation. I want to see if this correlates with a particular graphics driver.
(Reporter)

Updated

4 years ago
Depends on: 1085823
(Reporter)

Comment 5

3 years ago
The "tiny_block_size" processing is now enabled on Socorro. Here's a week's worth of crashes (all channels) broken down by graphics card vendor:

All OOMs for which we have block size data
0x8086 	91414 	61.03 %
0x1002 	28401 	18.96 %
0x10de 	28195 	18.82 %

OOMs where tiny_block_size > 1GB
0x8086 	1181 	89.54 %
0x1002 	84 	6.37 %
0x10de 	49 	3.71 %

OOMs where tiny_block_size > 2GB
0x8086 	636 	93.39 %
0x10de 	25 	3.67 %
0x1002 	16 	2.35 %

It's pretty clear that Intel drivers are to blame for most of the fragmentation crashes. However, those crashes are overall very rare, roughly 1% of OOMs.

That is very different from what I saw in comment 0, when these crashes were a large fraction of OOMs. It's not a matter of channel; restricting the searches to nightly doesn't make a difference. I also used my modified minidump-memorylist tool to confirm that the tiny_block_size numbers are accurate.

I don't have a good explanation. All I can say is that these fragmentation crashes no longer seem like a priority for the OOM effort.
I have no data to back up thinking it's related, but it may be useful tracking if this changes once bug 985193 is fixed?
David, is this still a problem?
Flags: needinfo?(dmajor)
(Reporter)

Comment 8

3 years ago
No; these are so rare now that it's not worth bothering at all.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(dmajor)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.