Closed Bug 1062065 Opened 10 years ago Closed 9 years ago

Huge graphics memory usage in Windows

Categories

(Core :: Graphics, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 + wontfix
firefox35 + wontfix
firefox36 - affected

People

(Reporter: away, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [crashkill:P1][MemShrink:P1])

Our Windows crash dumps contain the page status bits for the entire address space. In a number of OOM crashes, I see more than 3GB of pages marked as PAGE_WRITECOMBINE. I assume that means graphics memory. What causes this and what can we do about it?

31.0 bp-522ad68b-6d00-4325-8fec-7f38a2140901 3309MB
32.0b bp-4e8408f5-c6cc-469b-8a9c-f52c72140901 3302MB
34.0a1 bp-1379d5dc-0263-42dc-a0b0-ec2c02140901 2947MB
34.0a1 bp-bdb4c0af-8159-4628-93ff-2281a2140901 3121MB
34.0a1 bp-a3b212d7-505f-44c7-8f62-0d41b2140831 3384MB
Flags: needinfo?(bjacob)
Flags: needinfo?(bjacob) → needinfo?(bas)
I'm not sure if that is graphics memory? If it's Intel drivers I've noticed they sometimes shadow VRAM in regular RAM, that could contribute in theory. In general the only causes I've seen on this sort of behavior is lots and lots of surfaces. For example Google Maps (the WebGL version) is very prone to this(although it also uses a lot of JS memory), but there's many other examples.
Flags: needinfo?(bas)
Whiteboard: [crashkill:P1]
Whiteboard: [crashkill:P1] → [crashkill:P1][MemShrink]
(In reply to Bas Schouten (:bas.schouten) from comment #1)
> I'm not sure if that is graphics memory? If it's Intel drivers I've noticed
> they sometimes shadow VRAM in regular RAM, that could contribute in theory.

Yeah I suspect that's what's happening. 4/5 of those were Intel devices.
Assuming the intel driver connection, what should we do about it? Not use accelerated graphics for these drivers? Change the way we handle textures?

David can you provide QA with some precise combos of card/driver details so that we could try to reproduce this in the QA lab?
Flags: needinfo?(dmajor)
From comment 0 the only ones I'm reasonably certain of are:

DeviceID: 0x0412 (Intel HD 4000) DriverVersion: 10.18.10.3621, Win7SP1
DeviceID: 0x0412 (Intel HD 4000) DriverVersion: 10.18.10.3412, Win8.1
DeviceID: 0x0166 (3rd Generation Intel® HD Graphics 4000) DriverVersion: 10.18.10.3345, Win8.1

There may be other GPUs hitting this, but the HD 4000 seems like a good start. Marc, got any in the lab?
Flags: needinfo?(dmajor) → needinfo?(mschifer)
Sheesh, every report I opened today is this issue. Here's a few, including some nvidia:

bp-d71b2aed-f497-4527-a07f-cfe002140903 3245M 0x10de 0x0f00
bp-a7d011a7-5067-42df-a149-93e9e2140904 3224M 0x8086 0x0166
bp-5bf44180-8308-4a58-acea-b2a422140905 3460M 0x10de 0x1184
bp-23c6aa8d-6608-4605-a888-4e1452140904 3297M 0x8086 0x0a16

They're all size=16384 coming via PushNewDT. These are something like 60% of our small-OOM crashes.
Bas, any thoughts on this?

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> Assuming the intel driver connection, what should we do about it? Not use
> accelerated graphics for these drivers? Change the way we handle textures?
Flags: needinfo?(bas)
[Tracking Requested - why for this release]: This is a good fraction of the OOM|small crashes.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> Assuming the intel driver connection, what should we do about it? Not use
> accelerated graphics for these drivers? Change the way we handle textures?
> 
> David can you provide QA with some precise combos of card/driver details so
> that we could try to reproduce this in the QA lab?

Dumping RAM copies of images would be a very good start. That way non-intel machines will be better off than they are non-accelerated, and intel machines will at least be just as bad off as they are non-accelerated. This actually shouldn't be that hard, the work just needs to be prioritized.
Flags: needinfo?(bas)
Nothing in our lab matches the specs for this, most of our lab machines are much older than this.
Flags: needinfo?(mschifer)
(In reply to Bas Schouten (:bas.schouten) from comment #1)
> In general the only causes I've seen on this sort of behavior is lots and
> lots of surfaces. For example Google Maps (the WebGL version) is very prone
> to this(although it also uses a lot of JS memory), but there's many other
> examples.

So it turns out I have an Intel HD 4000 on my desk (Mac Mini running Boot Camp). Could you share some of these other example websites? I'd prefer to avoid WebGL since I'm not seeing webgl-buffers in the about:memory logs.

I'll also try to dig some URLs out of the corresponding Breakpad reports.
Flags: needinfo?(bas)
(In reply to David Major [:dmajor] from comment #10)
> (In reply to Bas Schouten (:bas.schouten) from comment #1)
> > In general the only causes I've seen on this sort of behavior is lots and
> > lots of surfaces. For example Google Maps (the WebGL version) is very prone
> > to this(although it also uses a lot of JS memory), but there's many other
> > examples.
> 
> So it turns out I have an Intel HD 4000 on my desk (Mac Mini running Boot
> Camp). Could you share some of these other example websites? I'd prefer to
> avoid WebGL since I'm not seeing webgl-buffers in the about:memory logs.
> 
> I'll also try to dig some URLs out of the corresponding Breakpad reports.

Stuff that has big transformed things is probably a good idea, the CSS3D periodic table will produce some load maybe? Other than that just websites with big images! Although that won't have gotten worse with OMTC it's probably the easiest way.
Flags: needinfo?(bas)
> http://congressoamericano.blogspot.com/p/fotos-do-congresso-americano-iii.html

I was about to suggest that myself. I've never seen a page with more images.
Whiteboard: [crashkill:P1][MemShrink] → [crashkill:P1][MemShrink:P1]
bas/dmajor - This bug looks to have stalled. What's the next step here and can one of you take this bug?
Flags: needinfo?(dmajor)
Flags: needinfo?(bas)
I've been looking into this, slowly, in between other things. The links in this bug have given me somewhat bad memory usage (on the order of 1200MB vsize for 350MB explicit) but I haven't been able to climb into the horrific range (3GB+) even with multiple windows. Bas, I can look into an xperf trace of these VirtualAllocs if you think it would help, but I don't know if I'm actually reproducing the same conditions as in the crash reports.
Flags: needinfo?(dmajor)
Depends on: 1097262
This is not going to be fixed in 34.
Depends on: 1085823
I don't really see anything specifically actionable or reproducible for us here to work on, sadly.
Flags: needinfo?(bas)
Wontfixing and untracking for future releases based on comment 17.
We recently made "total write combine size" searchable in crash-stats, and the results are pretty alarming.

OOM reports with >2GB WC as a percentage of all OOM reports:
6% on Release 34
2% on Beta 35
28% on Aurora 36
18% on Nightly 37 (but few data points)

So write-combine memory usage is a real problem on all channels, but v36 looks especially bad.
In bug 985193 comment 13 I was able to reproduce one scenario with huge WC memory.
Depends on: 985193
Depends on: 1123465
Depends on: 1127925
dmajor says: "that's done. gold star to mattwoodrow."
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.