Closed Bug 1962738 Opened 26 days ago Closed 11 days ago

WebGPU demos on DX12 quickly run into OOM errors

Categories

(Core :: Graphics: WebGPU, defect, P1)

defect

Tracking

()

RESOLVED FIXED
140 Branch
Tracking Status
firefox140 --- fixed

People

(Reporter: ErichDonGubler, Assigned: teoxoy)

References

Details

Attachments

(1 file)

No description provided.
See Also: → 1962706
Blocks: 1952428
Blocks: 1962752
Blocks: 1911255
Blocks: 1962753
No longer blocks: 1911255
Blocks: 1962806
Blocks: 1962844
Assignee: nobody → ttanasoaia
Status: NEW → ASSIGNED
Priority: -- → P1
Blocks: 1888749
See Also: 1962706
Blocks: 1927126

As I mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1962753#c4, all examples that are running into this sudden OOM were running into it prior to the wgpu update that brought in the memory pressure detection; the change that made it somewhat worse is that we now lose the device on such unexpected OOMs (i.e. OOMs outside of createBuffer/createTexture etc).

The OOM is triggered here. wgpu users can't (easily) run into this because the default descriptor heap size is 1M, whereas Firefox's is 10k.

Some tests:

The descriptor heap size was lowered to 10k in Bug 1852723 to hopefully reduce the number of OOMs in CI that were caused by lots of devices being created by the CTS.

From my tests, the memory usage of a descriptor heap is: nr_for_descriptors * 64 bytes. i.e. 1M descriptors take 64MB.

We might be able to get away with not increasing the limit at all or only increasing it to 100k by copying the descriptors into the descriptor heap at a later stage than at bind group creation. I will look into doing this.

See Also: → 1963161

I wonder what Chrome does to manage its descriptor heap size? 🤔

I also wonder if tactics like our test splitting wouldn't be a better fix for the OOMs with a higher descriptor limit? It seems like a priority conflict to make our defaults fit CI's limitations, but not applications already authored and in the wild.

(In reply to Erich Gubler [:ErichDonGubler] from comment #2)

I wonder what Chrome does to manage its descriptor heap size? 🤔

When they run out of slots in the descriptor heap, they throw it away and allocate a bigger one. Every time they use a bind group, they have to check that the underlying descriptor heap still exists and if not allocate it again in the current heap.

I also wonder if tactics like our test splitting wouldn't be a better fix for the OOMs with a higher descriptor limit? It seems like a priority conflict to make our defaults fit CI's limitations, but not applications already authored and in the wild.

We also want to keep the per-device overhead in check in general (not just for CI). There is room for increasing the limit, but I suspect that the real issue (beyond a couple of demos that allocate an unreasonable number of bind groups), is that we don't garbage collect when we get close to running out of bindings. If the JS runtime does not think that a GC is useful it won't happen, but it doesn't know about the memory cost of buffers/textures and the limits of the descriptor heaps on windows.

Even if we increase the heap size to 1M which is the max on tier 1 & 2, users of the WebGPU API will still be able to hit it if the number of views and buffers in all their bind groups exceeds 1M.

I think the problem in this case is that this a backend limit that the WebGPU spec doesn't cover (I guess since we never brought it up and the chrome team is doing something else, they have a more complex system with multiple heaps that they swap).

Resolving this is more complicated than I thought.

Ideally we should set descriptor heaps via SetDescriptorHeaps once per command list (d3d12 command buffer). To do this we should move encoding to CommandEncoder.finish() so that we have all the knowledge required to either reuse the last used heap (if it's big enough) or create a new one. Each d3d12 command buffer will then hold an Arc to the descriptor heap it used.

Even with this model it's still possible to hit the 1M limit and we could address it by splitting the work into multiple command buffers but this can be done separately if necessary.

Moving encoding to CommandEncoder.finish() is quite a large change, for now I will put up a patch to increase the limit to 500k since I've tested that is enough for all examples.

I also did a try push to test if increasing the limit to 1M caused any issues in the CTS and all looked good.

Pushed by ttanasoaia@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a0582edf9309 increase descriptor heap size to 500k r=webgpu-reviewers,nical

I opened https://github.com/gfx-rs/wgpu/issues/7680 to track further work.

Status: ASSIGNED → RESOLVED
Closed: 11 days ago
Resolution: --- → FIXED
Target Milestone: --- → 140 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: