WebGPU demos on DX12 quickly run into OOM errors
Categories
(Core :: Graphics: WebGPU, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox140 | --- | fixed |
People
(Reporter: ErichDonGubler, Assigned: teoxoy)
References
Details
Attachments
(1 file)
Reporter | ||
Updated•26 days ago
|
Assignee | ||
Updated•23 days ago
|
Assignee | ||
Updated•23 days ago
|
Updated•23 days ago
|
Assignee | ||
Comment 1•21 days ago
|
||
As I mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1962753#c4, all examples that are running into this sudden OOM were running into it prior to the wgpu update that brought in the memory pressure detection; the change that made it somewhat worse is that we now lose the device on such unexpected OOMs (i.e. OOMs outside of createBuffer
/createTexture
etc).
The OOM is triggered here. wgpu users can't (easily) run into this because the default descriptor heap size is 1M, whereas Firefox's is 10k.
Some tests:
- Increasing the descriptor heap size to 100k:
- https://www.babylonjs.com/Demos/WebGPU/forestWebGPU.html works now and runs at almost 2x the FPS compared to chrome
- https://webgpu.github.io/webgpu-samples/sample/animometer works but increasing
numTriangles
can still trigger the OOM - https://webgpu.github.io/webgpu-samples/sample/renderBundles works
- https://www.crazygames.com/game/project-prismatic works but is not really playable as it runs 4-5x slower compared to chrome
- the other examples still run into the OOM but it takes longer for them to do so:
- https://linebender.org/velato
- https://linebender.org/vello (use the arrow keys to switch to other scenes)
- https://webgpufundamentals.org/webgpu/webgpu-scene-graphs-hand.html (click on the animate checkbox)
- https://fluid.loga.nz (click and drag mouse around)
- https://ulucode.com/random/webgputests/linked
- Increasing the descriptor heap size to 1M:
- All examples work now but the most demanding of them (ex: https://fluid.loga.nz & https://ulucode.com/random/webgputests/linked) pause for a few frames every X seconds (probably due to GC; related: Bug 1963161)
The descriptor heap size was lowered to 10k in Bug 1852723 to hopefully reduce the number of OOMs in CI that were caused by lots of devices being created by the CTS.
From my tests, the memory usage of a descriptor heap is: nr_for_descriptors * 64 bytes
. i.e. 1M descriptors take 64MB.
We might be able to get away with not increasing the limit at all or only increasing it to 100k by copying the descriptors into the descriptor heap at a later stage than at bind group creation. I will look into doing this.
Reporter | ||
Comment 2•21 days ago
|
||
I wonder what Chrome does to manage its descriptor heap size? 🤔
Reporter | ||
Comment 3•21 days ago
|
||
I also wonder if tactics like our test splitting wouldn't be a better fix for the OOMs with a higher descriptor limit? It seems like a priority conflict to make our defaults fit CI's limitations, but not applications already authored and in the wild.
Comment 4•21 days ago
•
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #2)
I wonder what Chrome does to manage its descriptor heap size? 🤔
When they run out of slots in the descriptor heap, they throw it away and allocate a bigger one. Every time they use a bind group, they have to check that the underlying descriptor heap still exists and if not allocate it again in the current heap.
I also wonder if tactics like our test splitting wouldn't be a better fix for the OOMs with a higher descriptor limit? It seems like a priority conflict to make our defaults fit CI's limitations, but not applications already authored and in the wild.
We also want to keep the per-device overhead in check in general (not just for CI). There is room for increasing the limit, but I suspect that the real issue (beyond a couple of demos that allocate an unreasonable number of bind groups), is that we don't garbage collect when we get close to running out of bindings. If the JS runtime does not think that a GC is useful it won't happen, but it doesn't know about the memory cost of buffers/textures and the limits of the descriptor heaps on windows.
Assignee | ||
Comment 5•21 days ago
|
||
Even if we increase the heap size to 1M which is the max on tier 1 & 2, users of the WebGPU API will still be able to hit it if the number of views and buffers in all their bind groups exceeds 1M.
I think the problem in this case is that this a backend limit that the WebGPU spec doesn't cover (I guess since we never brought it up and the chrome team is doing something else, they have a more complex system with multiple heaps that they swap).
Assignee | ||
Comment 6•12 days ago
|
||
Resolving this is more complicated than I thought.
Ideally we should set descriptor heaps via SetDescriptorHeaps
once per command list (d3d12 command buffer). To do this we should move encoding to CommandEncoder.finish()
so that we have all the knowledge required to either reuse the last used heap (if it's big enough) or create a new one. Each d3d12 command buffer will then hold an Arc
to the descriptor heap it used.
Even with this model it's still possible to hit the 1M limit and we could address it by splitting the work into multiple command buffers but this can be done separately if necessary.
Moving encoding to CommandEncoder.finish()
is quite a large change, for now I will put up a patch to increase the limit to 500k since I've tested that is enough for all examples.
I also did a try push to test if increasing the limit to 1M caused any issues in the CTS and all looked good.
Assignee | ||
Comment 7•12 days ago
|
||
Assignee | ||
Comment 9•11 days ago
|
||
I opened https://github.com/gfx-rs/wgpu/issues/7680 to track further work.
Comment 10•11 days ago
|
||
bugherder |
Description
•