Implement `GPUDevice.destroy`
Categories
(Core :: Graphics: WebGPU, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox120 | --- | fixed |
People
(Reporter: jimb, Assigned: bradwerth)
References
(Blocks 2 open bugs)
Details
Attachments
(5 files, 2 obsolete files)
At the moment, mozilla::webgpu::Device::Destroy
is a placeholder.
This causes WebGPU CTS tests like webgpu:api,operation,adapter,requestDevice:invalid:
to hang, because it says:
device.destroy();
const lostInfo = await device.lost;
t.expect(lostInfo.reason === 'destroyed');
Reporter | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 1•1 year ago
|
||
Depends on D183053
Comment 2•1 year ago
|
||
WARNING: There's a few things that @erichdongubler has noticed that may
impact the landability of this patch:
- The condition this patch is "fixing" is actually already covered by
Buffer.cpp
's implementation ofBuffer::Unmap
, which prevents the
"device timeline" (i.e., the GPU process andwgpu_bindings
) from
ever receiving a device unmap request when the device is known to be
lost in the content process. - The test added by this patch is not testable without an implementation
ofGPUDevice::destroy
. Right now, we stub it out, and it needs to be
working to actually achieve coverage.
Updated•1 year ago
|
Comment 3•1 year ago
|
||
I've saved off some WIP patches for this, but I don't consider myself to be actively working on this. Unassigning.
Updated•1 year ago
|
Updated•11 months ago
|
Comment 4•11 months ago
|
||
Method to implement: https://gpuweb.github.io/gpuweb/#dom-gpudevice-destroy
Updated•11 months ago
|
Updated•10 months ago
|
Assignee | ||
Comment 5•10 months ago
|
||
From reading the spec for https://gpuweb.github.io/gpuweb/#dom-gpudevice-destroy, it seems like there are some dependencies that we'll need to implement to accomplish this.
- Implement https://gpuweb.github.io/gpuweb/#lose-the-device in wgpu-core.
- Implement the concept of Device validity in wgpu-core.
- Cleanup the semantics of resolving the GPUDevice.lost Promise, which should be triggered from wgpu-core during "lose the device", not from Device::Cleanup() which is invoked by the cycle collector.
I'll take this Bug for now and try to address these dependencies. Not sure if I'll file more issues here or in wgpu.
Comment 6•10 months ago
|
||
A few handwavey notes about the wgpu internals:
- All objects of the API are held in a hub of global registries: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/hub.rs#L451
- If you look at the underlying
Storage
type, it already has something to represent when an object exists but is not in a usable state: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/storage.rs#L21 - That means code that access the objects typically needs to handle the object being in an unusable state, for example: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/device/global.rs#L147
- Before deleting the actual device (for example the vulkan one), all associated resources must have been deleted first.
- Because there might be a frame in flight while the GPU is using the device, we can't actually destroy the device as soon as we receive the device.destroy API call. There is a mechanism for deferring deletion of texture, buffers, etc. to a safe point in the future where we know that the objects are not being used by the gpu. I don't know if it is already used for the device or if we need to plumb it in. Jim is a good person to ask about the details of the async deletion mechanisms.
- My intuition is that much of the work for supporting device destroy (triggered by JS or ourselves under some low memory conditions) and device lost (triggered by the driver at any time) is the same, but if it makes it simpler, it's definitely reasonable to split it in two projects and focus on device destroy first.
Assignee | ||
Comment 7•10 months ago
|
||
(In reply to Nicolas Silva [:nical] from comment #6)
Thank you for this detailed write-up. WIP at https://github.com/bradwerth/wgpu/pull/new/deviceValid
- If you look at the underlying
Storage
type, it already has something to represent when an object exists but is not in a usable state: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/storage.rs#L21
I didn't fully internalize it when I started my patch so I didn't follow your advice here. I will probably change all of this to store the Error
, and do a match device_guard.get
instead of the proposed is_valid
function.
- That means code that access the objects typically needs to handle the object being in an unusable state, for example: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/device/global.rs#L147
These safety checks appear to largely be missing at the moment. If nothing else, the WIP code highlights all the callsites I could find that will need to check the device before using it.
Assignee | ||
Comment 8•10 months ago
|
||
It does seem possible that we might need to get a Device
from storage and operate on it even when it is invalid, which would make storing the Error
object inadvisable. I'll see if there are any cases that require operations on an invalid Device
.
Assignee | ||
Comment 9•10 months ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #8)
It does seem possible that we might need to get a
Device
from storage and operate on it even when it is invalid, which would make storing theError
object inadvisable. I'll see if there are any cases that require operations on an invalidDevice
.
Well, the major one is that poll_devices
needs to be able to iterate the devices in order to wait on their queue completions and eventually drop the device. Representing invalid by storing an Error
makes that challenging since the Error
would have to also store an Opt<T>
for cases where we're using it to represent invalid. I don't think it's worth changing to this scheme.
Assignee | ||
Comment 10•10 months ago
|
||
This is dependent on the unmerged wgpu changes at
https://github.com/gfx-rs/wgpu/commit/0d4825c6b75563e5b2b1fb3a8347eff1a9a6ec22
This creates a WebGPUParent::LoseDevice entry point, which both tells
wgpu to lose the device, and informs the WebGPUChild that the device has
been lost. The child side is largely a stub, as a later part will
rationalize the handling of the device.lost promise in the child.
This also invokes LoseDevice from RecvAdapterRequestDevice on error,
fulfilling one of the triggers in the spec for "lose the device".
Comment 11•9 months ago
|
||
:bradwerth: Are the patches I have up still useful? Seeing if I can clean out some of my own patches in review here.
Updated•9 months ago
|
Assignee | ||
Comment 12•9 months ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #11)
:bradwerth: Are the patches I have up still useful? Seeing if I can clean out some of my own patches in review here.
Yes, they are probably obsolete, though later parts in this patch stack may need to do some similar things. And possibly the test you supplied will "just work" when all this is done. Go ahead and obsolete them and I'll cherry-pick the test later.
Updated•9 months ago
|
Updated•9 months ago
|
Updated•9 months ago
|
Assignee | ||
Comment 13•9 months ago
|
||
This ensures that both internal and external triggers of "lose the
device" resolve the lost promise using the same code path.
Depends on D188388
Assignee | ||
Comment 14•9 months ago
|
||
This treats destroy as a 2-step process: wait on the destroy, then drop.
Depends on D190129
Updated•9 months ago
|
Updated•9 months ago
|
Assignee | ||
Comment 15•9 months ago
|
||
Depends on D190238
Assignee | ||
Comment 16•9 months ago
|
||
This tracks DeviceIds that have been sent DeviceLost in a hashmap that
is pruned when the Device is dropped. This reduces IPC traffic with the
expected use case of many calls returning after a device loss.
Depends on D190238
Updated•9 months ago
|
Comment 17•9 months ago
|
||
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/843c7d19787b Part 1: Stub in 'lose the device' and trigger it on device lost errors. r=webgpu-reviewers,nical https://hg.mozilla.org/integration/autoland/rev/f214290956c1 Part 2: Rationalize GPUDevice lost promise handling. r=webgpu-reviewers,nical https://hg.mozilla.org/integration/autoland/rev/41d7ed93df76 Part 3: Make GPUDevice::Destroy() trigger wgpu device_destroy, then device_drop. r=webgpu-reviewers,nical https://hg.mozilla.org/integration/autoland/rev/b8733146f070 Part 4: Ensure that each Device is sent DeviceLost only once. r=webgpu-reviewers,nical https://hg.mozilla.org/integration/autoland/rev/9abfbd2f5892 Part 5: Update tests and test expectations. r=webgpu-reviewers,nical
![]() |
||
Comment 18•9 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/843c7d19787b
https://hg.mozilla.org/mozilla-central/rev/f214290956c1
https://hg.mozilla.org/mozilla-central/rev/41d7ed93df76
https://hg.mozilla.org/mozilla-central/rev/b8733146f070
https://hg.mozilla.org/mozilla-central/rev/9abfbd2f5892
Description
•