Closed Bug 1838693 Opened 1 year ago Closed 9 months ago

Implement `GPUDevice.destroy`

Categories

(Core :: Graphics: WebGPU, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
120 Branch
Tracking Status
firefox120 --- fixed

People

(Reporter: jimb, Assigned: bradwerth)

References

(Blocks 2 open bugs)

Details

Attachments

(5 files, 2 obsolete files)

At the moment, mozilla::webgpu::Device::Destroy is a placeholder.

This causes WebGPU CTS tests like webgpu:api,operation,adapter,requestDevice:invalid: to hang, because it says:

      device.destroy();
      const lostInfo = await device.lost;
      t.expect(lostInfo.reason === 'destroyed');
Severity: -- → S3
Type: task → enhancement
Priority: -- → P3
Priority: P3 → P2
Priority: P2 → P1
Summary: WebGPU GPUDevice.destroy not implemented, causing webgpu:api,operation,adapter,requestDevice:invalid to hang → Implement WebGPU `GPUDevice.destroy`
Summary: Implement WebGPU `GPUDevice.destroy` → Implement `GPUDevice.destroy`

WARNING: There's a few things that @erichdongubler has noticed that may
impact the landability of this patch:

  • The condition this patch is "fixing" is actually already covered by
    Buffer.cpp's implementation of Buffer::Unmap, which prevents the
    "device timeline" (i.e., the GPU process and wgpu_bindings) from
    ever receiving a device unmap request when the device is known to be
    lost in the content process.
  • The test added by this patch is not testable without an implementation
    of GPUDevice::destroy. Right now, we stub it out, and it needs to be
    working to actually achieve coverage.
Assignee: nobody → egubler
Status: NEW → ASSIGNED

I've saved off some WIP patches for this, but I don't consider myself to be actively working on this. Unassigning.

Assignee: egubler → nobody
Status: ASSIGNED → NEW
Blocks: webgpu-v1
No longer blocks: webgpu-v1-cts-blockers
Assignee: nobody → jimb
Status: NEW → ASSIGNED
Priority: P1 → P2
Priority: P2 → P1

From reading the spec for https://gpuweb.github.io/gpuweb/#dom-gpudevice-destroy, it seems like there are some dependencies that we'll need to implement to accomplish this.

  1. Implement https://gpuweb.github.io/gpuweb/#lose-the-device in wgpu-core.
  2. Implement the concept of Device validity in wgpu-core.
  3. Cleanup the semantics of resolving the GPUDevice.lost Promise, which should be triggered from wgpu-core during "lose the device", not from Device::Cleanup() which is invoked by the cycle collector.

I'll take this Bug for now and try to address these dependencies. Not sure if I'll file more issues here or in wgpu.

Assignee: jimb → bwerth

A few handwavey notes about the wgpu internals:

(In reply to Nicolas Silva [:nical] from comment #6)

Thank you for this detailed write-up. WIP at https://github.com/bradwerth/wgpu/pull/new/deviceValid

I didn't fully internalize it when I started my patch so I didn't follow your advice here. I will probably change all of this to store the Error, and do a match device_guard.get instead of the proposed is_valid function.

These safety checks appear to largely be missing at the moment. If nothing else, the WIP code highlights all the callsites I could find that will need to check the device before using it.

It does seem possible that we might need to get a Device from storage and operate on it even when it is invalid, which would make storing the Error object inadvisable. I'll see if there are any cases that require operations on an invalid Device.

(In reply to Brad Werth [:bradwerth] from comment #8)

It does seem possible that we might need to get a Device from storage and operate on it even when it is invalid, which would make storing the Error object inadvisable. I'll see if there are any cases that require operations on an invalid Device.

Well, the major one is that poll_devices needs to be able to iterate the devices in order to wait on their queue completions and eventually drop the device. Representing invalid by storing an Error makes that challenging since the Error would have to also store an Opt<T> for cases where we're using it to represent invalid. I don't think it's worth changing to this scheme.

This is dependent on the unmerged wgpu changes at
https://github.com/gfx-rs/wgpu/commit/0d4825c6b75563e5b2b1fb3a8347eff1a9a6ec22

This creates a WebGPUParent::LoseDevice entry point, which both tells
wgpu to lose the device, and informs the WebGPUChild that the device has
been lost. The child side is largely a stub, as a later part will
rationalize the handling of the device.lost promise in the child.

This also invokes LoseDevice from RecvAdapterRequestDevice on error,
fulfilling one of the triggers in the spec for "lose the device".

:bradwerth: Are the patches I have up still useful? Seeing if I can clean out some of my own patches in review here.

Flags: needinfo?(bwerth)

(In reply to Erich Gubler [:ErichDonGubler] from comment #11)

:bradwerth: Are the patches I have up still useful? Seeing if I can clean out some of my own patches in review here.

Yes, they are probably obsolete, though later parts in this patch stack may need to do some similar things. And possibly the test you supplied will "just work" when all this is done. Go ahead and obsolete them and I'll cherry-pick the test later.

Flags: needinfo?(bwerth)
Attachment #9343926 - Attachment is obsolete: true
Attachment #9343927 - Attachment is obsolete: true
Attachment #9353488 - Attachment description: WIP: Bug 1838693 Part 1: Stub in 'lose the device' and device.valid. → Bug 1838693 Part 1: Stub in 'lose the device' and device.valid.

This ensures that both internal and external triggers of "lose the
device" resolve the lost promise using the same code path.

Depends on D188388

This treats destroy as a 2-step process: wait on the destroy, then drop.

Depends on D190129

Attachment #9356945 - Attachment description: Bug 1838693 Part 3: Make GPUDevice.Destroy() trigger wgpu device_destroy, then drop the device. → Bug 1838693 Part 3: Make GPUDevice::Destroy() trigger wgpu device_destroy, then device_drop.
Attachment #9353488 - Attachment description: Bug 1838693 Part 1: Stub in 'lose the device' and device.valid. → Bug 1838693 Part 1: Stub in 'lose the device' and trigger it on device lost errors.

This tracks DeviceIds that have been sent DeviceLost in a hashmap that
is pruned when the Device is dropped. This reduces IPC traffic with the
expected use case of many calls returning after a device loss.

Depends on D190238

Attachment #9358111 - Attachment description: Bug 1838693 Part 4: Update tests and test expectations. → Bug 1838693 Part 5: Update tests and test expectations.
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/843c7d19787b
Part 1: Stub in 'lose the device' and trigger it on device lost errors. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/f214290956c1
Part 2: Rationalize GPUDevice lost promise handling. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/41d7ed93df76
Part 3: Make GPUDevice::Destroy() trigger wgpu device_destroy, then device_drop. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/b8733146f070
Part 4: Ensure that each Device is sent DeviceLost only once. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/9abfbd2f5892
Part 5: Update tests and test expectations. r=webgpu-reviewers,nical
Regressions: 1859974
Regressions: 1859825
Regressions: 1860113
Regressions: 1861751
Regressions: 1860577
Regressions: 1860570
Regressions: 1862499
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: