Bugzilla

Updated

•

1 year ago

Priority: P3 → P2

Updated

•

1 year ago

Blocks: 1812352

Updated

•

1 year ago

Priority: P2 → P1

Updated

•

1 year ago

Summary: WebGPU GPUDevice.destroy not implemented, causing webgpu:api,operation,adapter,requestDevice:invalid to hang → Implement WebGPU `GPUDevice.destroy`

Updated

•

1 year ago

Summary: Implement WebGPU `GPUDevice.destroy` → Implement `GPUDevice.destroy`

Comment 1

•

1 year ago

Attached file WIP: Bug 1838693: implement `GPUDevice::destroy` r?#webgpu-reviewers (obsolete) — Details

Depends on D183053

Comment 2

•

1 year ago

Attached file WIP: Bug 1838693: fix(webgpu): don't return an error for `GPUBuffer.unmap` when `this.[[device]]` is invalid r?#webgpu-reviewers (obsolete) — Details

WARNING: There's a few things that @erichdongubler has noticed that may
impact the landability of this patch:

The condition this patch is "fixing" is actually already covered by
Buffer.cpp's implementation of Buffer::Unmap, which prevents the
"device timeline" (i.e., the GPU process and wgpu_bindings) from
ever receiving a device unmap request when the device is known to be
lost in the content process.
The test added by this patch is not testable without an implementation
of GPUDevice::destroy. Right now, we stub it out, and it needs to be
working to actually achieve coverage.

Updated

•

1 year ago

Assignee: nobody → egubler

Status: NEW → ASSIGNED

Comment 3

•

1 year ago

I've saved off some WIP patches for this, but I don't consider myself to be actively working on this. Unassigning.

Updated

•

1 year ago

Assignee: egubler → nobody

Status: ASSIGNED → NEW

Updated

•

1 year ago

Blocks: 1843570

Updated

•

11 months ago

Blocks: webgpu-v1
No longer blocks: webgpu-v1-cts-blockers

Comment 4

•

11 months ago

Method to implement: https://gpuweb.github.io/gpuweb/#dom-gpudevice-destroy

Updated

•

11 months ago

Assignee: nobody → jimb

Status: NEW → ASSIGNED

Priority: P1 → P2

Updated

•

10 months ago

Priority: P2 → P1

Assignee

Comment 5

•

10 months ago

From reading the spec for https://gpuweb.github.io/gpuweb/#dom-gpudevice-destroy, it seems like there are some dependencies that we'll need to implement to accomplish this.

Implement https://gpuweb.github.io/gpuweb/#lose-the-device in wgpu-core.
Implement the concept of Device validity in wgpu-core.
Cleanup the semantics of resolving the GPUDevice.lost Promise, which should be triggered from wgpu-core during "lose the device", not from Device::Cleanup() which is invoked by the cycle collector.

I'll take this Bug for now and try to address these dependencies. Not sure if I'll file more issues here or in wgpu.

Assignee: jimb → bwerth

Nicolas Silva [:nical]

Comment 6

•

10 months ago

A few handwavey notes about the wgpu internals:

All objects of the API are held in a hub of global registries: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/hub.rs#L451
If you look at the underlying Storage type, it already has something to represent when an object exists but is not in a usable state: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/storage.rs#L21
That means code that access the objects typically needs to handle the object being in an unusable state, for example: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/device/global.rs#L147
Before deleting the actual device (for example the vulkan one), all associated resources must have been deleted first.
Because there might be a frame in flight while the GPU is using the device, we can't actually destroy the device as soon as we receive the device.destroy API call. There is a mechanism for deferring deletion of texture, buffers, etc. to a safe point in the future where we know that the objects are not being used by the gpu. I don't know if it is already used for the device or if we need to plumb it in. Jim is a good person to ask about the details of the async deletion mechanisms.
My intuition is that much of the work for supporting device destroy (triggered by JS or ourselves under some low memory conditions) and device lost (triggered by the driver at any time) is the same, but if it makes it simpler, it's definitely reasonable to split it in two projects and focus on device destroy first.

Assignee

Comment 7

•

10 months ago

(In reply to Nicolas Silva [:nical] from comment #6)

Thank you for this detailed write-up. WIP at https://github.com/bradwerth/wgpu/pull/new/deviceValid

If you look at the underlying Storage type, it already has something to represent when an object exists but is not in a usable state: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/storage.rs#L21

I didn't fully internalize it when I started my patch so I didn't follow your advice here. I will probably change all of this to store the Error, and do a match device_guard.get instead of the proposed is_valid function.

That means code that access the objects typically needs to handle the object being in an unusable state, for example: https://github.com/gfx-rs/wgpu/blob/012304ea111a06b574fcd7863946acef917581f8/wgpu-core/src/device/global.rs#L147

These safety checks appear to largely be missing at the moment. If nothing else, the WIP code highlights all the callsites I could find that will need to check the device before using it.

Assignee

Comment 8

•

10 months ago

It does seem possible that we might need to get a Device from storage and operate on it even when it is invalid, which would make storing the Error object inadvisable. I'll see if there are any cases that require operations on an invalid Device.

Assignee

Comment 9

•

10 months ago

(In reply to Brad Werth [:bradwerth] from comment #8)

It does seem possible that we might need to get a Device from storage and operate on it even when it is invalid, which would make storing the Error object inadvisable. I'll see if there are any cases that require operations on an invalid Device.

Well, the major one is that poll_devices needs to be able to iterate the devices in order to wait on their queue completions and eventually drop the device. Representing invalid by storing an Error makes that challenging since the Error would have to also store an Opt<T> for cases where we're using it to represent invalid. I don't think it's worth changing to this scheme.

Assignee

Comment 10

•

10 months ago

Attached file Bug 1838693 Part 1: Stub in 'lose the device' and trigger it on device lost errors. — Details

This is dependent on the unmerged wgpu changes at
https://github.com/gfx-rs/wgpu/commit/0d4825c6b75563e5b2b1fb3a8347eff1a9a6ec22

This creates a WebGPUParent::LoseDevice entry point, which both tells
wgpu to lose the device, and informs the WebGPUChild that the device has
been lost. The child side is largely a stub, as a later part will
rationalize the handling of the device.lost promise in the child.

This also invokes LoseDevice from RecvAdapterRequestDevice on error,
fulfilling one of the triggers in the spec for "lose the device".

Comment 11

•

9 months ago

:bradwerth: Are the patches I have up still useful? Seeing if I can clean out some of my own patches in review here.

Updated

•

9 months ago

Flags: needinfo?(bwerth)

Assignee

Comment 12

•

9 months ago

(In reply to Erich Gubler [:ErichDonGubler] from comment #11)

:bradwerth: Are the patches I have up still useful? Seeing if I can clean out some of my own patches in review here.

Yes, they are probably obsolete, though later parts in this patch stack may need to do some similar things. And possibly the test you supplied will "just work" when all this is done. Go ahead and obsolete them and I'll cherry-pick the test later.

Flags: needinfo?(bwerth)

Updated

•

9 months ago

Attachment #9343926 - Attachment is obsolete: true

Updated

•

9 months ago

Attachment #9343927 - Attachment is obsolete: true

Updated

•

9 months ago

Attachment #9353488 - Attachment description: WIP: Bug 1838693 Part 1: Stub in 'lose the device' and device.valid. → Bug 1838693 Part 1: Stub in 'lose the device' and device.valid.

Assignee

Comment 13

•

9 months ago

Attached file Bug 1838693 Part 2: Rationalize GPUDevice lost promise handling. — Details

This ensures that both internal and external triggers of "lose the
device" resolve the lost promise using the same code path.

Depends on D188388

Assignee

Comment 14

•

9 months ago

Attached file Bug 1838693 Part 3: Make GPUDevice::Destroy() trigger wgpu device_destroy, then device_drop. — Details

This treats destroy as a 2-step process: wait on the destroy, then drop.

Depends on D190129

Updated

•

9 months ago

Attachment #9356945 - Attachment description: Bug 1838693 Part 3: Make GPUDevice.Destroy() trigger wgpu device_destroy, then drop the device. → Bug 1838693 Part 3: Make GPUDevice::Destroy() trigger wgpu device_destroy, then device_drop.

Updated

•

9 months ago

Attachment #9353488 - Attachment description: Bug 1838693 Part 1: Stub in 'lose the device' and device.valid. → Bug 1838693 Part 1: Stub in 'lose the device' and trigger it on device lost errors.

Assignee

Comment 15

•

9 months ago

Attached file Bug 1838693 Part 5: Update tests and test expectations. — Details

Depends on D190238

Assignee

Comment 16

•

9 months ago

Attached file Bug 1838693 Part 4: Ensure that each Device is sent DeviceLost only once. — Details

This tracks DeviceIds that have been sent DeviceLost in a hashmap that
is pruned when the Device is dropped. This reduces IPC traffic with the
expected use case of many calls returning after a device loss.

Depends on D190238

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

9 months ago

Attachment #9358111 - Attachment description: Bug 1838693 Part 4: Update tests and test expectations. → Bug 1838693 Part 5: Update tests and test expectations.

Pulsebot

Comment 17

•

9 months ago

Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/843c7d19787b
Part 1: Stub in 'lose the device' and trigger it on device lost errors. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/f214290956c1
Part 2: Rationalize GPUDevice lost promise handling. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/41d7ed93df76
Part 3: Make GPUDevice::Destroy() trigger wgpu device_destroy, then device_drop. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/b8733146f070
Part 4: Ensure that each Device is sent DeviceLost only once. r=webgpu-reviewers,nical
https://hg.mozilla.org/integration/autoland/rev/9abfbd2f5892
Part 5: Update tests and test expectations. r=webgpu-reviewers,nical

Comment 18

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/843c7d19787b
https://hg.mozilla.org/mozilla-central/rev/f214290956c1
https://hg.mozilla.org/mozilla-central/rev/41d7ed93df76
https://hg.mozilla.org/mozilla-central/rev/b8733146f070
https://hg.mozilla.org/mozilla-central/rev/9abfbd2f5892

Status: ASSIGNED → RESOLVED

Closed: 9 months ago

status-firefox120: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 120 Branch

Calixte Denizet (:calixte)

Updated

•

9 months ago

Regressions: 1859974

Assignee

Updated

•

9 months ago

Regressions: 1859825

Tyson Smith [:tsmith]

Updated

•

9 months ago

Regressions: 1860113