[webgpu] crash on a webgpu demo (https://cx20.github.io/webgpu-test/examples/webgpu_glsl/teapot/index.html )
Categories
(Core :: Graphics: WebGPU, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox122 | --- | unaffected |
firefox123 | --- | disabled |
firefox124 | --- | disabled |
firefox125 | --- | disabled |
People
(Reporter: mayankleoboy1, Unassigned)
References
(Blocks 1 open bug, Regression, )
Details
(Keywords: crash, regression)
Crash Data
Attachments
(3 files)
Go to https://cx20.github.io/webgpu-test/examples/webgpu_glsl/teapot/index.html
AR: Crash
ER: Not so
Reporter | ||
Comment 1•2 years ago
|
||
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 2•1 years ago
|
||
Set release status flags based on info from the regressing bug 1873164
Updated•1 years ago
|
Updated•1 years ago
|
Updated•1 years ago
|
Updated•1 years ago
|
Updated•1 years ago
|
Updated•1 year ago
|
Comment 3•1 year ago
•
|
||
This is an internal issue where wgpu-hal
implementations are making assumptions about the order of bind group and bind group layout entries in Vec
s getting passed to them that aren't being ensured by wgpu-core
. In particular, DX12 and Metal (I've yet to confirm Vulkan) appear to assume that the shader-declared order of bindings will match the API-bound resources provided to a call to GPUDevice.createBindGroup
.
Marking this issue as P1. This is an ugly and confusing bug that prevents valid WebGPU programs from working, and it's obviously already being run into in demos. This will need to be resolved upstream first (see wgpu
#5421), and then consumed in a subsequent iteration of webgpu-update-wgpu
.
This issue also seems to apply to the GLES backend in WGPU upstream, but that doesn't apply to Firefox.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 5•1 year ago
|
||
wgpu
#5421 is now merged upstream, and awaiting webgpu-update-wgpu
.
Updated•1 year ago
|
Comment 6•1 year ago
|
||
WGPU has been re-vendored on mozilla-central
, and we should now be able to consume the fix.
Reporter | ||
Comment 7•1 year ago
|
||
Comment 8•1 year ago
|
||
Updated•1 year ago
|
Comment 10•1 year ago
|
||
After testing on some of our more commodity-tier CI hardware, it's unclear what the spread of this issue is. Gonna demote to P2 for now, but it's entirely possible that this will get bumped to P1 as we discover this issue to be more widespread.
Comment 11•1 year ago
|
||
An attachment that I'll explain momentarily.
Comment 12•1 year ago
|
||
At least in the case of my own machine on which I could reproduce this (a top-of-the-line-ish laptop 7 years ago with the latest Windows 10 currently on it, using an Intel Graphics HD 530 driver with its iGPU), this workload appears to provoke a timeout in the DX12 runtime, which subsequently disconnected the DX12 device from wgpu
with error code DXGI_ERROR_DEVICE_HUNG
; I was able to get a trace from the machine that consistently reproduces using wgpu
's player
binary, but not on my current daily driver for Windows (see attached fx-teapot.zip
). We may or may not be able to work around this; we fundamentally have limitations in what WebGPU back ends give us. We obviously aren't handling this very well, though. For the next assignee (likely myself): It's clear that there are at least two problems at play from my own reproductions of this issue:
-
wgpu
's DX12 backend (and likely others) may not be cutting off access to the DX12 backend fast enough. It appears that a subsequent allocation of a texture after receiving the failure succeeds in terms of theHRESULT
(error code) returned by the texture initialization, but in fact, the API (ID3D12Device::CreatePlacedResource
returns a null pointer.We need to ensure that we are invalidating all device-related WGPU resources once we detect that the underlying DX12 device has disconnected. We may already do this, but we should take this opportunity to double-check.
-
Later, this null pointer causes an access violation exception when
wgpu
tries to callIUnknown::AddRef
on it.This isn't surprising, given problem (1), but it is indicative of the fact that there are cases where Windows APIs may not indicate failure, but it still returns a null pointer for COM resources. We need to handle this case by explicitly checking that the returned COM pointer of resources we're attempting to initialize is not null before we accept them into tracked WGPU resources. I have some previous WIP work to make null checks earlier and more stringent against
wgpu
, but nothing that I've filed upstream yet.
Unassigning from myself, since there are yet higher priorities for the WebGPU team, ATM.
Comment 13•1 year ago
|
||
bugherder |
Updated•1 year ago
|
Comment 14•1 year ago
|
||
Closing because no crashes reported for 12 weeks.
Description
•