[WebGPU] with no-readback dx12 enabled, crash on https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=30
Categories
(Core :: Graphics: WebGPU, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox119 | --- | unaffected |
firefox120 | --- | unaffected |
firefox121 | --- | disabled |
firefox122 | --- | disabled |
firefox123 | --- | disabled |
firefox124 | --- | disabled |
firefox125 | --- | disabled |
firefox126 | --- | disabled |
People
(Reporter: mayankleoboy1, Assigned: ErichDonGubler)
References
(Blocks 2 open bugs, Regression, )
Details
(Keywords: regression)
Crash Data
Attachments
(2 files, 7 obsolete files)
16.62 KB,
patch
|
Details | Diff | Splinter Review | |
85.94 KB,
image/png
|
Details |
Enable webgpu
enable the no-readback dx12 thingy
Go to https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=30
Let the demo run actively for 30-45 seconds
AR: Crash
I get two crashes:
https://crash-stats.mozilla.org/report/index/1687e34a-6e54-4d66-a302-1a9210231109
https://crash-stats.mozilla.org/report/index/c5b8dce5-19b3-450b-86db-215ee0231109
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Updated•1 year ago
|
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1856787
Reporter | ||
Comment 2•1 year ago
|
||
Additional secondary STR:
Enable webgpu
enable the no-readback dx12 thingy
Go to https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=30
Rapidly repeatedly click on the "Image" button.
After maybe 20-30 clicks, I get a driver hang + Firefox tab crash
https://crash-stats.mozilla.org/report/index/5bbc1fdc-519e-4da0-8acf-771ac0231109
https://crash-stats.mozilla.org/report/index/d8967991-ce7b-4b2a-8521-f360e0231109
https://crash-stats.mozilla.org/report/index/d874ff15-bba4-4cc0-813c-25a9d0231109#tab-bugzilla
Updated•1 year ago
|
Reporter | ||
Comment 3•1 year ago
|
||
STR3 :
Follow the steps of comment 0 and let the tab crash.
Open the demo again
AR: The whole browser crashes.
https://crash-stats.mozilla.org/report/index/fc092041-d9fe-41e4-a660-cae200231109
Reporter | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Comment 4•1 year ago
•
|
||
D3D11Device reset happened during the STR. And it caused the crash.
Comment 5•1 year ago
|
||
By reverting Bug 1860801, the crash was addressed.
Comment 6•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 7•1 year ago
|
||
With the patch, valid TextureView keeps Texture alive and the crash happened when opened d3d12::Resources were around 2020.
From it, there may be a limit to the number of d3d12::Resources oof D3D11 textures.
Comment 8•1 year ago
|
||
Updated•1 year ago
|
Comment 9•1 year ago
•
|
||
I filed a bug upstream in https://github.com/gfx-rs/wgpu/issues/4700.
Edit: Actually there was already an issue filed at: https://github.com/gfx-rs/wgpu/issues/3350
I think that (short of implementing the whole descriptor cache solution that Dawn has to the sampler descriptor limit, which is a fair amount of of work) this should be addressed in wgpu by eagerly destroying texture views when a texture is destroyed.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 10•1 year ago
|
||
(In reply to Nicolas Silva [:nical] from comment #9)
I filed a bug upstream in https://github.com/gfx-rs/wgpu/issues/4700.
Edit: Actually there was already an issue filed at: https://github.com/gfx-rs/wgpu/issues/3350I think that (short of implementing the whole descriptor cache solution that Dawn has to the sampler descriptor limit, which is a fair amount of of work) this should be addressed in wgpu by eagerly destroying texture views when a texture is destroyed.
:nical, is there a plan that you work for the issue?
Comment 11•1 year ago
|
||
We landed improvements to the memory reclamation of buffers and textures (including the ability to destroy a texture while a texture view is alive). So it would be good to check back and see if the problem is still happening. The linked demo does not start in Firefox for me, it looks like something is failing at initialization, unrelated to the original issue.
Regardless, we still haven't made progress with the 2k samplers limit, and don't have a good concrete plan, I'll try to get this prioritized soon.
Comment 12•1 year ago
|
||
I still see the same crash with pref dom.webgpu.swap-chain.external-texture-dx12 = true.
https://crash-stats.mozilla.org/report/index/55a89806-440c-4baa-8e47-028bb0240117
Comment 13•1 year ago
|
||
(In reply to Nicolas Silva [:nical] from comment #11)
We landed improvements to the memory reclamation of buffers and textures (including the ability to destroy a texture while a texture view is alive). So it would be good to check back and see if the problem is still happening.
:nical, which pull request did the improvements?
Comment 14•1 year ago
•
|
||
With latest m-c, crash happened at texture.resource.clone() in create_texture_view(), since texture.resource was nullptr.
Comment 15•1 year ago
|
||
Comment 16•1 year ago
|
||
With Attachment 9373170 [details] [diff], the crash happened when a number of TextureView became 2420.
Comment 17•1 year ago
|
||
With the patch, the crash did not happen. Then TextureView seemed to consume d3d12 resources.
Reporter | ||
Comment 18•1 year ago
|
||
This is also reproducible on demos on https://usegpu.live/demo/geometry/data . There is a dropdown at the bottom right of the page. You can select other demos for crash or hang
Comment 19•1 year ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #17)
Created attachment 9373171 [details]
temporal patch - Call TextureView::Cleanup() from Texture::Destroy()With the patch, the crash did not happen. Then TextureView seemed to consume d3d12 resources.
The problem with this patch is that we cannot call TextureView's drop method until the very last texture view reference is destroyed. Otherwise JS code could try to use a dead texture view in some command and that would be the equivalent of a user-after-free.
The commit that improved texture memory reclamation is https://github.com/gfx-rs/wgpu/commit/4b82121501a61c2c2e11cb472d70ba54af3aa12d which makes it so if the user of the API calls texture.destroy(), wgpu, internally manages to deallocate the texture memory safely even if references to the texture still exist. That would not help, though, if the user is not calling texture.destroy() (memory reclamation would still be at the whims of the garbage collector).
Besides that there are issues with the number of live samplers which is limited to about 2048 (that should actually be affected by the number of bind groups rather than texture views, though, so I'm less sure about how it relates to this bug).
With latest m-c, crash happened at texture.resource.clone() in create_texture_view(), since texture.resource was nullptr.
The linked crash reports show the content process crashing (as a result of the GPU process crashing). The specific issue of the content process crashing should be fixed once bug 1873047 lands.
If you have a crash stack that shows some details of what's going on on the GPU process it would be handy.
I'm going to spend some time to better understand this in the coming days.
Comment 20•1 year ago
•
|
||
If you have a crash stack that shows some details of what's going on on the GPU process it would be handy.
With the STR, by attaching debugger to GPU process, I got the following stack.
[インライン フレーム] xul.dll!d3d12::com::impl$2::clone(d3d12::com::ComPtr<winapi::um::d3d12::ID3D12Resource> * self) 行 69 Rust
xul.dll!wgpu_hal::dx12::device::impl$1::create_texture_view(wgpu_hal::dx12::Device * self, wgpu_hal::dx12::Texture * texture, wgpu_hal::TextureViewDescriptor * desc) 行 469 Rust
xul.dll!wgpu_core::device::resource::Device<wgpu_hal::dx12::Api>::create_texture_view<wgpu_hal::dx12::Api>(alloc::sync::Arc<wgpu_core::resource::Texture<wgpu_hal::dx12::Api>> * self, wgpu_core::resource::TextureViewDescriptor * texture) 行 1174 Rust
xul.dll!wgpu_core::global::Global<wgpu_bindings::identity::IdentityRecyclerFactory>::texture_create_view<wgpu_bindings::identity::IdentityRecyclerFactory,wgpu_hal::dx12::Api>(wgpu_core::id::Id<wgpu_core::resource::Texture<wgpu_hal::empty::Api>> self, wgpu_core::resource::TextureViewDescriptor * texture_id, wgpu_core::id::Id<wgpu_core::resource::TextureView<wgpu_hal::empty::Api>> desc) 行 811 Rust
xul.dll!wgpu_bindings::server::Global::texture_action<wgpu_hal::dx12::Api>(wgpu_core::id::Id<wgpu_core::resource::Texture<wgpu_hal::empty::Api>> self, enum2$<wgpu_bindings::TextureAction> self_id, wgpu_bindings::error::ErrorBuffer action) 行 771 Rust
xul.dll!wgpu_bindings::server::wgpu_server_texture_action(wgpu_bindings::server::Global * global, wgpu_core::id::Id<wgpu_core::resource::Texture<wgpu_hal::empty::Api>> self_id, wgpu_bindings::ByteBuf * byte_buf, wgpu_bindings::error::ErrorBuffer error_buf) 行 933 Rust
xul.dll!mozilla::webgpu::WebGPUParent::RecvTextureAction(unsigned int64 aTextureId, unsigned int64 aDeviceId, const mozilla::ipc::ByteBuf & aByteBuf) 行 1297 C++
xul.dll!mozilla::webgpu::PWebGPUParent::OnMessageReceived(const IPC::Message & msg) 行 420 C++
xul.dll!mozilla::gfx::PCanvasManagerParent::OnMessageReceived(const IPC::Message & msg) 行 279 C++
xul.dll!mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy * aProxy, const IPC::Message & aMsg) 行 1813 C++
xul.dll!mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy * aProxy, mozilla::UniquePtr<IPC::Message,mozilla::DefaultDelete<IPC::Message>> aMsg) 行 1736 C++
xul.dll!mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy * aProxy, mozilla::ipc::MessageChannel::MessageTask & aTask) 行 1526 C++
xul.dll!mozilla::ipc::MessageChannel::MessageTask::Run() 行 1632 C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) 行 1194 C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) 行 480 C++
xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) 行 300 C++
[インライン フレーム] xul.dll!MessageLoop::RunInternal() 行 370 C++
xul.dll!MessageLoop::RunHandler() 行 364 C++
xul.dll!MessageLoop::Run() 行 346 C++
xul.dll!nsThread::ThreadFunc(void * aArg) 行 372 C++
nss3.dll!_PR_NativeRunThread(void * arg) 行 421 C
nss3.dll!pr_root(void * arg) 行 140 C
[外部コード]
[インライン フレーム] mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::operator()(int & aArgs, void * & aArgs, void * & aArgs) 行 150 C++
mozglue.dll!patched_BaseThreadInitThunk(int aIsInitialThread, void * aStartAddress, void * aThreadParam) 行 561 C++
[外部コード]
texture.resource.clone() was failed since txture.resource was nullptr.
Updated•1 year ago
|
Comment 21•1 year ago
|
||
I reproduced the issue. I see this in the log:
D3D12 ERROR: ID3D12CommandQueue::ExecuteCommandLists: Command lists must be successfully closed before execution. [ EXECUTION ERROR #838: EXECUTECOMMANDLISTS_FAILEDCOMMANDLIST]
Exception thrown at 0x00007FFA9C07CF19 in firefox.exe: Microsoft C++ exception: _com_error at memory location 0x0000007CA7FF5FF8.
D3D12: Removing Device.
D3D12 ERROR: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_INVALID_CALL: There is strong evidence that the application has performed an illegal or undefined operation, and such a condition could not be returned to the application cleanly through a return code). [ EXECUTION ERROR #232: DEVICE_REMOVAL_PROCESS_AT_FAULT]
Note: I initially suspected we were not properly calling destroy on the swap chain textures, but I verified that we do and that wgpu-core is internally destroying all of the swap chain textures. So my next guess is that it has something to do with the error causing the device to be removed above. If not, fixing the device removal will at least help with getting a cleaner repro. I'll keep digging
Comment 22•1 year ago
|
||
A command list isn't successfully closed with close
returning an out-of-memory error. After spending a bit more time in the d3d12 backend I better understand what's going on. Internally both the texture and texture view point to a reference counted ID3D12Resource
which holds the gpu memory allocation. It has been right under my nose for the whole time, the reason kept missing this is that the problem only occurs when the suballocation
features is disabled which is the case in gecko but not by default, so the language server was always pointing me to another implementation that does not have this issue.
The solution is to either get suballocation or track texture views in textures and deallocate them eagerly (like you did Sotaro, but from inside wgpu-core where it can be done safely).
I'll give it a go.
Updated•1 year ago
|
Comment 23•1 year ago
|
||
The wgpu-core side change for destroying texture views associated with destroyed textures is up for review in https://github.com/gfx-rs/wgpu/pull/5131
While discussing this with Jeff and Jim we noted that another thing we should for performance, but that would have an impact here is to reuse the canvas textures more aggressively, and reuse the associated texture views. Jim filed Bug 1876114.
Assignee | ||
Comment 24•1 year ago
|
||
gfx-rs/wgpu
#5131 has been reviewed and merged, and is awaiting re-vendoring of WGPU (see bug 1876389).
Comment 25•1 year ago
•
|
||
With latest m-c, the STR with pref dom.webgpu.swap-chain.external-texture-dx12 = true did not cause the crash. But global.queue_submit() in wgpu_server_queue_submit() did not return.
Comment 26•1 year ago
|
||
:nical, do you have any idea bout comment 25?
Reporter | ||
Comment 27•1 year ago
|
||
The demo does not animate with the latest Nightly (containing bug 1876389) and dx12-no-readback thingy enabled... A lot of other webgpu demos are not working either (https://react-webgpu-samples.vercel.app/ , https://webgpu.github.io/webgpu-samples/samples/resizeCanvas, etc.)
Comment 28•1 year ago
|
||
I heard that recent wgpu update caused several deadlock problems.
Comment 29•1 year ago
|
||
Callstack when deadlock of comment 25 happened.
xul.dll!parking_lot_core::thread_parker::imp::waitaddress::WaitAddress::wait_on_address(core::sync::atomic::AtomicUsize * self, unsigned int key) 行 100 Rust
xul.dll!parking_lot_core::thread_parker::imp::waitaddress::WaitAddress::park(core::sync::atomic::AtomicUsize * self) 行 53 Rust
xul.dll!parking_lot_core::thread_parker::imp::impl$1::park(parking_lot_core::thread_parker::imp::ThreadParker * self) 行 119 Rust
xul.dll!parking_lot_core::parking_lot::park::closure$0(parking_lot_core::parking_lot::park::closure_env$0<parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$0,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$1,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$2> thread_data, parking_lot_core::parking_lot::ThreadData *) 行 635 Rust
xul.dll!parking_lot_core::parking_lot::with_thread_data(parking_lot_core::parking_lot::park::closure_env$0<parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$0,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$1,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$2> f) 行 207 Rust
xul.dll!parking_lot_core::parking_lot::park(unsigned __int64 key, parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$0 validate, parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$1) 行 600 Rust
xul.dll!parking_lot::raw_rwlock::RawRwLock::wait_for_readers(enum2$<core::option::Option<std::time::Instant>> self, unsigned __int64 prev_value) 行 1013 Rust
xul.dll!parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow(enum2$<core::option::Option<std::time::Instant>> self) 行 645 Rust
xul.dll!parking_lot::raw_rwlock::impl$0::lock_exclusive(parking_lot::raw_rwlock::RawRwLock * self) 行 73 Rust
xul.dll!lock_api::rwlock::RwLock<parking_lot::raw_rwlock::RawRwLock,tuple$<>>::write() 行 480 Rust
xul.dll!wgpu_core::snatch::SnatchLock::write() 行 90 Rust
xul.dll!wgpu_core::resource::impl$19::drop(wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api> * self) 行 1045 Rust
xul.dll!core::ptr::drop_in_place<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>(wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api> *) 行 497 Rust
xul.dll!alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>::drop_slow<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>() 行 1266 Rust
xul.dll!alloc::sync::impl$27::drop(alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>> * self) 行 1897 Rust
xul.dll!core::ptr::drop_in_place(alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place(tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>> *) 行 497 Rust
xul.dll!core::ptr::mut_ptr::impl$0::drop_in_place(tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>> * self) 行 1431 Rust
xul.dll!hashbrown::raw::Bucket<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>>::drop() 行 581 Rust
xul.dll!hashbrown::raw::RawTable<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global>::drop_elements<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global>() 行 1038 Rust
xul.dll!hashbrown::raw::impl$17::drop(hashbrown::raw::RawTable<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global> * self) 行 2699 Rust
xul.dll!core::ptr::drop_in_place(hashbrown::raw::RawTable<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place(hashbrown::map::HashMap<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>,alloc::alloc::Global> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place(std::collections::hash::map::HashMap<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place<wgpu_core::device::life::ResourceMaps<wgpu_hal::dx12::Api>>(wgpu_core::device::life::ResourceMaps<wgpu_hal::dx12::Api> *) 行 497 Rust
xul.dll!wgpu_core::device::life::LifetimeTracker<wgpu_hal::dx12::Api>::triage_submissions<wgpu_hal::dx12::Api>(unsigned __int64 self, wgpu_core::device::CommandAllocator<wgpu_hal::dx12::Api> * last_done) 行 368 Rust
xul.dll!wgpu_core::device::resource::Device<wgpu_hal::dx12::Api>::maintain<wgpu_hal::dx12::Api>(wgpu_hal::dx12::Fence * self, enum2$<wgpu_types::Maintain<wgpu_core::device::queue::WrappedSubmissionIndex>> fence) 行 357 Rust
xul.dll!wgpu_core::global::Global::queue_submit<wgpu_hal::dx12::Api>(wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Queue>> self, ref$<slice2$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::CommandBuffer>>>> queue_id) 行 1546 Rust
xul.dll!wgpu_bindings::server::wgpu_server_queue_submit(wgpu_bindings::server::Global * global, wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Queue>> self_id, wgpu_core::id::Id<enum2$<wgpu_core::id::markers::CommandBuffer>> * command_buffer_ids, unsigned __int64 command_buffer_id_length, wgpu_bindings::error::ErrorBuffer error_buf) 行 1099 Rust
xul.dll!mozilla::webgpu::WebGPUParent::RecvQueueSubmit(unsigned __int64 aQueueId, unsigned __int64 aDeviceId, const nsTArray<unsigned long long> & aCommandBuffers, const nsTArray<unsigned long long> & aTextureIds) 行 760
Comment 30•1 year ago
|
||
It seems like SnatchLock::write
is hanging, probably because an older frame is holding a read lock.
Comment 31•1 year ago
|
||
Here we go: https://github.com/gfx-rs/wgpu/pull/5216
Reporter | ||
Comment 32•1 year ago
•
|
||
I still get teh crash from the latest Nightly, it just takes longer. I opened the testcase and let it run for 60-120 seconds. The browser crashed. https://crash-stats.mozilla.org/report/index/265b8bd0-4ba3-4c56-bedb-43a840240210
(ni? :nical so they see my comment)
Reporter | ||
Updated•1 year ago
|
Comment 33•1 year ago
|
||
By attaching debugger, the log out had the following.
D3D11: Removing Device.
0x00007FFE1EE94D8C (KernelBase.dll) で例外がスローされました (firefox.exe 内): WinRT originate error - 0x887A0005 : 'The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.'
Comment 34•1 year ago
|
||
Comment 35•1 year ago
|
||
With the patch of Attachment 9379901 [details] [diff], device reset happened when dx12 TextureView count became 1753.
Comment 36•1 year ago
|
||
Erich will pick this up going forward.
Reporter | ||
Comment 37•1 year ago
|
||
I still get a crash on the testcase (open the testcase and maximize the "initial density" slider) : https://crash-stats.mozilla.org/report/index/73a2c52f-5029-4ed9-b1b3-a88be0240229
Comment 38•1 year ago
•
|
||
I also get a crash. With the log patch of Attachment 9388300 [details] [diff], device reset happened when dx12 TextureView count became 1582.
Comment 39•1 year ago
|
||
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 40•1 year ago
|
||
Passing investigation on to :sotaro.
Updated•1 year ago
|
Comment 41•1 year ago
|
||
With the patch on latest m-c, device reset happened when count of dx12 TextureView became 1978. From it, the large number of the dx12 TextureView seemed to trigger the device reset.
The TextureView count became large since, the dx12 TextureViews were alive until texture_view_drop() was called.
texture_view_drop() was triggered at TextureView::Cleanup(). It was not called often, since it was triggered by cycle collection.
Then the count of dx12 TextureView became large.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 42•1 year ago
•
|
||
With D193511, the problem did not happen for me. But it is not correct fix for wgpu.
:nical wants a correct fix in wgpu. The following is a comment from :nical.
We can't drop a texture view while the JS object still exists, so we can't take this patch in its current state. That said wgpu internally does the same thing: textures have a list of texture views and when destroy is called on a texture, the internal resource of its views are internally removed.
ErichDonGubler: it would be good to double check that this system is working as expected and more generally instrument the number and size of all hal resources over time. Maybe we are incorrectly tracking the views of a texture or maybe the number of texture views is just a correlation.
Updated•1 year ago
|
Comment 44•1 year ago
•
|
||
For posterity, the link in the description for this report has expired/disappeared. Here is a functioning link to the WebGPU demo: https://usegpu.live/demo/geometry/data
Assignee | ||
Comment 45•1 year ago
|
||
After pairing with :jimb and :nical, I believe we have a fix for the device getting reset by having too many textures: wgpu
#5378. I've included more narrative about the fix there.
:sotaro, can you confirm that this resolves the crash for you? You should be able to re-vendor WGPU in your local checkout of with mach vendor --ignore-modified --force gfx/wgpu_bindings/moz.yaml --revision 27991d1b272b3d367d446daece6fde58d3cdfb5d
.
Assuming that the fix works, and we get it landed promptly, this fix should arrive with the next iteration of webgpu-update-wgpu
(which I'm responsible for this week).
Assignee | ||
Updated•1 year ago
|
Comment 46•1 year ago
|
||
:ErichDonGubler, thank you! I confirmed that the problem is addressed for me!!!
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 47•1 year ago
|
||
wgpu
#5378 has merged upstream. Now awaiting the next iteration of webgpu-update-wgpu
.
Assignee | ||
Updated•1 year ago
|
Reporter | ||
Comment 48•1 year ago
|
||
testing autoland builds from bug 1884946 seems to fix this bug
Reporter | ||
Comment 49•1 year ago
|
||
This is fixed on the latest Nightly.
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•11 months ago
|
Comment 50•11 months ago
|
||
I've replicated this issue using Nightly 121.0a1 (2023-11-08) on Windows 10 x64 following the STR from Comment 0, while pref dom.webgpu.swap-chain.external-texture-dx12=true.
However I'm unable to verify this in Firefox 125.0b5 as the provided test case does not work as expected(warning message "WebGPU is not working in our browser"). Please refer to the attached screenshot for details.
I can confirm that the issue no longer occurs in the latest Nightly 126.0a1 version.
Assignee | ||
Comment 51•11 months ago
|
||
Ina: I presume you're using a beta build, rather than a Nightly version. That's expected behavior in beta and stable; we don't expose the navigator.gpu
variable, because we don't want to expose WebGPU anywhere other than Nightly yet. We're using webgpu-v1
to track WebGPU's readiness for this.
Comment 53•11 months ago
|
||
Thank you for the clarification.
Marking this as "Verified Fixed" as the issue is no longer present in the latest Nightly 126.0a1 version.
Description
•