<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Updated

•

1 year ago

Crash Signature: @ d3d12::com::ComPtr<T>::as_unknown ] [@ mozilla::webgpu::CommandEncoder::CommandEncoder ]

Summary: [WebGPU] with n-readback dx12 enabled, crash on https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=8 → [WebGPU] with no-readback dx12 enabled, crash on https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=8

Reporter

Updated

•

1 year ago

Keywords: regression

Regressed by: 1856787

Reporter

Updated

•

1 year ago

Flags: needinfo?(sotaro.ikeda.g)

Reporter

Updated

•

1 year ago

Summary: [WebGPU] with no-readback dx12 enabled, crash on https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=8 → [WebGPU] with no-readback dx12 enabled, crash on https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=30

BugBot [:suhaib / :marco/ :calixte]

Reporter

Updated

•

1 year ago

Crash Signature: @ d3d12::com::ComPtr<T>::as_unknown ] [@ mozilla::webgpu::CommandEncoder::CommandEncoder ] → [@ d3d12::com::ComPtr<T>::as_unknown ] [@ mozilla::webgpu::CommandEncoder::CommandEncoder ]

Comment 1

•

1 year ago

Set release status flags based on info from the regressing bug 1856787

status-firefox119: --- → unaffected

status-firefox120: --- → unaffected

status-firefox121: --- → affected

status-firefox-esr115: --- → unaffected

Reporter

Comment 2

•

1 year ago

Additional secondary STR:

Enable webgpu
enable the no-readback dx12 thingy
Go to https://wgpu-game-of-life.fornwall.net/#rule=1&size=2048&seed=0&density=29&gps=30
Rapidly repeatedly click on the "Image" button.

After maybe 20-30 clicks, I get a driver hang + Firefox tab crash
https://crash-stats.mozilla.org/report/index/5bbc1fdc-519e-4da0-8acf-771ac0231109
https://crash-stats.mozilla.org/report/index/d8967991-ce7b-4b2a-8521-f360e0231109
https://crash-stats.mozilla.org/report/index/d874ff15-bba4-4cc0-813c-25a9d0231109#tab-bugzilla

Updated

•

1 year ago

Assignee: nobody → sotaro.ikeda.g

Flags: needinfo?(sotaro.ikeda.g)

Reporter

Comment 3

•

1 year ago

STR3 :
Follow the steps of comment 0 and let the tab crash.
Open the demo again

AR: The whole browser crashes.
https://crash-stats.mozilla.org/report/index/fc092041-d9fe-41e4-a660-cae200231109

Reporter

Updated

•

1 year ago

Crash Signature: [@ d3d12::com::ComPtr<T>::as_unknown ] [@ mozilla::webgpu::CommandEncoder::CommandEncoder ] → [@ d3d12::com::ComPtr<T>::as_unknown ] [@ mozilla::webgpu::CommandEncoder::CommandEncoder ] [@ wgpu_core::storage::Storage<T>::get_mut<T> ] [@ core::result::unwrap_failed | wgpu_core::command::CommandEncoder<T>::open<T> ]

Assignee

Updated

•

1 year ago

Blocks: 1859780

Severity: -- → S2

Priority: -- → P2

Assignee

Updated

•

1 year ago

Blocks: webgpu-apps

Comment 4

•

1 year ago

•

Edited

D3D11Device reset happened during the STR. And it caused the crash.

Comment 5

•

1 year ago

By reverting Bug 1860801, the crash was addressed.

Ryan VanderMeulen [:RyanVM]

Comment 6

•

1 year ago

Attached file WIP: Bug 1863872 - Re-add cache d3d12::Resource for swap chain textures (obsolete) — Details

Updated

•

1 year ago

status-firefox121: affected → fix-optional

Updated

•

1 year ago

Attachment #9363194 - Attachment is obsolete: true

Comment 7

•

1 year ago

Attached patch patch - add log around Texture and TextureView (obsolete) — Details — Splinter Review

With the patch, valid TextureView keeps Texture alive and the crash happened when opened d3d12::Resources were around 2020.

From it, there may be a limit to the number of d3d12::Resources oof D3D11 textures.

Comment 8

•

1 year ago

Attached file Bug 1863872 - Drop TextureView in Texture::Destroy() (obsolete) — Details

Jim Blandy :jimb

Updated

•

1 year ago

Severity: S2 → S4

Comment 9

•

1 year ago

•

Edited

I filed a bug upstream in https://github.com/gfx-rs/wgpu/issues/4700.
Edit: Actually there was already an issue filed at: https://github.com/gfx-rs/wgpu/issues/3350

I think that (short of implementing the whole descriptor cache solution that Dawn has to the sampler descriptor limit, which is a fair amount of of work) this should be addressed in wgpu by eagerly destroying texture views when a texture is destroyed.

Updated

•

1 year ago

See Also: → https://github.com/gfx-rs/wgpu/issues/3350

Ryan VanderMeulen [:RyanVM]

Updated

•

1 year ago

See Also: → https://github.com/gfx-rs/wgpu/issues/4700

Updated

•

1 year ago

status-firefox121: fix-optional → disabled

status-firefox122: --- → fix-optional

Updated

•

1 year ago

Attachment #9363420 - Attachment is obsolete: true

Donal Meehan [:dmeehan]

Updated

•

1 year ago

status-firefox122: fix-optional → disabled

Comment 10

•

1 year ago

(In reply to Nicolas Silva [:nical] from comment #9)

I filed a bug upstream in https://github.com/gfx-rs/wgpu/issues/4700.
Edit: Actually there was already an issue filed at: https://github.com/gfx-rs/wgpu/issues/3350

I think that (short of implementing the whole descriptor cache solution that Dawn has to the sampler descriptor limit, which is a fair amount of of work) this should be addressed in wgpu by eagerly destroying texture views when a texture is destroyed.

:nical, is there a plan that you work for the issue?

Flags: needinfo?(nical.bugzilla)

Comment 11

•

1 year ago

We landed improvements to the memory reclamation of buffers and textures (including the ability to destroy a texture while a texture view is alive). So it would be good to check back and see if the problem is still happening. The linked demo does not start in Firefox for me, it looks like something is failing at initialization, unrelated to the original issue.
Regardless, we still haven't made progress with the 2k samplers limit, and don't have a good concrete plan, I'll try to get this prioritized soon.

Flags: needinfo?(nical.bugzilla)

https://crash-stats.mozilla.org/report/index/55a89806-440c-4baa-8e47-028bb0240117

Comment 12

•

1 year ago

I still see the same crash with pref dom.webgpu.swap-chain.external-texture-dx12 = true.

Comment 13

•

1 year ago

(In reply to Nicolas Silva [:nical] from comment #11)

We landed improvements to the memory reclamation of buffers and textures (including the ability to destroy a texture while a texture view is alive). So it would be good to check back and see if the problem is still happening.

:nical, which pull request did the improvements?

Flags: needinfo?(nical.bugzilla)

Comment 14

•

1 year ago

•

Edited

With latest m-c, crash happened at texture.resource.clone() in create_texture_view(), since texture.resource was nullptr.

Comment 15

•

1 year ago

Attached patch patch - add log around Texture and TextureView (obsolete) — Details — Splinter Review

Attachment #9363418 - Attachment is obsolete: true

Comment 16

•

1 year ago

With Attachment 9373170 [details] [diff], the crash happened when a number of TextureView became 2420.

Comment 17

•

1 year ago

Attached file temporal patch - Call TextureView::Cleanup() from Texture::Destroy() (obsolete) — Details

With the patch, the crash did not happen. Then TextureView seemed to consume d3d12 resources.

Reporter

Comment 18

•

1 year ago

This is also reproducible on demos on https://usegpu.live/demo/geometry/data . There is a dropdown at the bottom right of the page. You can select other demos for crash or hang

Comment 19

•

1 year ago

(In reply to Sotaro Ikeda [:sotaro] from comment #17)

Created attachment 9373171 [details]
temporal patch - Call TextureView::Cleanup() from Texture::Destroy()

With the patch, the crash did not happen. Then TextureView seemed to consume d3d12 resources.

The problem with this patch is that we cannot call TextureView's drop method until the very last texture view reference is destroyed. Otherwise JS code could try to use a dead texture view in some command and that would be the equivalent of a user-after-free.

The commit that improved texture memory reclamation is https://github.com/gfx-rs/wgpu/commit/4b82121501a61c2c2e11cb472d70ba54af3aa12d which makes it so if the user of the API calls texture.destroy(), wgpu, internally manages to deallocate the texture memory safely even if references to the texture still exist. That would not help, though, if the user is not calling texture.destroy() (memory reclamation would still be at the whims of the garbage collector).

Besides that there are issues with the number of live samplers which is limited to about 2048 (that should actually be affected by the number of bind groups rather than texture views, though, so I'm less sure about how it relates to this bug).

With latest m-c, crash happened at texture.resource.clone() in create_texture_view(), since texture.resource was nullptr.

The linked crash reports show the content process crashing (as a result of the GPU process crashing). The specific issue of the content process crashing should be fixed once bug 1873047 lands.
If you have a crash stack that shows some details of what's going on on the GPU process it would be handy.

I'm going to spend some time to better understand this in the coming days.

Flags: needinfo?(nical.bugzilla)

Comment 20

•

1 year ago

•

Edited

If you have a crash stack that shows some details of what's going on on the GPU process it would be handy.

With the STR, by attaching debugger to GPU process, I got the following stack.

[インラインフレーム] xul.dll!d3d12::com::impl$2::clone(d3d12::com::ComPtr<winapi::um::d3d12::ID3D12Resource> * self) 行 69 Rust
xul.dll!wgpu_hal::dx12::device::impl$1::create_texture_view(wgpu_hal::dx12::Device * self, wgpu_hal::dx12::Texture * texture, wgpu_hal::TextureViewDescriptor * desc) 行 469 Rust
xul.dll!wgpu_core::device::resource::Device<wgpu_hal::dx12::Api>::create_texture_view<wgpu_hal::dx12::Api>(alloc::sync::Arc<wgpu_core::resource::Texture<wgpu_hal::dx12::Api>> * self, wgpu_core::resource::TextureViewDescriptor * texture) 行 1174 Rust
xul.dll!wgpu_core::global::Global<wgpu_bindings::identity::IdentityRecyclerFactory>::texture_create_view<wgpu_bindings::identity::IdentityRecyclerFactory,wgpu_hal::dx12::Api>(wgpu_core::id::Id<wgpu_core::resource::Texture<wgpu_hal::empty::Api>> self, wgpu_core::resource::TextureViewDescriptor * texture_id, wgpu_core::id::Id<wgpu_core::resource::TextureView<wgpu_hal::empty::Api>> desc) 行 811 Rust
xul.dll!wgpu_bindings::server::Global::texture_action<wgpu_hal::dx12::Api>(wgpu_core::id::Id<wgpu_core::resource::Texture<wgpu_hal::empty::Api>> self, enum2$<wgpu_bindings::TextureAction> self_id, wgpu_bindings::error::ErrorBuffer action) 行 771 Rust
xul.dll!wgpu_bindings::server::wgpu_server_texture_action(wgpu_bindings::server::Global * global, wgpu_core::id::Id<wgpu_core::resource::Texture<wgpu_hal::empty::Api>> self_id, wgpu_bindings::ByteBuf * byte_buf, wgpu_bindings::error::ErrorBuffer error_buf) 行 933 Rust
xul.dll!mozilla::webgpu::WebGPUParent::RecvTextureAction(unsigned int64 aTextureId, unsigned int64 aDeviceId, const mozilla::ipc::ByteBuf & aByteBuf) 行 1297 C++
xul.dll!mozilla::webgpu::PWebGPUParent::OnMessageReceived(const IPC::Message & msg) 行 420 C++
xul.dll!mozilla::gfx::PCanvasManagerParent::OnMessageReceived(const IPC::Message & msg) 行 279 C++
xul.dll!mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy * aProxy, const IPC::Message & aMsg) 行 1813 C++
xul.dll!mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy * aProxy, mozilla::UniquePtr<IPC::Message,mozilla::DefaultDelete<IPC::Message>> aMsg) 行 1736 C++
xul.dll!mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy * aProxy, mozilla::ipc::MessageChannel::MessageTask & aTask) 行 1526 C++
xul.dll!mozilla::ipc::MessageChannel::MessageTask::Run() 行 1632 C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) 行 1194 C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) 行 480 C++
xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) 行 300 C++
[インラインフレーム] xul.dll!MessageLoop::RunInternal() 行 370 C++
xul.dll!MessageLoop::RunHandler() 行 364 C++
xul.dll!MessageLoop::Run() 行 346 C++
xul.dll!nsThread::ThreadFunc(void * aArg) 行 372 C++
nss3.dll!_PR_NativeRunThread(void * arg) 行 421 C
nss3.dll!pr_root(void * arg) 行 140 C
[外部コード]
[インラインフレーム] mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::operator()(int & aArgs, void * & aArgs, void * & aArgs) 行 150 C++
mozglue.dll!patched_BaseThreadInitThunk(int aIsInitialThread, void * aStartAddress, void * aThreadParam) 行 561 C++
[外部コード]

texture.resource.clone() was failed since txture.resource was nullptr.

Updated

•

1 year ago

Flags: needinfo?(nical.bugzilla)

Comment 21

•

1 year ago

I reproduced the issue. I see this in the log:

D3D12 ERROR: ID3D12CommandQueue::ExecuteCommandLists: Command lists must be successfully closed before execution. [ EXECUTION ERROR #838: EXECUTECOMMANDLISTS_FAILEDCOMMANDLIST]
Exception thrown at 0x00007FFA9C07CF19 in firefox.exe: Microsoft C++ exception: _com_error at memory location 0x0000007CA7FF5FF8.
D3D12: Removing Device.
D3D12 ERROR: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_INVALID_CALL: There is strong evidence that the application has performed an illegal or undefined operation, and such a condition could not be returned to the application cleanly through a return code). [ EXECUTION ERROR #232: DEVICE_REMOVAL_PROCESS_AT_FAULT]

Note: I initially suspected we were not properly calling destroy on the swap chain textures, but I verified that we do and that wgpu-core is internally destroying all of the swap chain textures. So my next guess is that it has something to do with the error causing the device to be removed above. If not, fixing the device removal will at least help with getting a cleaner repro. I'll keep digging

Comment 22

•

1 year ago

A command list isn't successfully closed with close returning an out-of-memory error. After spending a bit more time in the d3d12 backend I better understand what's going on. Internally both the texture and texture view point to a reference counted ID3D12Resource which holds the gpu memory allocation. It has been right under my nose for the whole time, the reason kept missing this is that the problem only occurs when the suballocation features is disabled which is the case in gecko but not by default, so the language server was always pointing me to another implementation that does not have this issue.

The solution is to either get suballocation or track texture views in textures and deallocate them eagerly (like you did Sotaro, but from inside wgpu-core where it can be done safely).

I'll give it a go.

Assignee: sotaro.ikeda.g → nical.bugzilla

Flags: needinfo?(nical.bugzilla)

Updated

•

1 year ago

See Also: → https://github.com/gfx-rs/wgpu/issues/5079

Comment 23

•

1 year ago

The wgpu-core side change for destroying texture views associated with destroyed textures is up for review in https://github.com/gfx-rs/wgpu/pull/5131

While discussing this with Jeff and Jim we noted that another thing we should for performance, but that would have an impact here is to reuse the canvas textures more aggressively, and reuse the associated texture views. Jim filed Bug 1876114.

Assignee

Updated

•

1 year ago

Depends on: 1876389

Assignee

Comment 24

•

1 year ago

gfx-rs/wgpu#5131 has been reviewed and merged, and is awaiting re-vendoring of WGPU (see bug 1876389).

Updated

•

1 year ago

Blocks: 1843891

Comment 25

•

1 year ago

•

Edited

With latest m-c, the STR with pref dom.webgpu.swap-chain.external-texture-dx12 = true did not cause the crash. But global.queue_submit() in wgpu_server_queue_submit() did not return.

Comment 26

•

1 year ago

:nical, do you have any idea bout comment 25?

Flags: needinfo?(nical.bugzilla)

Reporter

Comment 27

•

1 year ago

The demo does not animate with the latest Nightly (containing bug 1876389) and dx12-no-readback thingy enabled... A lot of other webgpu demos are not working either (https://react-webgpu-samples.vercel.app/ , https://webgpu.github.io/webgpu-samples/samples/resizeCanvas, etc.)

Comment 28

•

1 year ago

I heard that recent wgpu update caused several deadlock problems.

Comment 29

•

1 year ago

Callstack when deadlock of comment 25 happened.

xul.dll!parking_lot_core::thread_parker::imp::waitaddress::WaitAddress::wait_on_address(core::sync::atomic::AtomicUsize * self, unsigned int key) 行 100 Rust
xul.dll!parking_lot_core::thread_parker::imp::waitaddress::WaitAddress::park(core::sync::atomic::AtomicUsize * self) 行 53 Rust
xul.dll!parking_lot_core::thread_parker::imp::impl$1::park(parking_lot_core::thread_parker::imp::ThreadParker * self) 行 119 Rust
xul.dll!parking_lot_core::parking_lot::park::closure$0(parking_lot_core::parking_lot::park::closure_env$0<parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$0,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$1,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$2> thread_data, parking_lot_core::parking_lot::ThreadData *) 行 635 Rust
xul.dll!parking_lot_core::parking_lot::with_thread_data(parking_lot_core::parking_lot::park::closure_env$0<parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$0,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$1,parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$2> f) 行 207 Rust
xul.dll!parking_lot_core::parking_lot::park(unsigned __int64 key, parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$0 validate, parking_lot::raw_rwlock::impl$10::wait_for_readers::closure_env$1) 行 600 Rust
xul.dll!parking_lot::raw_rwlock::RawRwLock::wait_for_readers(enum2$<core::option::Option<std::time::Instant>> self, unsigned __int64 prev_value) 行 1013 Rust
xul.dll!parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow(enum2$<core::option::Option<std::time::Instant>> self) 行 645 Rust
xul.dll!parking_lot::raw_rwlock::impl$0::lock_exclusive(parking_lot::raw_rwlock::RawRwLock * self) 行 73 Rust
xul.dll!lock_api::rwlock::RwLock<parking_lot::raw_rwlock::RawRwLock,tuple$<>>::write() 行 480 Rust
xul.dll!wgpu_core::snatch::SnatchLock::write() 行 90 Rust
xul.dll!wgpu_core::resource::impl$19::drop(wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api> * self) 行 1045 Rust
xul.dll!core::ptr::drop_in_place<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>(wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api> *) 行 497 Rust
xul.dll!alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>::drop_slow<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>() 行 1266 Rust
xul.dll!alloc::sync::impl$27::drop(alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>> * self) 行 1897 Rust
xul.dll!core::ptr::drop_in_place(alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place(tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>> *) 行 497 Rust
xul.dll!core::ptr::mut_ptr::impl$0::drop_in_place(tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>> * self) 行 1431 Rust
xul.dll!hashbrown::raw::Bucket<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>>::drop() 行 581 Rust
xul.dll!hashbrown::raw::RawTable<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global>::drop_elements<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global>() 行 1038 Rust
xul.dll!hashbrown::raw::impl$17::drop(hashbrown::raw::RawTable<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global> * self) 行 2699 Rust
xul.dll!core::ptr::drop_in_place(hashbrown::raw::RawTable<tuple$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>>,alloc::alloc::Global> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place(hashbrown::map::HashMap<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>,alloc::alloc::Global> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place(std::collections::hash::map::HashMap<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Texture>>,alloc::sync::Arc<wgpu_core::resource::DestroyedTexture<wgpu_hal::dx12::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>> *) 行 497 Rust
xul.dll!core::ptr::drop_in_place<wgpu_core::device::life::ResourceMaps<wgpu_hal::dx12::Api>>(wgpu_core::device::life::ResourceMaps<wgpu_hal::dx12::Api> *) 行 497 Rust
xul.dll!wgpu_core::device::life::LifetimeTracker<wgpu_hal::dx12::Api>::triage_submissions<wgpu_hal::dx12::Api>(unsigned __int64 self, wgpu_core::device::CommandAllocator<wgpu_hal::dx12::Api> * last_done) 行 368 Rust
xul.dll!wgpu_core::device::resource::Device<wgpu_hal::dx12::Api>::maintain<wgpu_hal::dx12::Api>(wgpu_hal::dx12::Fence * self, enum2$<wgpu_types::Maintain<wgpu_core::device::queue::WrappedSubmissionIndex>> fence) 行 357 Rust
xul.dll!wgpu_core::global::Global::queue_submit<wgpu_hal::dx12::Api>(wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Queue>> self, ref$<slice2$<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::CommandBuffer>>>> queue_id) 行 1546 Rust
xul.dll!wgpu_bindings::server::wgpu_server_queue_submit(wgpu_bindings::server::Global * global, wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Queue>> self_id, wgpu_core::id::Id<enum2$<wgpu_core::id::markers::CommandBuffer>> * command_buffer_ids, unsigned __int64 command_buffer_id_length, wgpu_bindings::error::ErrorBuffer error_buf) 行 1099 Rust
xul.dll!mozilla::webgpu::WebGPUParent::RecvQueueSubmit(unsigned __int64 aQueueId, unsigned __int64 aDeviceId, const nsTArray<unsigned long long> & aCommandBuffers, const nsTArray<unsigned long long> & aTextureIds) 行 760

Jim Blandy :jimb

Comment 30

•

1 year ago

It seems like SnatchLock::write is hanging, probably because an older frame is holding a read lock.

Comment 31

•

1 year ago

Here we go: https://github.com/gfx-rs/wgpu/pull/5216

Flags: needinfo?(nical.bugzilla)

Reporter

Comment 32

•

1 year ago

•

Edited

I still get teh crash from the latest Nightly, it just takes longer. I opened the testcase and let it run for 60-120 seconds. The browser crashed. https://crash-stats.mozilla.org/report/index/265b8bd0-4ba3-4c56-bedb-43a840240210
(ni? :nical so they see my comment)

Reporter

Updated

•

1 year ago

Flags: needinfo?(nical.bugzilla)

Comment 33

•

1 year ago

By attaching debugger, the log out had the following.

D3D11: Removing Device.
0x00007FFE1EE94D8C (KernelBase.dll) で例外がスローされました (firefox.exe 内): WinRT originate error - 0x887A0005 : 'The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.'

Comment 34

•

1 year ago

Attached patch patch - Add log (obsolete) — Details — Splinter Review

Attachment #9373170 - Attachment is obsolete: true

Comment 35

•

1 year ago

With the patch of Attachment 9379901 [details] [diff], device reset happened when dx12 TextureView count became 1753.

Comment 36

•

1 year ago

Erich will pick this up going forward.

Flags: needinfo?(nical.bugzilla) → needinfo?(egubler)

Updated

•

1 year ago

Depends on: 1879989

Reporter

Comment 37

•

1 year ago

I still get a crash on the testcase (open the testcase and maximize the "initial density" slider) : https://crash-stats.mozilla.org/report/index/73a2c52f-5029-4ed9-b1b3-a88be0240229

Comment 38

•

1 year ago

•

Edited

I also get a crash. With the log patch of Attachment 9388300 [details] [diff], device reset happened when dx12 TextureView count became 1582.

Comment 39

•

1 year ago

Attached patch patch - Add log (obsolete) — Details — Splinter Review

Attachment #9379901 - Attachment is obsolete: true

Assignee

Updated

•

1 year ago

Assignee: nical.bugzilla → egubler

Status: NEW → ASSIGNED

Flags: needinfo?(egubler)

Assignee

Updated

•

1 year ago

Priority: P2 → P1

Assignee

Comment 40

•

1 year ago

Passing investigation on to :sotaro.

Assignee: egubler → sotaro.ikeda.g

Updated

•

1 year ago

Depends on: 1881518

Updated

•

1 year ago

See Also: → https://github.com/gfx-rs/wgpu/pull/5131

Comment 41

•

1 year ago

Attached patch patch - Add log — Details — Splinter Review

With the patch on latest m-c, device reset happened when count of dx12 TextureView became 1978. From it, the large number of the dx12 TextureView seemed to trigger the device reset.

The TextureView count became large since, the dx12 TextureViews were alive until texture_view_drop() was called.

texture_view_drop() was triggered at TextureView::Cleanup(). It was not called often, since it was triggered by cycle collection.

Then the count of dx12 TextureView became large.

Attachment #9388300 - Attachment is obsolete: true

Updated

•

1 year ago

Attachment #9363420 - Attachment is obsolete: false

Updated

•

1 year ago

Attachment #9363420 - Attachment description: Bug 1863872 - Call TextureView::Cleanup() in Texture::ForceDestroy() → Bug 1863872 - Drop TextureView in Texture::Destroy()

Comment 42

•

1 year ago

•

Edited

With D193511, the problem did not happen for me. But it is not correct fix for wgpu.

:nical wants a correct fix in wgpu. The following is a comment from :nical.

We can't drop a texture view while the JS object still exists, so we can't take this patch in its current state. That said wgpu internally does the same thing: textures have a list of texture views and when destroy is called on a texture, the internal resource of its views are internally removed.
ErichDonGubler: it would be good to double check that this system is working as expected and more generally instrument the number and size of all hal resources over time. Maybe we are incorrectly tracking the views of a texture or maybe the number of texture views is just a correlation.

Comment 43

•

1 year ago

Re-assign to :ErichDonGubler.

Assignee: sotaro.ikeda.g → egubler

Jim Blandy :jimb

Updated

•

1 year ago

Attachment #9390097 - Attachment is patch: true

Attachment #9390097 - Attachment mime type: application/octet-stream → text/plain

Bob Hood [:bhood]

Comment 44

•

1 year ago

•

Edited

For posterity, the link in the description for this report has expired/disappeared. Here is a functioning link to the WebGPU demo: https://usegpu.live/demo/geometry/data

Assignee

Comment 45

•

1 year ago

After pairing with :jimb and :nical, I believe we have a fix for the device getting reset by having too many textures: wgpu#5378. I've included more narrative about the fix there.

:sotaro, can you confirm that this resolves the crash for you? You should be able to re-vendor WGPU in your local checkout of with mach vendor --ignore-modified --force gfx/wgpu_bindings/moz.yaml --revision 27991d1b272b3d367d446daece6fde58d3cdfb5d.

Assuming that the fix works, and we get it landed promptly, this fix should arrive with the next iteration of webgpu-update-wgpu (which I'm responsible for this week).

Assignee

Updated

•

1 year ago

Flags: needinfo?(sotaro.ikeda.g)

Comment 46

•

1 year ago

:ErichDonGubler, thank you! I confirmed that the problem is addressed for me!!!

Flags: needinfo?(sotaro.ikeda.g)

Updated

•

1 year ago

Attachment #9363420 - Attachment is obsolete: true

Updated

•

1 year ago

Attachment #9373171 - Attachment is obsolete: true

Assignee

Comment 47

•

1 year ago

wgpu#5378 has merged upstream. Now awaiting the next iteration of webgpu-update-wgpu.

Depends on: webgpu-update-wgpu

Assignee

Updated

•

1 year ago

No longer depends on: webgpu-update-wgpu

Assignee

Updated

•

1 year ago

Depends on: 1884946

Reporter

Comment 48

•

1 year ago

testing autoland builds from bug 1884946 seems to fix this bug

Reporter

Comment 49

•

1 year ago

This is fixed on the latest Nightly.

Ryan VanderMeulen [:RyanVM]

Assignee

Updated

•

1 year ago

Status: ASSIGNED → RESOLVED

Closed: 1 year ago

Resolution: --- → FIXED

Updated

•

1 year ago

status-firefox123: --- → disabled

status-firefox124: --- → disabled

status-firefox125: --- → fixed

Target Milestone: --- → 125 Branch

Catalin Sasca, Desktop Test Engineering [:csasca]

Updated

•

11 months ago

Flags: qe-verify+

Ina Popescu, Desktop QA

Comment 50

•

11 months ago

Attached image Fx125.0b5.png — Details

I've replicated this issue using Nightly 121.0a1 (2023-11-08) on Windows 10 x64 following the STR from Comment 0, while pref dom.webgpu.swap-chain.external-texture-dx12=true.
However I'm unable to verify this in Firefox 125.0b5 as the provided test case does not work as expected(warning message "WebGPU is not working in our browser"). Please refer to the attached screenshot for details.
I can confirm that the issue no longer occurs in the latest Nightly 126.0a1 version.

Flags: needinfo?(egubler)