Hit MOZ_CRASH(PipelineLayout[119] does not exist) at /third_party/rust/wgpu-core/src/storage.rs:125
Categories
(Core :: Graphics: WebGPU, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox119 | --- | unaffected |
firefox120 | --- | disabled |
firefox121 | --- | verified |
People
(Reporter: jkratzer, Assigned: bradwerth)
References
(Blocks 2 open bugs, Regression)
Details
(Keywords: regression, testcase, Whiteboard: [bugmon:bisected,confirmed][fuzzblocker])
Attachments
(2 files)
Testcase found while fuzzing mozilla-central rev ffe93e4e0835 (built with: --enable-debug --enable-fuzzing).
Testcase can be reproduced using the following commands:
$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build ffe93e4e0835 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Hit MOZ_CRASH(PipelineLayout[119] does not exist) at /third_party/rust/wgpu-core/src/storage.rs:125
==232244==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f35c89e1b55 bp 0x7f35a069ae00 sp 0x7f35a069adf0 T232368)
==232244==The signal is caused by a WRITE memory access.
==232244==Hint: address points to the zero page.
#0 0x7f35c89e1b55 in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:281:3
#1 0x7f35c89e1b55 in RustMozCrash /mozglue/static/rust/wrappers.cpp:18:3
#2 0x7f35c89e1aea in mozglue_static::panic_hook::habfbf582d66d5c86 /mozglue/static/rust/lib.rs:96:9
#3 0x7f35c89e14eb in core::ops::function::Fn::call::h081d0c2d4ea076dc /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/ops/function.rs:79:5
#4 0x7f35c9a4d1fd in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..Fn$LT$Args$GT$$GT$::call::hb3a915ffd78277c6 /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/boxed.rs:2007:9
#5 0x7f35c9a4d1fd in std::panicking::rust_panic_with_hook::h75cd912a39a34e8a /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/std/src/panicking.rs:709:13
#6 0x7f35c9a4cf86 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h1498b46f7849e167 /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/std/src/panicking.rs:597:13
#7 0x7f35c9a4a245 in std::sys_common::backtrace::__rust_end_short_backtrace::hd36a39b27b98086b /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/std/src/sys_common/backtrace.rs:151:18
#8 0x7f35c9a4ccd1 in rust_begin_unwind /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/std/src/panicking.rs:593:5
#9 0x7f35c9aac9b2 in core::panicking::panic_fmt::h98ef273141454c23 /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/panicking.rs:67:14
#10 0x7f35c7bb7cec in wgpu_server_pipeline_layout_drop /gfx/wgpu_bindings/src/server.rs:982:5
#11 0x7f35c1d7f3f9 in mozilla::webgpu::WebGPUParent::RecvImplicitLayoutDestroy(unsigned long, nsTArray<unsigned long> const&) /dom/webgpu/ipc/WebGPUParent.cpp:834:3
#12 0x7f35c1d90dd0 in mozilla::webgpu::PWebGPUParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PWebGPUParent.cpp:2002:80
#13 0x7f35bfe0bc0d in mozilla::gfx::PCanvasManagerParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PCanvasManagerParent.cpp:269:32
#14 0x7f35bf37fc1f in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /ipc/glue/MessageChannel.cpp:1800:25
#15 0x7f35bf37c972 in mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>) /ipc/glue/MessageChannel.cpp:1725:9
#16 0x7f35bf37d5f2 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) /ipc/glue/MessageChannel.cpp:1525:3
#17 0x7f35bf37e73f in mozilla::ipc::MessageChannel::MessageTask::Run() /ipc/glue/MessageChannel.cpp:1623:14
#18 0x7f35be6c7c4d in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1192:16
#19 0x7f35be6cebdd in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:480:10
#20 0x7f35bf386e4e in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:300:20
#21 0x7f35bf29fc41 in RunHandler /ipc/chromium/src/base/message_loop.cc:363:3
#22 0x7f35bf29fc41 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:345:3
#23 0x7f35be6c2f33 in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:370:10
#24 0x7f35d38d4d0f in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
#25 0x7f35d4175ac2 in start_thread nptl/pthread_create.c:442:8
#26 0x7f35d4207a3f misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:281:3 in MOZ_Crash
==232244==ABORTING
Reporter | ||
Comment 1•9 months ago
|
||
Comment 2•9 months ago
|
||
Verified bug as reproducible on mozilla-central 20231023141548-0dce3814f2ad.
The bug appears to have been introduced in the following build range:
Start: e0dd0b10e8fd0ea751f11fb0a6548ad9b6780e16 (20231016153418)
End: fa12efd7ca249d06b27ea86690ae0d0478f5dcce (20231016182434)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=e0dd0b10e8fd0ea751f11fb0a6548ad9b6780e16&tochange=fa12efd7ca249d06b27ea86690ae0d0478f5dcce
Updated•9 months ago
|
Comment 3•8 months ago
|
||
This bug has been marked as a regression. Setting status flag for Nightly to affected
.
Comment 4•8 months ago
|
||
This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:jimb, could you consider increasing the severity?
For more information, please visit BugBot documentation.
Updated•8 months ago
|
Assignee | ||
Updated•8 months ago
|
Assignee | ||
Comment 5•8 months ago
|
||
This is happening because when we create a pipeline on an invalid device, we don't respond adequately to the error and we throw away the error-generated id. What we need to do is one or both of:
- Respond to the error by invalidating the newly-created pipeline.
- Take in the error-generated id and change the newly-created pipeline to that id.
First and foremost, we need to respond better to the error, so I'll try to build a patch that pursues strategy #1, above.
Assignee | ||
Comment 6•8 months ago
|
||
Ugh, this is tricky. Since creation of a render pipeline gets encoded as a device action and sent in a way that won't get an immediate failure on a lost device, it's not easy to invalidate the content-side pipeline object. There may need to be an invalidation message sent from parent to child when the error has been generated. Alternatively, maybe the parent can first check if the id maps to an error in the registry, before asking wgpu to pipeline_layout_drop
it through the gfx_select!
macro.
Assignee | ||
Comment 7•8 months ago
|
||
Actually, our panic is that there is no entry for the id, so that implies that the error setting in device_create_render_pipeline
is not getting executed.
Assignee | ||
Comment 8•8 months ago
|
||
Okay, the id is getting correctly set to an error in hub.render_pipelines
but it is being retrieved from hub.pipeline_layouts
, where it doesn't exist. Perhaps we are calling the wrong function in response to the pipeline drop?
Assignee | ||
Comment 9•8 months ago
|
||
Okay, wgpu_client_create_compute_pipeline
sets an implicit pipeline layout id in the child, without knowing that the pipeline creation itself will eventually fail. When that fails in device_create_render_pipeline
, the pipeline layout id is never inserted by create_render_pipeline
, so it doesn't exist when the pipeline is eventually dropped and tries to also drop its implicit pipeline layout.
Not sure what would be the best solution here. wgpu won't tolerate the retrieval of a non-existent id (that's the panic that motivates this Bug). That's not going to change in wgpu -- it's part of the design choice. Our child view of the render pipeline assumes all is well and is never notified that its creation failed. It also assumes that the render pipeline will have an implicit pipeline layout with the same id.
Possible fixes:
- wgpu could be made to supply an invalid pipeline layout id when failing to create a pipeline. But why should it?
WebGPUParent::RecvDeviceAction
could notify the child when something in theSendDeviceAction
fails. But that would involve reparsing the byte buffer to see what actions were being attempted and then unwinding them. This would be nasty.
I'm going to think about this for awhile and see if I can come up with a more palatable fix, because both of these options are bad.
Assignee | ||
Comment 10•8 months ago
|
||
Alright, I think the fix will need to be in wgpu, but we'll need to build a temporary remediation in Firefox until wgpu is re-vendored with the fix. I've confirmed that moving the check of device.valid
in device_create_render_pipeline
further down, past the call to device.create_render_pipeline
is sufficient to fix the Bug. That's because device.create_render_pipeline
sets the error value for the implicit pipeline layout id, ensuring that when the pipeline is later dropped and the client assumes that layout id exists, it will be found in wgpu.
So, a four-stage fix, the first part of which will be done in this Bug:
- Stop destroying the implicit pipeline layout id, which will leak memory. Add a comment explaining why, referencing a new Bug that will revert this behavior.
- Build a fix in wgpu and get it accepted.
- Re-vendor wgpu, tracked in Bug 1851881.
- In a to-be-filed Bug, revert the changes in Step 1.
Assignee | ||
Comment 11•8 months ago
|
||
Assignee | ||
Comment 12•8 months ago
|
||
The wgpu-side fix is https://github.com/gfx-rs/wgpu/pull/4624.
Comment 13•8 months ago
|
||
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b361ef38ca02 Stop destroying implicit pipeline layouts and implicit bind group layouts. r=webgpu-reviewers,ErichDonGubler
Comment 14•8 months ago
|
||
bugherder |
Comment 15•8 months ago
|
||
Verified bug as fixed on rev mozilla-central 20231103051812-8be76292bf3f.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Description
•