Closed Bug 1877046 Opened 4 months ago Closed 4 months ago

Crash in [@ <wgpu_core::device::DeviceLostClosureC as core::ops::drop::Drop>::drop]

Categories

(Core :: Graphics: WebGPU, defect, P2)

Firefox 124
Other
Linux
defect

Tracking

()

RESOLVED FIXED
124 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox122 --- unaffected
firefox123 --- unaffected
firefox124 + fixed

People

(Reporter: release-mgmt-account-bot, Assigned: bradwerth)

References

(Blocks 2 open bugs, Regression)

Details

(4 keywords, Whiteboard: [fuzzblocker])

Crash Data

Attachments

(1 file, 2 obsolete files)

Crash report: https://crash-stats.mozilla.org/report/index/dc07d2f7-7850-4058-aa1f-6e5360240128

MOZ_CRASH Reason: DeviceLostClosureC must be consumed before it is dropped.

Top 10 frames of crashing thread:

0  libxul.so  MOZ_Crash  mfbt/Assertions.h:301
0  libxul.so  RustMozCrash  mozglue/static/rust/wrappers.cpp:18
1  libxul.so  mozglue_static::panic_hook  mozglue/static/rust/lib.rs:96
2  libxul.so  core::ops::function::Fn::call  library/core/src/ops/function.rs:79
3  libxul.so  <alloc::boxed::Box<F, A> as core::ops::function::Fn<Args>>::call  library/alloc/src/boxed.rs:2021
3  libxul.so  std::panicking::rust_panic_with_hook  library/std/src/panicking.rs:783
4  libxul.so  std::panicking::begin_panic_handler::{{closure}}  library/std/src/panicking.rs:649
5  libxul.so  std::sys_common::backtrace::__rust_end_short_backtrace  library/std/src/sys_common/backtrace.rs:170
6  libxul.so  rust_begin_unwind  library/std/src/panicking.rs:645
7  libxul.so  core::panicking::panic_fmt  library/core/src/panicking.rs:72

By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:

  • First crash report: 2024-01-27
  • Process type: Parent
  • Is startup crash: No
  • Has user comments: Yes
  • Is null crash: Yes - all crashes happened on null or near null memory address

By analyzing the backtrace, the regression may have been introduced by a patch [1] to fix Bug 1875543.

[1] https://hg.mozilla.org/mozilla-central/rev?node=ed135d395181

:ErichDonGubler, since you are the author of the potential regressor, could you please take a look?

Flags: needinfo?(egubler)
Duplicate of this bug: 1877045
Severity: -- → S3
Crash Signature: [@ <wgpu_core::device::DeviceLostClosureC as core::ops::drop::Drop>::drop] → [@ <wgpu_core::device::DeviceLostClosureC as core::ops::drop::Drop>::drop] [@ wgpu_core::device::impl$6::drop ]

Pretty sure the regressor set here also involves the patch stack for bug 1865921; NI'ing :bradwerth and :nical, who worked on this.

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(egubler)
Flags: needinfo?(bwerth)
Regressed by: 1865921

I will figure this out.

Assignee: nobody → bwerth
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bwerth)

This is happening because we can bail on the wgpu connection without first dropping/destroying all of its resources, which makes sense, that's fine. The panic in DeviceLostClosureC is well-intended but not strictly necessary because we defensively reclaim the memory in the WebGPUParent destructor just in case some calling pattern like this was possible. So now we need to go back and chill out wgpu and eliminate this panic.

One of the fixes will be to do the "right thing" during device.prepare_to_die. That right thing is probably to invoke some version of device.lose.

Attached file testcase.html

I can get a Pernosco session if needed.

Blocks: domino
Flags: in-testsuite?
Keywords: testcase
Whiteboard: [fuzzblocker]

Can also repro with the testcase from bug 1877600 (https://bugzilla.mozilla.org/attachment.cgi?id=9377313) - open the testcase, let it sit for 2-3 seconds and then close the tab.

The bug is marked as tracked for firefox124 (nightly). However, the bug still has low severity.

:bhood, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)

:bradwerth now that https://github.com/gfx-rs/wgpu/pull/5168 is merged, what are the next steps? Is there anything that needs to happen mozilla-central?

Flags: needinfo?(bwerth)

We need the next re-vendoring of wgpu into Firefox. That happens on a kinda-weekly basis, and the meta Bug for re-vendoring is Bug 1851881. There's one re-vendoring being done now (Bug 1876389), but I'm not sure that it will include the work that was merged on January 30. If it does, then this Bug will be resolved when that Bug lands. If not, then it'll be resolved with the next one. I'll mark the appropriate Bug as a blocker when I know.

Flags: needinfo?(bwerth)
Flags: needinfo?(bhood)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

:bradwerth, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(bwerth)
Keywords: topcrash
Flags: needinfo?(bwerth)
Severity: S3 → S2
Priority: -- → P2

The fix is part of Bug 1876389.

Depends on: 1876389
Attached patch childIsAlive.patch (obsolete) — Splinter Review

This is a needed fixup that may need to be part of the re-vendoring, or may be landed as part of this Bug.

Comment on attachment 9378330 [details] [diff] [review]
childIsAlive.patch

># HG changeset patch
># User Brad Werth <bwerth@mozilla.com>
># Date 1706920151 28800
>#      Fri Feb 02 16:29:11 2024 -0800
># Node ID a2a010eda0dc67a42ee4d53db8a238df27c71f38
># Parent  9ca12d444230411e2168420a584a7f95e90c1f97
>Bug XXX: Make WebGPUParent::LoseDevice check for child existence before attempt to send a message.
>
>diff --git a/dom/webgpu/ipc/WebGPUParent.cpp b/dom/webgpu/ipc/WebGPUParent.cpp
>--- a/dom/webgpu/ipc/WebGPUParent.cpp
>+++ b/dom/webgpu/ipc/WebGPUParent.cpp
>@@ -335,8 +335,9 @@ void WebGPUParent::MaintainDevices() {
> 
> void WebGPUParent::LoseDevice(const RawId aDeviceId, Maybe<uint8_t> aReason,
>                               const nsACString& aMessage) {
>-  // Check to see if we've already sent a DeviceLost message to aDeviceId.
>-  if (mLostDeviceIds.Contains(aDeviceId)) {
>+  // Check to see if we've lost the child, or already sent a DeviceLost
>+  // to the child.
>+  if (!ChildIsAlive(aDeviceId) || mLostDeviceIds.Contains(aDeviceId)) {
>     return;
>   }
> 
>@@ -395,6 +396,11 @@ void WebGPUParent::ReportError(const May
>   }
> }
> 
>+bool WebGPUParent::ChildIsAlive(RawId aDeviceId) {
>+  return mErrorScopeStackByDevice.find(aDeviceId) !=
>+         mErrorScopeStackByDevice.end();
>+}
>+
> ipc::IPCResult WebGPUParent::RecvInstanceRequestAdapter(
>     const dom::GPURequestAdapterOptions& aOptions,
>     const nsTArray<RawId>& aTargetIds,
>diff --git a/dom/webgpu/ipc/WebGPUParent.h b/dom/webgpu/ipc/WebGPUParent.h
>--- a/dom/webgpu/ipc/WebGPUParent.h
>+++ b/dom/webgpu/ipc/WebGPUParent.h
>@@ -190,6 +190,8 @@ class WebGPUParent final : public PWebGP
>   void ReportError(Maybe<RawId> aDeviceId, GPUErrorFilter,
>                    const nsCString& message);
> 
>+  bool ChildIsAlive(RawId aDeviceId);
>+
>   static Maybe<ffi::WGPUFfiLUID> GetCompositorDeviceLuid();
> 
>   UniquePtr<ffi::WGPUGlobal> mContext;
Attachment #9378330 - Attachment is obsolete: true
Attached file childIsAlive.patch (obsolete) —

Needed fixup.

:bradwerth just so i understand correctly, you still need to land childIsAlive.patch in order for all the crashes here to go away?

Flags: needinfo?(bwerth)

(In reply to Dianna Smith [:diannaS] from comment #18)

:bradwerth just so i understand correctly, you still need to land childIsAlive.patch in order for all the crashes here to go away?

Fortunately, no. A sufficient fix was landed as part of the re-vendoring: https://searchfox.org/mozilla-central/rev/57516f1ab5660f9abf459300bc625279eb323213/dom/webgpu/ipc/WebGPUParent.cpp#219. I will obsolete the patch and resolve this Bug.

Flags: needinfo?(bwerth)
Attachment #9378335 - Attachment is obsolete: true

Fixed as part of Bug 1876389.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Version: unspecified → Firefox 124
Target Milestone: --- → 124 Branch
See Also: → 1883481
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: