Closed Bug 1907367 Opened 2 months ago Closed 1 month ago

Crash in [@ <audioipc2_client::stream::CallbackServer as audioipc2::rpccore::Server>::process]

Categories

(Core :: Audio/Video: cubeb, defect, P1)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
131 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- fixed
firefox128 + wontfix
firefox129 + fixed
firefox130 + fixed
firefox131 --- fixed

People

(Reporter: RyanVM, Assigned: pehrsons)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(4 files, 1 obsolete file)

Looks like a continuation of bug 1902989, but I'm filing a new bug for better ease of tracking.

Crash report: https://crash-stats.mozilla.org/report/index/a12366d4-489e-46ab-967f-4ba060240711

MOZ_CRASH Reason: assertion failed: input_nbytes > 0

Top 10 frames:

0  XUL  MOZ_Crash(char const*, int, char const*)  mfbt/Assertions.h:317
0  XUL  RustMozCrash  mozglue/static/rust/wrappers.cpp:18
1  XUL  mozglue_static::panic_hook  mozglue/static/rust/lib.rs:98
2  XUL  core::ops::function::Fn::call  library/core/src/ops/function.rs:79
3  XUL  <alloc::boxed::Box<F, A> as core::ops::function::Fn<Args>>::call  library/alloc/src/boxed.rs:2034
3  XUL  std::panicking::rust_panic_with_hook  library/std/src/panicking.rs:783
4  XUL  std::panicking::begin_panic_handler::{{closure}}  library/std/src/panicking.rs:649
5  XUL  std::sys_common::backtrace::__rust_end_short_backtrace  library/std/src/sys_common/backtrace.rs:171
6  XUL  rust_begin_unwind  library/std/src/panicking.rs:645
7  XUL  core::panicking::panic_fmt  library/core/src/panicking.rs:72

Hey Matthew, this seems audio-ipc issue, would you mind take a look on this? Thanks!

Severity: -- → S3
Flags: needinfo?(kinetik)
Priority: -- → P2

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 5 desktop browser crashes on Mac on release

:karlt, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(karlt)
Keywords: topcrash

i wonder if there's a chance that changes for bug 1895787 made this spike on Nightly.
Some of those will be disabled for bug 1908539.

Severity: S3 → S2
Flags: needinfo?(karlt)
OS: Unspecified → macOS
Priority: P2 → P1
See Also: → 1908539

This is the topcrash for Firefox 128.0 on macOS 14 (no crash reports for earlier versions), ~60% of crashes in first 5 minutes after launch. Submitted urls show a correlation with video or audio chats.

Bug 1908539 was uplifted for the 128.0.2 dot release, so with any luck we'll see the volume drop soon if the hypothesis in comment 3 is correct.

https://hg.mozilla.org/releases/mozilla-beta/rev/478bd07bd5b3 landed for bug 1908539 and is in 129.0b6, which is still showing some crashes.

We still need to identify the cause of this.

The bug is marked as tracked for firefox128 (release), tracked for firefox129 (beta) and tracked for firefox130 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.

:jimm, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(jmathies)

Still pretty early in the rollout for 128.0.2, but it does appear to have made a positive impact on the crash rate here so far.

Calling this wontfix for Release 128 at this point as I'm not intending to uplift any more fixes this cycle, but as mentioned in comment 8, things seem improved with 128.0.2 as it stands.

This is a reminder regarding comment #7!

The bug is marked as tracked for firefox129 (beta) and tracked for firefox130 (nightly). We have limited time to fix this, the soft freeze is in 3 days. However, the bug still isn't assigned.

If this is a 128 regression in the backend it is most likely from this PR.
There's a non-zero crash rate prior to 128 though, so most likely it exacerbates a latent issue.
The PR is about device enumeration which isn't obviously related to the assertion failure. I could guess we hit some transitional state where a device is now considered available where previously it wasn't -- but it isn't usable?

Looking at the code paths that can lead here,

Assignee: nobody → apehrson
Status: NEW → ASSIGNED
Flags: needinfo?(kinetik)
Flags: needinfo?(jmathies)

Looking at resamplers, we know from above that the output_frames argument to the resampler's fill method is non-zero but the data callback is called with a nframes that is zero:

  • passthrough_resampler::fill: the output_frames arg is passed to the data callback directly. It is modified in one path, when we get an input buffer but no output buffer. We know from the callsite that we have an output buffer, unless there's a platform bug breaking the API contract. We disregard this for now.
  • cubeb_resampler_speex::fill comes in a few flavors, but because we know it's a duplex stream per above we're in fill_internal_duplex.

(In reply to Andreas Pehrson [:pehrsons] from comment #13)

for the cubeb_resampler_speex_one_way processor input_needed_for_output depends on the size of two buffers that may contain some leftovers from the last resampling pass and may return zero. This seems like the most likely culprit. Easiest way to trigger it should be with a low latency stream. I'll see if I can find any changes to default latency in 128.

I'm not convinced this is the culprit, fwiw, because on macOS in the vast majority of cases I'd expect input and output callback buffers to be identical in size and in running in lockstep in terms of timing.

Gecko uses the same sample rate (the native rate of the default output device) for both input and output, and the requested latency is the same on both sides of a duplex stream (and clamped to [128, 512]).
What could happen is the native rate of the input device is different from that of the output device so we end up resampling once from input to data callback. In the case of VPIO we request the native rate of the default output for the data callback, but the backend is forced to use the native rate of the input device so there may be two resampling passes: both from input to data callback and from data callback to output.

Even with an aggregate device (the common non-VPIO case) it appears that input and output callbacks are consistent on buffer size and timing.

I'll write up something speculative either way, probably to make the resampler skip the data callback when there are no frames to hand it.

Attachment #9418472 - Flags: approval-mozilla-beta?
Attachment #9418290 - Attachment is obsolete: true
Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/342f097fe5c5 Update audioipc revision to e6f44a2bd1. r=cubeb-reviewers,padenot

beta Uplift Approval Request

  • User impact if declined: Parent process may crash
  • Code covered by automated testing: no
  • Fix verified in Nightly: no
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: Unclear STR
  • Risk associated with taking this patch: Very low
  • Explanation of risk level: Trivial; skips failing an assertion by returning early
  • String changes made/needed: None
  • Is Android affected?: no
Attachment #9418492 - Flags: approval-mozilla-esr128?

esr128 Uplift Approval Request

  • User impact if declined: Parent process may crash
  • Code covered by automated testing: no
  • Fix verified in Nightly: no
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: Unclear STR
  • Risk associated with taking this patch: Very low
  • Explanation of risk level: Trivial; skips failing an assertion by returning early
  • String changes made/needed: None
  • Is Android affected?: no
Attachment #9418494 - Flags: approval-mozilla-release?

release Uplift Approval Request

  • User impact if declined: Parent process may crash
  • Code covered by automated testing: no
  • Fix verified in Nightly: no
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: Unclear STR
  • Risk associated with taking this patch: Very low
  • Explanation of risk level: Trivial; skips failing an assertion by returning early
  • String changes made/needed: None
  • Is Android affected?: no
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 131 Branch
Attachment #9418472 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9418492 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+
Attachment #9418494 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9418494 - Flags: approval-mozilla-release+ → approval-mozilla-release-
Attachment #9418494 - Flags: approval-mozilla-release- → approval-mozilla-release?
Attachment #9418494 - Flags: approval-mozilla-release? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: