Crash in [@ <audioipc2_client::stream::CallbackServer as audioipc2::rpccore::Server>::process]
Categories
(Core :: Audio/Video: cubeb, defect, P1)
Tracking
()
People
(Reporter: RyanVM, Assigned: pehrsons)
References
Details
(Keywords: crash, topcrash)
Crash Data
Attachments
(4 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr128+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-release+
|
Details | Review |
Looks like a continuation of bug 1902989, but I'm filing a new bug for better ease of tracking.
Crash report: https://crash-stats.mozilla.org/report/index/a12366d4-489e-46ab-967f-4ba060240711
MOZ_CRASH Reason: assertion failed: input_nbytes > 0
Top 10 frames:
0 XUL MOZ_Crash(char const*, int, char const*) mfbt/Assertions.h:317
0 XUL RustMozCrash mozglue/static/rust/wrappers.cpp:18
1 XUL mozglue_static::panic_hook mozglue/static/rust/lib.rs:98
2 XUL core::ops::function::Fn::call library/core/src/ops/function.rs:79
3 XUL <alloc::boxed::Box<F, A> as core::ops::function::Fn<Args>>::call library/alloc/src/boxed.rs:2034
3 XUL std::panicking::rust_panic_with_hook library/std/src/panicking.rs:783
4 XUL std::panicking::begin_panic_handler::{{closure}} library/std/src/panicking.rs:649
5 XUL std::sys_common::backtrace::__rust_end_short_backtrace library/std/src/sys_common/backtrace.rs:171
6 XUL rust_begin_unwind library/std/src/panicking.rs:645
7 XUL core::panicking::panic_fmt library/core/src/panicking.rs:72
Comment 1•2 months ago
|
||
Hey Matthew, this seems audio-ipc issue, would you mind take a look on this? Thanks!
Comment 2•2 months ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Mac on release
:karlt, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Comment 3•2 months ago
|
||
i wonder if there's a chance that changes for bug 1895787 made this spike on Nightly.
Some of those will be disabled for bug 1908539.
Comment 4•2 months ago
|
||
This is the topcrash for Firefox 128.0 on macOS 14 (no crash reports for earlier versions), ~60% of crashes in first 5 minutes after launch. Submitted urls show a correlation with video or audio chats.
Reporter | ||
Comment 5•2 months ago
|
||
Bug 1908539 was uplifted for the 128.0.2 dot release, so with any luck we'll see the volume drop soon if the hypothesis in comment 3 is correct.
Comment 6•2 months ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/478bd07bd5b3 landed for bug 1908539 and is in 129.0b6, which is still showing some crashes.
We still need to identify the cause of this.
Comment 7•2 months ago
|
||
The bug is marked as tracked for firefox128 (release), tracked for firefox129 (beta) and tracked for firefox130 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.
:jimm, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Reporter | ||
Comment 8•2 months ago
|
||
Still pretty early in the rollout for 128.0.2, but it does appear to have made a positive impact on the crash rate here so far.
Reporter | ||
Comment 9•2 months ago
|
||
Calling this wontfix for Release 128 at this point as I'm not intending to uplift any more fixes this cycle, but as mentioned in comment 8, things seem improved with 128.0.2 as it stands.
Comment 10•2 months ago
|
||
This is a reminder regarding comment #7!
The bug is marked as tracked for firefox129 (beta) and tracked for firefox130 (nightly). We have limited time to fix this, the soft freeze is in 3 days. However, the bug still isn't assigned.
Comment 11•2 months ago
|
||
The crash volume is unchanged.
Assignee | ||
Comment 12•2 months ago
|
||
If this is a 128 regression in the backend it is most likely from this PR.
There's a non-zero crash rate prior to 128 though, so most likely it exacerbates a latent issue.
The PR is about device enumeration which isn't obviously related to the assertion failure. I could guess we hit some transitional state where a device is now considered available where previously it wasn't -- but it isn't usable?
Looking at the code paths that can lead here,
- We know that input_unit is set if
has_input()
istrue
(i.e. input stream params has a non-zero samplerate).input_unit
is what is used to decide whether to pass an input buffer to the data callback (might be handy later). - By failing the assert in
CallbackServer
we know thatself.duplex_input
isSome
(happens when both input and output stream params are present) eithernframes
orinput_frame_size
is zero.input_frame_size
isServerStreamCallbacks::input_frame_size
which is from the input stream params so we can assume non-zero.nframes
is from the backend's data callback, which goes through the resampler where we knowoutput_frames
is non-zero.
- I'll follow up with an audit of the resampler paths.
Assignee | ||
Comment 13•2 months ago
|
||
Looking at resamplers, we know from above that the output_frames
argument to the resampler's fill
method is non-zero but the data callback is called with a nframes
that is zero:
- passthrough_resampler::fill: the
output_frames
arg is passed to the data callback directly. It is modified in one path, when we get an input buffer but no output buffer. We know from the callsite that we have an output buffer, unless there's a platform bug breaking the API contract. We disregard this for now. - cubeb_resampler_speex::fill comes in a few flavors, but because we know it's a duplex stream per above we're in
fill_internal_duplex
.nframes
isoutput_frames_before_processing
, which isoutput_processor->input_needed_for_output(output_frames_needed)
.- for the
delay_line
processorinput_needed_for_output
is the identity function, andoutput_frames_needed
is theoutput_frames
arg tofill
, i.e. non-zero. - for the
cubeb_resampler_speex_one_way
processorinput_needed_for_output
depends on the size of two buffers that may contain some leftovers from the last resampling pass and may return zero. This seems like the most likely culprit. Easiest way to trigger it should be with a low latency stream. I'll see if I can find any changes to default latency in 128.
- for the
Assignee | ||
Comment 14•2 months ago
•
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #13)
for the
cubeb_resampler_speex_one_way
processorinput_needed_for_output
depends on the size of two buffers that may contain some leftovers from the last resampling pass and may return zero. This seems like the most likely culprit. Easiest way to trigger it should be with a low latency stream. I'll see if I can find any changes to default latency in 128.
I'm not convinced this is the culprit, fwiw, because on macOS in the vast majority of cases I'd expect input and output callback buffers to be identical in size and in running in lockstep in terms of timing.
Gecko uses the same sample rate (the native rate of the default output device) for both input and output, and the requested latency is the same on both sides of a duplex stream (and clamped to [128, 512]).
What could happen is the native rate of the input device is different from that of the output device so we end up resampling once from input to data callback. In the case of VPIO we request the native rate of the default output for the data callback, but the backend is forced to use the native rate of the input device so there may be two resampling passes: both from input to data callback and from data callback to output.
Even with an aggregate device (the common non-VPIO case) it appears that input and output callbacks are consistent on buffer size and timing.
Assignee | ||
Comment 15•2 months ago
|
||
I'll write up something speculative either way, probably to make the resampler skip the data callback when there are no frames to hand it.
Assignee | ||
Comment 16•2 months ago
|
||
Let's see how this fares https://github.com/mozilla/audioipc/pull/178
Assignee | ||
Comment 17•2 months ago
|
||
Assignee | ||
Comment 18•2 months ago
|
||
Assignee | ||
Comment 19•1 month ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D218819
Updated•1 month ago
|
Updated•1 month ago
|
Comment 20•1 month ago
|
||
Comment 21•1 month ago
|
||
beta Uplift Approval Request
- User impact if declined: Parent process may crash
- Code covered by automated testing: no
- Fix verified in Nightly: no
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: Unclear STR
- Risk associated with taking this patch: Very low
- Explanation of risk level: Trivial; skips failing an assertion by returning early
- String changes made/needed: None
- Is Android affected?: no
Assignee | ||
Comment 22•1 month ago
|
||
Updated•1 month ago
|
Comment 23•1 month ago
|
||
esr128 Uplift Approval Request
- User impact if declined: Parent process may crash
- Code covered by automated testing: no
- Fix verified in Nightly: no
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: Unclear STR
- Risk associated with taking this patch: Very low
- Explanation of risk level: Trivial; skips failing an assertion by returning early
- String changes made/needed: None
- Is Android affected?: no
Assignee | ||
Comment 24•1 month ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D218924
Updated•1 month ago
|
Comment 25•1 month ago
|
||
release Uplift Approval Request
- User impact if declined: Parent process may crash
- Code covered by automated testing: no
- Fix verified in Nightly: no
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: Unclear STR
- Risk associated with taking this patch: Very low
- Explanation of risk level: Trivial; skips failing an assertion by returning early
- String changes made/needed: None
- Is Android affected?: no
Comment 26•1 month ago
|
||
bugherder |
Updated•1 month ago
|
Comment 27•1 month ago
|
||
uplift |
Updated•1 month ago
|
Updated•1 month ago
|
Comment 28•1 month ago
|
||
uplift |
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Comment 29•1 month ago
|
||
uplift |
Description
•