Crash in [@ mozilla::AudioData::SetTrimWindow]
Categories
(Core :: Audio/Video: Playback, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox113 | --- | unaffected |
firefox114 | --- | unaffected |
firefox115 | + | fixed |
firefox116 | --- | fixed |
People
(Reporter: aryx, Assigned: padenot)
References
(Regression)
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(6 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
2.64 KB,
text/plain
|
chutten
:
data-review+
|
Details |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-beta+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-beta+
|
Details | Review |
122 crashes from 75 machines for the last week.
Crash report: https://crash-stats.mozilla.org/report/index/d9c77d0b-169b-438a-8af0-8c01a0230530
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(mDataOffset <= mAudioData.Length()) (Data offset outside original buffer)
Top 10 frames of crashing thread:
0 xul.dll mozilla::AudioData::SetTrimWindow dom/media/MediaData.cpp:109
1 xul.dll mozilla::MediaDecoderStateMachine::AccurateSeekingState::DropAudioUpToSeekTarget dom/media/MediaDecoderStateMachine.cpp:2012
2 xul.dll mozilla::MediaDecoderStateMachine::AccurateSeekingState::HandleAudioDecoded dom/media/MediaDecoderStateMachine.cpp:1747
3 xul.dll mozilla::MediaDecoderStateMachine::RequestAudioData::<lambda_23>::operator dom/media/MediaDecoderStateMachine.cpp:3909
3 xul.dll mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, 1>::InvokeMethod xpcom/threads/MozPromise.h:654
3 xul.dll mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, 1>::InvokeCallbackMethod xpcom/threads/MozPromise.h:685
3 xul.dll mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, 1>::ThenValue<`lambda at /builds/worker/checkouts/gecko/dom/media/MediaDecoderStateMachine.cpp:3894:11', `lambda at /builds/worker/checkouts/gecko/dom/media/MediaDecoderStateMachine.cpp:3911:11'>::DoResolveOrRejectInternal xpcom/threads/MozPromise.h:870
4 xul.dll mozilla::MozPromise<mozilla::net::RemoteStreamInfo, nsresult, 0>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:490
5 xul.dll mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run xpcom/threads/TaskDispatcher.h:230
6 xul.dll mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:259
Updated•1 years ago
|
Comment 1•1 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
For more information, please visit BugBot documentation.
Comment 2•1 years ago
|
||
The bug is marked as tracked for firefox115 (nightly). We have limited time to fix this, the soft freeze is in a day. However, the bug still isn't assigned.
:jimm, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Assignee | ||
Updated•1 years ago
|
Assignee | ||
Comment 3•1 years ago
|
||
Assignee | ||
Comment 4•1 years ago
|
||
Depends on D179691
Assignee | ||
Comment 5•1 years ago
|
||
Depends on D179692
Assignee | ||
Comment 6•1 years ago
|
||
Comment on attachment 9337002 [details]
Bug 1835866 - Add information to crash reports when setting the trimming window for an AudioBuffer fails. r?#media-playback-reviewers data-review=chutten
Request for data collection review form
All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.
- What questions will you answer with this data?
Why some code crashes.
- Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:
Because we want to fix the crash.
- What alternative methods did you consider to answer these questions? Why were they not sufficient?
This is a rare crash, and manual code inspection hasn't proved successful and is too costly in engineering time, without any chance of success.
- Can current instrumentation answer these questions?
No.
- List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.
The categories don't apply, those numbers aren't related to the user, they are from numbers produced by the media stack of Firefox.
There is a single "measurement", that is a few numbers of internal values that are inconsistent with each others. Those values come from our implementation, based on values that come from the media being played back. Those values are the time at which data packet need to be rendered. It is not possible to know the content of the media or any other identifiable characteristics (data like: "23 milliseconds at 44100Hz is larger than 16 milliseconds at 44100Hz").
- Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.
No documentation, this is for debugging a crash.
- How long will this data be collected? Choose one of the following:
Until I fix the crash (a matter of days -- depending on how many crashes I have).
- What populations will you measure?
Same as the population that reports crashes.
- If this data collection is default on, what is the opt-out mechanism for users?
Users can decide to not share crash reports.
- Please provide a general description of how you will analyze this data.
This is a few numbers that are highly domain specific, they are time values that come from within our implementation.
- Where do you intend to share the results of your analysis?
I'm not sharing the data I'm just going to use the data to fix a crash.
- Is there a third-party tool (i.e. not Glean or Telemetry) that you are proposing to use for this data collection? If so:
No I'm going to look at the crash reports on soccoro.
Updated•1 years ago
|
Comment 7•1 years ago
|
||
Placing as an attachment to match workflow
Comment 8•1 years ago
|
||
Comment on attachment 9337013 [details]
data collection review request
PRELIMINARY NOTES:
Even technical data should be described and categorized (as Category 1, Technical). Description might be:
frameOffset * mChannels - An opaque value of an offset within the audio data
mAudioData.Length() - The length in frames * channels of the audio data stream
A string representation of the time interval of the trim window being set
A string representation of the time interval encompassing the ... y'know, I'm not sure what mOriginalTime is. The original start? The current playback time before trim? (I don't know this code)
Documentation is required and is provided by crash-stats
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection can be controlled through Firefox's Preferences and as part of crash report reporting.
If the request is for permanent data collection, is there someone who will monitor the data over time?
:padenot is responsible for removing this collection when it has outlived its usefulness.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default off (opt-in) for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does the data collection use a third-party collection tool?
No.
Result: datareview+
Updated•1 years ago
|
Updated•1 years ago
|
Assignee | ||
Comment 10•1 years ago
|
||
Updated•1 years ago
|
Comment 11•1 years ago
|
||
Comment 12•1 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d2842f44e20c
https://hg.mozilla.org/mozilla-central/rev/baf519214e46
https://hg.mozilla.org/mozilla-central/rev/e3f071c3e639
Assignee | ||
Comment 13•1 years ago
•
|
||
I meant to mark this leave-open.
Updated•1 years ago
|
Comment 14•1 years ago
|
||
Set release status flags based on info from the regressing bug 1817997
Comment 15•1 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #13)
I meant to mark this leave-open.
:padenot, not seeing any crashes since the patches landed.
Anything left to track under this, should it still be leave-open?
Assignee | ||
Comment 16•1 year ago
|
||
It's a bit annoying. I must have fixed it with another patch I landed last week, but I'm not sure which one. I'll revert the data collection and we can close this, watching closely reports.
Assignee | ||
Comment 17•1 year ago
|
||
Comment 18•1 year ago
|
||
Comment 19•1 year ago
|
||
bugherder |
Comment 20•1 year ago
|
||
The patch landed in nightly and beta is affected.
:padenot, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox115
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 21•1 year ago
|
||
We need to reopen this.
Assignee | ||
Comment 22•1 year ago
•
|
||
mozilla::AudioData::SetTrimWindow
is called with weird values. The printf I put in prints something like:
Data offset outside original buffer: 2354 > 2048 ([{1686645352037755,1000000}, {9223372036854775807,1000000}]({0,1000000}), [{16866453520132332,10000000}, {9223372036854775807,1000000}]({0,1000000}))
In this code we want to trim an audio buffer to start a bit later. But here we instruct it to start after its end.
The trim region is:
[{1686645352037755,1000000}, {9223372036854775807,1000000}]({0,1000000})
It's the seek target on the left bound, and the audio packet end on the right bound.
The audio packet is:
[{16866453520132332,10000000}, {9223372036854775807,1000000}]({0,1000000}))
I note that its base is in hundreds of nanoseconds, which are generally used on Windows, but never propagated down the code. It must be from an mp4 file that uses hns as a base, and then maybe there's an overflow somewhere. In any case, if we take typical packet sizes and typical sample rates, the seek target can lie in the correct packet, it's a bit more than 1024 frames later, and the buffers are 2048 or 4096 frames according to the printf (example: https://crash-stats.mozilla.org/report/index/117ad0df-8f3d-4cad-9af6-5398e0230613):
> (1686645352037755/1000000. - 16866453520132332/10000000.) * 44100
1081.4126014709473
The right bounds of the intervals are +infinity
. The previous code would trim the entire buffer in this case and do nothing else: https://hg.mozilla.org/mozilla-central/rev/78068c6cc38f8ba0535d51af0f502078d7c0b02e#l1.20 (because infinity == infinity in this case), I'm going to restore this behaviour, but it's weird. I'll be adding some code in Nightly to catch those at the site of creation so we can get to the bottom of this.
I'll get to the bottom of this in bug 1838275.
Assignee | ||
Comment 23•1 year ago
|
||
No need in fact. It all stems from the fact that this is using hns base:
A timeunit {16866453520132332,10000000} (base in hns, 1e7) added to its duration, for example {2048, 48000}, is greater than std::pow(2, std::numeric_limits<double>::digits) - 1
. This means that there's going to be some rounding if we keep the TimeUnit
in this base, but worst case we're off by a few hns.
The bug in the code is that I've made it return +infinity in this case: https://searchfox.org/mozilla-central/rev/f76d80e486028313488d05f4e7fe2509cec11777/dom/media/TimeUnits.cpp#41. This is incorrect, we should be accepting the slight loss in precision.
Assignee | ||
Comment 24•1 year ago
|
||
Comment 25•1 year ago
|
||
Assignee | ||
Comment 26•1 year ago
|
||
Comment on attachment 9337854 [details]
Bug 1835866 - Backout the data collection part. r?#media-playback-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: None, this is unnecessary data collection that has served its purpose, we can remove it so it doesn't hit release.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This is just a backout of an assert to restore the previous assert.
- String changes made/needed:
- Is Android affected?: Yes
Comment 27•1 year ago
|
||
bugherder |
Comment 28•1 year ago
|
||
(In reply to Noemi Erli[:noemi_erli] from comment #27)
:padenot is this final patch also needed in beta for Fx115?
Comment 29•1 year ago
|
||
Comment on attachment 9337854 [details]
Bug 1835866 - Backout the data collection part. r?#media-playback-reviewers
Approved for 115.0b6.
Assignee | ||
Comment 30•1 year ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #28)
(In reply to Noemi Erli[:noemi_erli] from comment #27)
:padenot is this final patch also needed in beta for Fx115?
Yes, I was waiting for a nightly or two, writing the approval request right now.
Assignee | ||
Comment 31•1 year ago
|
||
Comment on attachment 9338944 [details]
Bug 1835866 - Accept a small loss of precision when converting large number of seconds to a TimeUnit in a high base. r?alwu,#media-playback-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: Crashes on an assertion, likely on live-streaming or live radio websites
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): We're reverting to a previous behavior we've had in Nightly for some time, and we've understood the root cause of this problem thanks to data collection in the field.
- String changes made/needed:
- Is Android affected?: Yes
Comment 32•1 year ago
|
||
bugherder uplift |
Comment 33•1 year ago
|
||
bugherder uplift |
Comment 34•1 year ago
|
||
Comment on attachment 9338944 [details]
Bug 1835866 - Accept a small loss of precision when converting large number of seconds to a TimeUnit in a high base. r?alwu,#media-playback-reviewers
Approved for 115.0b6.
Comment 35•1 year ago
|
||
:padenot seems like some crashes here have popped up on nightly? could you take a look?
Assignee | ||
Updated•1 year ago
|
Description
•