Closed Bug 1835866 Opened 1 years ago Closed 1 year ago

Crash in [@ mozilla::AudioData::SetTrimWindow]

Categories

(Core :: Audio/Video: Playback, defect)

defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox113 --- unaffected
firefox114 --- unaffected
firefox115 + fixed
firefox116 --- fixed

People

(Reporter: aryx, Assigned: padenot)

References

(Regression)

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(6 files, 1 obsolete file)

122 crashes from 75 machines for the last week.

Crash report: https://crash-stats.mozilla.org/report/index/d9c77d0b-169b-438a-8af0-8c01a0230530

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(mDataOffset <= mAudioData.Length()) (Data offset outside original buffer)

Top 10 frames of crashing thread:

0  xul.dll  mozilla::AudioData::SetTrimWindow  dom/media/MediaData.cpp:109
1  xul.dll  mozilla::MediaDecoderStateMachine::AccurateSeekingState::DropAudioUpToSeekTarget  dom/media/MediaDecoderStateMachine.cpp:2012
2  xul.dll  mozilla::MediaDecoderStateMachine::AccurateSeekingState::HandleAudioDecoded  dom/media/MediaDecoderStateMachine.cpp:1747
3  xul.dll  mozilla::MediaDecoderStateMachine::RequestAudioData::<lambda_23>::operator  dom/media/MediaDecoderStateMachine.cpp:3909
3  xul.dll  mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, 1>::InvokeMethod  xpcom/threads/MozPromise.h:654
3  xul.dll  mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, 1>::InvokeCallbackMethod  xpcom/threads/MozPromise.h:685
3  xul.dll  mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, 1>::ThenValue<`lambda at /builds/worker/checkouts/gecko/dom/media/MediaDecoderStateMachine.cpp:3894:11', `lambda at /builds/worker/checkouts/gecko/dom/media/MediaDecoderStateMachine.cpp:3911:11'>::DoResolveOrRejectInternal  xpcom/threads/MozPromise.h:870
4  xul.dll  mozilla::MozPromise<mozilla::net::RemoteStreamInfo, nsresult, 0>::ThenValueBase::ResolveOrRejectRunnable::Run  xpcom/threads/MozPromise.h:490
5  xul.dll  mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run  xpcom/threads/TaskDispatcher.h:230
6  xul.dll  mozilla::TaskQueue::Runner::Run  xpcom/threads/TaskQueue.cpp:259
Flags: needinfo?(padenot)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

For more information, please visit BugBot documentation.

Keywords: topcrash

The bug is marked as tracked for firefox115 (nightly). We have limited time to fix this, the soft freeze is in a day. However, the bug still isn't assigned.

:jimm, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(jmathies)
Assignee: nobody → padenot
Flags: needinfo?(padenot)

Comment on attachment 9337002 [details]
Bug 1835866 - Add information to crash reports when setting the trimming window for an AudioBuffer fails. r?#media-playback-reviewers data-review=chutten

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

  1. What questions will you answer with this data?

Why some code crashes.

  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

Because we want to fix the crash.

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

This is a rare crash, and manual code inspection hasn't proved successful and is too costly in engineering time, without any chance of success.

  1. Can current instrumentation answer these questions?

No.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

The categories don't apply, those numbers aren't related to the user, they are from numbers produced by the media stack of Firefox.

There is a single "measurement", that is a few numbers of internal values that are inconsistent with each others. Those values come from our implementation, based on values that come from the media being played back. Those values are the time at which data packet need to be rendered. It is not possible to know the content of the media or any other identifiable characteristics (data like: "23 milliseconds at 44100Hz is larger than 16 milliseconds at 44100Hz").

  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.

No documentation, this is for debugging a crash.

  1. How long will this data be collected? Choose one of the following:

Until I fix the crash (a matter of days -- depending on how many crashes I have).

  1. What populations will you measure?

Same as the population that reports crashes.

  1. If this data collection is default on, what is the opt-out mechanism for users?

Users can decide to not share crash reports.

  1. Please provide a general description of how you will analyze this data.

This is a few numbers that are highly domain specific, they are time values that come from within our implementation.

  1. Where do you intend to share the results of your analysis?

I'm not sharing the data I'm just going to use the data to fix a crash.

  1. Is there a third-party tool (i.e. not Glean or Telemetry) that you are proposing to use for this data collection? If so:

No I'm going to look at the crash reports on soccoro.

Flags: needinfo?(jmathies)
Attachment #9337002 - Flags: data-review?(chutten)
Attachment #9337002 - Flags: data-review?(chutten)

Placing as an attachment to match workflow

Attachment #9337013 - Flags: data-review?(chutten)

Comment on attachment 9337013 [details]
data collection review request

PRELIMINARY NOTES:
Even technical data should be described and categorized (as Category 1, Technical). Description might be:

frameOffset * mChannels - An opaque value of an offset within the audio data
mAudioData.Length() - The length in frames * channels of the audio data stream
A string representation of the time interval of the trim window being set
A string representation of the time interval encompassing the ... y'know, I'm not sure what mOriginalTime is. The original start? The current playback time before trim? (I don't know this code)

Documentation is required and is provided by crash-stats

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection can be controlled through Firefox's Preferences and as part of crash report reporting.

If the request is for permanent data collection, is there someone who will monitor the data over time?

:padenot is responsible for removing this collection when it has outlived its usefulness.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default off (opt-in) for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.


Result: datareview+

Attachment #9337013 - Flags: data-review?(chutten) → data-review+
Attachment #9337000 - Attachment is obsolete: true
Attachment #9337002 - Attachment description: Bug 1835866 - Add information to crash reports when setting the trimming window for an AudioBuffer fails. r?#media-playback-reviewers → Bug 1835866 - Add information to crash reports when setting the trimming window for an AudioBuffer fails. r?#media-playback-reviewers data-review=chutten
Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d2842f44e20c Add a method to get a string representation of an media::Interval. r=pehrsons https://hg.mozilla.org/integration/autoland/rev/baf519214e46 Add information to crash reports when setting the trimming window for an AudioBuffer fails. r=pehrsons data-review=chutten
Attachment #9337037 - Attachment description: WIP: Bug 1835866 - Fix compilation error. → Bug 1835866 - Fix compilation error.

I meant to mark this leave-open.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Set release status flags based on info from the regressing bug 1817997

(In reply to Paul Adenot (:padenot) from comment #13)

I meant to mark this leave-open.

:padenot, not seeing any crashes since the patches landed.
Anything left to track under this, should it still be leave-open?

Flags: needinfo?(padenot)

It's a bit annoying. I must have fixed it with another patch I landed last week, but I'm not sure which one. I'll revert the data collection and we can close this, watching closely reports.

Flags: needinfo?(padenot)
Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e175f4307abd Backout the data collection part. r=media-playback-reviewers,alwu
Status: REOPENED → RESOLVED
Closed: 1 years ago1 year ago
Resolution: --- → FIXED

The patch landed in nightly and beta is affected.
:padenot, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox115 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(padenot)

We need to reopen this.

Status: RESOLVED → REOPENED
Flags: needinfo?(padenot)
Resolution: FIXED → ---

mozilla::AudioData::SetTrimWindow is called with weird values. The printf I put in prints something like:

Data offset outside original buffer: 2354 > 2048 ([{1686645352037755,1000000}, {9223372036854775807,1000000}]({0,1000000}), [{16866453520132332,10000000}, {9223372036854775807,1000000}]({0,1000000}))

In this code we want to trim an audio buffer to start a bit later. But here we instruct it to start after its end.

The trim region is:

[{1686645352037755,1000000}, {9223372036854775807,1000000}]({0,1000000})

It's the seek target on the left bound, and the audio packet end on the right bound.

The audio packet is:

[{16866453520132332,10000000}, {9223372036854775807,1000000}]({0,1000000}))

I note that its base is in hundreds of nanoseconds, which are generally used on Windows, but never propagated down the code. It must be from an mp4 file that uses hns as a base, and then maybe there's an overflow somewhere. In any case, if we take typical packet sizes and typical sample rates, the seek target can lie in the correct packet, it's a bit more than 1024 frames later, and the buffers are 2048 or 4096 frames according to the printf (example: https://crash-stats.mozilla.org/report/index/117ad0df-8f3d-4cad-9af6-5398e0230613):

 > (1686645352037755/1000000. - 16866453520132332/10000000.) * 44100
1081.4126014709473

The right bounds of the intervals are +infinity. The previous code would trim the entire buffer in this case and do nothing else: https://hg.mozilla.org/mozilla-central/rev/78068c6cc38f8ba0535d51af0f502078d7c0b02e#l1.20 (because infinity == infinity in this case), I'm going to restore this behaviour, but it's weird. I'll be adding some code in Nightly to catch those at the site of creation so we can get to the bottom of this.

I'll get to the bottom of this in bug 1838275.

See Also: → 1838275

No need in fact. It all stems from the fact that this is using hns base:

A timeunit {16866453520132332,10000000} (base in hns, 1e7) added to its duration, for example {2048, 48000}, is greater than std::pow(2, std::numeric_limits<double>::digits) - 1. This means that there's going to be some rounding if we keep the TimeUnit in this base, but worst case we're off by a few hns.

The bug in the code is that I've made it return +infinity in this case: https://searchfox.org/mozilla-central/rev/f76d80e486028313488d05f4e7fe2509cec11777/dom/media/TimeUnits.cpp#41. This is incorrect, we should be accepting the slight loss in precision.

Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b05fabf03de5 Accept a small loss of precision when converting large number of seconds to a TimeUnit in a high base. r=alwu,media-playback-reviewers

Comment on attachment 9337854 [details]
Bug 1835866 - Backout the data collection part. r?#media-playback-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: None, this is unnecessary data collection that has served its purpose, we can remove it so it doesn't hit release.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is just a backout of an assert to restore the previous assert.
  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9337854 - Flags: approval-mozilla-beta?
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED

(In reply to Noemi Erli[:noemi_erli] from comment #27)

https://hg.mozilla.org/mozilla-central/rev/b05fabf03de5

:padenot is this final patch also needed in beta for Fx115?

Flags: needinfo?(padenot)

Comment on attachment 9337854 [details]
Bug 1835866 - Backout the data collection part. r?#media-playback-reviewers

Approved for 115.0b6.

Attachment #9337854 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Donal Meehan [:dmeehan] from comment #28)

(In reply to Noemi Erli[:noemi_erli] from comment #27)

https://hg.mozilla.org/mozilla-central/rev/b05fabf03de5

:padenot is this final patch also needed in beta for Fx115?

Yes, I was waiting for a nightly or two, writing the approval request right now.

Flags: needinfo?(padenot)

Comment on attachment 9338944 [details]
Bug 1835866 - Accept a small loss of precision when converting large number of seconds to a TimeUnit in a high base. r?alwu,#media-playback-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes on an assertion, likely on live-streaming or live radio websites
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): We're reverting to a previous behavior we've had in Nightly for some time, and we've understood the root cause of this problem thanks to data collection in the field.
  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9338944 - Flags: approval-mozilla-beta?

Comment on attachment 9338944 [details]
Bug 1835866 - Accept a small loss of precision when converting large number of seconds to a TimeUnit in a high base. r?alwu,#media-playback-reviewers

Approved for 115.0b6.

Attachment #9338944 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

:padenot seems like some crashes here have popped up on nightly? could you take a look?

Flags: needinfo?(padenot)
Flags: needinfo?(padenot)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: