Closed Bug 1905287 Opened 1 year ago Closed 2 months ago

Crash in [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsTArray_Impl<T>::operator[] | mozilla::AudioChunk::ChannelDataForWrite<T>]

Categories

(Core :: Audio/Video: Playback, defect, P3)

All
Android
defect

Tracking

()

RESOLVED FIXED
138 Branch
Tracking Status
firefox-esr128 --- wontfix
firefox134 --- wontfix
firefox135 --- wontfix
firefox136 --- wontfix
firefox137 --- wontfix
firefox138 --- fixed

People

(Reporter: gsvelto, Assigned: az)

References

Details

(Keywords: crash)

Crash Data

Attachments

(3 files, 4 obsolete files)

Crash report: https://crash-stats.mozilla.org/report/index/51d0d963-c52b-4c16-bf41-6291b0240628

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames:

0  libmozglue.so  MOZ_Crash(char const*, int, char const*)  mfbt/Assertions.h:317
0  libmozglue.so  mozilla::detail::InvalidArrayIndex_CRASH(unsigned long, unsigned long)  mfbt/Assertions.cpp:50
1  libxul.so  nsTArray_Impl<void const*, nsTArrayInfallibleAllocator>::ElementAt(unsigned l...  xpcom/ds/nsTArray.h:1196
1  libxul.so  nsTArray_Impl<void const*, nsTArrayInfallibleAllocator>::operator[](unsigned ...  xpcom/ds/nsTArray.h:1233
1  libxul.so  mozilla::AudioChunk::ChannelDataForWrite<float>(unsigned long)  dom/media/AudioSegment.h:265
1  libxul.so  mozilla::MediaDecodeTask::FinishDecode()  dom/media/webaudio/MediaBufferDecoder.cpp:0
2  libxul.so  mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResu...  xpcom/threads/MozPromise.h:0
3  libxul.so  mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResu...  xpcom/threads/MozPromise.h:621
3  libxul.so  mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResu...  xpcom/threads/MozPromise.h:488
4  libxul.so  mozilla::TaskQueue::Runner::Run()  xpcom/threads/TaskQueue.cpp:257

This appears to be an Android-only issue and it never seemed to show up on nightly which might explain why we didn't file it before. It seems like we're accessing the data for a channel that does not exist, see this.

Interestingly this also seems to happen on Windows 10, and several comments mention that this happens while streaming stuff from https://www.siriusxm.com/

Crash Signature: [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsTArray_Impl<T>::operator[] | mozilla::AudioChunk::ChannelDataForWrite<T>] → [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsTArray_Impl<T>::operator[] | mozilla::AudioChunk::ChannelDataForWrite<T>] [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsTArray_Impl<T>::operator…

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jmathies)

Paul, any thoughts here?

content process crash in c++ vs. cubeb when decoding audio.

Flags: needinfo?(jmathies) → needinfo?(padenot)

Karl, any idea here?

Flags: needinfo?(padenot) → needinfo?(karlt)

The WINNT stacks have more frames.
e.g. https://crash-stats.mozilla.org/report/index/95dfb0dd-2c3e-4a64-a258-9090f0240727
The stack doesn't quite line up, which I suspect is because frame 2 is the return address from OnAudioDrainCompleted(), instead of the calling address.

Is it possible that the AudioData might have a larger channel count than the AudioInfo?

Flags: needinfo?(karlt)

Plausible yes.

This appears to be getting worse over time. Android specific. Ashley, maybe you can poke at this.

Flags: needinfo?(azebrowski)
Severity: -- → S2
Assignee: nobody → azebrowski
Status: NEW → ASSIGNED

I put a patch up that would add an assert to check Karl/Paul's theory re: AudioData/AudioInfo channel mismatch - not sure this is the correct approach here, but could give us some additional information.

Status: ASSIGNED → NEW
Assignee: azebrowski → nobody
Assignee: nobody → azebrowski
Status: NEW → ASSIGNED
Attachment #9436255 - Attachment description: Bug 1905287 - Assert AudioInfo has at least as many channels as AudioData when a MediaDecodeTask finishes decoding to ensure sufficient buffer size is allocated. → Bug 1905287 - Assert AudioInfo has at least as many channels as AudioData when a MediaDecodeTask finishes decoding to ensure sufficient buffer size is allocated. r=karlt
Pushed by azebrowski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c0666de7deba Assert AudioInfo has at least as many channels as AudioData when a MediaDecodeTask finishes decoding to ensure sufficient buffer size is allocated. r=karlt
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 134 Branch

Sorry, this shouldn't have been closed out - the patch that landed was to help diagnose the issue - reopening this.

Status: RESOLVED → REOPENED
Flags: needinfo?(azebrowski)
Resolution: FIXED → ---
Target Milestone: 134 Branch → ---

The new assert doesn't look like it's showing up in any crash reports - would you have any other ideas here :padenot ?

Flags: needinfo?(padenot)

It's plausible that the channel count changes mid-stream we should account for this maybe.

Flags: needinfo?(padenot)

The diagnostic assert added here landed for 134 early beta or earlier.
The "Crash Data" section of this bug is not showing any crashes in such versions.

Flags: needinfo?(azebrowski)
Flags: needinfo?(azebrowski)
Pushed by azebrowski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3935f5d5ea58 Add bounds check and diagnostics for failures decoding audio data. r=padenot

Updating the sig to catch the modified crash in the patch that just landed.

Crash Signature: nsTArray_Impl<T>::operator[] | mozilla::AudioChunk::ChannelDataForWrite] → nsTArray_Impl<T>::operator[] | mozilla::AudioChunk::ChannelDataForWrite] [@ mozilla::AudioChunk::ChannelDataForWrite<T>] [@ mozilla::AudioChunk::ChannelDataForWrite]
Status: REOPENED → RESOLVED
Closed: 7 months ago4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 136 Branch
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---

:padenot mentioned that this bug was affected by bug 1945507, which prevents us from collecting the crash reason. While we wait for the long-term solution there, as a short-term solution I have analyzed the 340 crash reports we received coming from the 134.0.2 arm64 APK, and I was able to recover the missing info that was supposed to be put into the crash reason for 339 of these crash reports. aIndex (aChannel) and aLength (mChannelData.Length()) were both equal to 1 in all 339 reports. I also recovered the same values, both equal to 1 again, from an isolated Windows crash. Quoting :padenot, "this means we're somehow attempting to write in the second channel of a stereo media that another part of the system considers mono."

See Also: → 1945507

We define channelCount here, and it matches with mBuffer.mChannelData for these loops without any problem. That means that the ChannelDataForWrite call here shouldn't be causing the invalid array error. Rather, I think we have data in mAudioQueue here with an inconsistent channel count, so we're crashing at either of these two loops. I'll see if I can get the data to upmix/downmix, and worst case fail gracefully when we run into mismatched channel data.

I tried to repro this in a variety of ways, from editing AAC files to create mismatched channel counts, to attempting to inject stereo packets into mono streams, but I wasn't able to reproduce. The patch I added will skip packets containing mismatched channel counts and has a crash that'll give us some stats should we run into a mismatch on a debug build. This should in theory stop the crashes on release and still get us some diagnostic information. I started to work on a patch where would rollback and reallocate buffers entirely when we run into a mismatched channel count, copying the data out of the mono buffer and upmixing to stereo, but I didn't feel comfortable doing so without a repro.

We're going to add some telemetry here to track when this happens.

Attachment #9466323 - Attachment is obsolete: true

(In reply to Jim Mathies [:jimm] from comment #24)

We're going to add some telemetry here to track when this happens.

This happens when it crashes, that's effectively telemetry.

Severity: S2 → S3
Priority: -- → P2
Priority: P2 → P3
Attachment #9469143 - Attachment is obsolete: true
Attachment #9469146 - Attachment is obsolete: true
Attachment #9469145 - Attachment is obsolete: true
Attachment #9471262 - Attachment description: Bug 1905287 - Accomodate decoding Web Audio packets containing unexpected additional channel counts by creating/mixing additional channels as needed. → WIP: Bug 1905287 - Accomodate decoding Web Audio packets containing unexpected additional channel counts by creating/mixing additional channels as needed. r=padenot
Attachment #9471262 - Attachment description: WIP: Bug 1905287 - Accomodate decoding Web Audio packets containing unexpected additional channel counts by creating/mixing additional channels as needed. r=padenot → Bug 1905287 - Accomodate decoding Web Audio packets containing unexpected additional channel counts by creating/mixing additional channels as needed. r=padenot
Pushed by azebrowski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/31884380a5c4 Accomodate decoding Web Audio packets containing unexpected additional channel counts by creating/mixing additional channels as needed. r=padenot

from editing AAC files to create mismatched channel counts, to attempting to inject stereo packets into mono streams, but I wasn't able to reproduce

https://www.siriusxm.com/

It's possible for HE-AACv2 files to have a kind of implicit signaling, which means the file will move from mono to stereo after decoding a few packets as the AAC decoder discovers there is parametric stereo data present.

https://codereview.chromium.org/1805013003
You can probably find some test vectors here:
https://github.com/Dash-Industry-Forum/Test-Vectors/issues/9

(In reply to Serban Stanca [:SerbanS] from comment #31)

https://hg.mozilla.org/mozilla-central/rev/31884380a5c4

This final fix landed in 138, and we haven't seen anything since. I'm calling this fixed.

Status: REOPENED → RESOLVED
Closed: 4 months ago2 months ago
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: 136 Branch → 138 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: