Closed Bug 1616073 Opened 6 years ago Closed 2 years ago

Crash in [@ mozilla::MediaRawData::~MediaRawData]

Categories

(Core :: Audio/Video, defect)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- affected
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox80 --- wontfix
firefox81 --- wontfix
firefox82 --- fix-optional

People

(Reporter: philipp, Unassigned)

Details

(Keywords: crash, csectype-wildptr, sec-high)

Crash Data

This bug is for crash report bp-d0b277d2-0e42-43af-a66a-590270200217.

Top 10 frames of crashing thread:

0 xul.dll mozilla::MediaRawData::~MediaRawData dom/media/MediaData.cpp:483
1 xul.dll mozilla::MediaRawData::~MediaRawData dom/media/MediaData.cpp:483
2 xul.dll nsTArray_Impl<RefPtr<mozilla::MediaRawData>, nsTArrayInfallibleAllocator>::Clear xpcom/ds/nsTArray.h:1825
3 xul.dll mozilla::TrackBuffersManager::TrackData::Reset dom/media/mediasource/TrackBuffersManager.h:378
4 xul.dll mozilla::TrackBuffersManager::ProcessTasks dom/media/mediasource/TrackBuffersManager.cpp:248
5 xul.dll mozilla::TrackBuffersManager::QueueTask dom/media/mediasource/TrackBuffersManager.cpp:170
6 xul.dll mozilla::detail::RunnableMethodImpl<RefPtr<mozilla::AbstractCanonical<RefPtr<AudioDeviceInfo> > >, void  xpcom/threads/nsThreadUtils.h:1215
7 xul.dll mozilla::TaskQueue::Runner::Run xpcom/threads/TaskQueue.cpp:207
8 xul.dll nsThreadPool::Run xpcom/threads/nsThreadPool.cpp:299
9 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1220

this cross-platform content-crash signature has been present for a while already in rather low volume. many of the reports look sec sensitive.

Keywords: sec-high

The priority flag is not set for this bug.
:bryce, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bvandyk)

Holding NI. P3 for now, pending retriage during cycle planning.

Priority: -- → P3

Temporary owner until we establish a person who can take a look at this.

Assignee: nobody → bvandyk
Crash Signature: [@ mozilla::MediaRawData::~MediaRawData] → [@ mozilla::MediaRawData::~MediaRawData] [@ RtlEnterCriticalSection | je_free | mozilla::MediaRawData::~MediaRawData] [@ nsTArray_Impl<T>::~nsTArray_Impl | mozilla::MediaRawData::~MediaRawData]

This and bug 1622936 look similar to me. Both involve MSE media data, and I'd expect them to be relatively hot paths. Bug 1622936 shows a recent uptick, which this on does not. I'm not sure what to make of that.

:dmajor has helped look at the reports for this and bug 1622936.

He observed the in many cases we have crashes addresses that involve single bit flips.

r9==800007feef15d2f0 and nearby r14==000007feef15d2f0
or 0x00017ff9b115d2f0 vs 0x00007ff9b115d2f0

FWIW, if I take out the stray bit, then the thing was trying to be xul!sEmptyTArrayHeader which makes sense that many reports from this build have a crashing address ending in ...d2f0 since that data structure is baked into libxul
these are not all coming from the same install time or the same CPU model, so it doesn't seem like a hardware bug

in fact, it may be fair to say that anything from that build ending in "1b90" was a mistaken read of xul!sEmptyTArrayHeader: https://crash-stats.mozilla.org/search/?address=%241b90&reason=%3DEXCEPTION_ACCESS_VIOLATION_READ&product=Firefox&version=74.0&date=%3E%3D2020-03-24T16%3A05%3A00.000Z&date=%3C2020-03-31T16%3A05%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address#facet-signature

:dmajor, please let me know if there's anything I've overlooked copying here that you think is relevant and/or anything else you think helps here.

:dveditz, do you have any thoughts on these issues?

Flags: needinfo?(dveditz)

What really surprised me is that we even see this bitflip on mac, like https://crash-stats.mozilla.org/report/index/8b9a59d1-0dae-4675-aba0-beb070200330#tab-rawdump where rax is a libxul address and rcx has an extra bit.

Could it be that media code is executed so often in loops that if you do have bad ram, you're most likely to notice it while running media code?

Or is this a sign of something malicious? Is there any code out there that likes to go around flipping random bits in the heap?

I'm not sure I have much to add. Suspicious that the UAF marker shows up in there (in r9 a lot). Since we're in a destructor could that be left over from loading that pattern to stomp on the memory? Not all of the crashes seem to involve bitflips, but I don't know what that means.

Flags: needinfo?(dveditz)

(In reply to Daniel Veditz [:dveditz] from comment #6)

I'm not sure I have much to add. Suspicious that the UAF marker shows up in there (in r9 a lot). Since we're in a destructor could that be left over from loading that pattern to stomp on the memory? Not all of the crashes seem to involve bitflips, but I don't know what that means.

Just to make sure we're not looking at different things, let's look at a specific report: https://crash-stats.mozilla.org/report/index/521a65ea-300a-4f7c-97a0-7b9be0200401#tab-rawdump

This one has poison in r9 but it's fine. The MediaData destructor is destroying members in reverse order of declaration, and the destructor for mCryptoInternal is inlined, and so far we've destroyed mCryptoInternal.mInitDataType and mCryptoInternal.mInitDatas. That's where r9 picked up poison, and we haven't touched r9 since. At the point of the crash we are attempting to destroy mCryptoInternal.mIV, running its IsEmpty() check, when we have a bit flip on its mHdr pointer:

0:032> ? xul!sEmptyTArrayHeader ^ @rcx
Evaluate expression: 536870912 = 00000000`20000000

https://crash-stats.mozilla.org/report/index/cdcc4a56-2d74-4443-b82f-118b40200329 and https://crash-stats.mozilla.org/report/index/18162346-1b5c-4154-9188-9de890200325 both have single bit flips on mHdr even though their crash reports claim to have ECC memory (in the latter case multi-bit ECC)

I'm unclear how to proceed further with this. I'm all ears for suggestions, but am marking this stalled for now.

Keywords: stalled

The bug assignee didn't login in Bugzilla in the last months and this bug has priority 'P1'/severity 'S1'.
:jimm, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: brycebugemail → nobody
Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
Severity: S1 → S2
Priority: P1 → --
Blocks: media-triage

Looks to have gone away.

No longer blocks: media-triage
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled
Group: media-core-security
You need to log in before you can comment on or make changes to this bug.