Closed Bug 1360334 Opened 7 years ago Closed 7 years ago

Crash in mozilla::MediaStreamGraph::NotifyOutputData since Firefox 49

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr45 --- unaffected
firefox-esr52 56+ fixed
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 + fixed
firefox57 + fixed

People

(Reporter: jesup, Assigned: pehrsons)

References

Details

(5 keywords, Whiteboard: [adv-main56+][adv-esr52.4+][post-critsmash-triage])

Crash Data

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1302231 +++

+++ This bug was initially created as a clone of Bug #1293976 +++


Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::MediaStreamGraph::NotifyOutputData(float*, unsigned int, int, unsigned int) 	dom/media/MediaStreamGraph.cpp:1256
1 	xul.dll 	mozilla::AudioCallbackDriver::DataCallback(float const*, float*, long) 	dom/media/GraphDriver.cpp:911

We're still seeing a low volume of crashes in this function - perhaps the MSG object itself is getting deleted out from under us?
Crashes on for (auto& listener : mAudioInputs) 

Volume isn't high; 82 crashes in 52esr/53/54/55 in the last 3 months.  Clear UAF
Rank: 12
Flags: needinfo?(padenot)
Keywords: csectype-uaf
Tracking since this is sec-high.
I don't have any idea for now, I'll keep digging.
Paul, jya - we need to finally nail this one.
Flags: needinfo?(jyavenard)
looks like a UAF to me... not my aread of expertise, will investigate later.
We landed code in bug 1299489 to make iterator invalidation with range-based for loops and nsTArray to crash safely (non-exploitably)...we could backport that to esr52.  Unless the entire MediaStreamGraph object is getting deleted out from under us while we're iterating?
This is occurring in 54 and 55 (and likely 56), so it's not solved by the iterator patch.

It might be worth checking if it's possible for the MSG to be deleted while we're doing this.  I don't think so... but I haven't checked.
Track 56+ as sec-high.
Flags: needinfo?(padenot)
Flags: needinfo?(jyavenard)
Paul -- Can you take a look at this?  What action can we take here?  If this is actionable, I'm thinking of asking Pehrsons to take this on when he's back from PTO -- unless you want it.  If Pehrsons does take this bug, he'll appreciate a braindump and an assessment from you to get him started. Thanks.
Flags: needinfo?(padenot)
It's not actionable. I've done an initial analysis back when this got filed, and I had not found anything suspicious.

I know it's really important, but short of having an idea, we can probably bang my head on it to try to find a solution, but I can't promise anything. I can do a braindump, though.
Flags: needinfo?(padenot)
Thanks, Paul.  I'm going to do a needinfo to myself now so I don't forget to talk with him about this when he returns.  A fresh set of eyes (especially Andreas' eyes) may help a lot with this.
Flags: needinfo?(mreavy)
Quick notes for people attempting to understand this:
- This happens on Windows XP and Windows Vista+. This means this crash happens when WINMM and WASAPI are being used as audio backends (as seen in the stack traces). Considering that, this is probably not caused by cubeb. We had an instance of a problem like that in the past, WASAPI (vista+) only, and we fixed it with very weird code. The whole thing was caused (we think) by weirdness at the system level.
- This happens both in duplex and no duplex: WINMM has no duplex support, and WASAPI is using duplex by default.
- This crash happens _after_ having made an iteration in the graph (the call to OneIteration happens before we crash here). This means that the `listener` on which we're crashing has gone away, but has not been removed from the `mAudioInputs` array. The graph holds RefPtr to this object, I'm not sure how it can happen.
I wonder if this happens when we're in the process (or have already?) switched to or from a SystemClock driver...  Perhaps a 'late' callback that drives the graph breaking assumptions about "this is only modified on the 'graph' thread", which is really more than one thread - it's the the thread we're called back on by the driver (which we presume is one thread), OR a system clock callback.
Hey, Andreas.  Welcome back!  This is one of the bugs I'd love for you to look at and discuss with Padenot and Jesup.  A fresh set of eyes on this may help.  Moving the needinfo to you to ask if you'd feel comfortable taking a look and if so, (after you've had a chance to look) do you have any insights?  I want to find an owner for this if we think it's actionable.
Flags: needinfo?(mreavy) → needinfo?(apehrson)
Thanks! I had a look already while reading through the bugmail for this, but no leads yet. I'll dig a bit more. (and keeping the ni? as a reminder)
So I suspect we are looking at cubeb calling unsafely into DataCallback() [1], and the GraphDriver and MediaStreamGraph are being pulled away mid-call for other reasons.

Is wrapping the rawptr in a RefPtr an option in this hot path?


[1] https://hg.mozilla.org/releases/mozilla-beta/annotate/91e10e241176/dom/media/GraphDriver.cpp#l878
Flags: needinfo?(rjesup)
Flags: needinfo?(padenot)
Flags: needinfo?(apehrson)
A lot of the comments in crash-stats are talking about navigating away or closing a tab so it could be a shutdown issue at the root.
(In reply to Andreas Pehrson [:pehrsons] from comment #17)
> So I suspect we are looking at cubeb calling unsafely into DataCallback()
> [1], and the GraphDriver and MediaStreamGraph are being pulled away mid-call
> for other reasons.
> 
> Is wrapping the rawptr in a RefPtr an option in this hot path?

What would you wrap in a RefPtr? I don't understand.
Flags: needinfo?(padenot)
This guy:
> AudioCallbackDriver* driver = reinterpret_cast<AudioCallbackDriver*>(aUser);
Well maybe, but the fundamental problem is that the MSG should never go away in the middle of an iteration. It is a fundamental property of the whole system that the MSG decides of the lifetime of the various GraphDrivers, and the MSG is kept alive via a refptr in the global msg table (gGraphs).

Adding another ref is only going to add problem I think.
Yeah, I foresee it as likely to shift the problem to a new place - perhaps the next callback. If we're lucky it might give us some new clues.

But, up to you.

I'll dig into the code more, trying to do some backtracking.
See discussion on IRC.  I still suspect a mistake (or late callback, or something) during callback/clock driver switches
Flags: needinfo?(rjesup)
Re late callback I went through the code for stopping the wasapi callback driver. A late callback is as likely as stop() not happening before the 5s timeout we've set.

I recall seeing some funkiness a couple of months ago while debugging related to switching a driver before the previous switch had even occurred, so I'll look at cases like that today.
(In reply to Andreas Pehrson [:pehrsons] from comment #24)
> Re late callback I went through the code for stopping the wasapi callback
> driver. A late callback is as likely as stop() not happening before the 5s
> timeout we've set.

There is a protection mechanism in place to prevent this [0].

[0]: https://github.com/kinetiknz/cubeb/blob/master/src/cubeb_wasapi.cpp#L2041
A lot of staring and a couple of minidumps later, I have one tiny and one fairly large hole. The latter could be at play here.

---

First, the tiny:

From the crash at [1], and its minidump. See the crashing thread (42) at frame 4 (and a bit higher in that method), [2].

`previousDriver` is a raw pointer, and it remains a raw pointer while we unlock the monitor. If another thread changes `mDriver->mPreviousDriver` here, we don't have a reference to keep it alive. Worst case we'll send off an AsyncCubebTask that will shut down a freed driver.

The minidump here showed that `previousDriver` and `releaseEvent.mRawPtr` had the same address, as if the driver was freed and the task was allocated in its place.


[1] https://crash-stats.mozilla.com/report/index/71c86e44-5c34-4821-b3d8-4038a0170830
[2] https://hg.mozilla.org/releases/mozilla-release/annotate/10a244c0f835/dom/media/GraphDriver.cpp#l180

---

And the large:

From the crash at [3], and its minidump. See thread 74, "CubebOperation #2". It is running an AsyncCubebTask to shut down an AudioCallbackDriver (thread 72 as it looks like, which is blocked on a sys call). Thread 74 times out in wasapi_stream_stop while waiting to join thread 72 and, instead of exiting cleanly, calls the StateCallback with an error, which results in [4]: spawning a fallback SystemClockDriver, no matter what.

Now, if we ever decide to switch from one AudioCallbackDriver to another AudioCallbackDriver (without looking more into why..., padenot tells me it's valid), we end up at [5] with an AudioCallbackDriver as `mPreviousDriver`. This schedules one AsyncCubebTask for shutting down mPreviousDriver and one AsyncCubebTask for initiating `this`. If the former task results in a failed AudioCallbackDriver::Stop() (see previous paragraph), we have a live fallback SystemClockDriver and a live AudioCallbackDriver racing to run the same MediaStreamGraph.

No further analysis done on what happens in such a race, but I'll suggest weird stuff like prematurely killing the graph and causing UAFs.

[3] https://crash-stats.mozilla.com/report/index/86c88e6a-8c43-4a6e-958e-a51450170628
[4] https://hg.mozilla.org/releases/mozilla-beta/annotate/0a6a0148a61c/dom/media/GraphDriver.cpp#l1062
[5] http://searchfox.org/mozilla-central/rev/44c693914255638d74bcf1ec3b0bcd448dfab2fd/dom/media/GraphDriver.cpp#778

---

Patch coming to cover these two cases.
Amazing analysis, thanks !
Assignee: nobody → apehrson
Status: NEW → ASSIGNED
Attached patch Patch/57 (obsolete) — Splinter Review
Attachment #8905965 - Flags: review?(padenot)
Attachment #8905965 - Flags: review?(padenot) → review+
Comment on attachment 8905965 [details] [diff] [review]
Patch/57

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Not easily. You can infer that something happens when we receive an error from the audio stack, but not in which corner case it must happen to cause harm (switching from output-only to input+output or vice versa).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No.

Which older supported branches are affected by this flaw?
ESR 52, beta 56.
This goes cleanly on 56. I have a rebased patch for 52.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
Yes, 52 needs a couple of lines less code than 56/57, is all. No added risk.

How likely is this patch to cause regressions; how much testing does it need?
Unlikely. It adds code for when there's an error in the audio stack in webrtc only. Basic testing of some popular sites using webrtc should be enough, with extra care on Windows. To stretch it we should test with as many sound cards as possible, and videos playing in other tabs.
Attachment #8905965 - Flags: sec-approval?
Attached patch Patch/52Splinter Review
This is the 52 backport
sec-approval for trunk. I'll nominate the ESR52 patch (next time, you get to do it!) so release management can weigh in there.
Attachment #8905965 - Flags: sec-approval? → sec-approval+
Comment on attachment 8905965 [details] [diff] [review]
Patch/57

Putting beta approval on this too. It needs to be there as well as ESR52.
Attachment #8905965 - Flags: approval-mozilla-beta+
Attachment #8906037 - Flags: approval-mozilla-esr52?
Attached patch Patch/56-57Splinter Review
This adds the missing `.get()` for logging that was causing the bustage. Carries forward r+, approval-mozilla-beta+, sec-approval+, though the latter two I cannot set.

The 52 patch looks ok in this regard still.
Attachment #8905965 - Attachment is obsolete: true
Flags: needinfo?(apehrson)
Attachment #8906344 - Flags: review+
Comment on attachment 8906344 [details] [diff] [review]
Patch/56-57

sec-approval? just to get the flags backfilled.
Attachment #8906344 - Flags: sec-approval?
https://hg.mozilla.org/mozilla-central/rev/0821986e4f3b
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Group: media-core-security → core-security-release
Comment on attachment 8906344 [details] [diff] [review]
Patch/56-57

carrying forward flags from the previous version
Attachment #8906344 - Flags: sec-approval?
Attachment #8906344 - Flags: sec-approval+
Attachment #8906344 - Flags: approval-mozilla-beta+
Comment on attachment 8906037 [details] [diff] [review]
Patch/52

uaf crash fix, esr52.4+
Attachment #8906037 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Whiteboard: [adv-main56+][adv-esr52.4+]
Flags: qe-verify-
Whiteboard: [adv-main56+][adv-esr52.4+] → [adv-main56+][adv-esr52.4+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.