Closed Bug 1396879 Opened 3 years ago Closed 3 years ago

UAF and/or nullptr + offset crash using repeated applyConstraints to turn off all processing

Categories

(Core :: WebRTC: Audio/Video, defect, P2, critical)

56 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 + fixed
firefox57 + fixed

People

(Reporter: jib, Assigned: jesup)

References

Details

(5 keywords, Whiteboard: [post-critsmash-triage])

Attachments

(1 file, 5 obsolete files)

+++ This bug was initially created as a clone of Bug #1381638 +++

STRs:
 1. Open http://jsfiddle.net/jib1/n49r4v6p/ + share mic
 2. Say "Luke, I am your father"

Expected result: No hilarity

Actual result: Vader-speak on multi-second delay.

Reproduced on OSX, Windows 10 and linux.

This does not appear to be a regression, even though with the blog making the problem so obvious it is hard to not imagine it worked at some point. I even tried setting media.navigator.audio.full_duplex=false, but it didn't help (maybe I should have restarted).

Alternative STRs:
 1. Open https://blog.mozilla.org/webrtc/fiddle-of-the-week-audio-constraints/
 2. Click "Result" tab.
 3. Uncheck noiseSuppression
 4. Uncheck echoCancellation

Once when I tried these alternate STRs, it crashed on step 4, so I'm making this hidden for now:

https://crash-stats.mozilla.com/report/index/37770fea-685c-41c5-876f-743490170905
Severity: major → critical
Rank: 13
I was able to repro it now by first turning off noiseSuppression and then hammering on the echoCancellation checkbox. Tab-crashed after a good 30 seconds of hammering on it.

https://crash-stats.mozilla.com/report/index/65fcdbd8-8bca-4d2d-baeb-719090170905
Update from Paul who noticed this is a virtual call at offset 0x18 in the vtable, from a nullptr, so not a UAF.
Summary: Severe mic distortion/slowdown/delay using applyConstraints to turn off processing (+intermittent UAF!) → Severe mic distortion/slowdown/delay using applyConstraints to turn off processing (+intermittent nullptr+offset crash!)
Group: core-security → media-core-security
Group: media-core-security
Whiteboard: [sg:dos]
Assignee: nobody → achronop
Comment on attachment 8905891 [details]
Bug 1396879 - Synchronize switching to fast path when turn off processing.

https://reviewboard.mozilla.org/r/177700/#review182826

::: dom/media/webrtc/MediaEngineWebRTC.h:644
(Diff revision 1)
>  
>    nsTArray<int16_t> mInputBuffer;
>    // mSkipProcessing is true if none of the processing passes are enabled,
>    // because of prefs or constraints. This allows simply copying the audio into
>    // the MSG, skipping resampling and the whole webrtc.org code.
> -  bool mSkipProcessing;
> +  std::atomic<int> mSkipProcessing;

Why int? This should be a bool. I think you need std::atomic_bool, but I don't know why it's that way.
Attachment #8905891 - Flags: review?(padenot)
Comment on attachment 8905891 [details]
Bug 1396879 - Synchronize switching to fast path when turn off processing.

https://reviewboard.mozilla.org/r/177700/#review182880

Also, just making this Atomic doesn't necessarily solve the problem.  You need a consistent state in this and sChannelsOpen and AudioOutputObserver.  Basically if it gets changed while the stream is running, extreme care about ordering must be taken (including caring about the consistency type of atomic!  see Atomics.h).  Also, the threading usage here needs to be documented in comments, and PassThrough() is bad, since it hides the atomic access when reading the code (unless it internally takes and releases a lock).   

Probably the safe-but-fast way is to queue existance of a change request using atomics (mNextSkipState), and make the change in the callback (if need be taking a lock to do the change).

::: dom/media/webrtc/MediaEngineWebRTC.h:644
(Diff revision 1)
>  
>    nsTArray<int16_t> mInputBuffer;
>    // mSkipProcessing is true if none of the processing passes are enabled,
>    // because of prefs or constraints. This allows simply copying the audio into
>    // the MSG, skipping resampling and the whole webrtc.org code.
> -  bool mSkipProcessing;
> +  std::atomic<int> mSkipProcessing;

you should use Atomic<bool>, and #include mfbt/Atomics.h

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp:208
(Diff revision 1)
>    , mSampleFrequency(MediaEngine::DEFAULT_SAMPLE_RATE)
>    , mTotalFrames(0)
>    , mLastLogFrames(0)
>    , mPlayoutDelay(0)
>    , mNullTransport(nullptr)
> -  , mSkipProcessing(false)
> +  , mSkipProcessing(0)

false
Attachment #8905891 - Flags: review-
I'm going to mark this bug as hidden again, based on our code observations that mAudioOutputObserver, a RefPtr introduced in 56 (bug 1372247) [3], is read on on the GraphDriver thread here [1], then used inside InsertFarEnd() here [2], while it may get nulled on the media thread here [3].

If mAudioOutputObserver is nulled before calling ->InsertFarEnd() [1] it would be a nullptr with offset crash.
If mAudioOutputObserver is nulled later inside ->InsertFarEnd() [3] it would appear to be a UAF accessing its previous value.

There is no locking around this, so both seem plausible, even though we've only observed the nullptr crash. The solution seems simple, don't even null this pointer.

I'm also renaming this to focus on the crash regressions, now that we know it regressed in bug 1372247. I've opened bug 1399193 instead to cover the Vader-voice problem, which predates 56. 

[1] http://searchfox.org/mozilla-central/rev/51eae084534f522a502e4b808484cde8591cb502/dom/media/webrtc/MediaEngineWebRTCAudio.cpp#595
[2] http://searchfox.org/mozilla-central/rev/51eae084534f522a502e4b808484cde8591cb502/dom/media/webrtc/MediaEngineWebRTCAudio.cpp#103
[3] https://hg.mozilla.org/mozilla-central/rev/3da6bb01f447#l7.67
Group: core-security
Summary: Severe mic distortion/slowdown/delay using applyConstraints to turn off processing (+intermittent nullptr+offset crash!) → nullptr + offset crash using repeated applyConstraints to turn off all processing
Summary: nullptr + offset crash using repeated applyConstraints to turn off all processing → UAF and/or nullptr + offset crash using repeated applyConstraints to turn off all processing
Group: core-security → media-core-security
Whiteboard: [sg:dos]
Mass change P1->P2 to align with new Mozilla triage process.
Priority: P1 → P2
Attachment #8905891 - Attachment is obsolete: true
Attachment #8905891 - Flags: review?(padenot)
Attachment #8905891 - Flags: review?(jib)
Tracking for 57.  I guess it's likely too late to avoid shipping this in 56.
I try to make the minimum changes in order to be an easy uplift.
Attachment #8908657 - Flags: review?(padenot)
Attachment #8908657 - Flags: review?(jib)
Comment on attachment 8908657 [details] [diff] [review]
crash-switching-to-fast-path.patch

changing reviewer to jesup who reviewed it before.
Attachment #8908657 - Flags: review?(padenot) → review?(rjesup)
Assignee: achronop → rjesup
Status: NEW → ASSIGNED
Comment on attachment 8908657 [details] [diff] [review]
crash-switching-to-fast-path.patch

Review of attachment 8908657 [details] [diff] [review]:
-----------------------------------------------------------------

I'm also concerned about synchronization around other variables here, like the sampling frequency.  Overall, this needs more work to support dynamic switching safely - and doubly-so if we need to uplift.  Simpler to not do dynamic changes, since that's not a common usecase (in fact it would be quite rare I'd think)

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp
@@ +375,5 @@
>          prefs.mChannels = channelCount;
>        }
>  
> +      {
> +        bool lastSkipProcessing = !(mLastPrefs.mAecOn || mLastPrefs.mAgcOn || mLastPrefs.mNoiseOn);

Why is this not lastSkipProcessing = mSkipProcessing?  Or get rid of lastSkipProcessing (though reading mSkipProcessing multiple times is more expensive since it's Atomic).  Or are we trying to avoid reading it at all if there's no change? (perhaps).

@@ +386,5 @@
> +            mSampleFrequency = MediaEngine::USE_GRAPH_RATE;
> +          } else {
> +            // Clear old entries before start using observer.
> +            // At this point it is certain that no one is using it.
> +            // Call Clear() from non audio thread because it calls free.

hardly important since InsertFarEnd calls moz_xmalloc() for each chunk...  
And you haven't clearly indicated what threads interact with mSkipProcessing
Attachment #8908657 - Flags: review?(rjesup) → review-
Comment on attachment 8908780 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

Review of attachment 8908780 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webrtc/MediaEngineWebRTCAudio.cpp
@@ +422,5 @@
> +    // we don't allow switching from non-fast-path to fast-path on the fly yet
> +    if (!mAudioOutputObserver) {
> +      mAudioOutputObserver = new AudioOutputObserver();
> +    } else {
> +      LOG(("%s No dynamic switch off fast-path yet",__FUNCTION));

This log statement is wrong, since if (!mAudioOutputObserver) is just a late-init pattern, and a non-zero mAudioOutputObserver is the common condition.

We don't want to see this log message every time someone toggles one of the three constraints.

@@ +651,5 @@
>  MediaEngineWebRTCMicrophoneSource::InsertInGraph(const T* aBuffer,
>                                                   size_t aFrames,
>                                                   uint32_t aChannels)
>  {
> +  MonitorAutoLock lock(mMonitor);

While I'm nervous to add a lock to this hot path without checking with Paul, jesup points out mSources can cause a UAF below without a lock, e.g. when a source is removed in Stop(), so it is no doubt needed.
Attachment #8908780 - Flags: review?(jib) → review+
Jesup suggested we take a different tack here, restricting fast-path to gUM only basically, not applyConstraints. Safer to upload a minimal fix to the UAFs that way.
Comment on attachment 8908657 [details] [diff] [review]
crash-switching-to-fast-path.patch

We went a different direction, as noted in comment 16.
Attachment #8908657 - Attachment is obsolete: true
Attachment #8908657 - Flags: review?(jib)
previous patch let mSkipProcessing change anyways, so it would switch but use the wrong rate (hidden by PassThrough()).  Now doesn't allow switching for real.
Attachment #8908821 - Flags: review?(jib)
Attachment #8908780 - Attachment is obsolete: true
simpler patch that correctly deals with ensuring we create the Observer when needed, and more clearly blocks changes when in use.
Attachment #8908829 - Flags: review?(jib)
Attachment #8908821 - Attachment is obsolete: true
Attachment #8908821 - Flags: review?(jib)
Allows switching into non-fastpath
Attachment #8908842 - Flags: review?(jib)
Attachment #8908829 - Attachment is obsolete: true
Attachment #8908829 - Flags: review?(jib)
Comment on attachment 8908842 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

Review of attachment 8908842 [details] [diff] [review]:
-----------------------------------------------------------------

Lgtm!
Attachment #8908842 - Flags: review?(jib) → review+
Comment on attachment 8908842 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

Review of attachment 8908842 [details] [diff] [review]:
-----------------------------------------------------------------

Just fix commit message (we allow switching FROM fast-path, just not TO fat-path)
Comment on attachment 8908842 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

Review of attachment 8908842 [details] [diff] [review]:
-----------------------------------------------------------------

I'm tired, I should test before r+ing. This is what I thought I wanted, unfortunately, when I test it with https://jsfiddle.net/jib1/9gctmawp/ (which is modified to start in fast-path), I get smurf-voice as soon as I turn on any filtering).

This sounds terrible, so I think your last patch, which instead makes echoCancellation, noiseSuppression and autoGainControl  non-functional when started in fast-path (all filters off) is preferable for uplift. :-(
Attachment #8908842 - Flags: review+ → review-
Comment on attachment 8908821 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

This one.

Just change the comment to: "we don't allow switching to or from fast-path on the fly yet"
Attachment #8908821 - Attachment is obsolete: false
Attachment #8908821 - Flags: review+
Comment on attachment 8908829 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

This is the one we want...
Attachment #8908829 - Attachment is obsolete: false
Attachment #8908829 - Flags: review?(jib)
Attachment #8908829 - Flags: review?(jib) → review+
Attachment #8908821 - Attachment is obsolete: true
Attachment #8908842 - Attachment is obsolete: true
Comment on attachment 8908829 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

We believe it might be possible to cause a UAF in theory; we've been unable to cause anything except rare nullptr crashes.  There are none in the field in crash-stats.

[Security approval request comment]
How easily could an exploit be constructed based on the patch?  Very hard - To even try to exploit it, you have to get the user to ok mic permission.  

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Not really, though someone might infer it has to do with freeing an object.  We also move a lock, since fast-path audio was not locking.

Which older supported branches are affected by this flaw? 56 and 57

If not all supported branches, which bug introduced the flaw? Bug 1372247 

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Trivial

How likely is this patch to cause regressions; how much testing does it need? Unlikely to cause regressions, since it simply locks state once a capture starts.



This is for 56, not 55.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1372247 

[User impact if declined]: Audio distortion and possible crashes if you switch audio processing on the fly

[Is this code covered by automated tests?]: No

[Has the fix been verified in Nightly?]: Locally verified; will be verified in next Nightly that contains it

[Needs manual test from QE? If yes, steps to reproduce]: No manual test needed.  Two engineers have tested it.

[List of other uplifts needed for the feature/fix]: None

[Is the change risky?]: No - it simply avoids the risky code once a capture is started.

[Why is the change risky/not risky?]: Removes any state changes that can cause the problem

[String changes made/needed]: none
Attachment #8908829 - Flags: sec-approval?
Attachment #8908829 - Flags: approval-mozilla-release?
Note: while a sec-high, I think we could wait and let it go unpatched in 56 in release even with a patch in 57.  The risk of exploits on this one is pretty low (and real-world users are unlikely to trigger the failure mode either). Needing user permission to even try to exploit a narrow timing how helps a ton here.  I'd be ok with dropping it to moderate, given that.

The main risk would come from someone giving permanent permission to evilsite.com, letting them poke at the hole to try to trigger a UAF
https://hg.mozilla.org/mozilla-central/rev/32bb2c8925bf
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Dan, do you have an opinion here? 
The RC build is Monday so we can still take this patch for 56.
Flags: needinfo?(dveditz)
Needing permission is certainly a mitigating factor. Still, I think the UAF risk here is higher than can be deduced from crashreports. I'm able to trigger this, not just by flipping between unfiltered (fast-path) and some filtered audio, but also flipping a single audio filter outside of fast-path, which runs through this code [1]:

    mAudioOutputObserver = new AudioOutputObserver();

This allocates new memory for a new AudioOutputObserver every time, and - mAudioOutputObserver being a RefPtr - deletes the previous one (frees the previous memory), and updates the pointer to the new memory, without any locking, all while the GraphDriver may be inside AudioOutputObserver::InsertFarEnd() using the old `this` pointer now pointing to free memory.

Click the noiseSuppression checkbox in this fiddle https://jsfiddle.net/jib1/0109ggpv/3/ to start a loop flipping this filter, and it reliably crashes within ~20-40 seconds for me.

Now, they're nullptr crashes, but I think this is misleading, because this isn't an asan build, so chances are high that the freed memory still contains good-looking data, and nothing observably bad happens.

[1] http://searchfox.org/mozilla-central/rev/e55c008768e788eff01694c08258e80eb71daa18/dom/media/webrtc/MediaEngineWebRTCAudio.cpp#358
Tbc unlike the STRs in comment 0, the nullptr crashes below from comment 30 CANNOT be caused by this line [1] (which isn't run, because I left echoCancellation on all the time):

    mAudioOutputObserver = nullptr;

rather they come from accessing freed memory inside AudioOutputObserver::InsertFarEnd(), such as mPlayoutFifo in bp-b192472e-3a6a-4500-a967-345c60170916

Here's a different one bp-1b8aea68-8ef1-4ff1-ba75-6ef250170916

I also managed to provoke a layout crash? bp-4a64ba4d-a29e-4124-8303-5a8930170916

Lots of bad news here imho.

[1] http://searchfox.org/mozilla-central/rev/e55c008768e788eff01694c08258e80eb71daa18/dom/media/webrtc/MediaEngineWebRTCAudio.cpp#354
[2] https://hg.mozilla.org/mozilla-central/annotate/dbb05142c666/content/media/webrtc/MediaEngineWebRTCAudio.cpp#l159
With the comment 30 and comment 31 notes, I think we should get this into 56.
(In reply to Al Billings [:abillings] from comment #32)
> With the comment 30 and comment 31 notes, I think we should get this into 56.

Agreed
Comment on attachment 8908829 [details] [diff] [review]
don't dynamically switch to/from fast-path gUM audio

Per comment #32, Take this in 56.
Attachment #8908829 - Flags: approval-mozilla-release? → approval-mozilla-release+
Suddenly we took a different direction than I was heading and I am not sure how to proceed. I have fixes for the crash and the rate issue (which is bug 1399193). The crash issue has been tested well and I would continue testing the rate fix after finishing this one.
jib and I decided after reviewing the original patch and possible race issues (plus the fact that the MediaStreamTrack frequency was set at creation time and can't be changed) that the safest thing to do was to not allow switching from/to fast-path after the capture has started.  We can look at larger changes (perhaps as part of the VoE removal that I started and padenot will be finishing) for 58, since that will as a side-effect probably move all the capture MediaStreamTracks to run at GraphRate()
Group: media-core-security → core-security-release
Looks like comment 29 was answered by abillings
Blocks: 1372247
Flags: needinfo?(dveditz)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #37)
> Looks like this landed in 56:
> https://hg.mozilla.org/releases/mozilla-release/rev/
> 4a1e218e8bd2e082f6f37f42c0742abb75004762

And also on FIREFOX_56b13_RELBRANCH.
https://hg.mozilla.org/releases/mozilla-beta/rev/67bf2e447a8f
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Attachment #8908829 - Flags: sec-approval? → sec-approval+
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.