Closed Bug 1382366 Opened 2 years ago Closed 2 years ago

Crash in mozilla::SystemClockDriver::WaitForNextIteration | mozilla::MediaStreamGraphImpl::UpdateMainThreadState

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P2, critical)

x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 58+ fixed
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 + fixed
firefox59 --- fixed

People

(Reporter: jesup, Assigned: karlt)

References

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main58+][adv-esr52.6+][post-critsmash-triage])

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is 
report bp-90598c3d-fa6e-4d42-8e1f-bbf6f0170719.
=============================================================

Goes back to at least 54.  Clear UAF
Ugh, apparently setting "Security-Sensitive" and "Media security bug" means you have to be in both to see it (or be on the CC)
Flags: needinfo?(padenot)
Rank: 15
Priority: -- → P1
Group: core-security
I think some of these crashes are a variation on 1384805, but not all of them.
Flags: needinfo?(padenot)
So in WaitForNextIteration, we assert the mGraphImpl owns the monitor, so mGraphImpl can't be null, and it seems unlikely it would be freed while the monitor is still held (perhaps we can add traps for that case, if it isn't already indirectly verified in the process of freeing a graph - and likely it is).  

That would mean that the only possible source of a UAF here is that the SystemClockDriver itself was freed. (i.e. this->mGraphImpl == ...e5e5). This could be verified if in 56b3 mNeedAnotherIteration is at offset 0x68d - 0x5e5 (see https://crash-stats.mozilla.com/report/index/b6e17226-4148-4cfb-94fb-459260170818).

And... wow.  There are 17(!) MediaStreamGrph threads in that report!  And a lot of them are in WaitForNextIteration()

Thoughts Paul?
Flags: needinfo?(padenot)
Yes, see my analysis in bug 1384805. There are multiple different crashes here, but what you describe is fixed by my patch.
Flags: needinfo?(padenot)
Hrm I got the bug number wrong _twice_. I meant bug 1388243.
Paul - any idea why we'd have 17 MSG threads??  17 active docs with webaudio??  

Perhaps this is was caused by the apparent issue I saw with an MSG not shutting down?
Flags: needinfo?(padenot)
And a thread named as an MSG thread has to be a ClockDriver thread, not a callback
(In reply to Randell Jesup [:jesup] from comment #7)
> Paul - any idea why we'd have 17 MSG threads??  17 active docs with
> webaudio??  
> 
> Perhaps this is was caused by the apparent issue I saw with an MSG not
> shutting down?

See comment 6, this explains why there is a lot of threads, and why they are leaking.
Flags: needinfo?(padenot)
Yes, this signature is a catch all really, it's where most of the time is spent when running a SystemClockDriver. There are multiple issues there, my patch fixed only one of them, hence the "some" in comment 3.
Flags: needinfo?(padenot)
Mass change P1->P2 to align with new Mozilla triage process.
Priority: P1 → P2
Duplicate of this bug: 1400558
Still causing UAF crashes in 57b8: https://crash-stats.mozilla.com/report/index/a73e1b70-6fc8-4936-88aa-97c800171017

We need someone to own investigating this bug
I'm not seeing any crashes with this signature in 58.0a1 though I did see them in 57.0a1 (and lots still in 57 betas). Is there something that might have fixed this in 58 that we could backport to 57 and maybe ESR 52.x? Not any of the bugs already mentioned here because those look like they were fixed in 56/57.

Assigning to :jib (triage owner) to find a real owner since this is a sec-high/crit bug and shouldn't be unassigned.
Assignee: nobody → jib
Flags: needinfo?(jib)
karl might have thoughts, or it might be related to some items he's working on
Karl, any thoughts on what might have fixed this?
Flags: needinfo?(karlt)
An hypothesis similar to "the large" from
https://bugzilla.mozilla.org/show_bug.cgi?id=1360334#c26:

1. AudioCallbackDriver::Shutdown() schedules AudioCallbackDriver::Stop() and
   spins an event loop on the main thread waiting for Stop() to return.

2. cubeb_stream_stop() is called on the AsyncCubebTask thread.

3. wasapi_stream_stop() fails to join the render thread in 5 seconds and so
   calls AudioCallbackDriver::StateCallback(CUBEB_STATE_ERROR).

4. mAudioStream is still set, and even if NextDriver() is set then it is not
   scheduled.  StateCallback() starts a SystemClockDriver.

5. The SystemClockDriver waits in WaitForNextIteration().
   mNeedAnotherIteration is only cleared in the SystemClockDriver, and so this
   wait will have a timeout.

6. AudioCallbackDriver::Shutdown() returns.

7. The graph releases the SystemClockDriver and destroys itself.

8. SystemClockDriver wakes, but no longer exists. UaF.
Assignee: jib → karlt
Status: NEW → ASSIGNED
Flags: needinfo?(karlt)
I suspect this is not yet fixed, but less frequent since the fix for bug 1360334.

I looked through changes since FIREFOX_BETA_57_BASE in MediaStreamGraph.cpp, GraphDriver.cpp, and media/libcubeb, but didn't see anything that I expect to have fixed this.

Bug 1388243 was also similar, and so that would have also reduced the frequency.
I considered borrowing mWaitState to fix this bug, but I think it is clearer to use a different and specific variable.
Attachment #8925423 - Flags: review?(padenot)
This approach uses a single variable for the test in StateCallback() to cover
situations from comment 19 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1408276#c15
as well as replacing the existing tests put in place for
https://bugzilla.mozilla.org/show_bug.cgi?id=1388243#c4 and
"the large" of https://bugzilla.mozilla.org/show_bug.cgi?id=1360334#c26

The change of approach results in removal of the recovery from failed
Dispatch() that appears to have been incidentally added with the fix for bug
1408276.  Dispatch() should not fail until late in shutdown and the graph
should be shut down earlier in shutdown.  If shutdown has already gone wrong,
there is not a whole lot to gain from keeping the graph running.

Failure to create a new thread is likely any time before shutdown, but there
is no recovery from this, before or after this patch.  I'd prefer not to complicate this bug with that issue.
Attachment #8925424 - Flags: review?(padenot)
Depends on: 1408276
Comment on attachment 8925423 [details] [diff] [review]
part 1: move declaration of mWaitState to SystemClockDriver

Review of attachment 8925423 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #8925423 - Flags: feedback+
Comment on attachment 8925424 [details] [diff] [review]
part 2: disable AudioCallback -> SystemClockDriver fallback before disowning graph

Review of attachment 8925424 [details] [diff] [review]:
-----------------------------------------------------------------

I traced through comment 19 and it looks very real. The fix looks good too.

What strikes me is that if GraphDriver operations were based on something like MozPromise (so resolve/reject handling at the callsite) we'd get rid of a whole lot of indirection and that would make reasoning about them simpler.

Thanks karl!
Attachment #8925424 - Flags: feedback+
Attachment #8925423 - Flags: review?(padenot) → review+
Attachment #8925424 - Flags: review?(padenot) → review+
The most likely path suspected for this in comment 19 did not exist until 55
https://hg.mozilla.org/mozilla-central/rev/c98ff2135b46d70fc3ce3635703a019d42d24539#l6.198
but there are other perhaps less expected modes of error from
wasapi_stream_render_loop(). e.g.
https://hg.mozilla.org/releases/mozilla-esr52/file/e4fec62e5347/media/libcubeb/src/cubeb_wasapi.cpp#l918

We still have a crash report for version 52.4, which has the fix for bug 1360334
https://crash-stats.mozilla.com/report/index/52283d50-fe8f-42fa-ab85-28eca0171023
Flags: needinfo?(jib)
Comment on attachment 8925424 [details] [diff] [review]
part 2: disable AudioCallback -> SystemClockDriver fallback before disowning graph

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

Triggering the UaF requires either an error from system libraries or a 5
second timeout.

I guess an attacker could consume available RAM in order to make five second
timeouts more likely.  I don't know any way that content could trigger this
kind of error.

> Do comments in the patch, the check-in comment, or tests included in the
> patch paint a bulls-eye on the security problem?

Comments in the patch describe how the code works and what is changed.
It is not too difficult to deduce the possibility of a UaF.

> Which older supported branches are affected by this flaw?

Part of the bug existed in 52.

> If not all supported branches, which bug introduced the flaw?

All supported.

> Do you have backports for the affected branches?
> If not, how different, hard to create, and risky will they be?

Backport for 52 is ready.  Others would not be difficult.
Code has not changed much.

> How likely is this patch to cause regressions; how much testing does it need?

There is a moderate risk here, limited to Web Audio and WebRTC.

There are many tests, but the interaction of threads means that it is not
possible to explicitly test all possible paths.  Some paths occur sometimes,
according to timing, and so it would be reassuring to have these tests running
on m-c for a few days before uplift.
Attachment #8925424 - Flags: sec-approval?
sec-approval for checkin on 11/28 or immediately after, two weeks into the new cycle. Do not check this into trunk before then.
Attachment #8925424 - Flags: sec-approval? → sec-approval+
Blocks: 1418820
https://hg.mozilla.org/mozilla-central/rev/53f055298fa9
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Please request Beta/ESR52 approval on the patches where applicable.
Flags: needinfo?(karlt)
Whiteboard: [checkin on 11/28]
Group: media-core-security → core-security-release
Comment on attachment 8925424 [details] [diff] [review]
part 2: disable AudioCallback -> SystemClockDriver fallback before disowning graph

Approval Request Comment

Only this patch need land on beta 58.

[Feature/Bug causing the regression]:
Don't know, but there are crashes on 52.4.  See comment 25.
[User impact if declined]:
sec-high
[Is this code covered by automated tests?]:
There are many tests, but the interaction of threads means that it is not
possible to explicitly test all possible paths.  Some paths occur sometimes,
according to timing.
[Has the fix been verified in Nightly?]:
No.  We don't have a reproducing testcase, and crashes are not frequent enough
to infer anything from lack of crash reports.
[Needs manual test from QE? If yes, steps to reproduce]:
No.
[List of other uplifts needed for the feature/fix]:
Bug 1408276 part 4.
[Is the change risky?]:
There is a moderate risk here, limited to Web Audio and WebRTC.
[Why is the change risky/not risky?]:
Some paths occur sometimes, according to timing.
[String changes made/needed]:
None.
Flags: needinfo?(karlt)
Attachment #8925424 - Flags: approval-mozilla-beta?
Comment on attachment 8926260 [details] [diff] [review]
esr52 patch (part 2 only)

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: 
Fix Landed on Version:
Risk to taking this patch (and alternatives if risky): 
String or UUID changes made by this patch: 

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

[Approval Request Comment]

Only this patch need land on ESR52.
Bug 1408276 esr52 patch needs to land before this.

If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined:
sec-high
Fix Landed on Version:
59
Risk to taking this patch (and alternatives if risky):
Moderate
String or UUID changes made by this patch:
None.
Attachment #8926260 - Flags: approval-mozilla-esr52?
Comment on attachment 8925424 [details] [diff] [review]
part 2: disable AudioCallback -> SystemClockDriver fallback before disowning graph

Fix a sec-high. Beta58+ & ESR52+.
Attachment #8925424 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8926260 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Depends on: 1422631
See Also: → 1415755
Blocks: 1426603
No longer depends on: 1422631
Whiteboard: [adv-main58+][adv-esr52.6+]
Flags: qe-verify-
Whiteboard: [adv-main58+][adv-esr52.6+] → [adv-main58+][adv-esr52.6+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.