Closed Bug 1302231 Opened 8 years ago Closed 8 years ago

Crash in mozilla::MediaStreamGraph::NotifyOutputData since Firefox 49

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P1)

49 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox49 + wontfix
firefox-esr45 --- unaffected
firefox50 + wontfix
firefox51 + fixed
firefox52 + fixed
firefox53 --- fixed

People

(Reporter: jesup, Assigned: padenot)

References

Details

(Keywords: crash, regression, sec-high, Whiteboard: [fixed on trunk by bug 1314514][post-critsmash-triage][adv-main51+])

Crash Data

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1293976 +++

This bug was filed from the Socorro interface and is 
report bp-c3d75891-a31e-49da-ab1c-a8ebe2160809.
=============================================================

Crashing Thread (98)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::MediaStreamGraph::NotifyOutputData(float*, unsigned int, int, unsigned int) 	dom/media/MediaStreamGraph.cpp:1256
1 	xul.dll 	mozilla::AudioCallbackDriver::DataCallback(float const*, float*, long) 	dom/media/GraphDriver.cpp:911
2 	xul.dll 	mozilla::AudioCallbackDriver::DataCallback_s(cubeb_stream*, void*, void const*, void*, long) 	dom/media/GraphDriver.cpp:772
3 	xul.dll 	noop_resampler::fill(void*, long*, void*, long) 	media/libcubeb/src/cubeb_resampler.cpp:54
4 	xul.dll 	`anonymous namespace'::refill 	media/libcubeb/src/cubeb_wasapi.cpp:529
5 	xul.dll 	`anonymous namespace'::refill_callback_duplex 	media/libcubeb/src/cubeb_wasapi.cpp:737
6 	xul.dll 	`anonymous namespace'::wasapi_stream_render_loop 	media/libcubeb/src/cubeb_wasapi.cpp:906
7 	ucrtbase.dll 	_crt_at_quick_exit 	
8 	kernel32.dll 	BaseThreadInitThunk 	
9 	ntdll.dll 	__RtlUserThreadStart 	
10 	ntdll.dll 	_RtlUserThreadStart

this crash signature on windows and os x seems to be regressing since firefox 49 builds - on 49.0b1 it's currently making up 0.25% of all browser crashes.

------------------

After talking with mreavy, cloning a new bug to handle the remaining crashes (low frequency).
Tracking for 49. We could take a patch potentially for a 49 dot release.
Assigning for investigation.
Assignee: rjesup → padenot
Rank: 10
Paul's at TPAC this week.  When he's back next week, he and Jesup will confer on how to attack this.  Jesup has looked already and doesn't see any holes, but Paul may bring a fresh perspective.  This one has a much lower crash rate than Bug 1293976, which is good.
[Tracking Requested - why for this release]:
Tracking 52+ for this sec critical bug.
Have you and Jesup had a chance to figure out a plan for a fix here?
Flags: needinfo?(padenot)
At this point, it's mostly a stare-at-the-code-until-beads-of-blood-appear-on-our-foreheads situation...  I'll coordinate with padenot tomorrow to see if there's any safe bandaid we can apply (perhaps holding a should-be-redundant reference during the operation).
This might be fixed by bug 1308418, which is going to land on beta. We'll see in a couple days.
Flags: needinfo?(padenot)
Depends on: 1308418
From release management bot on bug 1308418:
Crash volume for signature 'mozilla::MediaStreamGraph::NotifyOutputData':
 - nightly (version 52): 1 crash from 2016-09-19.
 - aurora  (version 51): 2 crashes from 2016-09-19.
 - beta    (version 50): 46 crashes from 2016-09-20.
 - release (version 49): 366 crashes from 2016-09-05.
 - esr     (version 45): 0 crashes from 2016-07-25.

Crash volume on the last weeks (Week N is from 10-17 to 10-23):
            W. N-1  W. N-2  W. N-3  W. N-4
 - nightly       0       1       0       0
 - aurora        0       1       1       0
 - beta         12      15      10       3
 - release      91      63     128      37
 - esr           0       0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly
 - aurora
 - beta    #1315     #581
 - release #807      #236
 - esr
No crashes so far in beta 9 (which has bug 1308418), or in any aurora/nightly build with that.  Note that it's a very rare crash, so aurora/nightly doesn't tell us much, and more time is needed with beta 9 and later to be confident that it's fixed (or not).  We will continue to monitor.

If bug 1308418 doesn't fix it, we'll WONTFIX for 50.  Note that the original problem with NotifyOutputData *has* been fixed (that had a much higer crash rate).
There is a crash in beta 9 now. It's still unclear what happens here.
Marking as wontfix for Fx50 per comments 10 and 11, and we're out of time for this release.  We'll keep this on our radar for Fx51.
Dropping to sec-high given signature/frequency and discussion with dveditz recently
Keywords: sec-criticalsec-high
We believe this may be fixed by the checkin for bug 1314514 (which includes the fix for bug 1215194); we plan to uplift this next week.
Paul - can you nominate for uplift (perhaps here, since the other bug is public?)  Also, I meant to say bug 1315194 in the previous comment.
Attachment #8813116 - Flags: review?(kinetik)
This needed to be rebased a bit.
Attachment #8813118 - Flags: review?(kinetik)
Attachment #8813116 - Flags: review?(kinetik) → review+
Attachment #8813118 - Flags: review?(kinetik) → review+
Comment on attachment 8813116 [details] [diff] [review]
Cherry-pick a cubeb revision, aurora patch

Approval Request Comment
[Feature/regressing bug #]: 1221587
[User impact if declined]: low traffic crash, bug 1315194, and this bug. This crash can manifest with a variety of signatures, depending on the scenario. Explanations have been posted on bug 1315194. We believe that Chrome also hits this.
[Describe test coverage new/current, TreeHerder]: has landed on inbound for some time
[Risks and why]: this implements an emergency bailout for when the audio thread is stuck, so we avoid a potentially exploitable crash.
[String/UUID change made/needed]: none
Attachment #8813116 - Flags: approval-mozilla-aurora?
Comment on attachment 8813118 [details] [diff] [review]
Cherry-pick a cubeb revision, beta patch

Approval Request Comment
[Feature/regressing bug #]: 1221587
[User impact if declined]: low traffic crash, bug 1315194, and this bug. This crash can manifest with a variety of signatures, depending on the scenario. Explanations have been posted on bug 1315194. We believe that Chrome also hits this.
[Describe test coverage new/current, TreeHerder]: has landed on inbound for some time
[Risks and why]: this implements an emergency bailout for when the audio thread is stuck, so we avoid a potentially exploitable crash.
[String/UUID change made/needed]: none
Attachment #8813118 - Flags: approval-mozilla-beta?
Comment on attachment 8813116 [details] [diff] [review]
Cherry-pick a cubeb revision, aurora patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
We don't know why this happens, but there are some evidence that this could be a windows kernel bug (according to what Microsoft people told Chrome people, that have the same issue).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
The code makes it clear what happens, but to date, nobody (apart maybe the people from Microsoft) understands why that happens. Nobody can reproduce, it's just very low traffic crashes.

Which older supported branches are affected by this flaw?
Current release, beta, aurora.

If not all supported branches, which bug introduced the flaw?
I might always have been present.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
This is for the backports.

How likely is this patch to cause regressions; how much testing does it need?
It's been on nightly for a while now. This is a sec-bug because we've only found the probable cause and workaround recently, we thought it was a regular UAF from the Gecko code.
Attachment #8813116 - Flags: sec-approval?
Comment on attachment 8813118 [details] [diff] [review]
Cherry-pick a cubeb revision, beta patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
We don't know why this happens, but there are some evidence that this could be a windows kernel bug (according to what Microsoft people told Chrome people, that have the same issue).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
The code makes it clear what happens, but to date, nobody (apart maybe the people from Microsoft) understands why that happens. Nobody can reproduce, it's just very low traffic crashes.

Which older supported branches are affected by this flaw?
Current release, beta, aurora.

If not all supported branches, which bug introduced the flaw?
I might always have been present.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
This is for the backports.

How likely is this patch to cause regressions; how much testing does it need?
It's been on nightly for a while now. This is a sec-bug because we've only found the probable cause and workaround recently, we thought it was a regular UAF from the Gecko code.
Attachment #8813118 - Flags: sec-approval?
Blocks: 1221587
Comment on attachment 8813116 [details] [diff] [review]
Cherry-pick a cubeb revision, aurora patch

sec-approval, a=dveditz
Attachment #8813116 - Flags: sec-approval?
Attachment #8813116 - Flags: sec-approval+
Attachment #8813116 - Flags: approval-mozilla-aurora?
Attachment #8813116 - Flags: approval-mozilla-aurora+
Attachment #8813118 - Flags: sec-approval?
Attachment #8813118 - Flags: sec-approval+
Attachment #8813118 - Flags: approval-mozilla-beta?
Attachment #8813118 - Flags: approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-aurora/rev/cecb6fcd3ea1
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [fixed on trunk by bug 1314514]
Target Milestone: --- → mozilla53
Marking qe-verify- due to lack of test case or steps to verify. Please advise if there is a way to easily verify this fix.
Flags: qe-verify-
Whiteboard: [fixed on trunk by bug 1314514] → [fixed on trunk by bug 1314514][post-critsmash-triage]
I verified, using various queries on crash-stats, that the patch in 1302231 fixes this. Not a single crash since 2016-11-23 (around the time when it landed), on builds that have this patch.
Depends on: 1326176
Whiteboard: [fixed on trunk by bug 1314514][post-critsmash-triage] → [fixed on trunk by bug 1314514][post-critsmash-triage][adv-main51+]
Group: media-core-security
Blocks: 1360334
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: