Intermittent ASAN heap-buffer-overflow in test_playback_rate.html

RESOLVED FIXED in Firefox 30

Status

()

defect
--
critical
RESOLVED FIXED
6 years ago
10 months ago

People

(Reporter: RyanVM, Assigned: jesup)

Tracking

(4 keywords)

Trunk
mozilla32
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox26- wontfix, firefox27+ wontfix, firefox28+ wontfix, firefox29+ wontfix, firefox30+ fixed, firefox31+ fixed, firefox32+ fixed, firefox-esr17 unaffected, firefox-esr2430+ wontfix, b2g18 unaffected, b2g-v1.1hd unaffected, b2g-v1.2 wontfix, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed, b2g-v2.0 fixed, seamonkey2.26 fixed)

Details

(Whiteboard: [adv-main30+][adv-esr24.6+][qa-], crash signature)

Attachments

(7 attachments, 3 obsolete attachments)

20.89 KB, text/plain
Details
8.57 KB, text/plain
Details
11.11 KB, text/plain
Details
67.48 KB, patch
Details | Diff | Splinter Review
9.07 KB, patch
jesup
: review+
Details | Diff | Splinter Review
9.15 KB, patch
jesup
: review+
Details | Diff | Splinter Review
2.49 KB, patch
padenot
: review+
Details | Diff | Splinter Review
Unfortunately, this doesn't fail in a TBPL-starrable way. Decoder, anything we can do about that?

https://tbpl.mozilla.org/php/getParsedLog.php?id=28460670&tree=Mozilla-Inbound

Ubuntu ASAN VM 12.04 x64 mozilla-inbound opt test mochitest-1 on 2013-09-27 04:07:56 PDT for push 06f9ccafcf8b
slave: tst-linux64-ec2-373

04:43:57     INFO -  153499 INFO TEST-PASS | /tests/content/media/test/test_playback_rate.html | If we disable the pitch preservation, it should appear as such.
04:43:57     INFO -  153500 INFO TEST-PASS | /tests/content/media/test/test_playback_rate.html | The playback rate shoud be 0.25.vbr.mp3-5
04:43:57     INFO -  153501 INFO TEST-PASS | /tests/content/media/test/test_playback_rate.html | We are effectively slowing down playback. (18.025, 4.217999) for vbr.mp3-5
04:43:57     INFO -  153502 INFO TEST-PASS | /tests/content/media/test/test_playback_rate.html | Current time should not change when playbackRate is null (4.217999 4.217999).
04:43:57     INFO -  153503 INFO TEST-PASS | /tests/content/media/test/test_playback_rate.html | The element should not be in paused state.
04:43:57     INFO -  153504 INFO TEST-PASS | /tests/content/media/test/test_playback_rate.html | Paused event should not have been received.
04:43:57     INFO -  153505 INFO TEST-PASS | /tests/content/media/test/test_playback_rate.html | Playback rate should be clamped to 5.
04:43:58     INFO -  =================================================================
04:43:58     INFO -  ==2421==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f46b7bbe810 at pc 0x7f46f1bc4ad3 bp 0x7f46b04d1250 sp 0x7f46b04d1248
04:43:58     INFO -  WRITE of size 4 at 0x7f46b7bbe810 thread T85
04:43:58     INFO -      #0 0x7f46f1bc4ad2 (/builds/slave/test/build/application/firefox/libxul.so+0x63a7ad2)
04:43:58     INFO -      #1 0x7f46f1bc2c6e (/builds/slave/test/build/application/firefox/libxul.so+0x63a5c6e)
04:43:58     INFO -      #2 0x7f46f1bc5f95 (/builds/slave/test/build/application/firefox/libxul.so+0x63a8f95)
04:43:58     INFO -      #3 0x7f46ee298ab4 (/builds/slave/test/build/application/firefox/libxul.so+0x2a7bab4)
04:43:58     INFO -      #4 0x7f46ee298e36 (/builds/slave/test/build/application/firefox/libxul.so+0x2a7be36)
04:43:58     INFO -      #5 0x7f46f1bcc7a4 (/builds/slave/test/build/application/firefox/libxul.so+0x63af7a4)
04:43:58     INFO -      #6 0x44ccc3 (/builds/slave/test/build/application/firefox/firefox+0x44ccc3)
04:43:58     INFO -      #7 0x7f46fd0c4e99 (/lib/x86_64-linux-gnu/libpthread.so.0+0x7e99)
04:43:58     INFO -      #8 0x7f46fc1d5dbc (/lib/x86_64-linux-gnu/libc.so.6+0xf3dbc)
04:43:58     INFO -  0x7f46b7bbe810 is located 0 bytes to the right of 155664-byte region [0x7f46b7b98800,0x7f46b7bbe810)
04:43:58     INFO -  freed by thread T1362 (Media Audio) here:
04:43:58     INFO -      #0 0x446aa5 (/builds/slave/test/build/application/firefox/firefox+0x446aa5)
04:43:58     INFO -      #1 0x7f46f1bc0208 (/builds/slave/test/build/application/firefox/libxul.so+0x63a3208)
04:43:58     INFO -  previously allocated by thread T1362 (Media Audio) here:
04:43:58     INFO -      #0 0x4467f5 (/builds/slave/test/build/application/firefox/firefox+0x4467f5)
04:43:58     INFO -      #1 0x7f46f1bc0095 (/builds/slave/test/build/application/firefox/libxul.so+0x63a3095)
04:43:58     INFO -  Thread T85 created by T83 (Media Audio) here:
04:43:58     INFO -      #0 0x4375c1 (/builds/slave/test/build/application/firefox/firefox+0x4375c1)
04:43:58     INFO -      #1 0x7f46f1bcb061 (/builds/slave/test/build/application/firefox/libxul.so+0x63ae061)
04:43:58     INFO -      #2 0x7f46f1bca87f (/builds/slave/test/build/application/firefox/libxul.so+0x63ad87f)
04:43:58     INFO -      #3 0x7f46ee2e1b18 (/builds/slave/test/build/application/firefox/libxul.so+0x2ac4b18)
04:43:58     INFO -  Thread T83 (Media Audio) created by T76 (Media State) here:
04:43:58     INFO -      #0 0x4375c1 (/builds/slave/test/build/application/firefox/firefox+0x4375c1)
04:43:58     INFO -      #1 0x7f46f9ba6b35 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6ab35)
04:43:58     INFO -      #2 0x7f46f9ba6687 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6a687)
04:43:58     INFO -  Thread T76 (Media State) created by T0 here:
04:43:58     INFO -      #0 0x4375c1 (/builds/slave/test/build/application/firefox/firefox+0x4375c1)
04:43:58     INFO -      #1 0x7f46f9ba6b35 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6ab35)
04:43:58     INFO -      #2 0x7f46f9ba6687 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6a687)
04:43:58     INFO -  Thread T1362 (Media Audio) created by T1335 (Media State) here:
04:43:58     INFO -      #0 0x4375c1 (/builds/slave/test/build/application/firefox/firefox+0x4375c1)
04:43:58     INFO -      #1 0x7f46f9ba6b35 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6ab35)
04:43:58     INFO -      #2 0x7f46f9ba6687 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6a687)
04:43:58     INFO -  Thread T1335 (Media State) created by T0 here:
04:43:58     INFO -      #0 0x4375c1 (/builds/slave/test/build/application/firefox/firefox+0x4375c1)
04:43:58     INFO -      #1 0x7f46f9ba6b35 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6ab35)
04:43:58     INFO -      #2 0x7f46f9ba6687 (/builds/slave/test/build/application/firefox/libnspr4.so+0x6a687)
04:43:58     INFO -  Shadow bytes around the buggy address:
04:43:58     INFO -    0x0fe956f6fcb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
04:43:58     INFO -    0x0fe956f6fcc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
04:43:58     INFO -    0x0fe956f6fcd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
04:43:58     INFO -    0x0fe956f6fce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
04:43:58     INFO -    0x0fe956f6fcf0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
04:43:58     INFO -  =>0x0fe956f6fd00: fd fd[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
04:43:58     INFO -    0x0fe956f6fd10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
04:43:58     INFO -    0x0fe956f6fd20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
04:43:58     INFO -    0x0fe956f6fd30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
04:43:58     INFO -    0x0fe956f6fd40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
04:43:58     INFO -    0x0fe956f6fd50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
04:43:58     INFO -  Shadow byte legend (one shadow byte represents 8 application bytes):
04:43:58     INFO -    Addressable:           00
04:43:58     INFO -    Partially addressable: 01 02 03 04 05 06 07
04:43:58     INFO -    Heap left redzone:     fa
04:43:58     INFO -    Heap right redzone:    fb
04:43:58     INFO -    Freed heap region:     fd
04:43:58     INFO -    Stack left redzone:    f1
04:43:58     INFO -    Stack mid redzone:     f2
04:43:58     INFO -    Stack right redzone:   f3
04:43:58     INFO -    Stack partial redzone: f4
04:43:58     INFO -    Stack after return:    f5
04:43:58     INFO -    Stack use after scope: f8
04:43:58     INFO -    Global redzone:        f9
04:43:58     INFO -    Global init order:     f6
04:43:58     INFO -    Poisoned by user:      f7
04:43:58     INFO -    ASan internal:         fe
04:43:58     INFO -  ==2421==ABORTING
Okay, looking at the log there are two strange things:

1) The harness doesn't show it as an error (it doesn't even show that the process exited).
2) The log is unsymbolized although it should be symbolized.

A wild guess would be that this is not the main process but some sub-process that dies here. But I don't know the code/test well enough. Needinfo from Paul to answer this question :)
Flags: needinfo?(paul)
This is the main process.
Flags: needinfo?(paul)
Here's the interesting part of the symbolized trace:

==2421==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f46b7bbe810 at pc 0x7f46f1bc4ad3 bp 0x7f46b04d1250 sp 0x7f46b04d1248
WRITE of size 4 at 0x7f46b7bbe810 thread T85
    #0 0x7f46f1bc4ad2 in RateTransposerFloat::transposeStereo(float*, float const*, unsigned int) build/media/libsoundtouch/src/RateTransposer.cpp:611
    #1 0x7f46f1bc2c6e in soundtouch::RateTransposer::transpose(float*, float const*, unsigned int) build/media/libsoundtouch/src/RateTransposer.cpp:319
    #2 0x7f46f1bc5f95 in soundtouch::FIFOSamplePipe::moveSamples(soundtouch::FIFOSamplePipe&) build/media/libsoundtouch/src/FIFOSamplePipe.h:88
    #3 0x7f46ee298ab4 in mozilla::BufferedAudioStream::GetTimeStretched(void*, long) build/content/media/AudioStream.cpp:850
    #4 0x7f46ee298e36 in mozilla::BufferedAudioStream::DataCallback(void*, long) build/content/media/AudioStream.cpp:875
    #5 0x7f46f1bcc7a4 in alsa_refill_stream build/media/libcubeb/src/cubeb_alsa.c:306
    #6 0x44ccc3 in ThreadStart _asan_rtl_
    #7 0x7f46fd0c4e99 in start_thread /build/buildd/eglibc-2.15/nptl/pthread_create.c:308
    #8 0x7f46fc1d5dbc in __opensock /build/buildd/eglibc-2.15/socket/../sysdeps/unix/sysv/linux/opensock.c:82
0x7f46b7bbe810 is located 0 bytes to the right of 155664-byte region [0x7f46b7b98800,0x7f46b7bbe810)
freed by thread T1362 (Media Audio) here:
    #0 0x446aa5 in operator delete[] _asan_rtl_
    #1 0x7f46f1bc0208 in soundtouch::FIFOSampleBuffer::ensureCapacity(unsigned int) build/media/libsoundtouch/src/FIFOSampleBuffer.cpp:185
previously allocated by thread T1362 (Media Audio) here:
    #0 0x4467f5 in operator new[] _asan_rtl_
    #1 0x7f46f1bc0095 in soundtouch::FIFOSampleBuffer::ensureCapacity(unsigned int) build/media/libsoundtouch/src/FIFOSampleBuffer.cpp:174
Thread T85 created by T83 (Media Audio) here:
    #0 0x4375c1 in __interceptor_pthread_create _asan_rtl_
    #1 0x7f46f1bcb061 in alsa_init build/media/libcubeb/src/cubeb_alsa.c:692
    #2 0x7f46f1bca87f in cubeb_init build/media/libcubeb/src/cubeb.c:125
    #3 0x7f46ee2e1b18 in mozilla::MediaDecoderStateMachine::AudioLoop() build/content/media/MediaDecoderStateMachine.cpp:1062
Looks potentially exploitable to me.
Group: core-security
Severity: normal → critical
Keywords: crash, sec-critical
Assignee: nobody → paul
https://tbpl.mozilla.org/php/getParsedLog.php?id=30026163&tree=Fx-Team because I'm tired of filing new bugs for all the places you can crash after corrupting memory.
Crash Signature: RateTransposerFloat::transposeStereo(float*, float const*, unsigned int)
Disabled in https://hg.mozilla.org/mozilla-central/rev/3dd1dc64123a, since as bug 931344 comment 10 notes, if you're corrupting memory in your test, the best thing that can be hoped for is random crashes in your test (like, say, bug 916734, bug 918417, bug 920827, bug 923996, bug 928225, bug 929521, bug 930982, bug 932193, bug 840742), while the worst is that a completely unknowable number of failures in other tests in any other test in mochitest-1 are the result of this.
Is Sept 27 when this started happening? Can we take that as a regression range or is the crash too intermittent to trust?
It was failing every 3-10 days, and we started running ASan tests on on-push builds around 10 days before this was filed, so the regression window right now is "between when the test first landed a year ago and early September 27th" though the explosion in the apparently-random crash bugs noted in comment 14 makes early-to-mid-September look rather likely.
(In reply to Phil Ringnalda (:philor) from comment #16)
> It was failing every 3-10 days, and we started running ASan tests on on-push
> builds around 10 days before this was filed, so the regression window right
> now is "between when the test first landed a year ago and early September
> 27th" though the explosion in the apparently-random crash bugs noted in
> comment 14 makes early-to-mid-September look rather likely.

Marking this as affected on 26 and tracking in that case since 26 was on nightly then.
Paul - do you have any ideas here on what our options are for FF26? We're two betas out from release and would consider a low risk backout to known good state if you can find the regressing bug.
Flags: needinfo?(paul)
I've started trying to find the reason for this, without success (yet). I'm hoping to have more luck with more machine (OS/arch) when I'm back at my desk in Paris (I'm coming back from NZ at the moment), because this is not obvious to me, and seems quite hard to repro.
Flags: needinfo?(paul)
Well here's hoping the difficulty to reproduce works to our advantage since we're now missing the ship date for FF26, continuing to track for FF27.
Paul, have you found the reason yet?
Flags: needinfo?(paul)
So, I've been trying to repro this for a couple months, now, no luck. I've tried windows 7, windows 8, various 32bits and 64bits linux, and osx.

I've been carefully re-reading a bunch of code, valgrinded it, ran asan builds under various load scenario, but no luck yet.

I'm pretty sure this is bad rounding from float to integer when calculating in a buffer size somewhere, but I can't find exactly where.

I'm definitely willing to fix this, but it's hard to repro, and I've been lacking the necessary time, lately.
Flags: needinfo?(padenot)
When I run this in my Linux64 ASAN build it crashes quite quickly:
./mach mochitest-plain --debugger=gdb --repeat=1000 content/media/test/test_playback_rate.html

Not sure if this is related to the original problem though.
I can debug this further if someone tells me what to look for.
Mats, this is an unrelated bug. In fact, we should fix it.
(I filed bug 960160 on the unrelated malloc_usable_size() bug.)
We're past the time for speculative fixes on beta, so this is going to be a wontfix there but still tracking for potential forward fix if low-risk enough to take to FF28
Paul - are you working on something for this issue?  We missed shipping in FF27 but there's still some time on 28 beta to try for a low risk fix.
Flags: needinfo?(paul)
Give me time, and I'll try to fix it, but apparently that's not going to happen.
Flags: needinfo?(paul)
I just talked to Paul, and he simply can't repro this. And it's not a good use of his time to try at this point.  

If someone can repro this for us, narrow the range that this was introduced, or get us some useful, actionable info, we can take a look at this again.
Assignee: paul → nobody
Matt - can you do any investigation here into reproducing and finding a regression range before we give up on this for the short term?
Flags: needinfo?(mwobensmith)
I've tried to reproduce this on both a build from the reported date and with two recent builds. The bug itself is light on details except that it happens during the mochitest run on this particular case. So, that's all I've had to work with.

I can't reproduce, and without a reproduction, a regression range is not possible. 

I'd say we give up for now, but I'm curious to know if this still happens at all on tbpl.
Flags: needinfo?(mwobensmith)
We'd have to re-enable it to find out. Someone could push it to Try if they wanted to.
Not sure which suite it'll run in due to chunking, but I'll retrigger the relevant runs once they finish.
https://tbpl.mozilla.org/?tree=Try&rev=80fe3b4cf648
(In reply to Matt Wobensmith from comment #32)
> I'd say we give up for now, but I'm curious to know if this still happens at
> all on tbpl.

Yup, but not without a lot of effort:
https://tbpl.mozilla.org/php/getParsedLog.php?id=34931318&tree=Try

What I also learned in this experiment - this test is horribly assertion-prone on debug builds.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #35)
> Yup, but not without a lot of effort:

As in, 1/200 ASAN runs hit it.
Hm, indeed, I can probably fix this assertion, I seem to recall I even filed the bug and had half a patch written for it.

Anyways, it seems to crash in the linear stereo linear interpolation routine. The error is not obvious, but the stop condition seems pretty error-prone. Considering i is probably equal to the size of the buffer + 1, we could simply do something like an emergency bailout when we reach the end of the buffer. It might not sound nice (tbpl probably does not care), but surely it is better than a crash.

Then maybe we can read the code to find why that happens, and write a real fix.
Regarding the ASan complaint about malloc_usable_size: See bug 960160 comment 1. ASan bug, fixed in Clang 3.4, known and easy work-around.
Fwiw, I ran test_playback_rate.html 310 times in loop without crashing in a local
Linux64 ASAN Opt build (built with Clang 3.4).
Paul, any chance you could fix that for 29? FYI, beta 3 is going to be built today.
Flags: needinfo?(paul)
Flags: needinfo?(paul)
Discussed IRL with Paul. Tagging as wontfix for 29. Too hard to reproduce and only occurs in ASAN.
If people agree on that, we could untrack for the other releases...
Why is "only occurs in ASan" as reason for not tracking? Some critical security bugs only show up in ASan. An OOB write does not necessarily show up as a regular crash, but can still be exploited in many cases.

If the test is disabled right now for some reason, then we should re-enable it and see if it's still happening.
Like the Try run I did about a month ago in comment 35? Do we have reason things have changed since?
This would be good to retry as soon as bug 919215 lands.  That and other bugs have refactored (and added locks) to this code.
Depends on: 919215
Also because we have landed an update to soundtouch.
Given my and Paul's comments, re-trying
https://tbpl.mozilla.org/?tree=Try&rev=2f359913b65c
annoying parser
https://tbpl.mozilla.org/?tree=Try&rev=5ee6e8a11a7e

The test seems to be perma-orange now, which may need resolving before we can verify things:
 INFO TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback_rate.html | We are effectively slowing down playback. (0.505, 0.300104) for bug495794.ogg-6
Flags: needinfo?(paul)
If I had to guess from the code (pre-update) and the ASAN stack, I'd guess that something grabbed ptrBegin() (raw ptr to the output buffer) before a call to ensureCapacity() cause it to be reallocated.

ensureCapacity() (old rev) gets called from putSamples(int) or ptrEnd().  ptrEnd returns a raw ptr to the buffer, but also uses ensureCapacity to guarantee there are 'slackCapacity' bytes available.  Unless that value was 0, ptrEnd() on it's own couldn't be the problem.
Also, this was a write, so most likely it was an output pointer, which would point towards ptrEnd() (which is used for writing generally); perhaps it wrote "just enough" to cause ensureCapacity() to reallocate.
With some fixes to get test_playback_rate to run:
https://tbpl.mozilla.org/?tree=Try&rev=fb8e57d0cf08

And we still have an ASAN hit there:
https://tbpl.mozilla.org/php/getParsedLog.php?id=38224491&tree=Try&full=1#error0

I'll attach the ASAN hit.  It does look like ptrEnd() is involved
Posted file asan_hit
Attachment #8410298 - Flags: review?(paul)
Attachment #8410298 - Flags: review?(paul) → review+
audited the use of the Clock and TimeStretcher; found another case and cleaned up the code
Attachment #8410298 - Attachment is obsolete: true
Attachment #8410314 - Flags: review?(paul)
Attachment #8410314 - Flags: review?(paul) → review+
Flags: needinfo?(paul)
Comment on attachment 8410314 [details] [diff] [review]
Lock before changing RateTransposer

Requesting approval for all affected active branches.

[Security approval request comment]
How easily could an exploit be constructed based on the patch?  Not trvially, but it is a UAF.  If you constantly change the rate it would happen far faster.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  The comments definitely paste a bullseye on it.  The code itself will be rather more obtuse, but someone could infer the general problem if they're watching for apparent sec-patches and especially for lock changes.

Which older supported branches are affected by this flaw? 22 and on.

If not all supported branches, which bug introduced the flaw? bug 495040 and probably bug 852401 (FF 22) which turned off sydney audio.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  Backports should be simple since this code hasn't really changed, and we just need to grab the lock.

How likely is this patch to cause regressions; how much testing does it need?  Little chance of regressions (only type of regression from this would be a deadlock, but that appears impossible).

We're may want to crash-land this across as many revs as we can (with comments removed or minimized), and perhaps do a ESR 24 release.  The open question is how exploitable is it, and how much data can leak.  It is a UAF, but the data probably doesn't affect the code flow if it grabs the wrong data when reading - but it writes as well, and we have ASAN stacks showing it writing to memory outside the buffer (or likely freed memory).  It also may allow some private data to escape, but the data would be subject to possibly-reversible transforms.

We may want to at least consider a 28 chemspill, though with 29 going out in a few days it may not make sense.
Attachment #8410314 - Flags: sec-approval?
Attachment #8410314 - Flags: approval-mozilla-esr24?
Attachment #8410314 - Flags: approval-mozilla-beta?
Attachment #8410314 - Flags: approval-mozilla-b2g28?
Attachment #8410314 - Flags: approval-mozilla-b2g26?
Attachment #8410314 - Flags: approval-mozilla-aurora?
This does not look like a chemspill worthy fix.  It's been in the code since FF22 without known exploit, and it's not trivial to construct one.  

We've already gone to build on the FF29 RC which will get testing on the beta audience.  If we did a build 2 for this, we lose that testing and given how big a deal the FF29 release is - we need that testing.  I'm not leaning towards losing it for a bug that has been wontfixed this many times.

If we can hold out, land this in the last weeks of FF30 across all branches - unless there was an exploit that demanded a chemspill based on actual evidence this had been found - we can make sure FF29 gets the chance to be a stable release.
This was discussed with release management. While it is old, it isn't a chemspill because we don't have a zero day. We just built the Firefox 29 release candidate so this is very late to try to take it and would cause a rebuild. 

We want to land this in the middle of the next release cycle, on May 20. This will give it enough bake time without a large window of exposure. I'm marking it for sec-approval but please don't land before then.
Whiteboard: [checkin 5/20/14]
Comment on attachment 8410314 [details] [diff] [review]
Lock before changing RateTransposer

I'm minusing this for ESR24 based on the "disabled" flag there. If this isn't disabled on ESR24, we can revisit.
Attachment #8410314 - Flags: sec-approval?
Attachment #8410314 - Flags: sec-approval+
Attachment #8410314 - Flags: approval-mozilla-esr24?
Attachment #8410314 - Flags: approval-mozilla-esr24-
Ok. I'm trying to see if there's a way to make it happen a lot, and how dangerous that looks.
See also bug 902328 (related); I have an expanded patch cleaning this up more.  I've been unable to find a way to get it to fail easily or locally, which is good. :-)
Comment on attachment 8412182 [details] [diff] [review]
Force it to run test_playback_rate a lot NOT FOR CHECKIN

This test will cycle enough on the testcase to get a hit in around 1/10 runs (and a higher chance of timeout, since we're running near the edge of total time allowed).  A lot better than 1/100 or 1/200 :-)

https://tbpl.mozilla.org/?tree=Try&rev=6f916a996524 has two hits in 21
https://tbpl.mozilla.org/?tree=Try&rev=724d8f12331c with the latest fix patch has no hits in about 40 runs.
Posted patch all.patch (obsolete) — Splinter Review
Updated patch used for verification (cleaned up Internal/Unlocked usage).  Working on Aurora 30 version.
Comment on attachment 8412232 [details] [diff] [review]
all.patch

Re-review of now-tested patch.

Cleans up the locking on the AudioClock functions as well.
Attachment #8412232 - Flags: review?(paul)
Comment on attachment 8412232 [details] [diff] [review]
all.patch

Review of attachment 8412232 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: content/media/AudioStream.cpp
@@ +227,5 @@
>  {
>    NS_ASSERTION(aPlaybackRate > 0.0,
>                 "Can't handle negative or null playbackrate in the AudioStream.");
>    // Avoid instantiating the resampler if we are not changing the playback rate.
> +  // GetPreservesPitch/SetPreservesPitch don't need locking

This comments seems off, since you are explicitly locking in SetPreservesPitch.

@@ +1070,1 @@
>  {

mAudioStream->mMonitor.AssertCurrentThreadOwns();

@@ +1107,1 @@
>  {

mAudioStream->mMonitor.AssertCurrentThreadOwns();
Attachment #8412232 - Flags: review?(paul) → review+
Group: media-core-security
Attachment #8410314 - Flags: approval-mozilla-b2g28?
Attachment #8410314 - Flags: approval-mozilla-b2g26?
Assignee: nobody → rjesup
Comment on attachment 8410314 [details] [diff] [review]
Lock before changing RateTransposer

Approving for landing this week.
Attachment #8410314 - Flags: approval-mozilla-beta?
Attachment #8410314 - Flags: approval-mozilla-beta+
Attachment #8410314 - Flags: approval-mozilla-aurora?
Attachment #8410314 - Flags: approval-mozilla-aurora+
Randell, this is okay to land on inbound now.
Flags: needinfo?(rjesup)
(In reply to Paul Adenot (:padenot) from comment #64)

> @@ +1070,1 @@
> >  {
> 
> mAudioStream->mMonitor.AssertCurrentThreadOwns();
> 
> @@ +1107,1 @@
> >  {
> 
> mAudioStream->mMonitor.AssertCurrentThreadOwns();

a) monitor is not accessible there; I'd need to make it protected
b) these are both followed by a ...Unlocked() call that has a monitor assert as the first statement.  Added a comment instead
Attachment #8410314 - Attachment is obsolete: true
For inbound and aurora
Attachment #8426069 - Attachment is obsolete: true
Attachment #8426069 - Attachment is obsolete: false
Attachment #8426069 - Flags: review+
Flags: needinfo?(rjesup)
Attachment #8426070 - Flags: review+
Attachment #8412232 - Attachment is obsolete: true
Attachment #8426069 - Attachment description: AudioStream rework → AudioStream rework for Beta
Note: needs aurora and beta checkins as well
There's aurora and beta approvals in comment 65, for a version of the patch that was obsoleted.  I'm needinfo'ing Ryan in case that causes it to not show up in his queries.
Flags: needinfo?(ryanvm)
Shouldn't be an issue. I go off the status flags and bug resolution only.
Flags: needinfo?(ryanvm)
https://hg.mozilla.org/mozilla-central/rev/21d19c772468
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
On beta, m1 is toasty, on a test that's been removed from nightly (and probably Aurora): test_audiowrite.html

Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at /builds/slave/m-beta-l64-d-00000000000000000/build/nsprpub/pr/src/pthreads/ptsynch.c:226
15:10:52     INFO -  Assertion failure: lock->locked && pthread_equal(lock->owner, pthread_self()), at /builds/slave/m-beta-l64-d-00000000000000000/build/nsprpub/pr/src/pthreads/ptsynch.c:226
Flags: needinfo?(rjesup)
On b2g28, there are also crashes in test_autoplay.html
Comment on attachment 8426726 [details] [diff] [review]
Extend fix to AudioData-only function

This is a one-line fix (plus a function rename to match the others -- XxxUnlocked() means you already hold the monitor (ok, it's weird).

This is used by the AudioData API which is gone in later builds.  This just extends the existing fix in this bug to cover this function as well.
Attachment #8426726 - Flags: review?(n.nethercote)
I'd previously noted that it appears this is broken since cubeb replaced sydneyaudio in 22; so esr24 should be affected.  However, it's hard to be 100% sure without going back and testing/examining the code.
Comment on attachment 8426726 [details] [diff] [review]
Extend fix to AudioData-only function

-> padenot; we'll back out of beta for tonight
Attachment #8426726 - Flags: review?(n.nethercote) → review?(paul)
Note: the function patched on beta is dead code on nightly (and almost certainly aurora); I'll followup with a patch to remove it, so there's no need to fix on nightly.
Attachment #8426726 - Flags: review?(paul) → review+
If this can land cleanly to ESR24 we might as well take it.
needinfo? on the remaining branch patch(es)
Flags: needinfo?(rjesup)
I looked at ESR24.  The patch has no chance of applying there; even the monitors we're using don't exist.  Given the large difference and lifetime of ESR24, padenot and I both suggest WONTFIX
Flags: needinfo?(rjesup)
Group: media-core-security
Whiteboard: [adv-main30+][adv-esr24.6+]
I ran the tests from comment 61 many times, but still no crash for me. I've never been able to reproduce this locally, so I'm marking [qa-] for posterity.
Whiteboard: [adv-main30+][adv-esr24.6+] → [adv-main30+][adv-esr24.6+][qa-]
Group: core-security
Duplicate of this bug: 932193
You need to log in before you can comment on or make changes to this bug.