Closed
Bug 816994
Opened 12 years ago
Closed 12 years ago
Opus crash due to race [@opus_multistream_decode_native]
Categories
(Core :: Audio/Video, defect)
Tracking
()
VERIFIED
FIXED
mozilla20
People
(Reporter: posidron, Assigned: rillian)
References
Details
(4 keywords, Whiteboard: [adv-main18+][adv-esr17+])
Attachments
(6 files, 3 obsolete files)
4.91 KB,
text/plain
|
Details | |
81.25 KB,
text/plain
|
Details | |
6.16 KB,
audio/ogg
|
Details | |
8.26 KB,
text/plain
|
Details | |
1.84 KB,
patch
|
rillian
:
review+
bajaj
:
approval-mozilla-aurora+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
1.85 KB,
patch
|
bajaj
:
approval-mozilla-beta+
akeybl
:
approval-mozilla-esr17+
|
Details | Diff | Splinter Review |
I am guessing that this crash is triggered by a race condition inside the media decoder state machine. To reproduce reload the testcase with cmd+r or f5 a few times. The crashing location will vary on each execution.
Tested with changeset: 114362:c63d5cff18ba
Reporter | ||
Comment 1•12 years ago
|
||
Reporter | ||
Comment 2•12 years ago
|
||
Reporter | ||
Comment 3•12 years ago
|
||
Marked as s-s because I have seen ASan reporting heap-buffer-overflow signatures as well.
Assignee | ||
Comment 4•12 years ago
|
||
I ran the testcase through rogg_crcfix so it fails without needing to patch out the libogg crc verification.
Attachment #687099 -
Attachment is obsolete: true
Comment 6•12 years ago
|
||
I tried to reproduce this last week and failed, so assigning to rillian, who says he has managed to reproduce.
Assignee: nobody → giles
Assignee | ||
Comment 7•12 years ago
|
||
I can reproduce this easily with a normal build, no asan required, but I've not made much progress tracking this down. Here's another backtrace, from a segfault inside the garbage collector, in case that's helpful.
Any ideas how to pursue this?
Comment 8•12 years ago
|
||
Although this could be a more recent regression, for now I'm going to assume it's been there in Opus all along until we know better.
status-firefox-esr10:
--- → unaffected
status-firefox17:
--- → wontfix
status-firefox18:
--- → affected
status-firefox19:
--- → affected
status-firefox20:
--- → affected
status-firefox-esr17:
--- → affected
tracking-firefox20:
--- → +
Assignee | ||
Comment 9•12 years ago
|
||
That will speed up my bisection. Thanks!
Assignee | ||
Comment 10•12 years ago
|
||
Oh, wait. Did you actually verify the problem, or just guess?
Assignee | ||
Comment 11•12 years ago
|
||
I did a besection, just testing a normal build with the crc-fixed testcase, and hitting reload 10-20 times to see if I could get it to segfault. The first bad commit was:
commit d682b69bcedcbbf041e9a2ba67bed073736217da (git)
Author: Alexandros Chronopoulos <achronop@gmail.com>
Date: Thu Aug 9 15:53:23 2012 -0700
Bug 748144 - Support multichannel Opus files. r=rillian
Which looks more helpful than I was expecting.
Assignee | ||
Comment 12•12 years ago
|
||
Another variation of the crash:
-919025920[7fff9c009dc0]: 2e300d0 Decoder playing 960 frames of data to stream for AudioData at 396000
Assertion failure: addr % CellSize == 0, at /home/giles/mozilla/firefox/js/src/gc/Heap.h:819
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4922d45 in js::gc::Cell::address (this=0x7fff7800ada2)
at /home/giles/mozilla/firefox/js/src/gc/Heap.h:819
819 JS_ASSERT(addr % CellSize == 0);
Missing separate debuginfos, use: debuginfo-install alsa-plugins-pulseaudio-1.0.26-1.fc17.x86_64 gsm-1.0.13-6.fc17.x86_64 json-c-0.9-4.fc17.x86_64
(gdb) bt
#0 0x00007ffff4922d45 in js::gc::Cell::address (this=0x7fff7800ada2)
at /home/giles/mozilla/firefox/js/src/gc/Heap.h:819
#1 0x00007ffff4922daa in js::gc::Cell::arenaHeader (this=0x7fff7800ada2)
at /home/giles/mozilla/firefox/js/src/gc/Heap.h:935
#2 0x00007ffff4924c1c in js::gc::Cell::compartment (this=0x7fff7800ada2)
at /home/giles/mozilla/firefox/js/src/gc/Heap.h:980
#3 0x00007ffff4c6a8dd in CheckMarkedThing<JSString> (trc=0x7fffffff5050,
thing=0x7fff7800ada2)
at /home/giles/mozilla/firefox/js/src/gc/Marking.cpp:106
#4 0x00007ffff4c68da8 in MarkInternal<JSString> (trc=0x7fffffff5050, thingp=
0x7fffcaa17268) at /home/giles/mozilla/firefox/js/src/gc/Marking.cpp:145
#5 0x00007ffff4c68365 in MarkUnbarriered<JSString> (trc=0x7fffffff5050,
thingp=0x7fffcaa17268, name=0x7ffff56e0ce7 "left child")
at /home/giles/mozilla/firefox/js/src/gc/Marking.cpp:175
#6 0x00007ffff4c62b19 in js::gc::MarkStringUnbarriered (trc=0x7fffffff5050,
thingp=0x7fffcaa17268, name=0x7ffff56e0ce7 "left child")
at /home/giles/mozilla/firefox/js/src/gc/Marking.cpp:317
...
Assignee | ||
Comment 13•12 years ago
|
||
Stack pointer seems to be getting corrupted somehow. When it crashes in the opus code, it's pretty commonly here:
614 ptr = (char*)st + align(sizeof(OpusMSDecoder));
0x00007ffff43c68cd <+111>: mov $0x10c,%edi
=> 0x00007ffff43c68d2 <+116>: callq 0x7ffff43c53e8 <align>
and in some instances rsp points to unmapped memory, so the callq segfaults trying to push rip.
Comment 14•12 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #13)
> Stack pointer seems to be getting corrupted somehow. When it crashes in the
> opus code, it's pretty commonly here:
Okay, that was enough to start making some guesses. media/libopus/src/opus_multistream.c:613 reads
ALLOC(buf, 2*frame_size, opus_val16)
frame_size is documented to have a maximum of 63*2880 in nsOggReader::DecodeOpus() (and this documentation is correct). 2*63*2880*sizeof(opus_val16) (which in this case is float, since we're not using a fixed-point build) is 1,451,520 bytes, or 1.38 MB. Our media thread stacks are 128 kB. So that's a problem.
This is obviously ridiculous: 63*2880 is 3.78 seconds of audio, and the maximum valid Opus packet size is 120 ms (giving a buffer size of 2*5760*4==45 kB here). It will get rejected by libopus later, but the damage to the stack has already been done.
The simple fix is to reject the packet earlier. The best place is before we allocate the buffer for the frame.
Attachment #690665 -
Flags: review?(giles)
Comment 15•12 years ago
|
||
Greenish on try: https://tbpl.mozilla.org/?tree=Try&rev=9840e1d7fab9
Assignee | ||
Comment 16•12 years ago
|
||
Comment on attachment 690665 [details] [diff] [review]
Don't try to decode Opus packets larger than 120 ms
Review of attachment 690665 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks for your help with this, Tim. I'd seen the allocation was large, but didn't think about the thread's stack size being the problem.
Looks good to me, and I can't make it crash with this patch. Some comments below.
::: content/media/ogg/OggReader.cpp
@@ +424,5 @@
> int32_t samples = opus_packet_get_samples_per_frame(aPacket->packet,
> (opus_int32) mOpusState->mRate);
> int32_t frames = frames_number*samples;
>
> + // The maximum valid value is 5760 (120 ms).
I would be be more verbose here, e.g.
// Valid Opus packets must be no more than 120 ms,
// so reject any reporting more than 5760 samples.
@@ +425,5 @@
> (opus_int32) mOpusState->mRate);
> int32_t frames = frames_number*samples;
>
> + // The maximum valid value is 5760 (120 ms).
> + if (frames <= 0 || frames > 5760)
The <= zero check is an important guard for the subsequent allocation, and detects error returns from opus_packet_get_nb_frames, but I suppose we could tighten this to the minimum as well?
The minimum opus frame size is 2.5 ms, or 120 samples, so we could check for (frames < 120) instead.
Currently opus_packet_get_samples_per_frame() will never return a negative, so checking for frames < 0 is sufficient to detect a failure. But, get_nb_frames only returns an error if the buffer is short, in which case the subsequent get_samples_per_frame call could segfault reading the toc byte for the packetS o really we should check right immediately afterward.
Attachment #690665 -
Flags: review?(giles) → review+
Assignee | ||
Comment 17•12 years ago
|
||
Christoph, can you please confirm this fixes your original testcase?
Assignee | ||
Comment 18•12 years ago
|
||
I believe I understand the crash in the opus code now, but I don't understand how this was causing the similar segfaults in js::gc. The jit does fancy things with the stack frames, but I would have thought the context switch in and out of the decode thread would just save and restore the invalid stack pointer without affecting other code.
Is it just that we managed to write into someone else's stack frame with our bogus pointer?
Assignee | ||
Comment 19•12 years ago
|
||
Validated test case file on Fedora Linux 17 x86_64 with official builds:
firefox-19.0a2: crashed
firefox-18.0b3: crashed
firefox-17.0.1: crashed
firefox-16.0.2: can't reproduce
Which is as expected. The call into the code with the stack overflow was added in ff17.
dveditz, why is firefox-17 wontfix? If we're indeed writing to the ionmonkey stack, couldn't this be exploitable?
OS: Mac OS X → All
Reporter | ||
Comment 20•12 years ago
|
||
Have tested again with the applied patch but still crashes.
Assignee | ||
Comment 21•12 years ago
|
||
(In reply to Christoph Diehl [:cdiehl] from comment #20)
> Have tested again with the applied patch but still crashes.
Well, darn. We should still land Tim's fix though; that's a real bug.
Reporter | ||
Comment 22•12 years ago
|
||
I am terrible sorry, I thought doing 'make obj/content/media' was enough.
The patch does fix the problem.
Assignee | ||
Comment 23•12 years ago
|
||
whew! thanks for retesting.
Assignee | ||
Comment 24•12 years ago
|
||
Tim, here's a patch implementing my comments from #16. Let me know if you like it better.
Attachment #691018 -
Flags: review?(tterribe)
Comment 25•12 years ago
|
||
Comment on attachment 691018 [details] [diff] [review]
Reject packets with invalid lengths
Review of attachment 691018 [details] [diff] [review]:
-----------------------------------------------------------------
(In reply to Ralph Giles (:rillian) from comment #18)
> Is it just that we managed to write into someone else's stack frame with our
> bogus pointer?
That seems the most likely to me.
(In reply to Ralph Giles (:rillian) from comment #19)
> dveditz, why is firefox-17 wontfix? If we're indeed writing to the ionmonkey
> stack, couldn't this be exploitable?
I think he just means that this should land on the ESR branch (once we request and get approval).
Anyway, patch LGTM.
::: content/media/ogg/OggReader.cpp
@@ +427,5 @@
> (opus_int32) mOpusState->mRate);
> int32_t frames = frames_number*samples;
>
> + // Valid Opus packets must be no shorter than 2.5 ms
> + // and no longer than 120 ms. Reject any reporting
It seems kind of odd to call out one case when we reject in two different cases.
Attachment #691018 -
Flags: review?(tterribe) → review+
Comment 26•12 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #18)
> Currently opus_packet_get_samples_per_frame() will never return a negative,
> so checking for frames < 0 is sufficient to detect a failure. But,
> get_nb_frames only returns an error if the buffer is short, in which case
> the subsequent get_samples_per_frame call could segfault reading the toc
> byte for the packetS o really we should check right immediately afterward.
Good catch, BTW. I've known this in the half-dozen times I implemented this function (including GetOpusDeltaGP() in OggCodecState.cpp!), but managed to miss that here.
Assignee | ||
Comment 27•12 years ago
|
||
Comment on attachment 690665 [details] [diff] [review]
Don't try to decode Opus packets larger than 120 ms
Marking obsolete in favour of the newer patch.
Attachment #690665 -
Attachment is obsolete: true
Assignee | ||
Comment 28•12 years ago
|
||
(In reply to Timothy B. Terriberry (:derf) from comment #25)
> > + // Valid Opus packets must be no shorter than 2.5 ms
> > + // and no longer than 120 ms. Reject any reporting
>
> It seems kind of odd to call out one case when we reject in two different
> cases.
I don't understand what you mean here. I tried to describe both bounds checks.
Assignee | ||
Comment 29•12 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #28)
> I don't understand what you mean here. I tried to describe both bounds
> checks.
Oh, you meant the next line. Sorry. How about:
// Valid Opus packets must be no shorter than 2.5 ms
// and no longer than 120 ms. Reject any reporting
// a duration outside that range in samples.
Or is it clear enough without the second sentence?
Comment 30•12 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #29)
> Or is it clear enough without the second sentence?
Either way is fine.
Assignee | ||
Comment 31•12 years ago
|
||
Assignee | ||
Comment 32•12 years ago
|
||
Updated patch with improved comments. Carrying forward previous r+.
Attachment #691018 -
Attachment is obsolete: true
Attachment #691399 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 33•12 years ago
|
||
Before this lands, be sure to ask for sec-approval on the patch and, ideally, I think we'd want branch patches for affected versions depending on the risk involved in taking it.
Assignee | ||
Comment 35•12 years ago
|
||
Comment on attachment 691399 [details] [diff] [review]
Reject packets with invalid lengths
[Security approval request comment]
> How easily can the security issue be deduced from the patch?
The patch doesn't refer to the stack overflow, it just describes rejecting packets with invalid header data. There's also a reference to a potential--but non-exploitable--heap segfault fix which could mislead an attacker.
> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
We are limiting the return values we use from a fairly simple function, so someone digging deeper could see what part of the data to play with mutating. I don't think that qualifies as a bulls-eye though; the stack overflow happens away from the site of this change.
> Which older supported branches are affected by this flaw?
Supported branches back to Firefox 17 are affected, when this code initially landed.
> If not all supported branches, which bug introduced the flaw?
Bug 748144.
> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
I can post backports. They're straightforward; the code itself hasn't changed, but the enclosing class and file names have, so the patch must be adapted to that.
> How likely is this patch to cause regressions; how much testing does it need?
Risk of causing other problems is low. The patch simply rejects invalid data earlier in the parsing sequence.
To verify, check that the testcase from this bug does not crash the browser, and that the content/media/test/test_playback.html mochitest still passes.
Attachment #691399 -
Flags: sec-approval?
Updated•12 years ago
|
Attachment #691399 -
Flags: sec-approval? → sec-approval+
Comment 36•12 years ago
|
||
I'm giving this sec-approval+ for central. We should prepare branch patches and nominate them. I'm cc'ing release management so they can decide when they want to take this on what branches.
Assignee | ||
Comment 37•12 years ago
|
||
Comment on attachment 691399 [details] [diff] [review]
Reject packets with invalid lengths
Patch applies unmodified to the Aurora tree.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 748144.
User impact if declined:
Users will be exposed to a security issue: crashes possibly exploitable by malicious web content.
Testing completed (on m-c, etc.):
Passes try push on m-c. Verified patch fixes the crash on aurora.
Risk to taking this patch (and alternatives if risky):
Risk is low. The patch merely rejects invalid data earlier in the parsing sequence.
String or UUID changes made by this patch:
No changes, only interior code is affected.
Attachment #691399 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 38•12 years ago
|
||
Backported to Firefox 18 (current beta tree). This is the same code change, just updated for enclosing class and filename changes.
[Approval Request Comment]
> Bug caused by (feature/regressing bug #):
Bug 748144.
> User impact if declined:
Users will be exposed to a security issue: crashes possibly exploitable by malicious web content.
> Testing completed (on m-c, etc.):
Passes try push on m-c. Verified patch fixes the crash on beta.
> Risk to taking this patch (and alternatives if risky):
Risk is low. The patch merely rejects invalid data earlier in the parsing sequence.
> String or UUID changes made by this patch:
No changes, only interior code is affected.
Attachment #692455 -
Flags: approval-mozilla-beta?
Comment 39•12 years ago
|
||
Approving patch for aurora/beta as it is low risk.Juan, can you please help with some testing here around opus/ogg to verify this patch ?
Updated•12 years ago
|
QA Contact: jbecerra
Updated•12 years ago
|
Attachment #691399 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•12 years ago
|
Attachment #692455 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•12 years ago
|
Updated•12 years ago
|
Assignee | ||
Comment 40•12 years ago
|
||
Thanks bajaj, Juan.
Assignee | ||
Comment 41•12 years ago
|
||
Comment on attachment 692455 [details] [diff] [review]
backport to earlier releases
This bug also affects Firefox 17. The ff18 patch applies to the esr17 branch and I've verified it resolves the issue and that the content/media mochitests are unaffected in my local testing.
[Approval Request Comment]
> If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This is a sec-critical bug.
> User impact if declined:
Users will be exposed to crashes possibly exploitable by malicious web content.
> Fix Landed on Version:
We have approval for m-c, aurora, and beta. Waiting for coordination with release management before landing anywhere.
> Risk to taking this patch (and alternatives if risky):
Risk is low. The patch merely rejects invalid data earlier in the parsing sequence.
> String or UUID changes made by this patch:
No changes, only interior code is affected.
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #692455 -
Flags: approval-mozilla-esr17?
Comment 42•12 years ago
|
||
Comment on attachment 692455 [details] [diff] [review]
backport to earlier releases
Feel free to land to all branches given your testing. QA will continue to test opus/ogg in parallel to beta 5, but we don't need to block the landing on this testing.
Attachment #692455 -
Flags: approval-mozilla-esr17? → approval-mozilla-esr17+
Comment 43•12 years ago
|
||
I've spoken with Alex and Ralph about this bug and the feeling is that the risk to uplifting this to ESR is low. My suggestion is that QA coordinates exploratory testing around Opus in Beta5. If we find any issues, we can always backout of ESR and Beta at that time. However that remains a very low risk.
Comment 44•12 years ago
|
||
Taking QA Contact so I can coordinate some Opus testing.
QA Contact: jbecerra → anthony.s.hughes
Reporter | ||
Comment 45•12 years ago
|
||
Btw.
It runs and will run also as a fuzzing job on my private and our future cloud servers.
Comment 47•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/746f31757355
Once this hits m-c, it can be uplifted to the release branches.
Flags: in-testsuite?
Keywords: checkin-needed
Assignee | ||
Comment 48•12 years ago
|
||
Since the crash isn't reliable I don't think I can make a useful testsuite entry for it. The steps to reproduce in the description are the best way to verify the bug.
Flags: in-testsuite? → in-testsuite-
Comment 49•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Comment 50•12 years ago
|
||
Looks like this got approval for Beta but didn't land in time. Can we make sure it gets landed for Beta 6?
Assignee | ||
Comment 51•12 years ago
|
||
I think we were waiting for inbound to cycle into nightly. What's the cut-off date for Beta 6?
Assignee | ||
Comment 52•12 years ago
|
||
After consultation with bajaj, the plan is to review this for beta6 in tomorrow's triage, and then land or not based on the decision there.
I'll go ahead and push to aurora.
Assignee | ||
Comment 53•12 years ago
|
||
Updated•12 years ago
|
Comment 54•12 years ago
|
||
Dropping qawanted since there's nothing for us to investigate. Will continue to track for verification.
Keywords: qawanted
Assignee | ||
Comment 55•12 years ago
|
||
Assignee | ||
Comment 56•12 years ago
|
||
tracking-firefox18:
+ → ---
tracking-firefox19:
+ → ---
tracking-firefox-esr17:
? → ---
Target Milestone: mozilla20 → ---
Assignee | ||
Comment 57•12 years ago
|
||
I...have no idea how I messed up the flags on this bug. I thought I just set status-firefox-esr17 to fixed.
I've put back what I can, but I don't seem to have an option to set the tracking flags for ff18 and ff19 to '+'.
tracking-firefox18:
--- → ?
tracking-firefox19:
--- → ?
tracking-firefox-esr17:
--- → ?
Target Milestone: --- → mozilla20
Updated•12 years ago
|
Comment 58•12 years ago
|
||
(In reply to Christoph Diehl [:cdiehl] from comment #0)
> Created attachment 687099 [details]
> testcase
Testcase failed to crash the following builds:
* Firefox Nightly 20.0a1 2012-12-21
* Firefox Aurora 19.0a2 2012-12-21
* Firefox 17.0.1esrpre 2012-12-21
Will need to wait for 18.0b6 builds next week to verify for Fx18.
Status: RESOLVED → VERIFIED
status-b2g18:
--- → fixed
Updated•12 years ago
|
Whiteboard: [adv-main18+][adv-esr17+]
Updated•12 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•