Closed Bug 927579 Opened 7 years ago Closed 7 years ago

crash [@ mlp_process] [@ tansig_approx]

Categories

(Core :: Audio/Video, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox27 --- wontfix
firefox28 --- fixed
firefox-esr24 --- unaffected
b2g-v1.1hd --- unaffected
b2g-v1.2 --- wontfix
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- unaffected

People

(Reporter: jruderman, Assigned: padenot)

References

Details

(Keywords: crash, sec-low, testcase, Whiteboard: [adv-main28+])

Attachments

(3 files)

No description provided.
Group: core-security
Attached file stack (gdb)
Attached file stack (ASan)
Passing this to MediaRecorder, libopus.
Please return to Web Audio if there is a problem with the output from the MediaStreamDestinationNode.
Component: Web Audio → Video/Audio
Rob -- As Karl says, this looks to be a bug in MediaRecorder (not Web Audio).  Since this is a sec bug, can you have a look and either take this bug yourself or find the right owner?  (Feel free to pass it back if the root cause problem is with Web Audio.)  Thanks.
Assignee: nobody → roc
Summary: WebAudio crash [@ mlp_process] [@ tansig_approx] → crash [@ mlp_process] [@ tansig_approx]
Linux platform can reproduce this one. Same crash point.
Assignee: roc → rlin
I checked the AudioChunk and found if the ac.createDelay with non-zero parameter, 
The NotifyQueuedTrackChanges in encoder would get a chunk with Duration = 2. 
createDelay(0) doesn't have this.
Is this behavior normal?
Hi Karlt, 
I try to check the AudioChunk from media stream and found all chunk are with the duration = 2 and mChannelData size = 2 on 44100 hz, is it normal? If yes, we may have somebody to fix crash problem in libopus.
Flags: needinfo?(karlt)
Hi Timothy, 
Could you help to check this libopus crash?
#0  0x00007ffff1f36aab in tansig_approx (x=-nan(0x400000))
    at /media/randy-lin/aa890c59-befb-4bc6-9138-def8ebeea61c/gecko_c2/media/libopus/src/mlp.c:81
#1  0x00007ffff1f36c0f in mlp_process (m=0x7ffff5244610 <net>, in=0x7fffbb39dd60, out=0x7fffbb39dc50)
    at /media/randy-lin/aa890c59-befb-4bc6-9138-def8ebeea61c/gecko_c2/media/libopus/src/mlp.c:100
#2  0x00007ffff1f35bd2 in tonality_analysis (tonal=0x7fffb74d6ff8, info_out=0x0, celt_mode=0x7ffff5244460 <mode48000_960_120>, 
    x=0x7fffbb39dfc0, len=480, offset=480, C=2, lsb_depth=24, downmix=0x7ffff1f1ce81 <downmix_float>)
Flags: needinfo?(tterribe)
backout this Bug 922247 - When encoding to Opus, resample the input to 48kHz if its sample rate is not suitable (edit) and it it works well....
Flags: needinfo?(karlt)
Paul, can you look at this given the patch for bug 922247 is implicated?  Thanks
Flags: needinfo?(paul)
I am looking at it as we speak.
Assignee: rlin → paul
Flags: needinfo?(paul)
The problem seems to be that NaNs were being fed to the opus encoder, eventually causing a crash in tansig_approx(). The bug is now fixed in git. See: http://git.xiph.org/?p=opus.git;a=commitdiff;h=d6b5679

The crash was due to an out-of-bound read, so it's unlikely to be exploitable for anything worse than a crash. Also, it's probably a good idea to understand (and ideally fix) why the encoder was being fed NaNs in the first place.
(In reply to Jean-Marc Valin (:jmspeex) from comment #13)
> The problem seems to be that NaNs were being fed to the opus encoder,
> eventually causing a crash in tansig_approx(). The bug is now fixed in git.
> See: http://git.xiph.org/?p=opus.git;a=commitdiff;h=d6b5679

Paul, do you want to make a patch back-porting this fix and assign me as the reviewer?

> The crash was due to an out-of-bound read, so it's unlikely to be
> exploitable for anything worse than a crash. Also, it's probably a good idea
> to understand (and ideally fix) why the encoder was being fed NaNs in the
> first place.

I assume Paul's still looking at that, but yeah, libopus shouldn't crash if it's fed NaNs, either.
Flags: needinfo?(tterribe)
(In reply to Randy Lin [:rlin] from comment #7)
> I try to check the AudioChunk from media stream and found all chunk are with
> the duration = 2 and mChannelData size = 2 on 44100 hz, is it normal? If
> yes, we may have somebody to fix crash problem in libopus.

If an AudioChunk directly from the DelayNodeEngine has duration = 2, then there is a bug there, but I assume this duration is from the resampler in the encoder.  Other streams may have duration = 2, I assume.
(In reply to Timothy B. Terriberry (:derf) from comment #14)
> (In reply to Jean-Marc Valin (:jmspeex) from comment #13)
> > The problem seems to be that NaNs were being fed to the opus encoder,
> > eventually causing a crash in tansig_approx(). The bug is now fixed in git.
> > See: http://git.xiph.org/?p=opus.git;a=commitdiff;h=d6b5679
> 
> Paul, do you want to make a patch back-porting this fix and assign me as the
> reviewer?

Yes, I'll do that.
 
> > The crash was due to an out-of-bound read, so it's unlikely to be
> > exploitable for anything worse than a crash. Also, it's probably a good idea
> > to understand (and ideally fix) why the encoder was being fed NaNs in the
> > first place.
> 
> I assume Paul's still looking at that, but yeah, libopus shouldn't crash if
> it's fed NaNs, either.

Of course, there is clearly a bug on the gecko size, here.
(In reply to Karl Tomlinson (:karlt) from comment #15)
> (In reply to Randy Lin [:rlin] from comment #7)
> > I try to check the AudioChunk from media stream and found all chunk are with
> > the duration = 2 and mChannelData size = 2 on 44100 hz, is it normal? If
> > yes, we may have somebody to fix crash problem in libopus.
> 
> If an AudioChunk directly from the DelayNodeEngine has duration = 2, then
> there is a bug there, but I assume this duration is from the resampler in
> the encoder.  Other streams may have duration = 2, I assume.

AudioChunk with duration = 2 is observed at the callback which is directly called from MediaStreamGraph, Opus encoder queues up chunks and feed them to libopus at once, so duration with 2 should not be a problem theoretically, I'm guessing libopus doesn't like the data being fed to it.
(gdb) p mSourceSegment->mChunks.Length()
$37 = 22

(gdb) p iter
$31 = {
  mSegment = @0x7fffbf486bb0, 
  mIndex = 5
}

(gdb) p *(float*)((SharedBuffer*)mSourceSegment->mChunks[5].mBuffer)->Data()@128
$33 = {0 <repeats 127 times>, -nan(0x400000)}

(gdb) p *(float*)((SharedBuffer*)mSourceSegment->mChunks[6].mBuffer)->Data()@128
$38 = {-nan(0x400000) <repeats 128 times>}

(gdb) p *(float*)((SharedBuffer*)mSourceSegment->mChunks[7].mBuffer)->Data()@128
$39 = {-nan(0x400000) <repeats 128 times>}

and so on until mChunks[21].

It seems like we are getting NaNs from the graph somehow.
Setting sec-low as it sounds like this may just be a harmless crash.
Keywords: sec-low
Some news: this seems completely unrelated to the MediaRecorder, and pretty silent, in fact. It seems related to cycles and DelayNode, but I can't repro if I use the revision of the initial cycles handling. I've started bisecting, I'll have more info tomorrow.

Basically, we read some trash memory somewhere (I suspect the DelayNode, but I'm not sure), put it into an AudioChunk, and it walks its way into the graph and goes eventually to the MediaRecorder, where it crashes.

When the Opus patch lands, this will be non-security sensitive, I think. Worst case, we output some trash audio.
Paul told me he found the underlying bug here. But I guess he didn't get around to attaching a patch.
Found similar crash when on run this bug's testcase.
Bug 935774 - "Assertion failure: meta" in mozilla::MediaEncoder::GetEncodedData
try: https://tbpl.mozilla.org/?tree=Try&rev=13cbbbc439a4
==>
https://tbpl.mozilla.org/php/getParsedLog.php?id=30500102&tree=Try

16:10:29     INFO -   0  XUL!mlp_process [mlp.c:13cbbbc439a4 : 81 + 0xa]
16:10:29     INFO -      rbx = 0x00000001042b7978   r12 = 0x0000000115d4fdb0
16:10:29     INFO -      r13 = 0x00000001042b7900   r14 = 0x0000000000000019
16:10:29     INFO -      r15 = 0x0000000000000000   rip = 0x000000010336b12c
16:10:29     INFO -      rsp = 0x0000000115d4d5a0   rbp = 0x0000000115d4d780
16:10:29     INFO -      Found by: given as instruction pointer in context
16:10:29     INFO -   1  XUL!tonality_analysis [analysis.c:13cbbbc439a4 : 480 + 0x4]
16:10:29     INFO -      rbx = 0x0000000106fdb3f8   r12 = 0x0000000000000012
16:10:29     INFO -      r13 = 0x0000000115d4df10   r14 = 0x0000000000000014
16:10:29     INFO -      r15 = 0x0000000000000000   rip = 0x000000010336a8be
16:10:29     INFO -      rsp = 0x0000000115d4d790   rbp = 0x0000000115d4ff10
16:10:29     INFO -      Found by: call frame info
16:10:29     INFO -   2  XUL!run_analysis [analysis.c:13cbbbc439a4 : 627 + 0x3c]
This was fixed in Opus, but it seems like the fix hasn't gone into FF yet:
http://git.xiph.org/?p=opus.git;a=commitdiff;h=d6b5679
When can we merge this patch into FF?
iirc, this was a problem related to cycles and DelayNode, and we were grabbing the wrong input chunks at a stream level. I'll look into it again, but when I told roc I found the cause, I was wrong, the actual cause was deeper.
Also, this seem to happen because the graph in the testcase does not contain a node capable of producing audio.
(In reply to Paul Adenot (:padenot) from comment #26)
> Also, this seem to happen because the graph in the testcase does not contain
> a node capable of producing audio.

Can you still post the backport patch from comment 14?
I just try to include the commnet 14 for 
the Bug 935774 - Assertion failure: meta in mozilla::MediaEncoder::GetEncodedData 

It can avoid the test fail, result: https://tbpl.mozilla.org/?tree=Try&rev=3d6a880c5b47
on the test_mediarecorder_record_crash_audiocontext.html test case.
See Also: → 945618
Is this fixed by bug 944538 which also fixed bug 945618?
Yep.
So can we close this? If not what needs to be done to resolve it.

Things unclear to me: Tim asked about backporting the tansig fix to ff27, but bug 945618 etc. mark that version as wontfix.

Is there a bug for not feeding NaNs to the encoder in the first place?
Now there is: bug 962548.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Depends on: 944538
Target Milestone: --- → mozilla28
Flags: in-testsuite?
What versions of Firefox were affected by this?
Flags: needinfo?(paul)
This started to happen on 26, I think, since bug 842243 landed.
Flags: needinfo?(paul)
B2G RelMan isn't approving sec-lows for v1.1/v1.2 uplift.
Whiteboard: [adv-main28+]
Group: core-security
You need to log in before you can comment on or make changes to this bug.