Closed Bug 1394602 Opened 2 years ago Closed 2 years ago

Receiving media before signaling can cause crashes

Categories

(Core :: WebRTC, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed

People

(Reporter: drno, Assigned: drno)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(2 files)

Attached file moz.log
Thanks to a reproducible test case from appear.in I learned that Firefox suffers from the following problem:

- Firefox is in an established call with at least one video track
- The far end wants to start sending a second video track to Firefox
- Accidentally the far end starts sending RTP from that second video track before finishing the signaling with Firefox
- This results in Firefox switching it's first video receiver to switch SSRC's thanks to this code http://searchfox.org/mozilla-central/rev/18c16ebf818abb86805ce08a6e537e4cd826f044/media/webrtc/signaling/src/media-conduit/VideoConduit.cpp#2110 which is suppose to enable switching RTP streams w/o signaling
- After offer/answer concludes when we try to setup the second video receiver with webrtc.org it asserts because we are trying to add a receiver SSRC (the intention here is to make this another second receiver) call.cc knows about this SSRC already and complains that it is not legit to add it a second time http://searchfox.org/mozilla-central/rev/18c16ebf818abb86805ce08a6e537e4cd826f044/media/webrtc/trunk/webrtc/call/call.cc#653

A couple of questions come to my mind here:

- How could the RTP packet with SSRC 3338378591 get through the packet filter in the media pipeline (line 35753 in the attached log), if the signaling announcing the packet only shows up 4000 lines later (line 39612)?
- Besides fixing the packet filter above, do we also need to look into asking the media layer about the currently utilized SSRC's, as the code in VideoConduit.cpp can make the signaling layers assumptions invalid at any given time?
- Or would it be better to face it and not allow switching SSRC's w/o signaling?
Byron, Randell: thoughts on my questions above?
Flags: needinfo?(rjesup)
Flags: needinfo?(docfaraday)
The packet probably made it through the filter due to its PT being unique (if bundle was even being used).

What a mess.
Flags: needinfo?(docfaraday)
If we're going to try to forgive unsignaled SSRC changes, we probably need to re-create all recv streams when the signaling happens.
(In reply to Byron Campen [:bwc] from comment #3)
> If we're going to try to forgive unsignaled SSRC changes, we probably need
> to re-create all recv streams when the signaling happens.

Fippo actually made a good point: this probably a generic "plan b" <-> "unified" interop issue.

The far end in this case is Chrome, which can start sending media right after SLD, because it runs on Plan B. And this triggers then the issue on our side.

The PT's are in fact unique after the initial O/A, because the audio and the video m-section don't share any PTs. And yes Bundle is active.

Yes either tearing down all receivers and restart them. Or try to query the currently active SSRC's from the media and then try to do some magic of moving them around, if needed. But the later one sounds complicated and scary.
Thanks Nils! I figured out a workaround which calls SLD later in Chrome. This avoids sending the media until I have an answer from Firefox. Will hold off deploying for a bit until I have a fiddle or test that reproduces the behaviour.
Flags: needinfo?(rjesup)
Currently we blindly restart the RTP receiver on receiving a new SSRC.

Here is a long term plan how we should change that:

- if the connection doesn't use bundle we will continue to restart receivers on new SSRC
- if the connection does use bundle:
  - if we receive MID's in the RTP from the far end it's save to restart the receiver on new SSRC, as the routing of the RTP packet hopefully happened according to MID
  - if we don't receive MID's in the RTP packets from the far end (Note: that is the case today for 100% of cases as I'm not aware of anyone sending MIDs yet) we will turn of the feature to switch on new SSRC and instead drop the packets to the floor

The proposed change for the last scenario should be save for this particular use case, as the new SSRC actually comes some time later via signaling at which point we will create a receiver for it. But until that signaling has happened we should ignore unknown RTP streams as they are "dangerous". Plus these are coming from clients which have implemented Bundle only half as they are missing the required ("MUST") support for MID.

Now the problem here is that nobody, including us, has full support for MID in RTP yet. So until we have that I guess the only viable option right now, as Byron suggested, is to restart all receivers on every signaling to reset the SSRC state in the webrtc.org call object and it's receivers.
Oh and I forgot to mention that we added the support for switching SSRC's on the fly to support Cisco Spark's way of starting screen share. But as Spark still does not use Bundle, but separate transports per stream they fall into the first category and should therefor not be affected by the proposed solution.
Is this a regression? The subject mentions Firefox crashing, which means we should add tracking flags and look at uplifting a band-aid fix.
Flags: needinfo?(drno)
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #8)
> Is this a regression? The subject mentions Firefox crashing, which means we
> should add tracking flags and look at uplifting a band-aid fix.

The crash is caused by basically webrtc.org equivalent to our MOZ_CRASH. So it's effectively a warning that we are doing something which we are not suppose to do, but it is not a security issue or anything like that.

So after quite a bit of source code archeology I'm pretty the root cause is bug 1337777 which landed in Fx 53.
And then later got accelerated through bug 1349233, which got uplifted to Fx 54.

Regression, well we did not crash before, because we didn't have this code before. But it is not a feature which once worked, and not stopped working.

I think the plan is that either Byron or me will have to implement the ugly "reset all receivers" on each round of signaling as band-aid, until we have full support for MID as described in comment #6.
Depends on: 1337777, 1349233
Flags: needinfo?(drno)
Ok, so not a sec issue or feature regression, but a tab crash regression in 53 for a rare web site edge-case.
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2
I thought about this a little bit more and I think maybe we don't even need the band-aid. Instead the plan would be to turn off the "unknown SSRC restarts the receiver" if bundle is in use.

The reasoning is: from what I hear several implementations get hit by this because they actually use bundle. But the reason they hit the assertion is that a RTP packet from a new stream arrives before the signaling sets up the new receiver. Now my assumption is that anything out there which is recent enough to actually implement bundle is not going to use the "lets change SSRCs without signaling" trick. And if one does it - tough luck.

So we would leave the "unknown SSRC restarts the receiver" feature turned on for un-bundled transports. This allows Spark to continue to abuse this feature.
For bundled transport we turn that feature off. Question is what we do with the too early RTP packets? Queue them for some time? I lean more towards throwing them away.

Once we have MID support in the demuxer we can then turn the "unknown SSRC restarts the receiver" back on if MIDs are transmitted to us in the RTP.

Byron: thoughts?
Flags: needinfo?(docfaraday)
I've never particularly liked doing contortions to attempt to support stuff that doesn't actually work, and given that this hack is causing worse problems than it was intended to solve, I am in favor of getting rid of the hack.
Flags: needinfo?(docfaraday)
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review210598

::: media/webrtc/signaling/src/peerconnection/TransceiverImpl.cpp:866
(Diff revision 1)
>                " Setting remote SSRC " <<
>                mJsepTransceiver->mRecvTrack.GetSsrcs().front());
>      conduit->SetRemoteSSRC(mJsepTransceiver->mRecvTrack.GetSsrcs().front());
>    }
>  
> +  // TODO (bug XXX) once we pay attention to receiving MID's in RTP packets the

Bug #

::: media/webrtc/signaling/src/peerconnection/TransceiverImpl.cpp:866
(Diff revision 1)
> +  // TODO (bug XXX) once we pay attention to receiving MID's in RTP packets the
> +  // media pipeline code should enable allowing SSRC changes again.
> +  // We also need to do that for SDP with MID and w/o SSRC (which is legit
> +  // according to latest spec) *sigh*
> +  conduit->SetAllowSsrcChanges(!mJsepTransceiver->HasBundleLevel());

Hmm. Wonder what interesting race conditions can happen if bundle is renegotiated... maybe avoid enabling this if it was already disabled?
Attachment #8933869 - Flags: review?(docfaraday) → review+
Blocks: 1423041
I'm not aware of a way to remove an existing r+ in mozreview. Since I added some more code to handle the no-SSRC case, which we actually test in a mochitest, can you please re-review Byron?
Flags: needinfo?(docfaraday)
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review236576

r+ with fixes.

::: media/webrtc/signaling/src/media-conduit/VideoConduit.h:317
(Diff revision 4)
> +  void WaitForInitialSsrc() override {
> +    mWaitingForInitialSsrc = true;
> +  }
> +

Do we really need this function? Can't we just set mWaitingForInitialSsrc to true by default, and unset it if SetRemoteSSRC is called?

::: media/webrtc/signaling/src/media-conduit/VideoConduit.cpp:1974
(Diff revision 4)
> +  if (mAllowSsrcChange || mWaitingForInitialSsrc) {
> +    if (mWaitingForInitialSsrc) {
> +      mWaitingForInitialSsrc = false;
> +    }

Pretty sure we don't return anything if neither mAllowSsrcChange nor mWaitingForInitialSsrc. Let's make this an early return condition instead of wrapping all the following code in an if check.

::: media/webrtc/signaling/src/peerconnection/TransceiverImpl.cpp:865
(Diff revision 4)
>      MOZ_MTLOG(ML_DEBUG, mPCHandle << "[" << mMid << "]: " << __FUNCTION__ <<
>                " Setting remote SSRC " <<
>                mJsepTransceiver->mRecvTrack.GetSsrcs().front());
>      conduit->SetRemoteSSRC(mJsepTransceiver->mRecvTrack.GetSsrcs().front());
> +  } else {
> +    // This is needed for MID w/o SSRC, which is legit *sigh*

To be fair, this is how everyone is supposed to be doing this now, but the required RTP MID support just isn't there.

::: media/webrtc/signaling/src/peerconnection/TransceiverImpl.cpp:874
(Diff revision 4)
> +  // TODO (bug 1423041) once we pay attention to receiving MID's in RTP packets
> +  // (see bug 1405495) the media pipeline code should enable allowing SSRC
> +  // changes again.
> +  if (mJsepTransceiver->HasBundleLevel()) {
> +    conduit->DisableSsrcChanges();
>    }

So if we're using bundle, but the RTP MID extension has been successfully negotiated, we don't need to disable SSRC changes.
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review236576

> Pretty sure we don't return anything if neither mAllowSsrcChange nor mWaitingForInitialSsrc. Let's make this an early return condition instead of wrapping all the following code in an if check.

Well the real work of the function happens outside of this if condition here:
 if (DeliverPacket(data, len) != kMediaConduitNoError) {

This is getting pretty ugly though. Move the code inside the if condition into a new function?

> So if we're using bundle, but the RTP MID extension has been successfully negotiated, we don't need to disable SSRC changes.

I was planing on relying on the RTP packets to tell us if we see the RTP MID extension, instead of relying on just the signaling. Since we currently don't use the MID for routing the packets our self (yet) I guess this feels a little risky. On the other hand Fx doesn't change SSRCs without signaling them, so that should not cause problems.
See Also: → 1409167
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review236576

> Well the real work of the function happens outside of this if condition here:
>  if (DeliverPacket(data, len) != kMediaConduitNoError) {
> 
> This is getting pretty ugly though. Move the code inside the if condition into a new function?

Ah, my eyeballs tricked me here.
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review236576

> I was planing on relying on the RTP packets to tell us if we see the RTP MID extension, instead of relying on just the signaling. Since we currently don't use the MID for routing the packets our self (yet) I guess this feels a little risky. On the other hand Fx doesn't change SSRCs without signaling them, so that should not cause problems.

Hmm. So if we ever actually see an RTP MID, we start allowing the SSRC to change?
Lets wait for the investigation in bug 1449042 before landing this (to avoid adding more trouble to the problems).
Flags: needinfo?(drno)
See Also: → 1427577
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review237174

::: media/webrtc/signaling/src/media-conduit/VideoConduit.cpp:897
(Diff revisions 4 - 6)
> +  CSFLogDebug(LOGTAG, "%s: SSRC %u (0x%x)", __FUNCTION__, ssrc, ssrc);
> +  mRecvStreamConfig.rtp.remote_ssrc = ssrc;

Any particular reason this was moved?

::: media/webrtc/signaling/src/peerconnection/TransceiverImpl.cpp:870
(Diff revisions 4 - 6)
> +    // TODO (bug 1423041) once we pay attention to receiving MID's in RTP packets
> +    // (see bug 1405495) we could make this depending on the presence of MID in
> +    // the RTP packets instead of relying on the signaling.
> +    if (mJsepTransceiver->HasBundleLevel() &&
> +        !details.GetExt(webrtc::RtpExtension::kMIdUri)) {
> +      conduit->DisableSsrcChanges();
> +    }

We should probably do this even when the receive track is not active.
Ok, I've determined what was going on in bug 1449042, I think it is safe to proceed here.
Flags: needinfo?(docfaraday)
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review237174

> Any particular reason this was moved?

When searching for the best place to modify |mWaitingForInitialSsrc| in here I noticed that this function actually modifies the |mRecvStreamConfig| even before all the safety checks (make me wonder how many times mRecvStreamConfig actually contains a SSRC which never got applied to the recv stream). That looked wrong to me. So I moved the two things into one common place.
Comment on attachment 8933869 [details]
Bug 1394602: don't allow SSRC changes with Bundle.

https://reviewboard.mozilla.org/r/204806/#review237174

> When searching for the best place to modify |mWaitingForInitialSsrc| in here I noticed that this function actually modifies the |mRecvStreamConfig| even before all the safety checks (make me wonder how many times mRecvStreamConfig actually contains a SSRC which never got applied to the recv stream). That looked wrong to me. So I moved the two things into one common place.

Ok, fair enough.
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/9d5db4dacc58
don't allow SSRC changes with Bundle. r=bwc
https://hg.mozilla.org/mozilla-central/rev/9d5db4dacc58
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Flags: needinfo?(drno)
Is this something we should consider for backport to 60 or can it ride the 61 train to release?
Flags: needinfo?(drno)
Thanks for checking Ryan: as much as I would love Firefox users to benefit from this earlier I think the change is too risky to consider it uplifting.
Flags: needinfo?(drno)
You need to log in before you can comment on or make changes to this bug.