Closed Bug 1874471 Opened 8 months ago Closed 7 months ago

Possible race condition when demuxing video packets of new transceiver

Categories

(Core :: WebRTC: Signaling, defect)

Firefox 120
defect

Tracking

()

RESOLVED FIXED
125 Branch
Tracking Status
firefox125 --- fixed

People

(Reporter: deadbeef, Assigned: bwc)

Details

Attachments

(3 files)

Attached file mozlog_demux_error.txt

Steps to reproduce:

  1. Set up a RTCPeerConnection with a sendonly video transceiver and perform an offer/answer exchange.
  2. Apply a remote description adding a new recvonly video m= section containing an a=ssrc line (MID and RID header extensions not in use, demuxing by SSRC only). In the attached log this is SSRC 2804095052.
  3. At the same time, or sooner, packets from this SSRC are arriving from the remote endpoint.

Actual results:

There appears to be a race condition when setting up the receive stream. From the attached log:

  1. VideoConduit 7f1d6cd24900 (presumably representing the recvonly transceiver) sets up the receive stream as expected for SSRC 2804095052
  2. VideoConduit 7f1d7b51fc00 (presumably the sendonly transceiver) processes a packet from SSRC 2804095052, and due to code that handles the "unknown ssrc (and ssrc-not-signaled case)", adopts this as its new remote SSRC
  3. This unsets 2804095052 from VideoConduit 7f1d6cd24900, which generates a new remote SSRC.
  4. VideoConduit 7f1d6cd24900 now fails to demux packets from 2804095052.

My guess is that, while there's only one video m= section, all packets are routed to it in order to handle unknown/unsignaled SSRCs. This presumably changes when the new remote description is applied containing a new video m= section, but packets may still be queued up in the first VideoConduit, which when processed unset the SSRC of the VideoConduit which is actually meant to be handling that SSRC.

We had the same exact problem in Chrome, which was quite a pain to work around: https://source.chromium.org/chromium/_/webrtc/src.git/+/15e078c574981597c5d6ecc13476f54e667dc568

It looks like it can be avoided by making sure there's never exactly one video m= section, but adding a superfluous m= section seems to be causing other issues which I'm still looking into.

Expected results:

Demuxing by SSRC works as expected.

Just wanted to clarify, in #2, the m= section is sendonly in the remote description, recvonly in the local description.

Byron, any thoughts here?

Flags: needinfo?(docfaraday)

... so this is a scenario that uses bundle, which requires the use of SDP mid, but does not use RTP mid?

At the risk of being labelled a smart-aleck:

Patient: Doctor, it hurts when I do this!
Doctor: Then don't do that.

More seriously, I think that we must be seeing early arrival of that new RTP stream. That causes the filter for the first conduit to see a unique payload type, and learn the SSRC. Then, when we negotiate, the filter for the second conduit also will let that SSRC through. Lastly, because we're doing bundle without the MID RTP extension, we disable the ssrc switchover logic, because if we don't we'll just get random assignment with unsignaled SSRCs (this is what prevents us getting into an infinite back-and-forth ssrc switching).

I suppose we could add a "filter out everything because you're not negotiated to receive" bit on the filters.

Flags: needinfo?(docfaraday)

No, I agree, we should definitely be using the RTP MID extension. Unfortunately, given how things currently work, that's easier said than done...

Ok, I think I'm going to do a couple of things:

  1. Disregard the negotiated payload types of non-receiving m-sections for the purposes of unique payload type filtering.
  2. If a filter knows an ssrc, either from negotiation or learning it from RTP, we disable payload type matching for that filter. This will avoid incorrectly learning a new m-section's ssrc if the RTP arrives early.
Assignee: nobody → docfaraday

Try looks good.

Could you grab a binary from one of those try pushes and see if it addresses the issue?

Flags: needinfo?(deadbeef)

I'm not sure what I'm missing, but I'm having trouble reproducing the issue in an isolated way. I have a certain integration test which reproduces it (at least 10% of the time), but am not able to run that on a custom browser... Would it be ok for me to verify when the next release comes around?

Flags: needinfo?(deadbeef)

I'm a little bit leary of landing a non-verified behavioral change the day before a soft freeze, so maybe this will go into the next nightly. If that helps you substantially, I can try an early uplift.

Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d26e5bcff843 Only use receive payload types in MediaPipelineFilter. r=mjf https://hg.mozilla.org/integration/autoland/rev/a7847a74255d Disable payload-type matching for filters that know an ssrc. r=mjf
Status: UNCONFIRMED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: