Possible race condition when demuxing video packets of new transceiver
Categories
(Core :: WebRTC: Signaling, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox125 | --- | fixed |
People
(Reporter: deadbeef, Assigned: bwc)
Details
Attachments
(3 files)
Steps to reproduce:
- Set up a RTCPeerConnection with a sendonly video transceiver and perform an offer/answer exchange.
- Apply a remote description adding a new recvonly video m= section containing an a=ssrc line (MID and RID header extensions not in use, demuxing by SSRC only). In the attached log this is SSRC 2804095052.
- At the same time, or sooner, packets from this SSRC are arriving from the remote endpoint.
Actual results:
There appears to be a race condition when setting up the receive stream. From the attached log:
- VideoConduit 7f1d6cd24900 (presumably representing the recvonly transceiver) sets up the receive stream as expected for SSRC 2804095052
- VideoConduit 7f1d7b51fc00 (presumably the sendonly transceiver) processes a packet from SSRC 2804095052, and due to code that handles the "unknown ssrc (and ssrc-not-signaled case)", adopts this as its new remote SSRC
- This unsets 2804095052 from VideoConduit 7f1d6cd24900, which generates a new remote SSRC.
- VideoConduit 7f1d6cd24900 now fails to demux packets from 2804095052.
My guess is that, while there's only one video m= section, all packets are routed to it in order to handle unknown/unsignaled SSRCs. This presumably changes when the new remote description is applied containing a new video m= section, but packets may still be queued up in the first VideoConduit, which when processed unset the SSRC of the VideoConduit which is actually meant to be handling that SSRC.
We had the same exact problem in Chrome, which was quite a pain to work around: https://source.chromium.org/chromium/_/webrtc/src.git/+/15e078c574981597c5d6ecc13476f54e667dc568
It looks like it can be avoided by making sure there's never exactly one video m= section, but adding a superfluous m= section seems to be causing other issues which I'm still looking into.
Expected results:
Demuxing by SSRC works as expected.
Reporter | ||
Comment 1•8 months ago
|
||
Just wanted to clarify, in #2, the m= section is sendonly in the remote description, recvonly in the local description.
Assignee | ||
Comment 3•7 months ago
|
||
... so this is a scenario that uses bundle, which requires the use of SDP mid, but does not use RTP mid?
At the risk of being labelled a smart-aleck:
Patient: Doctor, it hurts when I do this!
Doctor: Then don't do that.
More seriously, I think that we must be seeing early arrival of that new RTP stream. That causes the filter for the first conduit to see a unique payload type, and learn the SSRC. Then, when we negotiate, the filter for the second conduit also will let that SSRC through. Lastly, because we're doing bundle without the MID RTP extension, we disable the ssrc switchover logic, because if we don't we'll just get random assignment with unsignaled SSRCs (this is what prevents us getting into an infinite back-and-forth ssrc switching).
I suppose we could add a "filter out everything because you're not negotiated to receive" bit on the filters.
Reporter | ||
Comment 4•7 months ago
|
||
No, I agree, we should definitely be using the RTP MID extension. Unfortunately, given how things currently work, that's easier said than done...
Assignee | ||
Comment 5•7 months ago
|
||
Ok, I think I'm going to do a couple of things:
- Disregard the negotiated payload types of non-receiving m-sections for the purposes of unique payload type filtering.
- If a filter knows an ssrc, either from negotiation or learning it from RTP, we disable payload type matching for that filter. This will avoid incorrectly learning a new m-section's ssrc if the RTP arrives early.
Assignee | ||
Updated•7 months ago
|
Assignee | ||
Comment 6•7 months ago
|
||
Assignee | ||
Comment 7•7 months ago
|
||
Assignee | ||
Comment 8•7 months ago
|
||
Depends on D201766
Assignee | ||
Comment 9•7 months ago
|
||
Try looks good.
Could you grab a binary from one of those try pushes and see if it addresses the issue?
Reporter | ||
Comment 10•7 months ago
|
||
I'm not sure what I'm missing, but I'm having trouble reproducing the issue in an isolated way. I have a certain integration test which reproduces it (at least 10% of the time), but am not able to run that on a custom browser... Would it be ok for me to verify when the next release comes around?
Assignee | ||
Comment 11•7 months ago
|
||
I'm a little bit leary of landing a non-verified behavioral change the day before a soft freeze, so maybe this will go into the next nightly. If that helps you substantially, I can try an early uplift.
Comment 12•7 months ago
|
||
Comment 13•7 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d26e5bcff843
https://hg.mozilla.org/mozilla-central/rev/a7847a74255d
Description
•