Closed
Bug 1273652
Opened 8 years ago
Closed 8 years ago
No video rendering after renegotiating recvonly video (e.g. on meet.jit.si)
Categories
(Core :: WebRTC, defect, P1)
Core
WebRTC
Tracking
()
RESOLVED
FIXED
mozilla50
People
(Reporter: drno, Assigned: jesup)
References
Details
Attachments
(8 files)
3.87 MB,
application/x-gzip
|
Details | |
813.22 KB,
text/html
|
Details | |
1.91 MB,
application/x-bzip2
|
Details | |
474.09 KB,
application/x-bzip2
|
Details | |
191.38 KB,
application/x-bzip2
|
Details | |
3.02 KB,
text/html
|
Details | |
1.32 KB,
patch
|
pkerr
:
review+
gchang
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
58 bytes,
text/x-review-board-request
|
jesup
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
Occasionally when being in >= 3 multi-party call on meet.jit.si one of the video streams gets rendered as just as black. - this affects all version of Firefox (up to release) - I think I have not seen this for a two party call - it can affect the main video rendering as well as one of the preview renderer - it can happen right when joining a running conference - sometimes I think it happens also for video which has been rendered successfully I'm attaching a NSPR & webrtc log from a time where I had left a 3 way call and rejoined the call and got black video for one of the two other participants. Because of the SSRC re-writting done by the SFU I'm wondering if this might be related to Simulcast support?! Marking this P1 until we have an idea what is causing this.
Reporter | ||
Comment 1•8 years ago
|
||
I doubled checked already that the SDP we see in Firefox on about:webrtc for these calls is actually the SDP we got over the wire (oppose to Firefox re-creating a strange looking SDP from a confused internal state). What I'm referring to is this: ... m=video 1 RTP/SAVPF 100 a=ssrc:1019964314 cname:6f53568c-1802-4029-b63d-b71790779aec a=ssrc:1083845631 cname:1c04ce18-1ef5-43d1-8a1d-b977949ed95c a=ssrc:4016385049 cname:1c04ce18-1ef5-43d1-8a1d-b977949ed95c … m=video 1 RTP/SAVPF 100 a=ssrc:3232102231 cname:1c04ce18-1ef5-43d1-8a1d-b977949ed95c … m=video 1 RTP/SAVPF 100 a=ssrc:3969758265 cname:mixed a=ssrc:3969758265 label:mixedlabelvideo0 a=ssrc:3969758265 mslabel:mixedmslabel …
Reporter | ||
Comment 2•8 years ago
|
||
The patch from bug 1213773 might potentially fix this issue. Need to verify.
See Also: → 1213773
Comment 3•8 years ago
|
||
We investigated the situation and these additional SSRCs are to be expected. They are recvonly SSRCs from the remote endpoints that we decide to allocate to existing m-lines in order to keep the m-line number low. On a side note, we’re able to reproduce the non-rendering issue even in a 1-1 call, albeit very rarely. Hopefully bug 1213773 fixes this.
Assignee | ||
Comment 4•8 years ago
|
||
George - can you test if this is fixed in Today's (5/25) Nightly? the patch first was in Nightly today. Thanks! (Note: today's nightly has a bug where the AEC is only on for the first use of getUserMedia, so use headsets. Hopefully the fix for that will be in tomorrow's nightly)
Flags: needinfo?(gp)
Comment 5•8 years ago
|
||
It seems I'm no longer able to "reproduce" the issue. I put reproduce in quotes because we never a reliable way to reproduce it, it's random. I'll give it a few more shots to make sure I wasn't just being lucky.
Reporter | ||
Comment 6•8 years ago
|
||
I "reproduced" it just now with two Nightly's on Linux and Mac. So it is not fixed :-( I took me a couple of attempts. Now I have 1:1 call, but that call has two m=audio and two m=video lines in the SDP. Maybe the black video gets triggered only if multiple m-lines are present? That would at least explains why it is harder to repro on 1:1, but easier with on a 3-way call. Another observation from this 1:1 call: in the "RTP Stats" section of about:webrtc I see: - outbound_rt[c]p_audio_0 - outbound_rt[c]p_video_1 this makes sense as the SDP says sendrecv on one audio and one video m-line. But then on the receiving side I see this: - inbound_rt[c]p_audio_0 <- which appears dead; SSRC 0; no packets - inbound_rt[c]p_video_1 <- which also appears to be dead; SSRC 0; no packets - inbound_rt[c]p_audio_3 - inbound_rt[c]p_video_4 Why is number 2 missing from that list? Interestingly the amount of SSRC's in the remote SDP is 5 (which matches the 0-4 range).
Reporter | ||
Comment 7•8 years ago
|
||
Apparently even a fresh 1:1 call results in two audio plus two video m lines. A new theory from my side: does the black video get triggered when you re-use a room (at least in the 1:1 scenario)? I think I have not seen the black video when I used a room/URL for the first time.
Comment 8•8 years ago
|
||
Now I'm able to repro it too :( > But then on the receiving side I see this: The dead channels are expected to be dead. We pre-allocate 2 SSRCs for our server (1 for audio and 1 for video), but we typically don't send anything there. > Why is number 2 missing from that list? I think number 2 is missing because it's for SCTP, not RTP.
Comment 9•8 years ago
|
||
> I think I have not seen the black video when I used a room/URL for the first time.
I'm able to repro it even when I use a room for the first time :(
Comment 10•8 years ago
|
||
Comment 11•8 years ago
|
||
^ comes from a 1-1 call between FF46<->current Nightly. Both participants joined just once.
Flags: needinfo?(gp)
Comment 12•8 years ago
|
||
A little more information about the captured about:webrtc page. It's coming from the endpoint that renders black video. The SSRC that fails to be rendered is 1520089304. You can check that FF is getting data by looking at inbound_rtp_video_4: Decoder: Avg. bitrate: 1.77 Mbps (0.38 SD) Avg. framerate: 29.21 fps (1.15 SD) Local: 13:04:32 GMT-0500 (CDT) inboundrtp SSRC: 1520089304 Received: 243081 packets (260614.48 Kb) Lost: 331 Jitter: 1.35 Remote: 13:04:33 GMT-0500 (CDT) outboundrtp SSRC: 1520089304 Sent: 243041 packets (255825.53 Kb)
Reporter | ||
Comment 13•8 years ago
|
||
Yeah the missing number 2 in the RTP stats happens when the m=application is the middle of the m-sections in the SDP, rather then the end. I also verified via PCAP that packets are in fact still arrive with the expected SSRCs in case of the black rendering.
Comment 14•8 years ago
|
||
One thing that I notice between the first two calls in that attachment is that there's an extra a=ssrc line in a remote video m-section in the first one (groom2, which I assume failed?), but not the previous one (groom1): groom2 (failed?) a=ssrc:619954680 cname:mixed a=ssrc:619954680 label:mixedlabelvideo0 a=ssrc:619954680 mslabel:mixedmslabel a=ssrc:2033112203 cname:a0be499b-8416-44a4-b7d4-1144fbfaa4bd groom1 (succeeded?) a=ssrc:1513371698 cname:mixed a=ssrc:1513371698 label:mixedlabelvideo0 a=ssrc:1513371698 mslabel:mixedmslabel Also, in groom1 (the succeeded case?) not all of the m-sections are bundled, whereas in groom2 (the failed case?) they are. This smells like a bundle demux problem to me.
Comment 15•8 years ago
|
||
(In reply to George Politis [:gp] from comment #12) > A little more information about the captured about:webrtc page. It's coming > from the endpoint that renders black video. The SSRC that fails to be > rendered is 1520089304. You can check that FF is getting data by looking at > inbound_rtp_video_4: > > Decoder: Avg. bitrate: 1.77 Mbps (0.38 SD) Avg. framerate: 29.21 fps (1.15 > SD) > > Local: 13:04:32 GMT-0500 (CDT) inboundrtp SSRC: 1520089304 Received: 243081 > packets (260614.48 Kb) Lost: 331 Jitter: 1.35 > > Remote: 13:04:33 GMT-0500 (CDT) outboundrtp SSRC: 1520089304 Sent: 243041 > packets (255825.53 Kb) That's kinda weird. Because neither of the two ssrcs that are in the previous video m-section show up in the stats. What is that m-section for?
Comment 16•8 years ago
|
||
> That's kinda weird. Because neither of the two ssrcs that are in the previous video m-section show up in the stats. What is that m-section for?
Those are the pre-allocate video SSRCs for our server. We typically don't send anything there.
Comment 17•8 years ago
|
||
(In reply to Byron Campen [:bwc] (PTO May 26, PTO Jun 6-10) from comment #14) > One thing that I notice between the first two calls in that attachment is > that there's an extra a=ssrc line in a remote video m-section in the first > one (groom2, which I assume failed?), but not the previous one (groom1): > > groom2 (failed?) > a=ssrc:619954680 cname:mixed > a=ssrc:619954680 label:mixedlabelvideo0 > a=ssrc:619954680 mslabel:mixedmslabel > a=ssrc:2033112203 cname:a0be499b-8416-44a4-b7d4-1144fbfaa4bd > > groom1 (succeeded?) > a=ssrc:1513371698 cname:mixed > a=ssrc:1513371698 label:mixedlabelvideo0 > a=ssrc:1513371698 mslabel:mixedmslabel > > Also, in groom1 (the succeeded case?) not all of the m-sections are bundled, > whereas in groom2 (the failed case?) they are. This smells like a bundle > demux problem to me. This additional SSRC is to be expected. It's a recvonly SSRCs from the remote endpoint that we decide to allocate to existing m-lines in order to keep the m-line number low. I don't remember what the scenario was for groom1 or if it succeeded, I should probably have taken a note about that and I should also have clarified that the failed captured session was groom2, the very top one.
Reporter | ||
Comment 18•8 years ago
|
||
(In reply to Byron Campen [:bwc] (PTO May 26, PTO Jun 6-10) from comment #14) > Also, in groom1 (the succeeded case?) not all of the m-sections are bundled, > whereas in groom2 (the failed case?) they are. This smells like a bundle > demux problem to me. Interesting observation. But in all the logs I captured everything appears to be bundled correctly.
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → drno
Rank: 10
Summary: Black video rendering in multiparty calls on meet.jit.si → Black video rendering in calls on meet.jit.si
Reporter | ||
Comment 19•8 years ago
|
||
Sample NSPR log file from a call with black video rendering
Reporter | ||
Comment 20•8 years ago
|
||
Sample NSPR log file from a working call
Reporter | ||
Comment 21•8 years ago
|
||
So after the two log files in attachment 8761070 [details] and attachment 8761071 [details] my current educated guess is that the Jitsi bridge fails in certain call scenarios to send/forward RTCP between the sender and the receiver. Note: all my attempts to debug this have shown that for unknown reasons this problem appears to be timing sensitive. E.g. with a non-optimized, debug build of Firefox I was not able to reproduce the problem today at all. But after compiling the exact same code base with optimization and no debug it became easy to reproduce. My guess is that debug vs non-debug somehow impacts the speed with which re-negotiations are being taken care of and that might change something on the bridge side. If you execute the following command: grep "received RTCP" nspr.log.good-with-rtcp | cut -d '|' -f 2 | sort | uniq -c you get something like this: 3 Receive audio[467ea7df-437c-44f2-adf5-f856c7fd84bf] received RTCP packet. 1 Receive audio[5c45c565-e7be-441e-842c-747337ee9788] received RTCP packet. 5 Receive video[833496da-85f1-4a88-a1ba-d9bad271d27b] received RTCP packet. 36 Receive video[f8939cf5-b735-4873-814b-643aef2e1ae9] received RTCP packet. 142 Transmit audio[{77f7d662-db59-334e-9a9f-a2eb86fa327d}] received RTCP packet. 142 Transmit video[{cd24f8a8-31d4-4f4a-a847-6ef57ebdc00e}] received RTCP packet. But if you execute this: grep "received RTCP" nspr.log.bad-with-rtcp | cut -d '|' -f 2 | sort | uniq -c you something like this: 19 Receive audio[5c45c565-e7be-441e-842c-747337ee9788] received RTCP packet. 559 Transmit audio[{80505a4a-d359-2341-945d-d270f3f708cf}] received RTCP packet. 559 Transmit video[{911b4eb7-f24c-ab41-a7e0-b86b7c36384b}] received RTCP packet. The important part here is that "Receive video[...]" is missing, which indicates that FF MediaPipelines never forwarded any RTCP on its video RTP receiver to the webrtc.org code. Now this could also indicate a bug in FF MediaPipelines, especially the filter which lets RTP and RTCP pass for each pipeline based on the SSRC's from the SDP. But the log file clearly shows that the "Receive video[..]" pipeline does actually receive video RTP packets, which means the SSRC filter seems to be set properly for that pipeline (plus I manually tracked in the log file the SSRC's getting updated through renegotiation). As the RTCP part of the MediaPipeline shares the same filter as the RTP part I'm pretty sure the filter is setup properly for RTCP part. And the fact that "Receive audio[...]" still shows up in the table above means that we are getting at least RTCP for the audio receiver. Now this only indicates that FF is not receiving RTCP sender reports from the sender side, which if I understand it correctly is less important for working video, then RTCP receiver reports. The log file shows even in the bad call scenario FF sending out receiver reports regularly. But if this is really a problem in the bridge I assume it is very likely that it affects RTCP sender and receiver reports for a given video stream. I guess the remaining step which we could do is check in a PCAP of a bad call for the RTCP traffic and the SSRC's on these reports.
Reporter | ||
Comment 22•8 years ago
|
||
Another try with NSPR logs + pcap show that FF actually sends sender reports on the recvonly video m-section... hmmm I guess I keep digging.
Comment 23•8 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #21) > I guess the remaining step which we could do is check in a PCAP of a bad > call for the RTCP traffic and the SSRC's on these reports. Hi Nils, the bridge is sending sender reports for the SSRCs that it forwards, which include report blocks for the SSRCs that it receives. I run a quick test and the RTCP seemed fine while video was not rendering. If you want to run additional tests, there's a way to have the bridge dump the decrypted traffic.
Reporter | ||
Comment 24•8 years ago
|
||
Looking at some log files and pcap even more I found that in a 1:1 call the Firefox which renders black initially sends RTCP receiver and sender reports. But then it stops sending receiver reports and only send sender reports, for audio and video. My current guess is that one of the renegotiations triggers this behavior. Further the audio side only sending sender reports probably does not affect the audio, because the RTCP feedback is not essential. But in case of video looking at the frequency of the sender reports it actually looks more like these are requests to submit a missing I-frame. I guess these I-Frame requests inside of sender reports get ignore on the sender side and therefore the sender never submits a new I-frame, and the receiver stays black because it can't rendering only with p-frames. I'll continue to add more logging to figure out why the receiver switches to sending sender reports only on all channels.
Reporter | ||
Comment 25•8 years ago
|
||
I located the code which is probably causing the sending instead of just receiving behavior. But that code got added pretty recently in 49. Additionally I see also more confusion in my logs regarding which webrtc.org engines are suppose to be used for video decoding.
Reporter | ||
Comment 26•8 years ago
|
||
I'm finally able to repro the problem locally with a mochitest. It appears we have a problem when reconfiguring a recvonly video codec (which gets triggered through renegotiation - and meet.jit.si sometimes does two rounds and sometimes three rounds of offer-answer (where the third offer is logically identical to the second offer from before)).
Reporter | ||
Comment 27•8 years ago
|
||
NSPR and Webrtc log from a failed mochitest run.
Reporter | ||
Comment 28•8 years ago
|
||
Modified version of a test case which for me reproduces the issue after renegotiating.
Assignee | ||
Comment 29•8 years ago
|
||
Attachment #8770176 -
Flags: review?(pkerr)
Assignee | ||
Updated•8 years ago
|
Assignee: drno → rjesup
Status: NEW → ASSIGNED
Updated•8 years ago
|
Attachment #8770176 -
Flags: review?(pkerr) → review+
Reporter | ||
Comment 30•8 years ago
|
||
I manually did over 20 calls between my Mac and Linux machine with the patch applied on both sides and the calls were rock solid (where before I would have gotten black video with 5 calls max). I'm still working on writing a mochitest to prevent regressions of this in the future.
Reporter | ||
Updated•8 years ago
|
Summary: Black video rendering in calls on meet.jit.si → No video rendering after renegotiating recvonly video (e.g. on meet.jit.si)
Comment 32•8 years ago
|
||
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/381f21c8b4c0 Always reinitialize the receiver/jitterbuffer when reseting video decoding r=pkerr
Reporter | ||
Comment 33•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/63800/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/63800/
Attachment #8770335 -
Flags: review?(rjesup)
Reporter | ||
Comment 34•8 years ago
|
||
I guess we need to wait until the actual fix has made its way into MC, before we can land the test (I'm assuming your patch is not going to show up on the mozreview autoland repo).
Comment 35•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/381f21c8b4c0
Assignee | ||
Updated•8 years ago
|
Attachment #8770335 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 36•8 years ago
|
||
Comment on attachment 8770335 [details] Bug 1273652: mochi test to verify rendering happens after double negotiation. https://reviewboard.mozilla.org/r/63800/#review61096
Reporter | ||
Comment 37•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/4957d2668f55f0a7bb4dafb4b3be4e8fadee4197 Bug 1273652: mochi test to verify rendering happens after double negotiation. r=jesup
Reporter | ||
Updated•8 years ago
|
Keywords: leave-open
Comment 38•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4957d2668f55
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Assignee | ||
Comment 39•8 years ago
|
||
Comment on attachment 8770176 [details] [diff] [review] Always reinitialize the receiver/jitterbuffer when reseting video decoding Approval Request Comment [Feature/regressing bug #]: N/A (upstream bug) [User impact if declined]: random video failure in some renegotiation situations depending on timing [Describe test coverage new/current, TreeHerder]: Includes test (request is for both patches) [Risks and why]: Very low risk [String/UUID change made/needed]: none
Attachment #8770176 -
Flags: approval-mozilla-aurora?
Assignee | ||
Updated•8 years ago
|
Attachment #8770335 -
Flags: approval-mozilla-aurora?
Comment 40•8 years ago
|
||
Comment on attachment 8770176 [details] [diff] [review] Always reinitialize the receiver/jitterbuffer when reseting video decoding Review of attachment 8770176 [details] [diff] [review]: ----------------------------------------------------------------- This patch fixes some video failure in some renegotiation. Let's take it in aurora.
Attachment #8770176 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 41•8 years ago
|
||
Comment on attachment 8770335 [details] Bug 1273652: mochi test to verify rendering happens after double negotiation. Add a test. Take it in aurora.
Attachment #8770335 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 42•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/876634f4b299 https://hg.mozilla.org/releases/mozilla-aurora/rev/c8ddb24aabed
You need to log in
before you can comment on or make changes to this bug.
Description
•