Closed Bug 1273652 Opened 8 years ago Closed 8 years ago

No video rendering after renegotiating recvonly video (e.g. on meet.jit.si)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla50

Tracking Flags:

Tracking

Status

firefox49

---

fixed

firefox50

---

fixed

People

(Reporter: drno, Assigned: jesup)

References

Details

Attachments

(8 files)

nspr.log.jitsi-black-after-rejoining.gz 8 years ago Nils Ohlmeier [:drno] 3.87 MB, application/x-gzip		Details
about:webrtc page of a session that failed 8 years ago George Politis [:gp] 813.22 KB, text/html		Details
nspr.log.bad-with-rtcp.bz2 8 years ago Nils Ohlmeier [:drno] 1.91 MB, application/x-bzip2		Details
nspr.log.good-with-rtcp.bz2 8 years ago Nils Ohlmeier [:drno] 474.09 KB, application/x-bzip2		Details
nspr.log-failed-mochitest-with-webrtc-logs.bz2 8 years ago Nils Ohlmeier [:drno] 191.38 KB, application/x-bzip2		Details
test_peerConnection_promiseSendOnly.html 8 years ago Nils Ohlmeier [:drno] 3.02 KB, text/html		Details
Always reinitialize the receiver/jitterbuffer when reseting video decoding 8 years ago Randell Jesup [:jesup] (needinfo me) 1.32 KB, patch	pkerr : review+ gchang : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
Bug 1273652: mochi test to verify rendering happens after double negotiation. 8 years ago Nils Ohlmeier [:drno] 58 bytes, text/x-review-board-request	jesup : review+ gchang : approval-mozilla-aurora+	Details

Nils Ohlmeier [:drno]

Reporter

Description

•

8 years ago

Attached file nspr.log.jitsi-black-after-rejoining.gz — Details

Occasionally when being in >= 3 multi-party call on meet.jit.si one of the video streams gets rendered as just as black.

- this affects all version of Firefox (up to release)
- I think I have not seen this for a two party call
- it can affect the main video rendering as well as one of the preview renderer
- it can happen right when joining a running conference
- sometimes I think it happens also for video which has been rendered successfully

I'm attaching a NSPR & webrtc log from a time where I had left a 3 way call and rejoined the call and got black video for one of the two other participants.

Because of the SSRC re-writting done by the SFU I'm wondering if this might be related to Simulcast support?! Marking this P1 until we have an idea what is causing this.

Nils Ohlmeier [:drno]

Reporter

Comment 1

•

8 years ago

I doubled checked already that the SDP we see in Firefox on about:webrtc for these calls is actually the SDP we got over the wire (oppose to Firefox re-creating a strange looking SDP from a confused internal state).

What I'm referring to is this:

...
m=video 1 RTP/SAVPF 100
a=ssrc:1019964314 cname:6f53568c-1802-4029-b63d-b71790779aec
a=ssrc:1083845631 cname:1c04ce18-1ef5-43d1-8a1d-b977949ed95c
a=ssrc:4016385049 cname:1c04ce18-1ef5-43d1-8a1d-b977949ed95c
…
m=video 1 RTP/SAVPF 100
a=ssrc:3232102231 cname:1c04ce18-1ef5-43d1-8a1d-b977949ed95c
…
m=video 1 RTP/SAVPF 100
a=ssrc:3969758265 cname:mixed
a=ssrc:3969758265 label:mixedlabelvideo0
a=ssrc:3969758265 mslabel:mixedmslabel
…

Nils Ohlmeier [:drno]

Reporter

Comment 2

•

8 years ago

The patch from bug 1213773 might potentially fix this issue. Need to verify.

Comment 3

•

8 years ago

We investigated the situation and these additional SSRCs are to be expected. They are recvonly SSRCs from the remote endpoints that we decide to allocate to existing m-lines in order to keep the m-line number low.

On a side note, we’re able to reproduce the non-rendering issue even in a 1-1 call, albeit very rarely.

Hopefully bug 1213773 fixes this.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 4

•

8 years ago

George - can you test if this is fixed in Today's (5/25) Nightly?  the patch first was in Nightly today.
Thanks!
(Note: today's nightly has a bug where the AEC is only on for the first use of getUserMedia, so use headsets.  Hopefully the fix for that will be in tomorrow's nightly)

Flags: needinfo?(gp)

George Politis [:gp]

Comment 5

•

8 years ago

It seems I'm no longer able to "reproduce" the issue. I put reproduce in quotes because we never a reliable way to reproduce it, it's random. I'll give it a few more shots to make sure I wasn't just being lucky.

Nils Ohlmeier [:drno]

Reporter

Comment 6

•

8 years ago

I "reproduced" it just now with two Nightly's on Linux and Mac. So it is not fixed :-(

I took me a couple of attempts. Now I have 1:1 call, but that call has two m=audio and two m=video lines in the SDP. Maybe the black video gets triggered only if multiple m-lines are present? That would at least explains why it is harder to repro on 1:1, but easier with on a 3-way call.

Another observation from this 1:1 call:
in the "RTP Stats" section of about:webrtc I see:
 - outbound_rt[c]p_audio_0
 - outbound_rt[c]p_video_1
this makes sense as the SDP says sendrecv on one audio and one video m-line.
But then on the receiving side I see this:
 - inbound_rt[c]p_audio_0 <- which appears dead; SSRC 0; no packets
 - inbound_rt[c]p_video_1 <- which also appears to be dead; SSRC 0; no packets
 - inbound_rt[c]p_audio_3
 - inbound_rt[c]p_video_4
Why is number 2 missing from that list? Interestingly the amount of SSRC's in the remote SDP is 5 (which matches the 0-4 range).

Nils Ohlmeier [:drno]

Reporter

Comment 7

•

8 years ago

Apparently even a fresh 1:1 call results in two audio plus two video m lines.

A new theory from my side: does the black video get triggered when you re-use a room (at least in the 1:1 scenario)?
I think I have not seen the black video when I used a room/URL for the first time.

George Politis [:gp]

Comment 8

•

8 years ago

Now I'm able to repro it too :(

> But then on the receiving side I see this:

The dead channels are expected to be dead. We pre-allocate 2 SSRCs for our server (1 for audio and 1 for video), but we typically don't send anything there. 

> Why is number 2 missing from that list?

I think number 2 is missing because it's for SCTP, not RTP.

George Politis [:gp]

Comment 9

•

8 years ago

> I think I have not seen the black video when I used a room/URL for the first time.

I'm able to repro it even when I use a room for the first time :(

George Politis [:gp]

Comment 10

•

8 years ago

Attached file about:webrtc page of a session that failed — Details

George Politis [:gp]

Comment 11

•

8 years ago

^ comes from a 1-1 call between FF46<->current Nightly. Both participants joined just once.

Flags: needinfo?(gp)

George Politis [:gp]

Comment 12

•

8 years ago

A little more information about the captured about:webrtc page. It's coming from the endpoint that renders black video. The SSRC that fails to be rendered is 1520089304. You can check that FF is getting data by looking at inbound_rtp_video_4:

Decoder: Avg. bitrate: 1.77 Mbps (0.38 SD) Avg. framerate: 29.21 fps (1.15 SD)

Local: 13:04:32 GMT-0500 (CDT) inboundrtp SSRC: 1520089304 Received: 243081 packets (260614.48 Kb) Lost: 331 Jitter: 1.35

Remote: 13:04:33 GMT-0500 (CDT) outboundrtp SSRC: 1520089304 Sent: 243041 packets (255825.53 Kb)

Nils Ohlmeier [:drno]

Reporter

Comment 13

•

8 years ago

Yeah the missing number 2 in the RTP stats happens when the m=application is the middle of the m-sections in the SDP, rather then the end.

I also verified via PCAP that packets are in fact still arrive with the expected SSRCs in case of the black rendering.

Byron Campen [:bwc]

Comment 14

•

8 years ago

One thing that I notice between the first two calls in that attachment is that there's an extra a=ssrc line in a remote video m-section in the first one (groom2, which I assume failed?), but not the previous one (groom1):

groom2 (failed?)
a=ssrc:619954680 cname:mixed
a=ssrc:619954680 label:mixedlabelvideo0
a=ssrc:619954680 mslabel:mixedmslabel
a=ssrc:2033112203 cname:a0be499b-8416-44a4-b7d4-1144fbfaa4bd

groom1 (succeeded?)
a=ssrc:1513371698 cname:mixed
a=ssrc:1513371698 label:mixedlabelvideo0
a=ssrc:1513371698 mslabel:mixedmslabel

Also, in groom1 (the succeeded case?) not all of the m-sections are bundled, whereas in groom2 (the failed case?) they are. This smells like a bundle demux problem to me.

Byron Campen [:bwc]

Comment 15

•

8 years ago

(In reply to George Politis [:gp] from comment #12)
> A little more information about the captured about:webrtc page. It's coming
> from the endpoint that renders black video. The SSRC that fails to be
> rendered is 1520089304. You can check that FF is getting data by looking at
> inbound_rtp_video_4:
> 
> Decoder: Avg. bitrate: 1.77 Mbps (0.38 SD) Avg. framerate: 29.21 fps (1.15
> SD)
> 
> Local: 13:04:32 GMT-0500 (CDT) inboundrtp SSRC: 1520089304 Received: 243081
> packets (260614.48 Kb) Lost: 331 Jitter: 1.35
> 
> Remote: 13:04:33 GMT-0500 (CDT) outboundrtp SSRC: 1520089304 Sent: 243041
> packets (255825.53 Kb)

   That's kinda weird. Because neither of the two ssrcs that are in the previous video m-section show up in the stats. What is that m-section for?

George Politis [:gp]

Comment 16

•

8 years ago

> That's kinda weird. Because neither of the two ssrcs that are in the previous video m-section show up in the stats. What is that m-section for?

Those are the pre-allocate video SSRCs for our server. We typically don't send anything there.

George Politis [:gp]

Comment 17

•

8 years ago

(In reply to Byron Campen [:bwc] (PTO May 26, PTO Jun 6-10) from comment #14)
> One thing that I notice between the first two calls in that attachment is
> that there's an extra a=ssrc line in a remote video m-section in the first
> one (groom2, which I assume failed?), but not the previous one (groom1):
> 
> groom2 (failed?)
> a=ssrc:619954680 cname:mixed
> a=ssrc:619954680 label:mixedlabelvideo0
> a=ssrc:619954680 mslabel:mixedmslabel
> a=ssrc:2033112203 cname:a0be499b-8416-44a4-b7d4-1144fbfaa4bd
> 
> groom1 (succeeded?)
> a=ssrc:1513371698 cname:mixed
> a=ssrc:1513371698 label:mixedlabelvideo0
> a=ssrc:1513371698 mslabel:mixedmslabel
> 
> Also, in groom1 (the succeeded case?) not all of the m-sections are bundled,
> whereas in groom2 (the failed case?) they are. This smells like a bundle
> demux problem to me.

This additional SSRC is to be expected. It's a recvonly SSRCs from the remote endpoint that we decide to allocate to existing m-lines in order to keep the m-line number low.

I don't remember what the scenario was for groom1 or if it succeeded, I should probably have taken a note about that and I should also have clarified that the failed captured session was groom2, the very top one.

Nils Ohlmeier [:drno]

Reporter

Comment 18

•

8 years ago

(In reply to Byron Campen [:bwc] (PTO May 26, PTO Jun 6-10) from comment #14)
> Also, in groom1 (the succeeded case?) not all of the m-sections are bundled,
> whereas in groom2 (the failed case?) they are. This smells like a bundle
> demux problem to me.

Interesting observation. But in all the logs I captured everything appears to be bundled correctly.

Nils Ohlmeier [:drno]

Reporter

Updated

•

8 years ago

Assignee: nobody → drno

Rank: 10

Summary: Black video rendering in multiparty calls on meet.jit.si → Black video rendering in calls on meet.jit.si

Nils Ohlmeier [:drno]

Reporter

Updated

•

8 years ago

Comment 19

•

8 years ago

Attached file nspr.log.bad-with-rtcp.bz2 — Details

Sample NSPR log file from a call with black video rendering

Nils Ohlmeier [:drno]

Reporter

Comment 20

•

8 years ago

Attached file nspr.log.good-with-rtcp.bz2 — Details

Sample NSPR log file from a working call

Nils Ohlmeier [:drno]

Reporter

Comment 21

•

8 years ago

So after the two log files in attachment 8761070 [details] and attachment 8761071 [details] my current educated guess is that the Jitsi bridge fails in certain call scenarios to send/forward RTCP between the sender and the receiver.

Note: all my attempts to debug this have shown that for unknown reasons this problem appears to be timing sensitive. E.g. with a non-optimized, debug build of Firefox I was not able to reproduce the problem today at all. But after compiling the exact same code base with optimization and no debug it became easy to reproduce. My guess is that debug vs non-debug somehow impacts the speed with which re-negotiations are being taken care of and that might change something on the bridge side.

If you execute the following command:
  grep "received RTCP" nspr.log.good-with-rtcp | cut -d '|' -f 2 | sort | uniq -c
you get something like this:
   3  Receive audio[467ea7df-437c-44f2-adf5-f856c7fd84bf] received RTCP packet.
   1  Receive audio[5c45c565-e7be-441e-842c-747337ee9788] received RTCP packet.
   5  Receive video[833496da-85f1-4a88-a1ba-d9bad271d27b] received RTCP packet.
  36  Receive video[f8939cf5-b735-4873-814b-643aef2e1ae9] received RTCP packet.
 142  Transmit audio[{77f7d662-db59-334e-9a9f-a2eb86fa327d}] received RTCP packet.
 142  Transmit video[{cd24f8a8-31d4-4f4a-a847-6ef57ebdc00e}] received RTCP packet.

But if you execute this:
  grep "received RTCP" nspr.log.bad-with-rtcp | cut -d '|' -f 2 | sort | uniq -c
you something like this:
    19  Receive audio[5c45c565-e7be-441e-842c-747337ee9788] received RTCP packet.
 559  Transmit audio[{80505a4a-d359-2341-945d-d270f3f708cf}] received RTCP packet.
 559  Transmit video[{911b4eb7-f24c-ab41-a7e0-b86b7c36384b}] received RTCP packet.
The important part here is that "Receive video[...]" is missing, which indicates that FF MediaPipelines never forwarded any RTCP on its video RTP receiver to the webrtc.org code.

Now this could also indicate a bug in FF MediaPipelines, especially the filter which lets RTP and RTCP pass for each pipeline based on the SSRC's from the SDP. But the log file clearly shows that the "Receive video[..]" pipeline does actually receive video RTP packets, which means the SSRC filter seems to be set properly for that pipeline (plus I manually tracked in the log file the SSRC's getting updated through renegotiation). As the RTCP part of the MediaPipeline shares the same filter as the RTP part I'm pretty sure the filter is setup properly for RTCP part.
And the fact that "Receive audio[...]" still shows up in the table above means that we are getting at least RTCP for the audio receiver.

Now this only indicates that FF is not receiving RTCP sender reports from the sender side, which if I understand it correctly is less important for working video, then RTCP receiver reports. The log file shows even in the bad call scenario FF sending out receiver reports regularly. But if this is really a problem in the bridge I assume it is very likely that it affects RTCP sender and receiver reports for a given video stream.

I guess the remaining step which we could do is check in a PCAP of a bad call for the RTCP traffic and the SSRC's on these reports.

Nils Ohlmeier [:drno]

Reporter

Comment 22

•

8 years ago

Another try with NSPR logs + pcap show that FF actually sends sender reports on the recvonly video m-section... hmmm I guess I keep digging.

George Politis [:gp]

Comment 23

•

8 years ago

(In reply to Nils Ohlmeier [:drno] from comment #21)
> I guess the remaining step which we could do is check in a PCAP of a bad
> call for the RTCP traffic and the SSRC's on these reports.

Hi Nils, the bridge is sending sender reports for the SSRCs that it forwards, which include report blocks for the SSRCs that it receives. I run a quick test and the RTCP seemed fine while video was not rendering. If you want to run additional tests, there's a way to have the bridge dump the decrypted traffic.

Nils Ohlmeier [:drno]

Reporter

Comment 24

•

8 years ago

Looking at some log files and pcap even more I found that in a 1:1 call the Firefox which renders black initially sends RTCP receiver and sender reports. But then it stops sending receiver reports and only send sender reports, for audio and video. My current guess is that one of the renegotiations triggers this behavior.
Further the audio side only sending sender reports probably does not affect the audio, because the RTCP feedback is not essential. But in case of video looking at the frequency of the sender reports it actually looks more like these are requests to submit a missing I-frame. I guess these I-Frame requests inside of sender reports get ignore on the sender side and therefore the sender never submits a new I-frame, and the receiver stays black because it can't rendering only with p-frames.

I'll continue to add more logging to figure out why the receiver switches to sending sender reports only on all channels.

Nils Ohlmeier [:drno]

Reporter

Comment 25

•

8 years ago

I located the code which is probably causing the sending instead of just receiving behavior. But that code got added pretty recently in 49.
Additionally I see also more confusion in my logs regarding which webrtc.org engines are suppose to be used for video decoding.

Nils Ohlmeier [:drno]

Reporter

Updated

•

8 years ago

Depends on: 1282792

Nils Ohlmeier [:drno]

Reporter

Comment 26

•

8 years ago

I'm finally able to repro the problem locally with a mochitest. It appears we have a problem when reconfiguring a recvonly video codec (which gets triggered through renegotiation - and meet.jit.si sometimes does two rounds and sometimes three rounds of offer-answer (where the third offer is logically identical to the second offer from before)).

Nils Ohlmeier [:drno]

Reporter

Comment 27

•

8 years ago

Attached file nspr.log-failed-mochitest-with-webrtc-logs.bz2 — Details

NSPR and Webrtc log from a failed mochitest run.

Nils Ohlmeier [:drno]

Reporter

Comment 28

•

8 years ago

Attached file test_peerConnection_promiseSendOnly.html — Details

Modified version of a test case which for me reproduces the issue after renegotiating.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 29

•

8 years ago

Attached patch Always reinitialize the receiver/jitterbuffer when reseting video decoding — Details — Splinter Review

Attachment #8770176 - Flags: review?(pkerr)

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

8 years ago

Assignee: drno → rjesup

Status: NEW → ASSIGNED

Paul Kerr [:pkerr]

Updated

•

8 years ago

Attachment #8770176 - Flags: review?(pkerr) → review+

Nils Ohlmeier [:drno]

Reporter

Comment 30

•

8 years ago

I manually did over 20 calls between my Mac and Linux machine with the patch applied on both sides and the calls were rock solid (where before I would have gotten black video with 5 calls max).

I'm still working on writing a mochitest to prevent regressions of this in the future.

Nils Ohlmeier [:drno]

Reporter

Updated

•

8 years ago

Summary: Black video rendering in calls on meet.jit.si → No video rendering after renegotiating recvonly video (e.g. on meet.jit.si)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 31

•

8 years ago

leave-open for the test

Keywords: leave-open

Pulsebot

Comment 32

•

8 years ago

Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/381f21c8b4c0
Always reinitialize the receiver/jitterbuffer when reseting video decoding r=pkerr

Nils Ohlmeier [:drno]

Reporter

Comment 33

•

8 years ago

Attached file Bug 1273652: mochi test to verify rendering happens after double negotiation. — Details

Review commit: https://reviewboard.mozilla.org/r/63800/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/63800/

Attachment #8770335 - Flags: review?(rjesup)

Nils Ohlmeier [:drno]

Reporter

Comment 34

•

8 years ago

I guess we need to wait until the actual fix has made its way into MC, before we can land the test (I'm assuming your patch is not going to show up on the mozreview autoland repo).

Carsten Book [:Tomcat]

Comment 35

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/381f21c8b4c0

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

8 years ago

Attachment #8770335 - Flags: review?(rjesup) → review+

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 36

•

8 years ago

Comment on attachment 8770335 [details]
Bug 1273652: mochi test to verify rendering happens after double negotiation.

https://reviewboard.mozilla.org/r/63800/#review61096

Nils Ohlmeier [:drno]

Reporter

Comment 37

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/4957d2668f55f0a7bb4dafb4b3be4e8fadee4197
Bug 1273652: mochi test to verify rendering happens after double negotiation. r=jesup

Nils Ohlmeier [:drno]

Reporter

Updated

•

8 years ago

Keywords: leave-open

Carsten Book [:Tomcat]

Comment 38

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4957d2668f55

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox50: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla50

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 39

•

8 years ago

Comment on attachment 8770176 [details] [diff] [review]
Always reinitialize the receiver/jitterbuffer when reseting video decoding

Approval Request Comment
[Feature/regressing bug #]: N/A (upstream bug)

[User impact if declined]: random video failure in some renegotiation situations depending on timing

[Describe test coverage new/current, TreeHerder]: Includes test (request is for both patches)

[Risks and why]: Very low risk
 
[String/UUID change made/needed]: none

Attachment #8770176 - Flags: approval-mozilla-aurora?

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

8 years ago

Attachment #8770335 - Flags: approval-mozilla-aurora?

Nils Ohlmeier [:drno]

Reporter

Updated

•

8 years ago

Comment 40

•

8 years ago

Comment on attachment 8770176 [details] [diff] [review]
Always reinitialize the receiver/jitterbuffer when reseting video decoding

Review of attachment 8770176 [details] [diff] [review]:
-----------------------------------------------------------------

This patch fixes some video failure in some renegotiation. Let's take it in aurora.

Attachment #8770176 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Gerry Chang [:gchang]

Comment 41

•

8 years ago

Comment on attachment 8770335 [details]
Bug 1273652: mochi test to verify rendering happens after double negotiation.

Add a test. Take it in aurora.

Attachment #8770335 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 42

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/876634f4b299
https://hg.mozilla.org/releases/mozilla-aurora/rev/c8ddb24aabed

status-firefox49: affected → fixed

You need to log in before you can comment on or make changes to this bug.