Closed
Bug 1337810
Opened 8 years ago
Closed 8 years ago
Jitsi video freezes after a few minutes of conversation
Categories
(Core :: WebRTC: Audio/Video, defect, P1)
Core
WebRTC: Audio/Video
Tracking
()
VERIFIED
FIXED
mozilla55
People
(Reporter: JuliaC, Assigned: jesup)
References
()
Details
(Keywords: regression, regressionwindow-wanted)
Attachments
(3 files, 2 obsolete files)
2.27 MB,
application/zip
|
Details | |
3.70 KB,
patch
|
drno
:
review+
gchang
:
approval-mozilla-aurora+
gchang
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
1.73 KB,
patch
|
drno
:
review+
ritu
:
approval-mozilla-esr52+
|
Details | Diff | Splinter Review |
[Note]:
- This issue affects both Firefox-Firefox calls and Chrome-Firefox calls
[Affected versions]:
- latest Nightly 54.0a1 (2017-02-08)
- latest Aurora 53.0a2 (2017-02-08)
- 52.0b4 build1 (20170206101855)
- 51.0.1 build3 (20170125094131)
[Affected platforms]:
- Windows 10 x64
- Windows 7 x64
- Ubuntu 14.04 x86
- Mac OS X 10.11
[Steps to reproduce]:
1. Launch Firefox
2. Go to https://meet.jit.si/, create a room, provide the camera and microphone permissions and join it
3. Join the same room from another station
4. Pay attention at the video/audio playback
[Expected result]:
- The call is successfully initiated for both participants
- The audio/video streams are properly rendered
[Actual result]:
- After a few minutes from the call initiation, video freezes on the host side for ~2-3 minutes
- Video can be often resumed by switching between self view and guest view
[Regression range]:
- It is difficult to find a precise regression range, considering that a manual search is required
- On 49.0a2 (2016-07-04) the issue is still reproducible, but the freeze lasts just several seconds
Reporter | ||
Updated•8 years ago
|
URL: https://meet.jit.si/
Comment 1•8 years ago
|
||
Nils -- I'd like you to investigate and reach out Emil when it makes sense. Since this reproduces with Fx 49, it's likely due to a recent change on Jitsi's end. Thanks!
Assignee: nobody → drno
Rank: 15
Priority: -- → P1
Comment 2•8 years ago
|
||
So Jitsi is not aware of any such reports or anything which they have deployed which could have caused this. I'll have a look.
Comment 3•8 years ago
|
||
I was able to repro the problem. Call initiator receiving video froze for at least 30s. Video on the other end froze also but only for 5s or so.
RTP stats on about:webrtc kept increasing through the freeze. So it's probably not a actual RTP transport issue itself. Maybe some RTCP problem?
Comment 4•8 years ago
|
||
Looking at some NSPR and webrtc.org log files I see that the first couple of seconds (in this case) everything looks normal. Then suddenly Firefox sends three RTCP on the receiving video stream. And from there on at least every second received video RTP packet gets "answered" by Firefox with an RTCP packet.
So that looks like Firefox is requesting some form of retransmission, but is not happy with what it gets from the SFU.
Comment 5•8 years ago
|
||
Looking at Wireshark trace with logs combined: a single missing RTP video packet appear to generate the picture freeze and RTCP storm.
Assignee | ||
Comment 6•8 years ago
|
||
I wonder if this is https://bugs.chromium.org/p/webrtc/issues/detail?id=7143
You could try commenting out bytes_sent += SendPadData(bytes - bytes_sent, false, 0, 0);
in rtp_sender.cc in TimeToSendPadding(), and replace it with "bytes_sent = bytes". Note: this is on the sending side...
Also: seeing the logs might be useful.
Comment 7•8 years ago
|
||
Yes I also wondered if bug 1339270 is the reason. Especially as I was only able to repro this with Nightly so far, but not with release (although missing a single packet is obviously not very deterministic way to repro something).
See Also: → 1339270
Comment 8•8 years ago
|
||
Verified that bug 1339270 is not causing this. Which would not have made too much sense as I test with a local build and release on the other end, but the video freezes on Nightly (which would be the one sending the padding in this case).
See Also: 1339270 →
Comment 9•8 years ago
|
||
Here are logs from such a frozen call; local buid; I shut down Firefox while the video was frozen.
If you grep for ac2f53d5-b310-49df-bd72-1f339bedec02 in the child log file you will see that up to '2017-02-14 18:30:05.808549 UTC' we receive lots of video RTP packets and only send out occasionally an RTCP packet. But after that timestamp Firefox starts to send a lot of RTCP. Unfortunately error log messages indicate why.
Updated•8 years ago
|
Comment 10•8 years ago
|
||
So I did verify today that Fx is sending out NACKs. But what the logging showed me is that it sends out the NACKs for some time and then stops sending NACKs even though the video is still frozen. Once the video recovers I saw NACK being send again.
Randel does that sound like a behavior to be expected?
Flags: needinfo?(rjesup)
Assignee | ||
Comment 11•8 years ago
|
||
At first though yes, I would expect it to stop trying to use NACK to recover after a very short time (<1sec), since after that it's cheaper/faster/better to just get an iframe (send PLI or FIR). I'll look into exactly what the trigger to stop sending NACKs is (I know when the queue of packets waiting for a NACK to unblock them overflows it will ask for an iframe)
Flags: needinfo?(rjesup)
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(rjesup)
Comment 12•8 years ago
|
||
So even more logging statements show that at the time of the video freeze Fx starts to switch from sending NACK's to sending out PLI's. But apparently as the video stays frozen it never receives an I frame.
Comment 13•8 years ago
|
||
So the new RTP clear text logger from Bug 1343640 tells me that the Fx on both sides apparently send PLI's, but never receive any PLI from the SFU.
One things which sticks out about the PLIs is that the sender SSRC in the receiver report and the PLI is not used in any of the streams. Although the source SSRC correctly points to the stream with the missing packets.
Interestingly I can reproduce similar problems with in our SFU room on appear.in. But I haven't looked at the clear text RTCP for that case.
Assignee | ||
Comment 14•8 years ago
|
||
This appears to fix the wrong-SSRC for receviers - give this a try. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea77d78b045337653d94a22b3268bd725eccee87
Attachment #8843546 -
Flags: review?(drno)
Assignee | ||
Updated•8 years ago
|
Assignee: drno → rjesup
Status: NEW → ASSIGNED
Comment 15•8 years ago
|
||
Thanks Randell. I'll give it a try on Monday to check if this makes a difference.
Assignee | ||
Comment 16•8 years ago
|
||
had some oranges on linux only; suspect it's uninitialized RTT values if we don't exchange RCTP before it's measured. Also added a debug
Attachment #8843602 -
Flags: review?(drno)
Assignee | ||
Updated•8 years ago
|
Attachment #8843546 -
Attachment is obsolete: true
Attachment #8843546 -
Flags: review?(drno)
Assignee | ||
Comment 17•8 years ago
|
||
Much happier with the last patch: https://treeherder.mozilla.org/#/jobs?repo=try&author=rjesup@wgate.com&selectedJob=81673271
Flags: needinfo?(rjesup)
Comment 18•8 years ago
|
||
Today I mostly did not get the video working reliably at all. As it turned out the problem was that when you use iptables to randomly drop UDP packets sometime NSPR will report that as a permission error, which then terminates the connection at the transportlayer.
With attachment #8843602 [details] [diff] [review] applied plus a hack to ignore the NSPR permission errors I no longer saw any video freezes. Just a few short hangs which quickly (within 1-2s) recovered.
I did not have the time to test again without attachment #8843602 [details] [diff] [review] to check if the combination fixed things ultimately. I'll try to do that soon, because last week the video freezes recovered after a long time, where todays video freezes never recovered (I gave up waiting after >= 30s).
Comment 19•8 years ago
|
||
Just verified that in the RTP logs from the run with all patches and hacks PLI's appear and stop after big video packets are send/received.
Updated•8 years ago
|
Attachment #8843602 -
Flags: review?(drno) → review+
Comment 20•8 years ago
|
||
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cfe4b63ed4f7
use sender SSRC for receivers r=drno
Comment 22•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Updated•8 years ago
|
Assignee | ||
Comment 24•8 years ago
|
||
Comment on attachment 8843602 [details] [diff] [review]
use sender SSRC for receivers
Approval Request Comment
[Feature/Bug causing the regression]: webrtc 49 update (FF53)
[User impact if declined]: Some webrtc conference servers may fail (i.e.. jitsi, a large provider)
[Is this code covered by automated tests?]: No; we don't care about this value in our code or check it (nor does chrome).
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: We verified the fix using jitsi (we were able to repro it easily, and could see the server not forwarding packets). QE could do the same tests we did, but I don't think that adds anything.
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: very simple change (which also fixes a timing hole that could make tests orange).
[String changes made/needed]: none
Flags: needinfo?(rjesup)
Attachment #8843602 -
Flags: approval-mozilla-beta?
Attachment #8843602 -
Flags: approval-mozilla-aurora?
Comment 25•8 years ago
|
||
Comment on attachment 8843602 [details] [diff] [review]
use sender SSRC for receivers
Fix a webrtc regression issue and was verified locally. Aurora54+ & Beta53+.
Attachment #8843602 -
Flags: approval-mozilla-beta?
Attachment #8843602 -
Flags: approval-mozilla-beta+
Attachment #8843602 -
Flags: approval-mozilla-aurora?
Attachment #8843602 -
Flags: approval-mozilla-aurora+
Comment 26•8 years ago
|
||
bugherder uplift |
Comment 27•8 years ago
|
||
bugherder uplift |
Comment 28•8 years ago
|
||
Iulia, could you please spot-check this on 53?
Flags: needinfo?(iulia.cristescu)
Reporter | ||
Comment 29•8 years ago
|
||
I still manage to reproduce the issue on Windows 10 x64, Ubuntu 16.04 x64 and Mac OS X 10.11.6, using 53.0b5 build1 (20170320143328). The bug is also still encountered on 54.0a2 (2017-03-21) and 55.0a1 (2017-03-21), but the freeze lasts only for about 1 minute.
Flags: needinfo?(iulia.cristescu)
Assignee | ||
Comment 30•8 years ago
|
||
Nils - please re-check ASAP on Nightly (and beta if nightly is working); check if the NACKs are getting through as well.
Status: RESOLVED → REOPENED
Flags: needinfo?(rjesup)
Resolution: FIXED → ---
Comment 31•8 years ago
|
||
Lulia can you get us a MOZ log file with Nightly with the following value:
MOZ_LOG=timestamp,signaling:5,jsep:5,rtplogger:5
It would be ideal if you can get us the logs from both sides of the same identical call. Note: the logging from above only works in Nightly.
Then I'll have a look at the RTP packets in Wireshark from the log. That should hopefully give us quickly an idea of what kind of freeze you actually see there.
Flags: needinfo?(iulia.cristescu)
Comment 32•8 years ago
|
||
I was able to repro short hangs in the video on Jitsi myself.
By looking at the new RTP logging I was able to find out that Firefox sends out NACK's for missing frames to the Jitsi server. These NACKs then appear to get forwarded to the sender. But the sender is using a completely different RTP sequence number range. So it is not surprising that it never sends the requested missing frame.
I contacted Jitsi and they already located and fixed the issue which caused the NACK's to be forwarded. The Jitsi server is suppose to terminate the NACKs and not forward them to the sender.
It is my understanding that they will also look into why their server is not responding to Firefox NACK's packets.
Flags: needinfo?(rjesup)
Reporter | ||
Comment 33•8 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #31)
> Lulia can you get us a MOZ log file with Nightly with the following value:
> MOZ_LOG=timestamp,signaling:5,jsep:5,rtplogger:5
>
> It would be ideal if you can get us the logs from both sides of the same
> identical call. Note: the logging from above only works in Nightly.
> Then I'll have a look at the RTP packets in Wireshark from the log. That
> should hopefully give us quickly an idea of what kind of freeze you actually
> see there.
I held calls between 2 Windows 10 x64 stations, using 55.0a1 (2017-03-24) and having
cd c:\
set MOZ_LOG=timestamp,signaling:5,jsep:5,rtplogger:5
set MOZ_LOG_FILE=%TEMP%\log.txt
cd "Program Files\Nightly"
.\firefox.exe
previously set.
Once I've reproduced the problem and exited Firefox, looked for the generated log file, but this was empty every time. Is there anything wrong in the above environment settings?
Flags: needinfo?(iulia.cristescu) → needinfo?(drno)
Comment 34•8 years ago
|
||
Sorry I forgot to mention that rtp logging only work with e10s off (because the sandboxed content process can't write to file any more).
I got confirmation from Jitsi that remaining NACK issue got fixed on their end. And I just verified that NACKs now work as expected again. The fix on Jitis side is currently on beta.meet.jit.si, so probably not yet deployed on the production environment. I can check with Jitsi when the fix will be available in production.
Closing the bug since this was not a Firefox issue, and it's working now again.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Flags: needinfo?(drno)
Resolution: --- → FIXED
Comment 35•8 years ago
|
||
Is this something we want to uplift to ESR52 as well?
Flags: needinfo?(rjesup)
Assignee | ||
Comment 36•8 years ago
|
||
I believe this is fallout from bug 1250356 - Webrtc 49 update, landed in FF 53
Blocks: 1250356
Flags: needinfo?(rjesup)
Assignee | ||
Comment 37•8 years ago
|
||
Or rather - the code changed in 53. It may be incorrect in 52 as well (and the initial report indicates this). If so, it will need a slightly different patch I suspect. Let's keep it on the radar for 52ESR, given the XP population ending up there
Flags: needinfo?(rjesup)
Comment 38•8 years ago
|
||
If we want this on ESR52, we should probably get this nominated for approval soon.
Assignee | ||
Comment 39•8 years ago
|
||
possible fix for this issue in 52 -- 52 is very different in this area, so the fix would need to be verified before uplift
Attachment #8855625 -
Flags: review?(drno)
Assignee | ||
Comment 40•8 years ago
|
||
Flags: needinfo?(rjesup)
Assignee | ||
Comment 41•8 years ago
|
||
Could you try the 52ESR Try run in comment 40? Thanks!
Flags: needinfo?(iulia.cristescu)
Comment 42•8 years ago
|
||
Comment on attachment 8855625 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC
Review of attachment 8855625 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM
::: media/webrtc/signaling/src/peerconnection/MediaPipelineFactory.cpp
@@ +826,5 @@
> + if (aTrackPair.mSending) {
> + auto ssrcs = &aTrackPair.mSending->GetSsrcs();
> + if (!ssrcs->empty()) {
> + conduit->SetLocalSSRC(ssrcs->front());
> + }
I think we should have MOZ_MTLOG() and return NS_ERROR_FAILURE for the else case here to prevent silent failure when not setting the local SSRC.
Attachment #8855625 -
Flags: review?(drno) → review+
Assignee | ||
Comment 43•8 years ago
|
||
Attachment #8855939 -
Flags: review?(drno)
Assignee | ||
Updated•8 years ago
|
Attachment #8855625 -
Attachment is obsolete: true
Comment 44•8 years ago
|
||
Comment on attachment 8855939 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC
Review of attachment 8855939 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM
Attachment #8855939 -
Flags: review?(drno) → review+
Assignee | ||
Comment 45•8 years ago
|
||
NI to nils to test this on Monday; once verified I'll put it up for esr52
Flags: needinfo?(drno)
Reporter | ||
Comment 46•8 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #41)
> Could you try the 52ESR Try run in comment 40? Thanks!
The issue is not reproducible anymore on the the 52ESR Try run in comment 40.
Flags: needinfo?(iulia.cristescu)
Reporter | ||
Comment 47•8 years ago
|
||
Also, verified 55.0a1 (2017-04-10), 54.0a2 (2017-04-10) and 53.0b9 (20170403072723) builds. Nightly and Aurora builds are verified fixed, but the issue is still reproducible on the beta build.
Reporter | ||
Updated•8 years ago
|
Comment 48•8 years ago
|
||
I also tried the ESR try build for 5 min call. I had some very small video freezes, but they recovered within 1-3 seconds. After about 4min or so on side the remote video got greyed out with a message that the video got turned of due to bandwidth limitations. But that sounds like a Jitsi issue to me. So the ESR build seems to work fine.
Flags: needinfo?(drno)
Comment 49•8 years ago
|
||
Please request ESR52 approval on this when you get a chance.
Flags: needinfo?(rjesup)
Assignee | ||
Comment 50•8 years ago
|
||
Comment on attachment 8855939 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Failure of a major WebRTC provider
User impact if declined: ESR users unable to use Jitsi (and maybe some others)
Fix Landed on Version: 55
Risk to taking this patch (and alternatives if risky): Very low risk, just changes where it gets the default SRRC value.
String or UUID changes made by this patch: none
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(rjesup)
Attachment #8855939 -
Flags: approval-mozilla-esr52?
Comment on attachment 8855939 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC
Video freezing isn't a good experience, the fix has been verified in a few places already, ESR52.2+
Attachment #8855939 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 52•8 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•