Closed
Bug 960296
Opened 10 years ago
Closed 10 years ago
Video freeze during WebRTC call
Categories
(Core :: WebRTC: Audio/Video, defect)
Tracking
()
VERIFIED
FIXED
mozilla29
Tracking | Status | |
---|---|---|
firefox27 | --- | unaffected |
firefox28 | --- | verified |
firefox29 | --- | verified |
People
(Reporter: drno, Assigned: bwc)
Details
Attachments
(2 files, 1 obsolete file)
270.80 KB,
application/x-bzip2
|
Details | |
2.96 KB,
patch
|
abr
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
While on a WebRTC call with mreavy the video in both directions kept freezing and not recovering. I was on a local build from 12/11/2013 (67ae84fa6807a57be78e6a237d146297861f2da1[67ae84fa6807]) and gathered the attached log files. nspr.log contains this line: 6057984[100330b20]: Couldn't send media on '2b4c6f46c597b17a/stream2/video' Stun/Turn stdout contains this around the same time: (generic/INFO) Buffers full (turn/WARNING) TURN(relay(IP4:10.250.7.48:56697|IP4:199.223.236.92:3478)): Failed sending send indication (turn/DEBUG) TURN(relay(IP4:10.250.7.48:56697|IP4:199.223.236.92:3478)): Send indication len=1131
Reporter | ||
Comment 1•10 years ago
|
||
Reporter | ||
Comment 2•10 years ago
|
||
Link to the stdout log file (because it is too big for anything I have access to): https://dl.dropboxusercontent.com/u/17831554/bz-960296-stdout.bz2
Reporter | ||
Comment 3•10 years ago
|
||
Some observations from grepping through the log files: - In the nspr log file the "SendPacket(...) succeeded" messages seem to cluster up (in the beginning you see ~2 of them grouped, right before the first "couldn't send media..." message you see 12 in one block), as if a send buffer got filled and then flushed - The stdout also contains this line (not sure what if it is relevant): 7934:(ice/ERR) ICE(PC:2b4c6f46c597b17a): peer (PC:2b4c6f46c597b17a:default), stream(2b4c6f46c597b17a/stream2/video) tried to trickle ICE in inappropriate state 4
Comment 4•10 years ago
|
||
I believe this is due to failing to update the capacity of the buffer when the TCP buffers drain. This means that once the buffer is full it appears to stay full and we can never write again. Attached is a very quick patch that seems to fix the problem in unit test (plus a test that exercises this). That said, I haven't double-checked the logic to make sure we are decrementing in the right place, so I'd like Byron to take a look.
Updated•10 years ago
|
Assignee: nobody → docfaraday
Assignee | ||
Comment 5•10 years ago
|
||
Looking at this now.
Assignee | ||
Comment 6•10 years ago
|
||
The logic around |pending| seems to be correct now. The decrement seems to be in the right place, and we never decrease it anywhere else in the code, since the only other place where we write is when we've been given a new buffer, and |pending| is zero.
Assignee | ||
Comment 7•10 years ago
|
||
How about we add an assert(!sock->pending) after the flush loop?
Assignee | ||
Comment 8•10 years ago
|
||
Assert that |pending| is zero after we've flushed all writes.
Assignee | ||
Updated•10 years ago
|
Attachment #8361139 -
Attachment is obsolete: true
Assignee | ||
Comment 9•10 years ago
|
||
Comment on attachment 8361210 [details] [diff] [review] Update buffered stun socket size when flushes happen Review of attachment 8361210 [details] [diff] [review]: ----------------------------------------------------------------- Added an assert, just sanity-checking that.
Attachment #8361210 -
Flags: review?(ekr)
Comment 10•10 years ago
|
||
Comment on attachment 8361210 [details] [diff] [review] Update buffered stun socket size when flushes happen Review of attachment 8361210 [details] [diff] [review]: ----------------------------------------------------------------- Let's not have me review my own code.
Attachment #8361210 -
Flags: review?(ekr) → review?(adam)
Comment 11•10 years ago
|
||
Comment on attachment 8361210 [details] [diff] [review] Update buffered stun socket size when flushes happen Review of attachment 8361210 [details] [diff] [review]: ----------------------------------------------------------------- LGTM
Attachment #8361210 -
Flags: review?(adam) → review+
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Comment 12•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/09dc812fa7db
Keywords: checkin-needed
Comment 13•10 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #3) > Some observations from grepping through the log files: > > - In the nspr log file the "SendPacket(...) succeeded" messages seem to > cluster up (in the beginning you see ~2 of them grouped, right before the > first "couldn't send media..." message you see 12 in one block), as if a > send buffer got filled and then flushed Note that video frames are often more than one packet (and thus clustered). And if there's a packet loss (or jitter buffer underflow) you can see iframes which are typically a bunch more packets, or retransmits of blocks of packets (you'll see those as using "mOther...." because of the split nature of the VideoConduits).
https://hg.mozilla.org/mozilla-central/rev/09dc812fa7db
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Updated•10 years ago
|
Comment 15•10 years ago
|
||
Comment on attachment 8361210 [details] [diff] [review] Update buffered stun socket size when flushes happen [Approval Request Comment] Bug caused by (feature/regressing bug #): Initial TURN TCP landing in 28 User impact if declined: Video will freeze randomly when using TURN TCP and not recover Testing completed (on m-c, etc.): on m-c since last week Risk to taking this patch (and alternatives if risky): virtually no risk. Includes a test String or IDL/UUID changes made by this patch: none
Attachment #8361210 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8361210 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 17•10 years ago
|
||
Nils, can you please verify this is fixed in Firefox 28 and 29?
Reporter | ||
Comment 18•10 years ago
|
||
By blocking UDP completely locally on my machine I forced a TURN TCP call. Result: The STUN log file from FF 29 shows that it now is able to recover from occasional "Buffers full" situation, meaning the log messages show up, but then later data gets send successfully to/over the same TCP TURN connection. BUT with FF 28 and 29 calls between two office machines still freeze after 10-15min. I'll try to take logs and packet dumps from both side to investigate if this is the same or another problem.
Flags: needinfo?(drno)
Reporter | ||
Comment 19•10 years ago
|
||
I traced another call in which the video froze after about 25-30mins. After analyzing the traces it looks like the call froze around that time, because the TCP TURN connection gets really confused about missing packets and retransmission. From that point on both Firefox clients are still sending data to the TURN server, but non of data gets relayed anymore. So it is not an obvious problem in Firefox. Throughout this call the video froze a couple of times for short periods of times (e.g. seconds), but with this fix in place it always recovered from this.
Status: RESOLVED → VERIFIED
Reporter | ||
Updated•10 years ago
|
Comment 20•10 years ago
|
||
Is this with apprtc? There is some other evidence that apprtc may only allocate TURN for around 30 minutes...
Reporter | ||
Comment 21•10 years ago
|
||
Yes this was with apprtc. No other evidence. I can repeat or if let me know which other platform I should test with.
You need to log in
before you can comment on or make changes to this bug.
Description
•