Closed Bug 922132 Opened 7 years ago Closed 5 years ago

Image freezes on callee side after a while

Categories

(Core :: WebRTC: Audio/Video, defect)

x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox25 --- unaffected
firefox26 --- affected

People

(Reporter: mihaelav, Unassigned)

References

()

Details

(Keywords: regression)

Attachments

(3 files)

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0

The image freezes on callee side in a 1:1 call with another user on Win XP machine.

Environment:
Callee (1st person in the conversation): Firefox 26.0a2/Win 7
Caller (2nd person in the conversation): Firefox 22.0/Win XP

Steps:
1. Create a conversation on https://apprtc.webrtc.org/ from a Win 7 machine using Firefox 26
2. Join the conversation created on step 1 from another Win XP machine using Firefox 22.0
3. Wait for a while (~15-20 seconds)

Expected result: Image and sound work fine on both ends of the conversation

Actual result: On the caller side, the callee's image freezes.

Regression window (for the Firefox 26.0a2/Win 7 - Firefox 22.0/Win XP case) :
Last good nightly: Sept 12, 2013
First bad nightly: Sept 13, 2013
Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a4e9c9c9dbf9&tochange=b9029b1de410

Note: The issue is also reproducible with the following environments:
1. Callee (1st person in the conversation): Firefox 26.0a2/Win XP
   Caller (2nd person in the conversation): Firefox 22.0/Win 7
2. Callee (1st person in the conversation): Firefox 26.0a2/Ubuntu 13.04
   Caller (2nd person in the conversation): Firefox 22.0/Win XP
(In reply to Eric Rescorla (:ekr) from comment #1)
> I notice this patch in the list:
> https://bugzilla.mozilla.org/show_bug.cgi?id=880067

I have suspicions about this patch, which I've discussed a bit with abr and ehugg - I don't think the testing done for this included verification that RTCPs (in particular PLI/etc) are (always) getting to the "right" channel and getting processed.  My looks at this as part of bug 864654 and also the bug on one-way video support (sendonly/recvonly) made me suspicious.  It's hard to follow the impact of all this in the abstract without digging in, or simply taking a webrtc_trace which will tell you quickly if things are working or not, at least in the normal case.  At the very least this seems to be a first-level thing to check, and in any case the logs will help debug what's happening here.  (A wireshark wouldn't hurt, but probably isn't necessary.)

NSPR_LOG_MODULES=timestamp,signaling:5,webrtc_trace=65535 NSPR_LOG_FILE=whatever WEBRTC_TRACE_FILE=whatever
Hi Mihaela, Can you we get retest and get logs for us per Randell's directions in Comment 2 (above)?  Thanks.
Flags: needinfo?(mihaela.velimiroviciu)
(In reply to Maire Reavy [:mreavy] from comment #3)
> Hi Mihaela, Can you we get retest and get logs for us per Randell's
> directions in Comment 2 (above)?  Thanks.

I set the system varibles mentioned in comment 2 (NSPR_LOG_MODULES=signaling:5,mtransport:5,webrtc_trace=65535,timestamp:1; NSPR_LOG_FILE=D:\log.log; WEBRTC_TRACE_FILE=D:\trace.log), but I get no logs when this bug occurs. 

(In reply to Mihaela Velimiroviciu [QA] (:mihaelav) from comment #0)
> Actual result: On the caller side, the callee's image freezes.

Correction: On the *callee* side, the *caller's* image freezes.
Flags: needinfo?(mihaela.velimiroviciu)
Mihaela - as with almost all requests for logs, you will need a debug build - though the webrtc_trace logs (I believe) and a few others work in non-debug builds.

NSPR_LOG_MODULES=signaling:5,mtransport:5,webrtc_trace=65535,timestamp  (no :number for the timestamp option in NSPR).

I'm surprised though that you would get *no* logs
Flags: needinfo?(mihaela.velimiroviciu)
(In reply to Randell Jesup [:jesup] from comment #5)
> Mihaela - as with almost all requests for logs, you will need a debug build
> - though the webrtc_trace logs (I believe) and a few others work in
> non-debug builds.
> 
> NSPR_LOG_MODULES=signaling:5,mtransport:5,webrtc_trace=65535,timestamp  (no
> :number for the timestamp option in NSPR).
> 
> I'm surprised though that you would get *no* logs

I updated NSPR_LOG_MODULES and used an Aurora debug build, but the logs&trace files are still not created.
Flags: needinfo?(mihaela.velimiroviciu)
Please reach out to me on IRC today (or anyone else at your location who can help).

signaling:5 logs should throw a ton of log messages if you make any sort of webrtc call, so something isn't set up right.

You need to get the environment variables set up right before running firefox; I suspect that's where the problem is.  

Nightly builds are available in debug form off of http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/

Follow the MDN/Wiki docs for setting up env vars on windows.  Example - From a command shell:
c:\>  set NSPR_LOG_MODULES=signaling:5
c:\>  set NSPR_LOG_FILE=d:\log.txt
c:\>  c:\firefox\firefox.exe
(or wherever you put it)
Flags: needinfo?(mihaela.velimiroviciu)
Attached file logs and trace
Flags: needinfo?(mihaela.velimiroviciu)
Time details of the freezes:
* The image froze after ~4 seconds after the connection between the 2 parts was made.
* The image unfroze after 1 min 42 seconds (since the connection was made)
* The image froze again after 1 min 44 seconds (since the connection was made) - ~2 seconds after it unfroze

Note: After ~2 minutes and 50 seconds (since it connected), the caller sees that the callee left the conversation (which didn't actually happen), while on the callee side the image remains frozen, the sound connection is closed, but the "Hang up" button still appears on the screen, as if the call is still ongoing. This usually doesn't happen on non-debug builds.
Which side was this log captured from?  also, my understanding was that you were using an aurora build - that was for the 26 ("caller") side, correct?  

Also, the info captured is a LOT less that I expected - what exactly were the NSPR_LOG_MODULES setting?  In particular, was webrtc_trace:65535 there (and was that the exact number)?  I just did a short call using a nightly debug, and got a 12MB log file; the one you sent was only 19KB
Also, does this only happen between 26 (or later) and 22?  Does it happen in 26->26 or 27->27 or even 26->24 (release)?  22 was the first rev with full webrtc support, and is now two revs out of date (in the release channel), though we should still be compatible with it.
(In reply to Randell Jesup [:jesup] from comment #10)
> Which side was this log captured from?  also, my understanding was that you
> were using an aurora build - that was for the 26 ("caller") side, correct?  
The logs were captured on the callee side (Aurora/Win 7, 64 bit)
> 
> Also, the info captured is a LOT less that I expected - what exactly were
> the NSPR_LOG_MODULES setting?  In particular, was webrtc_trace:65535 there
> (and was that the exact number)?  I just did a short call using a nightly
> debug, and got a 12MB log file; the one you sent was only 19KB
This is the exact value I used: NSPR_LOG_MODULES = signaling:5,mtransport:5,webrtc_trace=65535,timestamp

(In reply to Randell Jesup [:jesup] from comment #11)
> Also, does this only happen between 26 (or later) and 22?  Does it happen in
> 26->26 or 27->27 or even 26->24 (release)?  22 was the first rev with full
> webrtc support, and is now two revs out of date (in the release channel),
> though we should still be compatible with it.

I'll check that tomorrow, as I currently don't have access to the necessary machines .
(In reply to Mihaela Velimiroviciu [QA] (:mihaelav) from comment #12)
> (In reply to Randell Jesup [:jesup] from comment #10)
> > Which side was this log captured from?  also, my understanding was that you
> > were using an aurora build - that was for the 26 ("caller") side, correct?  
> The logs were captured on the callee side (Aurora/Win 7, 64 bit)

So, this is the side that *saw* the freeze, correct?  Best would be logs from both sides.  (debug builds if possible, though I think webrtc_trace works in opt builds)

> > 
> > Also, the info captured is a LOT less that I expected - what exactly were
> > the NSPR_LOG_MODULES setting?  In particular, was webrtc_trace:65535 there
> > (and was that the exact number)?  I just did a short call using a nightly
> > debug, and got a 12MB log file; the one you sent was only 19KB
> This is the exact value I used: NSPR_LOG_MODULES =
> signaling:5,mtransport:5,webrtc_trace=65535,timestamp

Something is clearly wrong: there are no timestamps in the nspr logs, for example (each line should start with a long timestamp).  I do see signaling and mtransport logs

I need to see if we're sending RTCP PLI/FIRs to ask for keyframes, and if we're receiving them.

> 
> (In reply to Randell Jesup [:jesup] from comment #11)
> > Also, does this only happen between 26 (or later) and 22?  Does it happen in
> > 26->26 or 27->27 or even 26->24 (release)?  22 was the first rev with full
> > webrtc support, and is now two revs out of date (in the release channel),
> > though we should still be compatible with it.
> 
> I'll check that tomorrow, as I currently don't have access to the necessary
> machines .

Thanks.
(In reply to Randell Jesup [:jesup] from comment #13)
> (In reply to Mihaela Velimiroviciu [QA] (:mihaelav) from comment #12)
> > (In reply to Randell Jesup [:jesup] from comment #10)
> > > Which side was this log captured from?  also, my understanding was that you
> > > were using an aurora build - that was for the 26 ("caller") side, correct?  
> > The logs were captured on the callee side (Aurora/Win 7, 64 bit)
> 
> So, this is the side that *saw* the freeze, correct?  Best would be logs
> from both sides.  (debug builds if possible, though I think webrtc_trace
> works in opt builds)

Yes, the callee saw the freeze and it's where the logs are from. I'm currently working on getting logs from both sides (Aurora 26 debug, 22.0 RC).

> > (In reply to Randell Jesup [:jesup] from comment #11)
> > > Also, does this only happen between 26 (or later) and 22?  Does it happen in
> > > 26->26 or 27->27 or even 26->24 (release)?  22 was the first rev with full
> > > webrtc support, and is now two revs out of date (in the release channel),
> > > though we should still be compatible with it.
> > 
> > I'll check that tomorrow, as I currently don't have access to the necessary
> > machines .
> 
> Thanks.

The issue is not reproducible on 26->24+ calls.
Attached file logs from both sides
Callee (1st person who joins) - Aurora 26 debug/Win 7, sees the freeze
Caller (2nd person who joins) - FF 22RC/Win XP

(T0) The callee creates the call 
(T1) After ~ 1 minute and 11 seconds (since T0), the caller joins.  
(T2) After ~4 seconds after the caller joins - T1 - (1 minute 15 seconds since the call was ctreated - T0), the callee sees that the caller's image froze. 
(T3) After ~ 2 minutes and 52 seconds since the call is created (T0), the image unfoze.
(T4) After ~ 3 minutes since T0, the image freezes again (on the same side - callee sees frozen caller's image)
(T5) After ~ 3 minutes and 5 seconds since T0, the image unfreezes and the call is interrupted
Reproduced the issue during our testing on Firefox 27 beta 7 using https://apprtc.webrtc.org/ on a call between:

Caller: Firefox 27.0b7 on Mac OSX 10.9
Callee: Firefox 22.0 on Windows 7
Randell, is there any further information Bogdan can provide?
Flags: needinfo?(rjesup)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #17)
> Randell, is there any further information Bogdan can provide?

Closing this bug as incomplete due to lack of response.
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(rjesup)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.