Closed Bug 1087068 Opened 10 years ago Closed 10 years ago

[Loop] Long latency from the call is established to the 1st remote video is rendered

Categories

(Firefox OS Graveyard :: Gaia::Loop, defect)

ARM
Gonk (Firefox OS)
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jaywang, Assigned: jaoo)

References

Details

(Whiteboard: [caf priority: p2][CR 733012], NPOTB, [mobile app][blocking][tef-triage][loop in v1.1])

Attachments

(6 files, 1 obsolete file)

Attached file MT_call.txt
steps to reproduce :
1. Make a VT call from Firefoxone app from UE1
2. Receive the call at UE2
3. UE2 screen shows "connecting"
4. UE2 starts the duration timer after call is established
5. UE2 doesn't shows the remote video after 5+secs.

It looks like the device that receives call has the longer delay (5+secs). The device that originated the call has less delay (3+secs).

Maria,
When does loop mobile client app move from "connecting" screen to in-call screen? Is is triggered by "onAddStream" event?

Randell,
Is normal to see 5+secs latency from "onAddStream" to the 1st frame is received?
I see 5+secs delay from the CSF log (MT_call.txt)
10-21 22:43:54.939 15728 15728 I PRLog   : 2014-10-21 22:43:54.939851 UTC - -1225002668[b6b48080]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:286: Returning success for OnAddStream()
10-21 22:44:00.959 15728 16758 I PRLog   : 2014-10-21 22:44:00.962782 UTC - 25716720[abf82b00]: [DecodingThread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 160 bytes (NAL 0x41), time 363478933 (32713104), flags 0x0
Flags: needinfo?(rjesup)
Flags: needinfo?(oteo)
Attached file MO_call.txt
Also, attach the MO call CSF log for the comparison.

10-21 22:42:18.809 14100 14100 I PRLog   : 2014-10-21 22:42:18.812292 UTC - -1225002668[b6b48080]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:286: Returning success for OnAddStream()

10-21 22:42:21.339 14100 16040 I PRLog   : 2014-10-21 22:42:21.346729 UTC - 25723056[adcd2d80]: [DecodingThread|WebrtcOMXH264VideoCodec] WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 851 bytes (NAL 0x41), time 165649100 (14908419), flags 0x0
Whiteboard: [CR 733012]
Whiteboard: [CR 733012] → [caf priority: p2][CR 733012]
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.1?
(In reply to Jay from comment #0)
> Created attachment 8509119 [details]
> MT_call.txt
> 
> steps to reproduce :
> 1. Make a VT call from Firefoxone app from UE1
> 2. Receive the call at UE2
> 3. UE2 screen shows "connecting"
> 4. UE2 starts the duration timer after call is established
> 5. UE2 doesn't shows the remote video after 5+secs.
> 
> It looks like the device that receives call has the longer delay (5+secs).
> The device that originated the call has less delay (3+secs).
> 
> Maria,
> When does loop mobile client app move from "connecting" screen to in-call
> screen? Is is triggered by "onAddStream" event?
> 
> Randell,
> Is normal to see 5+secs latency from "onAddStream" to the 1st frame is
> received?
> I see 5+secs delay from the CSF log (MT_call.txt)
> 10-21 22:43:54.939 15728 15728 I PRLog   : 2014-10-21 22:43:54.939851 UTC -
> -1225002668[b6b48080]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:286:
> Returning success for OnAddStream()
> 10-21 22:44:00.959 15728 16758 I PRLog   : 2014-10-21 22:44:00.962782 UTC -
> 25716720[abf82b00]: [DecodingThread|WebrtcOMXH264VideoCodec]
> WebrtcOMXH264VideoCodec.cpp:403: Decoder input: 160 bytes (NAL 0x41), time
> 363478933 (32713104), flags 0x0

Well, in MT_call.txt I see 2 seconds from VideoConduit create to the first video packet being sent.

A possible cause is the extreme number of NotifyQueuedTrackChanges calls to MediaPipeline(!) (on the order of 700/1000 per sec).  I tried a local (opt) build with MediaPipeline:6 in a simpler example; I see 3 bursts of 4 callbacks each every ~66ms.  (At a quick look.)  Total callbacks is ~150/s, way less than you're seeing.  This is a simple loopback test; I'll have to try again in the app.

Are these debug (+opt) builds? 

Another possibility is related to bug 1087605 (priority of the main thread and all the webrtc threads).
Flags: needinfo?(rjesup)
Might this issue be caused by the high CPU consumption? We are working actively with TB to reduce the Loop consumption by tweaking the TB library, maybe that could also improve this latency. We will let you know as soon as we make any progress with that. 

Additionally, you can add some extra logs to the app to check the times spent in any step of the set-up call. You can do it by changing the following parameters in the config.js file to true:
 
  performanceLog: {
    enabled: false,
    persistent: false
  }

If you do that, for every call made a file will be created in the device SDCard storing the time spent in every step, you have an example in https://bug1064429.bugzilla.mozilla.org/attachment.cgi?id=8485882. I hope this helps.
Flags: needinfo?(oteo)
Hi Jay, is it possible to get the issue with the additional logs Maria mentioned to see if what they are working on would address this issue?
Flags: needinfo?(jaywang)
(In reply to Maria Angeles Oteo (:oteo) from comment #4)
> Might this issue be caused by the high CPU consumption? We are working
> actively with TB to reduce the Loop consumption by tweaking the TB library,
> maybe that could also improve this latency. We will let you know as soon as
> we make any progress with that. 
> 
> Additionally, you can add some extra logs to the app to check the times
> spent in any step of the set-up call. You can do it by changing the
> following parameters in the config.js file to true:
>  
>   performanceLog: {
>     enabled: false,
>     persistent: false
>   }
> 
> If you do that, for every call made a file will be created in the device
> SDCard storing the time spent in every step, you have an example in
> https://bug1064429.bugzilla.mozilla.org/attachment.cgi?id=8485882. I hope
> this helps.

I don't think it's caused by CPU consumption since I am using 8x26 device which has more cores than 8x10. Let me try to collect the additional logs that you suggested.
Flags: needinfo?(jaywang)
Attached file has the log during MT call. I visually saw around 5-sec delay from the point that counter starts to the point that the 1st video is rendered.

Maria,
Can you help to check the log if you see that latency from the log? Also, what's the typical delay that you are seen with your setup?
Flags: needinfo?(oteo)
Can you retest with the thread priority patch from bug 1087605?  It may not make a difference, but it's certainly worth a try.
Flags: needinfo?(jaywang)
Required for commercialization per whiteboard -> blocking+.
blocking-b2g: 2.1? → 2.1+
(In reply to Randell Jesup [:jesup] from comment #8)
> Can you retest with the thread priority patch from bug 1087605?  It may not
> make a difference, but it's certainly worth a try.

Tried it but doesn't make much differences.
Flags: needinfo?(jaywang)
Attached patch 1087068.patch (obsolete) — Splinter Review
I've been poking around and have ended up with a patch that let's know there's something fishy here. Let me explain myself. The thing is the Loop app (loop from now on) tells the Tokbox lib (tokbox from now on) where to add the media video element in which tokbox will play the remote video. Somehow tokbox fires an event 'loaded' when it has added the media video element to the DOM. Loop listens to that event and changes the UI from 'connecting' to 'connected' state. Once the UI changes to 'connected' state the video is not played yet and it takes a few seconds more to see the video (this the delay this bug is about AFAIK, correct me if I'm wrong). Well, in this patch I got a reference to the media video element once tokbox loads it on the DOM and set a listener to the 'playing' event on the media video element and I see this event is not fired even when the video is playing. I gave another try with the 'canplay' event and this is fired but by the time it is fired the video is not playing either.

Randell, could you shed some light here please? Why is not the 'playing' event fired? That would help us to postpone the change in the UI from 'connecting' to 'connected' state at least.
Flags: needinfo?(rjesup)
Any comments on 'playing' not firing on <video>'s playing streams from webrtc?  (Not sure if the tokbox lib is using mozSrcObject (I doubt it) or URL.createObjectURL(stream) (likely)).
Flags: needinfo?(roc)
Flags: needinfo?(rjesup)
Flags: needinfo?(jwwang)
(In reply to Randell Jesup [:jesup] from comment #12)
> Any comments on 'playing' not firing on <video>'s playing streams from
> webrtc?  (Not sure if the tokbox lib is using mozSrcObject (I doubt it) or
> URL.createObjectURL(stream) (likely)).

Tokbox lib is using mozSrcObject as per [1]. Thanks!

[1] https://mxr.mozilla.org/mozilla-central/source/browser/components/loop/content/shared/libs/sdk.js#6882
'playing' is fired when readyState >= HAVE_FUTURE_DATA and 'paused' is false. For the stream case, readyState will become HAVE_ENOUGH_DATA once the stream is not blocked. Per comment 11, since the video is playing, the stream shouldn't be blocked. Btw, 'canplay' is fired when readyState reaches HAVE_FUTURE_DATA no matter 'paused' is true or false.

Therefore, once you see readyState >= HAVE_FUTURE_DATA and 'paused' is false, 'playing' should've been fired.
Flags: needinfo?(jwwang)
This is possibly related to https://bugzilla.mozilla.org/show_bug.cgi?id=926753

Right now we're listening for 'loadedmetadata' on the Video element before triggering the completion handler for the session.subscribe method. It happens that this fires earlier on Firefox than it does on Chrome. On Chrome we have the videoWidth and videoHeight defined but in Firefox we don't. So it actually fires a lot earlier in Firefox.

That being said I would also be happy if 'playing' fired on the Video element so we could use that.
(In reply to Adam Ullman from comment #15)

> That being said I would also be happy if 'playing' fired on the Video
> element so we could use that.

Sadly I have not seen the 'playing' event being fired when it is supposed to be. I'm wondering what value the property 'paused' has by the time the 'canplay' event is fired. If it is false, and it keeps false all the time that would explain why the 'playing' event is not fired. Am I right? I'll poke around and keep you updated.
I tried some experiments with webrtc-landing/pc_test.html, adding an onplay that simply tells me it was called.

In every case I tried, it was called for <video>s connected to remote sources, either with mozSrcObject = stream or src = URL.createObjectURL(stream), calling play() after attaching the source, or using autoplay.

What's the sequence being used to set up the <video>?  and can you add an onplay at <video> creation time to see when/if it does fire?
Flags: needinfo?(oteo)
Attached patch 1087068.patchSplinter Review
Well, I tried almost all the suggestions and I ended up with a dirty trick that solves the problem from the user perception point of view (basically to set the connected UI when the user see the video). User doesn't notice the delay.

I tried Randell's suggestion, which was to adding a 'onplay' event handler instead of the 'onloadedmetadata' TB is currently using. Sadly there is no different, the event is fired and video needs a few seconds more to show up.

The dirty trick which solves the issue is to check whether the mozPaintedFrames property. That works and this is a solution so the user doesn't notice the delay but the delay is still here.

Have a look at the patch guys!
Attachment #8514066 - Attachment is obsolete: true
can either of you see if the patch Jose Antonio put up in comment 18 would resolved the user perception of delay?  it sounds like there is still investigation to do - but does this improve the experience enough for 2.1?
Flags: needinfo?(rjesup)
Flags: needinfo?(jwwang)
Flags: needinfo?(jwwang) → needinfo?(jaywang)
(In reply to sescalante from comment #19)
> can either of you see if the patch Jose Antonio put up in comment 18 would
> resolved the user perception of delay?  it sounds like there is still
> investigation to do - but does this improve the experience enough for 2.1?

I just check the Jose's patch and it did help to improve the perception delay. We should get this mainlined.
Flags: needinfo?(jaywang)
Comment on attachment 8514990 [details] [diff] [review]
1087068.patch

Review of attachment 8514990 [details] [diff] [review]:
-----------------------------------------------------------------

::: app/call_screen/js/call_manager.js
@@ +421,5 @@
>                _perfDebug && PerfLog.log(_perfBranch,
>                  'Received "loaded" event from remote stream');
> +              // CallScreenUI.setCallStatus('connected');
> +
> +              var remoteVideoContainer = document.getElementById('fullscreen-video');

comment why this has been commented out/changed

@@ +431,5 @@
> +              var internal = window.setInterval(function() {
> +                if (remoteVideoElement.mozPaintedFrames == 0) {
> +                  return;
> +                }
> +                CallScreenUI.setCallStatus('connected');

Only use this hack if it's an audio + video call!
And: you should have a timeout to set connected even if you can't get a frame, in case there's something blocking video.

@@ +433,5 @@
> +                  return;
> +                }
> +                CallScreenUI.setCallStatus('connected');
> +                window.clearInterval(internal);
> +              }, TIME_INTERVAL_SECONDS * 1000);

Perhaps check more often than once per N seconds?

::: app/js/config.js
@@ +1,3 @@
>  Config = {
>    version: '1.1',
> +  debug: true,

don't check this in, I think

::: app/libs/opentok.js
@@ +26,5 @@
>      }
>    };
>  
> +  // OT.setLogLevel(Config.debug ? OT.DEBUG : OT.WARN);
> +  OT.setLogLevel(OT.NONE);

perhaps this should be redone or not done?

::: app/libs/tokbox/v2.2.9.1/js/TB.js
@@ +6826,5 @@
>  
>        cleanup = function cleanup () {
>          clearTimeout(timeout);
> +        // videoElement.removeEventListener('loadedmetadata', onLoad, false);
> +        videoElement.removeEventListener('play', onLoad, false);

comment why

@@ +6869,5 @@
>          }
>        }, this), 30000);
>  
> +      // videoElement.addEventListener('loadedmetadata', onLoad, false);
> +      videoElement.addEventListener('play', onLoad, false);

comment why
Attachment #8514990 - Flags: feedback+
Assignee: nobody → josea.olivera
NI jose, can we please get this ready for review and land this ASAP given the feedback+ ?
Flags: needinfo?(josea.olivera)
(In reply to bhavana bajaj [:bajaj] from comment #22)
> NI jose, can we please get this ready for review and land this ASAP given
> the feedback+ ?

shoot, just saw hose is on PTO :( jesup/oteo anyone else who can help here?
Flags: needinfo?(oteo)
(In reply to bhavana bajaj [:bajaj] from comment #23)
> (In reply to bhavana bajaj [:bajaj] from comment #22)
> > NI jose, can we please get this ready for review and land this ASAP given
> > the feedback+ ?
> 
> shoot, just saw hose is on PTO :( jesup/oteo anyone else who can help here?

josea*
I can look, but I don't know the answers to some of the questions I asked here (or know the code he's modifying, which is why I feedback+'d it
Flags: needinfo?(rjesup)
no worries, JA will finish it although he's on PTO :(
In any case, this is a fix in Loop Mobile Client, it does not require any change on gecko/gaia side so the  2.1+ can be removed. 
As soon as the patch is ready we will update our Loop Mobile client with this fix.

Bhavana, be free to clear the 2.1+.
Flags: needinfo?(oteo) → needinfo?(bbajaj)
Flags: needinfo?(josea.olivera)
(In reply to Maria Angeles Oteo (:oteo) from comment #26)
> no worries, JA will finish it although he's on PTO :(
> In any case, this is a fix in Loop Mobile Client, it does not require any
> change on gecko/gaia side so the  2.1+ can be removed. 
> As soon as the patch is ready we will update our Loop Mobile client with
> this fix.
> 
> Bhavana, be free to clear the 2.1+.

Thanks!
Flags: needinfo?(bbajaj)
blocking-b2g: 2.1+ → ---
Whiteboard: [caf priority: p2][CR 733012] → [caf priority: p2][CR 733012], NPOTB
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
Bhavana, I am going to reopen the bug, it requires a patch in loop mobile client (already under revision) and we will use this bug to upload the fix in the repository for Loop Mobile client in master: https://github.com/mozilla-b2g/firefoxos-loop-client
Severity: normal → major
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Whiteboard: [caf priority: p2][CR 733012], NPOTB → [caf priority: p2][CR 733012], NPOTB, [mobile app][blocking][tef-triage]
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → WORKSFORME
Given the fix landed on the client directly, marking this as resolved WFM
(In reply to bhavana bajaj [:bajaj] from comment #30)
> Given the fix landed on the client directly, marking this as resolved WFM

The fix is not landed yet. Let's review this as it is ready now.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Attachment #8521670 - Flags: review?(borja.bugzilla)
Comment on attachment 8521670 [details] [review]
Pointer to Github PR https://github.com/mozilla-b2g/firefoxos-loop-client/pull/300

I've tried the patch, and in the case of a video call this could be a good workaround, but it's not if the call is an 'audio-call'. In that case (If A calls B and B answers with audio only), the delay is about 5 seconds (actually you can listen the other peer and in the screen you are seeing the word 'connecting').

I would suggest to keep the solution for the video, and restore the previous behaviour for the audio one.

Let me know when ready and I'll review this again! Thanks!
Attachment #8521670 - Flags: review?(borja.bugzilla) → feedback+
(In reply to Borja Salguero [:borjasalguero] from comment #32)
> Comment on attachment 8521670 [details] [review]
> Pointer to Github PR
> https://github.com/mozilla-b2g/firefoxos-loop-client/pull/300
> 
> I've tried the patch, and in the case of a video call this could be a good
> workaround, but it's not if the call is an 'audio-call'. In that case (If A
> calls B and B answers with audio only), the delay is about 5 seconds
> (actually you can listen the other peer and in the screen you are seeing the
> word 'connecting').
> 
> I would suggest to keep the solution for the video, and restore the previous
> behaviour for the audio one.
> 
> Let me know when ready and I'll review this again! Thanks!

The patch also works for audio calls as mozpaintedframes is also >0 when an audio call is established. We are not sure why this is happening. An explanation could be that Tokbox always uses a video element (even for audio calls) and some UI controls might be refreshed (but hidden), which would make the painted frames to move above 0. Adam, do you think this could be the case?

With respect to the delay Borja mentions, I've tested the patch this morning together with him, and the delay seemed to be related to the long polling time (1 sec) that was used for checking if both devices were really connected, by reducing this polling time to 100 msec, in some cases the audio could be heard a bit before the 'connecting' message disappears, but I think this change is worth it.

Unfortunately, using the events in the video element is not working as all the events I have tried (loadedData, canPlay, playing, canPlayThrough are received in a timeframe of ~10msecs and it takes around ~3 secs more to start seeing anything in the screen.

I've created a branch [1] in which I have enabled performance logging and including some new logs for the video element events and reduce the polling time to 100msecs. If you are going to use it, install with the performanceLog enabled: $ grunt build --performanceLog=persistent

[1] https://github.com/dcoloma/firefoxos-loop-client/tree/Bug1087068-followup
Flags: needinfo?(adam)
Performance Logs
(In reply to Borja Salguero [:borjasalguero] from comment #32)
> Comment on attachment 8521670 [details] [review]
> Pointer to Github PR
> https://github.com/mozilla-b2g/firefoxos-loop-client/pull/300

> Let me know when ready and I'll review this again! Thanks!

Borja, nits addressed but let's pause a bit until we have feedback from our friends at Tokbox.

(In reply to Daniel Coloma:dcoloma from comment #33)
> (In reply to Borja Salguero [:borjasalguero] from comment #32)

> The patch also works for audio calls as mozpaintedframes is also >0 when an
> audio call is established.

Yeah, that's the reason I'm keeping this for both audio and video calls.

> We are not sure why this is happening. An
> explanation could be that Tokbox always uses a video element (even for audio
> calls) and some UI controls might be refreshed (but hidden), which would
> make the painted frames to move above 0. Adam, do you think this could be
> the case?

Lets wait for Adam's feedback here before landing this. Requesting ni? at Song as he might be able to help here.
Flags: needinfo?(mozilla-support)
Sorry for the late response guys. I'm not sure why mozPaintedFrames would be working. My guess is it's because there is still a video track it's just been disabled and possibly some empty frames are still being sent? I've seen that in some instances with disabled video tracks vs. no video track. I'm not sure though, I'll ask around at TokBox and see if anyone has better ideas and get back to you.

Do you guys want us to change any code on our side? It looks like you were trying to replace the 'loadedmetadata' event with 'play'? Would it be helpful if we had some other event in addition to 'loaded'? eg. 'playing' that's dispatched from the Subscriber when the video is actually displaying frames? We could use your same mozPaintedFrames hack. Alternatively 'loadedmetadata' could just fire later ;)
Flags: needinfo?(adam)
(In reply to Adam Ullman from comment #36)
> Do you guys want us to change any code on our side? It looks like you were
> trying to replace the 'loadedmetadata' event with 'play'?

We tried with the 'play' event as well. Sadly there wasn't any improvement.

> Would it be
> helpful if we had some other event in addition to 'loaded'? eg. 'playing'
> that's dispatched from the Subscriber when the video is actually displaying
> frames?

That would be great for sure. Could we land the hack we have in the meantime? Once you added this to the lib we could update the lib and delete the hack.

> We could use your same mozPaintedFrames hack.

Feel free!
Flags: needinfo?(adam)
Comment on attachment 8521670 [details] [review]
Pointer to Github PR https://github.com/mozilla-b2g/firefoxos-loop-client/pull/300

Let's review this so we could land it. Do you mind Borja? Thanks!
Attachment #8521670 - Flags: review?(borja.bugzilla)
Comment on attachment 8521670 [details] [review]
Pointer to Github PR https://github.com/mozilla-b2g/firefoxos-loop-client/pull/300

Small comments to address before merging. Overall we need to open a bug in order to investigate the issue and remove the workaround when it will be fixed. Thanks!
Attachment #8521670 - Flags: review?(borja.bugzilla) → review+
Blocks: 1036490
See Also: → 1105707
Landed on master branch at:

https://github.com/mozilla-b2g/firefoxos-loop-client/commit/b129be95682d7c2f751cd93a2bd8c53c525cc075
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Whiteboard: [caf priority: p2][CR 733012], NPOTB, [mobile app][blocking][tef-triage] → [caf priority: p2][CR 733012], NPOTB, [mobile app][blocking][tef-triage][loop approved for 1.1]
It looks like we have a bug where if you subscribeToVideo(false) in P2P mode the other party still continues to send video frames. Under the covers subscribeToVideo(false) just mutes the video track on the receiving end. It doesn't message the other end and tell them to stop sending video. So this could explain why you are still seeing frames even though there is no video being displayed. 

How are you guys doing audio calls? Are you doing it only on the Subscriber side using subscribeToVideo(false) or also on the publishing side using publishVideo(false)?
Flags: needinfo?(adam)
(In reply to Adam Ullman from comment #41)
> It looks like we have a bug where if you subscribeToVideo(false) in P2P mode
> the other party still continues to send video frames. Under the covers
> subscribeToVideo(false) just mutes the video track on the receiving end. It
> doesn't message the other end and tell them to stop sending video. So this
> could explain why you are still seeing frames even though there is no video
> being displayed. 

Thanks for digging into this.
 
> How are you guys doing audio calls? Are you doing it only on the Subscriber
> side using subscribeToVideo(false) or also on the publishing side using
> publishVideo(false)?

We just disable video on the publishing side (see [1] and [2] please) so it seems there's something fishy here as we stop sending video frames the receiving side shouldn't receive any video.

[1] https://github.com/mozilla-b2g/firefoxos-loop-client/blob/master/app/call_screen/js/call_manager.js#L543
[2] https://github.com/mozilla-b2g/firefoxos-loop-client/blob/master/app/call_screen/js/call_manager.js#L280-L288
Landed on 1.1 branch at:

https://github.com/mozilla-b2g/firefoxos-loop-client/commit/e7483a7310fedb14b0cd0a645508f8ee8986053d
Whiteboard: [caf priority: p2][CR 733012], NPOTB, [mobile app][blocking][tef-triage][loop approved for 1.1] → [caf priority: p2][CR 733012], NPOTB, [mobile app][blocking][tef-triage][loop in v1.1]
Flags: needinfo?(mozilla-support)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: