Closed Bug 1122486 Opened 5 years ago Closed 5 years ago

Hello calls frequently fail to complete video connection between two users

Categories

(Hello (Loop) :: Client, defect, P1)

defect
Points:
3

Tracking

(firefox35 wontfix, firefox36+ fixed, firefox37+ fixed, firefox38+ fixed)

RESOLVED FIXED
mozilla38
Iteration:
38.2 - 9 Feb
Tracking Status
firefox35 --- wontfix
firefox36 + fixed
firefox37 + fixed
firefox38 + fixed
Blocking Flags:
backlog Fx38+

People

(Reporter: standard8, Assigned: standard8)

References

Details

Attachments

(4 files, 3 obsolete files)

I've seen this in the past, but I've been seeing it again whilst testing locally with local running loop servers. What happens is:

1) Generate a link clicker url
2) Open it in a second browser
3) Start the call
4) Accept the media prompt
5) Accept the incoming call on the first browser

Expected Results

=> Call sets up with two-way video

Actual Results

=> Call sets up with video going one-way only, even though both local videos are shown.
=> After a few seconds, the call aborts and says "Something went wrong" on the callee end, and "Your conversation has ended" on the end being called.

I think if we can work out what this is, it might provide some insight into the high levels of timeouts in bug 1119574.
Ok, I've done some investigation into this. In the past I've seen it with calls, but now we're improving the stability of our functional tests with rooms - I'm also seeing it there as well.

The characteristics:

- Seems to be random, possibly up to 30% of the time on average (I get it frequently when running tests on my machine with debug build, may vary on different setups etc).
- No remote video element gets created/displayed (we just get a white background).
- I'm running a local setup, i.e. loop-server and standalone servers are local, TokBox using dev keys

Looking at the logs:

- The SDK is producing the following error (version 2.2.9.1):

TypeError: stream.connection is undefined sdk.js:20382:0

- I'm not seeing any notifications from the sdk that the remote stream has been received - this seems to tie in with the TypeError, as the streamCreatedHandler is throwing before it would send the notification to the Loop code.



Next I'm going to test this with the latest sdk, and check the client flow.
Assignee: nobody → standard8
Iteration: --- → 38.1 - 26 Jan
Points: --- → 3
Further investigation has shown that using version 2.4.0 of the sdk seems to fix the error - we want to update to that in bug 1093780.

However, that's quite a big update, so I took a look at 2.2.9.3 (also originally provided on bug 1093780). This seems to have had an explicit fix for something like the issue we're seeing - and tests reveal it to be fine as well.

Unfortunately 2.2.9.3 has a bug with the audio level display meter (it is shown, when it shouldn't be) - but I've worked up a patch for that.

So I think what we should do is:

- Get 2.2.9.3 on Aurora & Beta for testing to go out in the next releases.
- Get 2.4.0 on Nightly alongside the screensharing code and have it under testing as we go.

We may/may not land 2.2.9.3 on nightly as an intermediate.
This upgrades us to 2.2.9.3 - I've tested it locally, and I've been running it through the functional tests lots - these were intermittently failing for me with this bug with the existing 2.2.9.3 code.

There's a separate patch coming for the audio display level meter fix, that I want to record separately.
Attachment #8551300 - Flags: review?(mdeboer)
This fixes the 2.2.9.3 of the sdk to not show the audioLevelDisplayMeter all the time. It is based on code I found in 2.4.0.

Is TokBox happy for us to ship this patched version for two cycles, whilst we deploy and test 2.4.0? (its a smaller patch, which means we're more likely to get approval on our beta train for the next release).
Attachment #8551301 - Flags: review?(msander)
Attachment #8551301 - Flags: review?(aoberoi)
[Tracking Requested - why for this release]: This is one possible cause of a lot of failures that we've been seeing for users. Its intermittent, so doesn't always show up on all environments, but given we know we have a higher than we'd like failure count, then we should fix be fixing this.

I'll nominate the fix once its got reviews & is ready for landing.
Attached file webrtc-js-2.2.9.6.tgz (obsolete) —
(In reply to Mark Banner (:standard8) from comment #4)
attachment 8551994 [details] has a slightly newer version of our JS SDK with a fix for the audio levels display bug. Could you give that one a try?
(In reply to msander from comment #7)
> (In reply to Mark Banner (:standard8) from comment #4)
> attachment 8551994 [details] has a slightly newer version of our JS SDK with
> a fix for the audio levels display bug. Could you give that one a try?

I just tested this, and it seems to still display the audio versions. The actions I'm trying here are: Join a room, mute video. At which stage the audio level is still displayed.
Mike: I've just checked the diffs between 2.2.9.3 and 2.2.9.6 and I can't see anything in there that would change the audio level display issue. Am I missing something or did I get the wrong version?
Flags: needinfo?(msander)
Mark: I see the bug in 2.2.9.1 but not 2.2.9.3 or 2.2.9.6. Ankur is going to take a look.
Flags: needinfo?(msander) → needinfo?(aoberoi)
backlog: --- → Fx38+
Priority: -- → P1
audioLevelDisplayMode works as expected.
Flags: needinfo?(aoberoi)
actually, i was wrong. it works in chrome but not in firefox.
(In reply to aoberoi from comment #11)
> Created attachment 8554042 [details]
> loop-opentok-testbed.zip
> 
> audioLevelDisplayMode works as expected.

Using FirefoxNightly (38), and Chrome release (40), I see the audio level indicator in both of them.
@standard8, you see in in chrome when testing with the attached sample code?
(In reply to aoberoi from comment #14)
> @standard8, you see in in chrome when testing with the attached sample code?

Yes, using the 2.2.9.6 version (I unpacked and ran it locally on Mac) - you have to talk to see the green indicator come into the corner.
Attached file webrtc-js-2.2.9.7.tgz
Here is a patch for the 2.2.9 SDK with a fix for the audio levels ui.
The new 2.2.9.7 version does fix the audio level indicator regression that 2.2.9.3/6 both had.

I've also run this through the functional tests a few times (with a couple of additional patches) and it has run successfully with none of the intermittent failing to connect that I was seeing before.
Attachment #8551300 - Attachment is obsolete: true
Attachment #8551301 - Attachment is obsolete: true
Attachment #8551300 - Flags: review?(mdeboer)
Attachment #8551301 - Flags: review?(msander)
Attachment #8551301 - Flags: review?(aoberoi)
Attachment #8555108 - Flags: review?(nperriault)
Attachment #8551994 - Attachment is obsolete: true
Iteration: 38.1 - 26 Jan → 38.2 - 9 Feb
Comment on attachment 8555108 [details] [diff] [review]
Upgrade Loop's use of Tokbox SDK 2.2.9.7 to fix issues with calls and rooms intermitently failing to connect.

Review of attachment 8555108 [details] [diff] [review]:
-----------------------------------------------------------------

Built latest fx-team with this patch and the one for bug 1126036 applied, ran a full call, worked smoothly. Looks good to me.
Attachment #8555108 - Flags: review?(nperriault) → review+
Blocks: 1126199
Attached patch Branch PatchSplinter Review
Adjusted patch for branches.

Approval Request Comment
[Feature/regressing bug #]: Firefox Hello, possible the previous upgrade regressed it, but that was before we shipped in 34/35.
[User impact if declined]: Occasionally a room or call may start up but fail to connect properly with no obvious reason as to why. This is timing related, it could be that some users see this more than others, but we haven't done an analysis.
[Describe test coverage new/current, TreeHerder]: Partner supplied sdk. Landed on central, nightly builds now have functional test coverage run outside of treeherder.
[Risks and why]: Low - partner supplied sdk, manual testing. Repeated local functional testing against central showed no errors.
[String/UUID change made/needed]: None
Attachment #8555749 - Flags: approval-mozilla-beta?
Attachment #8555749 - Flags: approval-mozilla-aurora?
Attachment #8555749 - Flags: approval-mozilla-beta?
Attachment #8555749 - Flags: approval-mozilla-beta+
Attachment #8555749 - Flags: approval-mozilla-aurora?
Attachment #8555749 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.