Closed Bug 1041832 Opened 6 years ago Closed 6 years ago

Webrtc mochitests should support trickle candidates

Categories

(Core :: WebRTC: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: bwc, Assigned: drno)

References

(Depends on 1 open bug)

Details

Attachments

(1 file, 14 obsolete files)

43.47 KB, patch
drno
: review+
Details | Diff | Splinter Review
Since bug 991037 entails a switch to full trickle logic, we'll need to teach the mochitests to pass along trickle ICE candidates, since they will no longer be in the SDP. This may also be useful in testing stuff other than host candidates.
Assignee: nobody → drno
First version which works for local mochitest (note: WIP).
This properly handles now end of trickle ICE. Still WIP
Attachment #8465768 - Attachment is obsolete: true
WIP. Still missing receiving candidates in steeplechase setup.
Attachment #8465817 - Attachment is obsolete: true
WIP: only flush candidates on setRemoteDesc. Turn on data channel tests.
Attachment #8465861 - Attachment is obsolete: true
Byron: With your patch from bug 991037 I have seen:
- data channel test just work
- end of trickle ICE coming in before all other candidates
- data channel test hang

Is it expected to not see any candidates from the answerer side?
Flags: needinfo?(docfaraday)
Not really. I'll probably be looking into 991037 today, so hopefully I'll get to the bottom of it.
Flags: needinfo?(docfaraday)
(In reply to Nils Ohlmeier [:drno] from comment #5)
> Byron: With your patch from bug 991037 I have seen:
> - data channel test just work
> - end of trickle ICE coming in before all other candidates
> - data channel test hang
> 
> Is it expected to not see any candidates from the answerer side?

So, are you seeing this happen with ICE succeeding? Because if ICE is sufficiently fast, it can beat trickle. Is this roughly what you're seeing?

0. Answerer has already gathered candidates.
1. Offerer starts gathering candidates.
2. Offerer creates the offer and calls setLocal, then sends the offer along.
3. Offerer starts trickling ICE candidates.
4. Answerer gets the offer, calls setLocal, and creates an answer.
5. Answerer starts sending ICE checks.
6. Answerer sends answer.
7. Answerer starts sending trickle candidates.
8. ICE checks beat the answerer's trickle candidates across, offerer creates peer-reflexive candidates, and ICE concludes.
Flags: needinfo?(drno)
(In reply to Byron Campen [:bwc] from comment #7)
> So, are you seeing this happen with ICE succeeding? Because if ICE is
> sufficiently fast, it can beat trickle. Is this roughly what you're seeing?
> 
> 0. Answerer has already gathered candidates.
> 1. Offerer starts gathering candidates.
> 2. Offerer creates the offer and calls setLocal, then sends the offer along.
> 3. Offerer starts trickling ICE candidates.
> 4. Answerer gets the offer, calls setLocal, and creates an answer.
> 5. Answerer starts sending ICE checks.
> 6. Answerer sends answer.
> 7. Answerer starts sending trickle candidates.
> 8. ICE checks beat the answerer's trickle candidates across, offerer creates
> peer-reflexive candidates, and ICE concludes.

I did not see 7. happening at all. The answer in 6. is empty. And ICE succeeds on both ends without any candidate from the answerer side ever being offered. I assume ICE connectivity beat the candidate gathering.
Flags: needinfo?(drno)
Strange. At any rate, I know what is causing the premature end-of-candidates signal, and I think I know how to fix it by deleting code. More tomorrow, hopefully, and maybe that will shed some light on this.
Ok, figured out the DataChannel and premature end-of-candidates problem. Bug 1024028 is biting us now.
Depends on: 1024028
Updated patch which watches for trickle end, verifies if candidates have been received, and uses a copy of the initial offer and answer.
Attachment #8466563 - Attachment is obsolete: true
This actually works now with data channel test as well.
Attachment #8469604 - Attachment is obsolete: true
Works with non-trickle ICE now.
Attachment #8470115 - Attachment is obsolete: true
Be aware; this test case now detects bug 991037 (early end-of-candidates signal), so it will fail intermittently.
Added timeouts for ICE gathering.
Attachment #8470245 - Attachment is obsolete: true
Heads-up; there's a couple of tests in media/tests/identity that still use variables that this patch renames.
Flags: needinfo?(drno)
Comment on attachment 8474853 [details] [diff] [review]
bug_1041832_add_trickle_ice_support.patch

Review of attachment 8474853 [details] [diff] [review]:
-----------------------------------------------------------------

Missed a couple:

28640 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/identity/test_setIdentityProvider.html | uncaught exception - TypeError: test.pcLocal._last_offer is undefined at http://mochi.test:8888/tests/dom/media/tests/identity/test_setIdentityProvider.html:86
TEST-INFO | expected PASS
28641 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/identity/test_setIdentityProviderWithErrors.html | uncaught exception - TypeError: test.pcLocal._last_offer is undefined at http://mochi.test:8888/tests/dom/media/tests/identity/test_setIdentityProviderWithErrors.html:73
TEST-INFO | expected PASS

::: dom/media/tests/mochitest/pc.js
@@ +1975,5 @@
> +   *        A PeerConnectionTest object to which the ice candidates gets
> +   *        forwarded.
> +   */
> +  setupIceCandidateHandler : function
> +  PCW_setupIceCandidateHandler(test) {

Line wrap not needed, especially if you don't follow house style and you remove the second function name (which isn't needed).

@@ +1986,5 @@
> +      info(self.label + ": got iceCandidateEvent: " + anEvent);
> +      if (!anEvent.candidate) {
> +        info(self.label + ": received end of trickle ICE event");
> +        self.endOfTrickleIce = true;
> +        if (typeof self.endOfTrickleIceCallback == 'function') {

=== ; check quote style
Attached patch identity.patch (obsolete) — Splinter Review
Should fix the only real problem related to this that I saw running bug 991037
Depends on: 1057621
Depends on: 1057558
Flags: needinfo?(drno)
Try run: https://tbpl.mozilla.org/?tree=Try&rev=2c6e82ec8aa6
Attachment #8474853 - Attachment is obsolete: true
Attachment #8477627 - Attachment is obsolete: true
Removed checks for gathering state.
Attachment #8477973 - Attachment is obsolete: true
Try run for the latest patch with trickle ICE: https://tbpl.mozilla.org/?tree=Try&rev=ed4e54261d6f

Try run against current m-c (w/o trickle ICE): https://tbpl.mozilla.org/?tree=Try&rev=cc547383078d
Ok, the main problem here is that there is a totally unreasonable delay between createAnswer/setLocal on the answerer side, while we send the answer to the offerer immediately. The offerer receives and sets the answer seconds before the answerer sets its own description. Pre-full-trickle, the offerer got the candidates in the answer, and was able to start ICE regardless of whether the answerer had set its own SDP yet, but in the full trickle case the answerer can't send any candidates until its own SDP is set, so the start of ICE is significantly out of sync.
Wait, scratch that, the reason for the delay between createAnswer/setLocal is because this is a datachannel test, and we're waiting for the datachannels in between these two operations. We probably shouldn't be doing that.
I'm kicking off a try run with full logging for b2g emulator M7. That should shed some more light on where these delays are coming from.
Keep in mind that test_dataChannel_bug1013809.html just tests the old test behavior, where we set the answer on the offerer side first, before setting on the answerer side. I think that is a pretty unrealistic scenario which we will hardly see in the field.
If this is the only remaining problem I suggest we disabled that test, as it is know to cause strange timing effect.
Kill it with fire.
(Ok, maybe that wasn't totally explicit. We should remove the test entirely.)
So, I see that practically all of these tests are disabled on b2g debug, due to horrible lag. Maybe the fact that we're doing a little more work handling trickle candidates has pushed opt over the edge too?
Comment on attachment 8478684 [details] [diff] [review]
bug_1041832_add_trickle_ice_support.patch

Review of attachment 8478684 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by review comments. Not giving the thumbs up/down yet, just trying to parallelize.

::: dom/media/tests/identity/test_setIdentityProvider.html
@@ +83,5 @@
>    [
>      "OFFERS_AND_ANSWERS_INCLUDE_IDENTITY",
>      function(test) {
> +      ok(test.originalOffer.sdp.contains("a=identity"), "a=identity is in the offer SDP");
> +      ok(test.originalAnswer.sdp.contains("a=identity"), "a=identity is in the answer SDP");

The other tests seem to replace with _latest_offer/_latest_answer here.

::: dom/media/tests/identity/test_setIdentityProviderWithErrors.html
@@ +70,5 @@
>      [
>        'ONLY_REMOTE_SDP_INCLUDES_IDENTITY_ASSERTION',
>        function(test) {
> +        ok(!test.originalOffer.sdp.contains('a=identity'), 'a=identity not contained in the offer SDP');
> +        ok(test.originalAnswer.sdp.contains('a=identity'), 'a=identity is contained in the answer SDP');

Same issue here.

::: dom/media/tests/mochitest/mochitest.ini
@@ +67,5 @@
>  # no screenshare on b2g/android
>  # frequent timeouts/crashes on e10s (bug 1048455)
>  skip-if = buildapp == 'b2g' || toolkit == 'android' || e10s
> +#[test_peerConnection_basicH264Video.html]
> +#skip-if = buildapp == 'b2g' || os == 'android' # bug 1043403

Is this related?

::: dom/media/tests/mochitest/pc.js
@@ +677,3 @@
>    peer.createAnswer(function (answer) {
> +    // make a copy so this does not get updated with ICE candidates
> +    self.originalAnswer = new mozRTCSessionDescription(JSON.parse(JSON.stringify(answer)));

I love javascript...

@@ +1830,5 @@
> +    var self = this;
> +
> +    self._remote_ice_candidates.push(candidate);
> +    if (self.remoteDescriptionSet) {
> +      //info("adding candidate " + JSON.stringify(candidate));

Either remove or uncomment.

@@ +2002,5 @@
> +        self._local_ice_candidates.push(anEvent.candidate);
> +        test.iceCandidateHandler(self.label, anEvent.candidate);
> +        if (typeof self.onIceCandidateCallback === 'function') {
> +          self.onIceCandidateCallback(anEvent.candidate);
> +        }

Don't see onIceCandidateCallback being set anywhere.

::: dom/media/tests/mochitest/templates.js
@@ +693,5 @@
>    ],
>    [
>      'PC_REMOTE_SANE_LOCAL_SDP',
>      function (test) {
> +      test.pcRemote.verifySdp(test.pcRemote.localDescription, "answer", test.pcRemote.constraints, test.pcRemote.offerOptions, function (trickle) {

Line's getting really long here.

@@ +702,5 @@
>    ],
>    [
>      'PC_LOCAL_SANE_REMOTE_SDP',
>      function (test) {
> +      test.pcLocal.verifySdp(test.pcLocal.remoteDescription, "answer", test.pcLocal.constraints, test.pcLocal.offerOptions, function (trickle)

Line's getting really long here.

@@ +722,5 @@
>          myTest.next();
>        };
>        function onIceConnectedFailed () {
>          dumpSdp(myTest);
> +        is(test.pcLocal._pc.iceGatheringState, GATH_COMPLETE, "ICE gathering state has reached 'complete'");

Hmm. I'm not totally sure this something we'll do right now, and not sure what putting it right above ok(false, accomplishes.

@@ +765,5 @@
>          myTest.next();
>        };
>        function onIceConnectedFailed () {
>          dumpSdp(myTest);
> +        is(test.pcRemote._pc.iceGatheringState, GATH_COMPLETE, "ICE gathering state has reached 'complete'");

Similar thing here.
Updated according to Byron's feedback.
Disabled test_dataChannel_bug1013809.html on B2G.

Try run with trickle ICE patches: https://tbpl.mozilla.org/?tree=Try&rev=401be974550f
Attachment #8478684 - Attachment is obsolete: true
As the previous try run was showing lots of oranges from the IPC fall out, here is another run with IPC fixes included: https://tbpl.mozilla.org/?tree=Try&rev=0316bdc7627f
The previous patch had still issues with the SDP in the data channel tests.

Try run: https://tbpl.mozilla.org/?tree=Try&rev=c486298eb413
Attachment #8479450 - Attachment is obsolete: true
So I ran a bunch of retriggers of b2g M7, and it looks like our orange rate is in the 20-30% range.

https://tbpl.mozilla.org/?tree=Try&rev=41d9bca2547b
I am trying to pin down the source of these horrible delays, and the TimerThread is definitely firing timers very late (I added some checking and a MOZ_CRASH to TimerThread::Run in a try push here: https://tbpl.mozilla.org/?tree=Try&rev=8e989f171fc5). I tried checking for clock drift, but found no evidence of it. I now have a try run that should log when the monitor wait in TimerThread takes more than a second too long, we'll see what happens.
So something interesting here. dom/tests/mochitest/bugs/test_bug369306.html seems to be perma-orange with these patches, but it is a completely unrelated test...
That test is failing because the test window has chrome privs for some reason, which allows it to bypass the dom.disable_window_flip pref. Interestingly enough, the test doesn't always fail, probably because the "focus" and "blur" callbacks aren't arriving in time.
I am going to debug this test without the patch set from bug 991037. More in a few hours.
Depends on: 1059867
Ok, that test is completely broken, see the comments on bug 963103. I think we should disable it.
Moved B2G test disabling out of the patch set.
Attachment #8479628 - Attachment is obsolete: true
Attachment #8480812 - Flags: review?(docfaraday)
Comment on attachment 8480812 [details] [diff] [review]
bug_1041832_add_trickle_ice_support.patch

Review of attachment 8480812 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if you just make sure that the PC_REMOTE_CHECK_MEDIA_TRACKS and PC_LOCAL_CHECK_MEDIA_TRACKS steps are consistent between the non-datachannel and datachannel cases, since that will make it easier to start cutting down on the duplicate code later.
Attachment #8480812 - Flags: review?(docfaraday) → review+
Addressed Byron's concern regarding the TRACKS verification.

Carrying forward r+=bwc
Attachment #8480812 - Attachment is obsolete: true
Attachment #8480831 - Flags: review+
Blocks: 1060103
Blocks: 1060102
https://hg.mozilla.org/mozilla-central/rev/029654998ef3
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.