Closed Bug 905150 Opened 11 years ago Closed 11 years ago

ICE shouldn't cancel running checks on first nomination

Categories

(Core :: WebRTC: Networking, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 --- unaffected
firefox25 + verified
firefox26 + verified

People

(Reporter: ekr, Assigned: ekr)

References

Details

(Keywords: verifyme)

Attachments

(1 file, 6 obsolete files)

http://tools.ietf.org/html/rfc5245#section-8.1.2 requires that on nomination
state you cancel all waiting and frozen checks.

However, nICEr appears to cancel *all* checks.

    p2=TAILQ_FIRST(&comp->stream->check_list);
    while(p2){
      if((p2 != pair) && (p2->remote->component->component_id == comp->component_id)){
        r_log(LOG_ICE,LOG_DEBUG,"ICE-PEER(%s)/STREAM(%s)/comp(%d): cancelling pair %s (0x%p)",comp->stream->pctx->label,comp->stream->label,comp->component_id,p2->as_string,p2);

        if(r=nr_ice_candidate_pair_cancel(pair->pctx,p2))
          ABORT(r);
      }

      p2=TAILQ_NEXT(p2,entry);
    }

This still allows you to connect but leads to suboptimal paths.
Attached patch Remove early cancel for ICE (obsolete) — Splinter Review
Assignee: nobody → ekr
Status: NEW → ASSIGNED
Comment on attachment 791735 [details] [diff] [review]
Remove early cancel for ICE

Adam,

This has two substantive changes:

1. Don't cancel running checks when one check on a component succeeds. This gives higher-priority candidate pairs a chance to succeed if they are just running slower for some reason (e.g., suicide packets)

2. Re-set the priorities to match the RFC5245 recommendations and in particular make relayed less preferred than srflx--not sure why nICEr's test programs had different settings.

Combined these should not leave us with relayed candidates if something else is available.

The remaining changes are accessors to let you figure out which candidate pairs are in use and then tests to verify that the right candidate pairs ahve been chosen.

Does this all seem sane? If so, I will clean it up and ask for review.
Attachment #791735 - Flags: feedback?(adam)
Comment on attachment 791735 [details] [diff] [review]
Remove early cancel for ICE

Review of attachment 791735 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. I have some suggestions for improvements inline.

Also, I presume the plan is to add new unittests to check that you end up with non-relayed candidates in the situations this patch fixes, right?

::: media/mtransport/nricemediastream.cpp
@@ +71,5 @@
>  namespace mozilla {
>  
>  MOZ_MTLOG_MODULE("mtransport")
>  
> +static nsresult MakeNrIceCandidate(NrIceCandidate* out, nr_ice_candidate* cand) {

The ownership model here is pretty confusing. I think you really want to make ownership more clear by changing the signature to:

static nsresult MakeNrIceCandidate(NrIceCandidate &out, nr_ice_candidate &cand)

@@ +74,5 @@
>  
> +static nsresult MakeNrIceCandidate(NrIceCandidate* out, nr_ice_candidate* cand) {
> +  int r;
> +
> +  char addr[64];  // Enough for IPv6 with colons.

s/64/INET6_ADDRSTRLEN + 1/

@@ +190,5 @@
>  }
>  
> +nsresult NrIceMediaStream::GetActivePair(int component,
> +                                         NrIceCandidate *local,
> +                                         NrIceCandidate *remote) {

Same comment as above regarding ownership

::: media/mtransport/test/ice_unittest.cpp
@@ +275,5 @@
> +        type = "prflx";
> +        break;
> +      case NrIceCandidate::ICE_RELAYED:
> +        type = "relay";
> +        break;

default:
  type = "invalid";

@@ +288,5 @@
> +              << cand.port
> +              << std::endl;
> +  }
> +
> +  void DumpActive() {

Perhaps this method should be named in a way that indicates that it is also verifying candidate types?

::: media/mtransport/third_party/nICEr/src/ice/ice_component.c
@@ +805,5 @@
>      comp->active=pair;
>  
>      r_log(LOG_ICE,LOG_DEBUG,"ICE-PEER(%s)/STREAM(%s)/comp(%d): cancelling all pairs but %s (0x%p)",comp->stream->pctx->label,comp->stream->label,comp->component_id,pair->as_string,pair);
>  
> +    /* OK, we need to cancel off everything on this component that isn't running */

I just want to double check the logic: I guess the rationale behind checking for IN_PROGRESS (rather than for the "FROZEN" and "WAITING" called out in the spec) is that we cannot get here with a pair marked as "SUCCEEDED", right?
Attachment #791735 - Flags: feedback?(adam) → feedback+
Comment on attachment 791735 [details] [diff] [review]
Remove early cancel for ICE

Review of attachment 791735 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nricemediastream.cpp
@@ +71,5 @@
>  namespace mozilla {
>  
>  MOZ_MTLOG_MODULE("mtransport")
>  
> +static nsresult MakeNrIceCandidate(NrIceCandidate* out, nr_ice_candidate* cand) {

Convention in this code is that out arguments are ptrs, so only one of these should be a reference.

@@ +190,5 @@
>  }
>  
> +nsresult NrIceMediaStream::GetActivePair(int component,
> +                                         NrIceCandidate *local,
> +                                         NrIceCandidate *remote) {

See above.

::: media/mtransport/third_party/nICEr/src/ice/ice_component.c
@@ +805,5 @@
>      comp->active=pair;
>  
>      r_log(LOG_ICE,LOG_DEBUG,"ICE-PEER(%s)/STREAM(%s)/comp(%d): cancelling all pairs but %s (0x%p)",comp->stream->pctx->label,comp->stream->label,comp->component_id,pair->as_string,pair);
>  
> +    /* OK, we need to cancel off everything on this component that isn't running */

No, we could, actually.

Let's do what the spec says.
Comment on attachment 791735 [details] [diff] [review]
Remove early cancel for ICE

Review of attachment 791735 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/test/ice_unittest.cpp
@@ -252,5 @@
>  
>        ASSERT_TRUE(NS_SUCCEEDED(res));
>      }
>    }
>  

Do you think we want a default here? Note that this is an enum, so I would prefer that the compiler enforced that we got all of these. Perhaps make this a test failure isntead.

@@ +288,5 @@
> +              << cand.port
> +              << std::endl;
> +  }
> +
> +  void DumpActive() {

Yeah....
Oh, one more thing:
While we can get there with a pair marked succeeded, we are about to prioritize ourselves over that candidate, so cancelling it is (I think) harmless. Nevertheless, I think it's better to follow the spec...
Comment on attachment 791735 [details] [diff] [review]
Remove early cancel for ICE

Review of attachment 791735 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nricemediastream.cpp
@@ +71,5 @@
>  namespace mozilla {
>  
>  MOZ_MTLOG_MODULE("mtransport")
>  
> +static nsresult MakeNrIceCandidate(NrIceCandidate* out, nr_ice_candidate* cand) {

Even in C++ code? That's pretty ugly. If that's what you want, sure. The contract is unclear, though. If you want to keep it a pointer, I would recommend very explicit documentation.

However, my recommendation remains the use of references, possibly with the addition of "const" to mark in-parameters, if you want to make that distinction.

::: media/mtransport/test/ice_unittest.cpp
@@ -252,5 @@
>  
>        ASSERT_TRUE(NS_SUCCEEDED(res));
>      }
>    }
>  

Sure, make it a test failure. That works also.
Attached patch Remove early cancel for ICE (obsolete) — Splinter Review
Attachment #791735 - Attachment is obsolete: true
Target Milestone: --- → mozilla26
Attached patch Remove early cancel for ICE (obsolete) — Splinter Review
Attachment #794017 - Attachment is obsolete: true
Attached patch Remove early cancel for ICE (obsolete) — Splinter Review
Attachment #795152 - Attachment is obsolete: true
Attached patch Remove early cancel for ICE (obsolete) — Splinter Review
Attachment #795154 - Attachment is obsolete: true
Attachment #795155 - Flags: review?(adam)
Comment on attachment 795155 [details] [diff] [review]
Remove early cancel for ICE

Review of attachment 795155 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. One throwaway suggestion below -- feel free to ignore it.

::: media/mtransport/nricemediastream.h
@@ +106,5 @@
>  
> +  // Get the candidate pair currently active. It's the
> +  // caller's responsibility to free these.
> +  nsresult GetActivePair(int component,
> +                         NrIceCandidate** local, NrIceCandidate** remote);

This is clearer than before but still a little awkward.

Consider:
std::pair<NrIceCandidate*,NrIceCandidate*> GetActivePair(int component);
Attachment #795155 - Flags: review?(adam) → review+
Attachment #795155 - Attachment is obsolete: true
Comment on attachment 798085 [details] [diff] [review]
Update ICE cancellation algorithm to keep checking higher priority pairs once one pair completes.

Review of attachment 798085 [details] [diff] [review]:
-----------------------------------------------------------------

A few minor formatting nits for the checkin, thanks!

::: media/mtransport/nricemediastream.cpp
@@ +83,5 @@
> +  // Const-cast because the internal nICEr code isn't const-correct.
> +  nr_ice_candidate *cand = const_cast<nr_ice_candidate *>(&candc);
> +  char addr[INET6_ADDRSTRLEN + 1];
> +
> +  r = nr_transport_addr_get_addrstring(&cand->addr,addr,sizeof(addr));

The other code in this file uses spaces after commas; please match here and in other new calls here

@@ +209,5 @@
> +                                     component,
> +                                     &local_int, &remote_int);
> +  if (r) {
> +    return NS_ERROR_FAILURE;
> +  }

Be consistent about if (x) foo; versus if (x) { foo; } - I don't care which, but please use one style in a file/dir/etc

::: media/mtransport/nricemediastream.h
@@ +148,5 @@
>        state_(ICE_CONNECTING),
>        ctx_(ctx),
>        name_(name),
>        components_(components),
> +    stream_(nullptr) {}

whitespace incorrect (tab?)

::: media/mtransport/third_party/nICEr/src/ice/ice_component.c
@@ +805,5 @@
>      comp->active=pair;
>  
>      r_log(LOG_ICE,LOG_DEBUG,"ICE-PEER(%s)/STREAM(%s)/comp(%d): cancelling all pairs but %s (0x%p)",comp->stream->pctx->label,comp->stream->label,comp->component_id,pair->as_string,pair);
>  
> +    /* Cancel checks in WAITING and FROZEN per S 8.1.2 */

RFC reference as well as paragraph
Attachment #798085 - Attachment is obsolete: true
Comment on attachment 798180 [details] [diff] [review]
Update ICE cancellation algorithm to keep checking higher priority pairs once one pair completes.

Review of attachment 798180 [details] [diff] [review]:
-----------------------------------------------------------------

carrying forward r+ from abr
Attachment #798180 - Flags: review+
Comment on attachment 798180 [details] [diff] [review]
Update ICE cancellation algorithm to keep checking higher priority pairs once one pair completes.

Review of attachment 798180 [details] [diff] [review]:
-----------------------------------------------------------------

r? to derf to verify style nits.
Attachment #798180 - Flags: review?(tterribe)
(In reply to Randell Jesup [:jesup] from comment #15)
> Comment on attachment 798085 [details] [diff] [review]
> Update ICE cancellation algorithm to keep checking higher priority pairs
> once one pair completes.
> 
> Review of attachment 798085 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> A few minor formatting nits for the checkin, thanks!
> 
> ::: media/mtransport/nricemediastream.cpp
> @@ +83,5 @@
> > +  // Const-cast because the internal nICEr code isn't const-correct.
> > +  nr_ice_candidate *cand = const_cast<nr_ice_candidate *>(&candc);
> > +  char addr[INET6_ADDRSTRLEN + 1];
> > +
> > +  r = nr_transport_addr_get_addrstring(&cand->addr,addr,sizeof(addr));
> 
> The other code in this file uses spaces after commas; please match here and
> in other new calls here

Done.


> @@ +209,5 @@
> > +                                     component,
> > +                                     &local_int, &remote_int);
> > +  if (r) {
> > +    return NS_ERROR_FAILURE;
> > +  }
> 
> Be consistent about if (x) foo; versus if (x) { foo; } - I don't care which,
> but please use one style in a file/dir/etc

Changed.


> ::: media/mtransport/nricemediastream.h
> @@ +148,5 @@
> >        state_(ICE_CONNECTING),
> >        ctx_(ctx),
> >        name_(name),
> >        components_(components),
> > +    stream_(nullptr) {}
> 
> whitespace incorrect (tab?)

Fixed.


> ::: media/mtransport/third_party/nICEr/src/ice/ice_component.c
> @@ +805,5 @@
> >      comp->active=pair;
> >  
> >      r_log(LOG_ICE,LOG_DEBUG,"ICE-PEER(%s)/STREAM(%s)/comp(%d): cancelling all pairs but %s (0x%p)",comp->stream->pctx->label,comp->stream->label,comp->component_id,pair->as_string,pair);
> >  
> > +    /* Cancel checks in WAITING and FROZEN per S 8.1.2 */
> 
> RFC reference as well as paragraph

Convention in this code is to say ICE S XXX. Changed to that.
Comment on attachment 798180 [details] [diff] [review]
Update ICE cancellation algorithm to keep checking higher priority pairs once one pair completes.

Review of attachment 798180 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/nricectx.cpp
@@ +318,5 @@
>      nr_crypto_vtbl = &nr_ice_crypto_nss_vtbl;
>      initialized = true;
>  
> +    // Set the priorites for candidate type preferences.
> +    // These numbers come from RFC 5245 S. 4.1.2.2

This is okay because the C++ code, unlike the C code, was written post-RFC, so we can use the more normal convention here.

@@ +323,4 @@
>      NR_reg_set_uchar((char *)"ice.pref.type.srv_rflx",100);
> +    NR_reg_set_uchar((char *)"ice.pref.type.peer_rflx",110);
> +    NR_reg_set_uchar((char *)"ice.pref.type.host",126);
> +    NR_reg_set_uchar((char *)"ice.pref.type.relayed",0);

Spaces after commas? The existing code didn't have them, and you're not changing all of it, so if you want to leave these ones alone, that's fine with me, too, but it's inconsistent with the code right below it.

::: media/mtransport/third_party/nICEr/src/ice/ice_media_stream.h
@@ +88,5 @@
>  int nr_ice_media_stream_component_failed(nr_ice_media_stream *stream,nr_ice_component *component);
>  int nr_ice_media_stream_set_state(nr_ice_media_stream *str, int state);
>  int nr_ice_media_stream_get_best_candidate(nr_ice_media_stream *str, int component, nr_ice_candidate **candp);
>  int nr_ice_media_stream_send(nr_ice_peer_ctx *pctx,nr_ice_media_stream *str, int component, UCHAR *data, int len);
> +int nr_ice_media_stream_get_active(nr_ice_peer_ctx *pctx,nr_ice_media_stream *str, int component, nr_ice_candidate **local, nr_ice_candidate **remote);

Comma after space (yes, the line above makes the same mistake, but let's not compound the error).
Attachment #798180 - Flags: review?(tterribe) → review+
If you're going to backout and re-land for using the wrong bug number, please make use of DONTBUILD and consolidate it to one push. There's no reason this needed 3 separate pushes + builds + tests.
Thanks for the advice. I will do so in the future.
https://hg.mozilla.org/mozilla-central/rev/6eb4d1008076
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Does this need testing by QA and if so how can we test this?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #25)
> Does this need testing by QA and if so how can we test this?

I just talked to Ekr: we need QA to verify that an ordinary apprtc call works and then that a call works when you only can do TURN.  One trick to do the second part is to block at the firewall every UDP port except port 3478.  

If you run the first test and verify that you're not sending data to the TURN server (with wireshark) it tests both this bug and bug 904598.  If you can make TURN calls (the second part of the test), then TURN works when all other options are not available -- which is what we want.
Thanks Maire.

Mihaela, please make this a priority in your testing tonight.
QA Contact: mihaela.velimiroviciu
I cannot make a call between 2 Nightlies from Mac OS X 10.8.4 - Mac OS X 10.7.5 and Mac OS X 10.8.4 - Ubuntu 13.04: both the callee and the caller remain stuck in "Connecting..." mode. I attached the logs in bug #904598 comment #50.
(In reply to Maire Reavy [:mreavy] from comment #26)
> I just talked to Ekr: we need QA to verify that an ordinary apprtc call
> works and then that a call works when you only can do TURN.  One trick to do
> the second part is to block at the firewall every UDP port except port 3478.

If you're running this under Linux, I _think_ you'd block the UDP ports like this:

> sudo iptables -A INPUT -p udp \! --sport 3478 \! --sport 53 -j DROP

To return things to normal, issue the same command, but with the -A changed to a -D.
Make that:

> sudo iptables -A INPUT -p udp -m multiport \! --sports 3478,53 -j DROP
Okay, EKR points out that you probably lost bootp, which will look like DNS failures. Let's try allowing all the ports in the sub-1024 range:

> sudo iptables -A INPUT -p udp -m multiport \! --sports 1:1023,3478 -j DROP
Under some conditions this doesn't seem to be working. We are still testing this.
(In reply to juan becerra [:juanb] from comment #32)
> Under some conditions this doesn't seem to be working. We are still testing
> this.

We managed to track the issue down to an interaction between some new interface prioritization code for Linux and the use of TURN servers. See bug 915420.
Depends on: 915420
Comment on attachment 798180 [details] [diff] [review]
Update ICE cancellation algorithm to keep checking higher priority pairs once one pair completes.

Review of attachment 798180 [details] [diff] [review]:
-----------------------------------------------------------------

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 904598, 905150, 915420

User impact if declined:
Users will not be able to connect with WebRTC whenever a site
offers TURN service. With 904598 and 905150 but not 915420,
this will only impact Linux users.

Testing completed (on m-c, etc.):
Bugs 904598, 905150 tested by EKR and then EKR and ABR.
Bug 915420 fixed today and tested by EKR and ABR. Recommend
re-testing by QA prior to uplift.

Risk to taking this patch (and alternatives if risky):
This shouldn't make things any worse. The alternative is to
turn TURN off entirely.


String or IDL/UUID changes made by this patch: None.
Attachment #798180 - Flags: approval-mozilla-aurora?
Comment on attachment 798180 [details] [diff] [review]
Update ICE cancellation algorithm to keep checking higher priority pairs once one pair completes.

https://bugzilla.mozilla.org/show_bug.cgi?id=904598#c57
Attachment #798180 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Could we get a roll up of the testing status so far? I know there were some issues encountered via email. It would be good to have that recorded here.
Keywords: verifyme
QA Contact: mihaela.velimiroviciu → jbecerra
Verified this on nightly by establishing ordinary calls between Mac and Linux clients. I also tested the scenario where only TURN was used, per instructions by Adam and Eric. Both of these scenarios now work. I did not check on latest Aurora.
I've managed to reproduce this issue by following the next steps:

1. On an Ubuntu 12.10 32-bit machine, I've blocked the UDP ports like this: > sudo iptables -A INPUT -p udp \! --sport 3478 \! --sport 53 -j DROP  and then opened https://apprtc.webrtc.org with the Nightly from 2013-09-10
2. On a Mac OS X 10.8 machine from a different network, I've opened the same link with the same Nightly build.

Results: on both machine, users could see themselves in both images (and not each other, as it would be expected)


I confirm that this now works with Firefox 25 beta 4 (build ID: 20131001024718), after testing same steps as above (without giving for the 2nd time the sudo command in the terminal, since it's not necessary), with the same machines, both in different and same networks.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: