Status

()

Core
WebRTC: Networking
P2
normal
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: ekr, Assigned: ekr)

Tracking

unspecified
mozilla23
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [WebRTC], [blocking-webrtc-], [qa-], [fuzzing:cdiehl])

Attachments

(1 attachment, 9 obsolete attachments)

(Assignee)

Description

6 years ago
nICEr has TURN support but it's largely untested. We need to test it with a recent TURN server.

Updated

6 years ago
Whiteboard: [WebRTC]

Updated

6 years ago
Whiteboard: [WebRTC] → [WebRTC], [blocking-webrtc-]

Updated

6 years ago
Whiteboard: [WebRTC], [blocking-webrtc-] → [WebRTC], [blocking-webrtc+]
I don't think we'll block preffing on for TURN support.

Adding Jan-Ivar as someone who might offload this from ekr later (after blockers)
Assignee: nobody → ekr
Priority: -- → P2
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC], [blocking-webrtc-]
(Assignee)

Comment 2

5 years ago
Created attachment 731172 [details] [diff] [review]
Rewrite TURN. Patch 1 (nr_turn_client_ctx and below); WIP
(Assignee)

Comment 3

5 years ago
Comment on attachment 731172 [details] [diff] [review]
Rewrite TURN. Patch 1 (nr_turn_client_ctx and below); WIP

Review of attachment 731172 [details] [diff] [review]:
-----------------------------------------------------------------

This is unfinished but it does work with the trivial tests and restund.
Attachment #731172 - Flags: feedback?(cdiehl)
Attachment #731172 - Flags: feedback?(adam)
(Assignee)

Comment 4

5 years ago
Note: do not review nr_socket_turn.c or anything in ice/. Those were just hacked up to get compilation to work.
(Assignee)

Comment 5

5 years ago
Note also: this does not do password prep. You need to provide the binary MD5 password for now.
Depends on: 856382
(Assignee)

Comment 6

5 years ago
Created attachment 731719 [details] [diff] [review]
Rewrite TURN. Patch 2 (ICE); WIP
(Assignee)

Comment 7

5 years ago
Created attachment 731724 [details] [diff] [review]
Rewrite TURN. Patch 2 (ICE); WIP
(Assignee)

Updated

5 years ago
Attachment #731719 - Attachment is obsolete: true
(Assignee)

Comment 8

5 years ago
Created attachment 731725 [details] [diff] [review]
Rewrite TURN. Patch 3 (Long-term password hashing); WIP
(Assignee)

Comment 9

5 years ago
Created attachment 731727 [details] [diff] [review]
Rewrite TURN. Patch 3 (Long-term password hashing); WIP
(Assignee)

Updated

5 years ago
Attachment #731725 - Attachment is obsolete: true
(Assignee)

Comment 10

5 years ago
Created attachment 731863 [details] [diff] [review]
Rewrite TURN
(Assignee)

Updated

5 years ago
Attachment #731172 - Attachment is obsolete: true
Attachment #731172 - Flags: feedback?(cdiehl)
Attachment #731172 - Flags: feedback?(adam)
(Assignee)

Updated

5 years ago
Attachment #731727 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Attachment #731724 - Attachment is obsolete: true
(Assignee)

Comment 11

5 years ago
Comment on attachment 731863 [details] [diff] [review]
Rewrite TURN

Review of attachment 731863 [details] [diff] [review]:
-----------------------------------------------------------------

Adam,

This has three known defects.

1. There is a memory leak in nricectx of the usenrame and password.
2. Before we uplift the code to resiprocate, we need to supply an
   MD5 implementation for nr_crypto
3. I didn't clean up long-term auth for regular STUN yet.

I intend to fix 1 before landing...
Attachment #731863 - Flags: review?(adam)

Comment 12

5 years ago
Created attachment 731948 [details] [diff] [review]
Interdiff between first and most recent (731863) patch

The interdiff tool is out of service at the moment, and uploading this was the easiest way to get a format I could work with easily.

Comment 13

5 years ago
Comment on attachment 731863 [details] [diff] [review]
Rewrite TURN

Review of attachment 731863 [details] [diff] [review]:
-----------------------------------------------------------------

This generally looks to be in really good shape. I have a bunch of nits and a small handful of minor to middle-importance bugs. Given the number of comments, I'm marking r- for the opportunity to look at this one more time after the bugs are fixed and the memory leak plugged.

Whitespace nits:

nricectx.cpp:166: trailing whitespace
nricectx.cpp:218: trailing whitespace
nricectx.cpp:233: trailing whitespace
nricectx.h:139: trailing whitespace
nricectx.h:143: trailing whitespace
nricectx.h:147: trailing whitespace
ice_unittest.cpp:56: trailing whitespace
ice_unittest.cpp:162: trailing whitespace
ice_candidate.c:617: tab character
ice_candidate.c:618: tab character
ice_candidate.c:619: tab character
ice_candidate.c:620: tab character
stun.h:169: tab character
stun.h:183: tab character
stun.h:185: tab character
stun.h:195: tab character
stun.h:197: tab character
stun_build.h:150: tab character
stun_client_ctx.c:307: tab character
stun_client_ctx.c:308: tab character
stun_client_ctx.c:309: tab character
stun_client_ctx.c:310: tab character
stun_client_ctx.c:448: tab character
stun_client_ctx.c:449: tab character
stun_client_ctx.c:473: tab character
stun_client_ctx.c:474: tab character
stun_client_ctx.c:476: tab character
stun_client_ctx.c:480: tab character
stun_client_ctx.c:481: tab character
stun_client_ctx.c:483: tab character
stun_client_ctx.c:487: tab character
stun_client_ctx.c:488: tab character
stun_client_ctx.c:490: tab character
stun_client_ctx.c:561: tab character
stun_client_ctx.c:562: tab character
stun_client_ctx.c:563: tab character
stun_client_ctx.c:568: tab character
stun_client_ctx.c:629: tab character
stun_client_ctx.c:631: tab character
stun_client_ctx.c:632: tab character
stun_client_ctx.c:633: tab character
stun_client_ctx.c:634: tab character
stun_client_ctx.c:636: tab character
stun_client_ctx.c:637: tab character
stun_client_ctx.c:638: tab character
stun_client_ctx.c:648: tab character
stun_client_ctx.c:649: tab character
stun_client_ctx.c:650: tab character
stun_client_ctx.c:657: tab character
stun_client_ctx.c:658: tab character
stun_client_ctx.h:157: tab character
stun_msg.c:340: tab character
stun_msg.c:341: tab character
stun_msg.c:343: tab character
stun_msg.c:344: tab character
stun_msg.h:128: tab character
stun_msg.h:129: tab character
turn_client_ctx.c:258: tab character
turn_client_ctx.h:106: tab character
turn_client_ctx.h:115: tab character

::: media/mtransport/nricectx.cpp
@@ +221,5 @@
> +  if (!(server->username=r_strdup(const_cast<char *>(
> +          username_.c_str()))))
> +    return NS_ERROR_OUT_OF_MEMORY;
> +
> +  // TODO(ekr@rtfm.com): handle non-ASCII passwords somehow?

I don't see how this relies on the passwords being ASCII.

@@ +223,5 @@
> +    return NS_ERROR_OUT_OF_MEMORY;
> +
> +  // TODO(ekr@rtfm.com): handle non-ASCII passwords somehow?
> +  int r = r_data_create(&server->password,
> +                        const_cast<UCHAR *>(&password_[0]),

Even ignoring the const_cast<> (see below), this gave me this willies just based on the abstraction violation it represents.

Please add the following comment to save the next person to find this code some research:

/* C++03 23.2.4, Paragraph 1 stipulates that the elements
   in std::vector must be contiguous, and can therefore be
   used as input to functions expecting C arrays. */

@@ +451,5 @@
>    return NS_OK;
>  }
>  
> +// TODO(ekr@rtfm.com): This is just SetStunServers with s/Stun/Turn
> +// Could we do a template or something?

I haven't tried making it work, but you should be able to get away with something like:

> template<class NrIceServer,
>          struct nr_ice_server,
>          typename ToNicerStruct,
>          const char * protocol,
>          typename nr_ice_ctx_set_server>
> nsresult NrIceCtx::SetServers(const std::vector<NrIceServer>&
>                                   servers) {
>   if (servers.empty())
>     return NS_OK;
>
>   ScopedDeleteArray<nr_ice_server> servers(
>       new nr_ice_server[servers.size()]);
>
>   for (size_t i=0; i < servers.size(); ++i) {
>     nsresult rv = servers[i].ToNicerStruct(&servers[i]);
>     if (NS_FAILED(rv)) {
>       MOZ_MTLOG(PR_LOG_ERROR, "Couldn't set " << protocol << " server for '" << name_ << "'");
>       return NS_ERROR_FAILURE;
>     }
>   }
>
>   int r = nr_ice_ctx_set_servers(ctx_, servers, servers.size());
>   if (r) {
>     MOZ_MTLOG(PR_LOG_ERROR, "Couldn't set " << protocol << " server for '" << name_ << "'");
>     return NS_ERROR_FAILURE;
>   }
>
>   return NS_OK;
> }
>
> nsresult NrIceCtx::SetTurnServers(const std::vector<NrIceTurnServer>&
>                                   turn_servers) {
>     return SetServers<NrIceTurnServer,
>                       nr_ice_turn_server,
>                       ToNicerTurnServer,
>                       "TURN",
>                       nr_ice_ctx_set_turn_server>(turn_servers);
> }
>
> nsresult NrIceCtx::SetStunServers(const std::vector<NrIceStunServer>&
>                                   stun_servers) {
>     return SetServers<NrIceStunServer,
>                       nr_ice_stun_server,
>                       ToNicerStunServer,
>                       "STUN",
>                       nr_ice_ctx_set_stun_server>(stun_servers);
> }

::: media/mtransport/test/ice_unittest.cpp
@@ +40,5 @@
>  
>  const std::string kDefaultStunServerAddress((char *)"23.21.150.121");
>  const std::string kDefaultStunServerHostname((char *)"ec2-23-21-150-121.compute-1.amazonaws.com");
>  const std::string kBogusStunServerHostname((char *)"stun-server-nonexistent.invalid");
>  const uint16_t kDefaultStunServerPort=3478;

It's possible that we'll want to use different ports in the future -- why don't you go ahead and define a kDefaultTurnServerPort as well?

@@ +41,5 @@
>  const std::string kDefaultStunServerAddress((char *)"23.21.150.121");
>  const std::string kDefaultStunServerHostname((char *)"ec2-23-21-150-121.compute-1.amazonaws.com");
>  const std::string kBogusStunServerHostname((char *)"stun-server-nonexistent.invalid");
>  const uint16_t kDefaultStunServerPort=3478;
> +const std::string kDefaultTurnServerAddress((char *)"192.168.134.140");

We're not going to land this with an RFC 1918 address in here, are we?

@@ +46,5 @@
> +const std::string kDefaultTurnUsername((char *)"test");
> +/*const unsigned char kDefaultTurnPasswordData[] = {
> +  0xe1, 0x6a, 0x36, 0xf7, 0x03, 0x53, 0xfd, 0x0c,
> +  0x65, 0xea, 0x50, 0x36, 0x2d, 0x71, 0xa1, 0xe5
> +  };*/

Should we be removing this rather than commenting it out? If we leave it in (but commented out), please add an explanation.

::: media/mtransport/test/turn_unittest.cpp
@@ +81,5 @@
> +MtransportTestUtils *test_utils;
> +
> +using namespace mozilla;
> +
> +std::string g_turn_server("172.16.131.149");

Whatever ends up happening here, you probably want to include a cite to the turn server testing infrastructure bug.

@@ +105,5 @@
> +    r = nr_socket_local_create(&addr, &net_socket_);
> +    ASSERT_EQ(0, r);
> +
> +    r = nr_ip4_str_port_to_transport_addr(
> +      const_cast<char *>(g_turn_server.c_str()), 3478,

const_cast<> always gives me the willies, since it usually means that a contract is either incorrect or about to be broken. It looks like it would be approximately as easy to update nr_ip4_str_port_to_transport_addr to take a const char * (since inet_addr takes a const char * as its argument).

@@ +117,5 @@
> +    };
> +
> +    INIT_DATA(password, password_data, sizeof(password_data));
> +    r = nr_turn_client_ctx_create(const_cast<char *>("test"), net_socket_,
> +                                  const_cast<char *>("test"), &password,

As above re: const_cast<>. I recognize that this can turn into a lot of work, so you might want to open a separate bug for dealing with const-correctness in the nICEr library in general.

@@ +130,5 @@
> +
> +  void SetUp() {
> +    RUN_ON_THREAD(test_utils->sts_target(),
> +                  WrapRunnable(this, &TurnClient::SetUp_s),
> +                  NS_DISPATCH_SYNC);

I know these aren't critical since we don't check leaks on the unit tests, but is there some reason you're not using the new SyncRunnable objects for these dispatches?

@@ +238,5 @@
> +    ASSERT_NE(offset, std::string::npos);
> +
> +    std::string host = target.substr(4, offset - 4);
> +    std::string port = target.substr(offset + 1);
> +

Perhaps we should also extract the protocol and assert that it is "IP4"?

@@ +304,5 @@
> +
> +TEST_F(TurnClient, SendToSelf) {
> +  Allocate();
> +  SendTo(relay_addr_);
> +  ASSERT_TRUE_WAIT(received() == 100, 1000);

I think you want to verify that the contents of the received buffer are the same as the contents of the sent buffer. There are a few subtle ways this could go wrong.

@@ +311,5 @@
> +int main(int argc, char **argv)
> +{
> +#ifdef LINUX
> +  // This test can cause intermittent oranges on the builders on Linux
> +  CHECK_ENVIRONMENT_FLAG("MOZ_WEBRTC_TESTS")

I see this same conditional in ice_unittest -- is it also true for these, or did it get copied in here by accident?

::: media/mtransport/third_party/nICEr/src/crypto/nr_crypto.c
@@ +53,5 @@
>  
> +static int nr_ice_crypto_dummy_md5(UCHAR *buf, int buf_l, UCHAR digest[16])
> +  {
> +    fprintf(stderr,"Need to define crypto API implementation\n");
> +

We probably ought to add a preprocessor #warning here so there's at least some indication that something's going wrong at compile time.

::: media/mtransport/third_party/nICEr/src/ice/ice_candidate.c
@@ +689,1 @@
>          /* switch candidate from TURN mode to STUN mode */

"STUN mode"? I can't quite follow. It looks like we're moving over to what could more clearly be described as "relay mode."

@@ +690,5 @@
>  
> +        if (r=nr_turn_client_get_relayed_address(turn, &relay_addr))
> +          ABORT(r);
> +
> +        if(r=nr_concat_strings(&label,"turn-relay(",cand->base.as_string,"|",relay_addr,")",NULL))

relay_addr.as_string

@@ +695,5 @@
>            ABORT(r);
>  
>          r_log(LOG_ICE,LOG_DEBUG,"ICE(%s): Switching from TURN (%s) to RELAY (%s)",cand->u.relayed.turn->label,cand->label,label);
>  
>          /* Copy out mapped address and relay address */

I don't think this comment is accurate any more.

@@ +737,3 @@
>      _status=0;
>    abort:
>      if(_status){

Please add logging in this block (if not above at each ABORT) indicating that something has gone wrong.

@@ +746,5 @@
>          cand2->state=NR_ICE_CAND_STATE_FAILED;
>          cand2->done_cb(0,0,cand2->cb_arg);
>        }
>      }
> +    /* return(_status); */

Remove this.

::: media/mtransport/third_party/nICEr/src/ice/ice_socket.c
@@ +136,1 @@
>                    goto re_process;

I don't think this is quite right. If the thing that we unwrap happens to be (or look enough like) another STUN message, we still want to treat it like data. Rather than "goto re_process", I think we really want to call directly into  nr_ice_ctx_deliver_packet().

::: media/mtransport/third_party/nICEr/src/ice/ice_socket.h
@@ +50,5 @@
>      nr_stun_client_ctx *client;
>      nr_stun_server_ctx *server;
> +    struct {
> +      nr_turn_client_ctx *turn_client;
> +      nr_socket *turn_sock;

The relationship between u.turn_client.turn_sock and u.turn_client.turn_client->sock isn't immediately apparent. Perhaps a comment explaining what's going on here would be in order.

::: media/mtransport/third_party/nICEr/src/stun/nr_socket_turn.c
@@ +126,4 @@
>  
> +    if (!sturn->turn)
> +      ABORT(R_BAD_ARGS);
> +

Should we also be checking for the magic cookie and returning if that's bad?

@@ +135,5 @@
>    abort:
>      return(_status);
>    }
>  
>  static int nr_socket_turn_recvfrom(void *obj,void * restrict buf,

Add a comment explaining why this operation is not permitted. We probably want something in the header file as well, warning that you cannot get the descriptor for, or use recvfrom on, turn sockets.

@@ +145,3 @@
>    }
>  
>  static int nr_socket_turn_getfd(void *obj, NR_SOCKET *fd)

Add a comment explaining why this operation is not permitted.

::: media/mtransport/third_party/nICEr/src/stun/stun.h
@@ +107,5 @@
>  
>  #ifdef USE_TURN
>  /* TURN attributes */
>  #define NR_STUN_ATTR_LIFETIME                0x000d
>  #define NR_STUN_ATTR_BANDWIDTH               0x0010

I think BANDWIDTH has been deprecated, and should probably be removed.

@@ +113,2 @@
>  #define NR_STUN_ATTR_DATA                    0x0013
>  #define NR_STUN_ATTR_RELAY_ADDRESS           0x0016

=> NR_STUN_ATTR_XOR_RELAY_ADDRESS

@@ +119,5 @@
>  //#define NR_STUN_ATTR_REQUESTED_PORT_PROPS    0x0018    /* UINT4 */
>  //#define NR_STUN_ATTR_REQUESTED_TRANSPORT     0x0019    /* UINT4 */
>  //#define NR_STUN_ATTR_TIMER_VAL               0x0021    /* ? */
>  //#define NR_STUN_ATTR_REQUIRED_IP             0x0022    /* address */
>  //#define NR_STUN_ATTR_CONNECT_STAT            0x0023    /* UINT4 */

These have all changed or gone away (except for 0x0019, which you've replicated above). Please remove them.

::: media/mtransport/third_party/nICEr/src/stun/stun_build.c
@@ +323,5 @@
> +
> +
> +  if (r=nr_crypto_md5((UCHAR *)digest_input, len + password->len, hmac_key->data))
> +    ABORT(r);
> +

hmac_key->len = 16;

@@ +338,5 @@
> +  Data hmac_key;
> +
> +  ATTACH_DATA(hmac_key, hmac_key_d);
> +
> +  if (auth->authenticate) {

It's easier to read this if you reverse the sense of the test, and return at the top of the function if !auth->authenticate.

@@ +394,1 @@
>  {

RFC 5766, section 4: "The client SHOULD include the SOFTWARE attribute in all Allocate and Refresh requests"

@@ +420,1 @@
>  {

RFC 5766, section 4: "The client SHOULD include the SOFTWARE attribute in all Allocate and Refresh requests"

@@ +488,2 @@
>  nr_stun_build_data_indication(nr_stun_client_data_indication_params *params, nr_stun_message **msg)
>  {

Not objecting, but isn't this message sent only by TURN servers? Is there a plan to use this library to build a TURN server at some point?

::: media/mtransport/third_party/nICEr/src/stun/stun_client_ctx.c
@@ +95,5 @@
>      }
>      return(_status);
>    }
>  
> +int nr_stun_client_set_auth_params(nr_stun_client_ctx *ctx, char *username, char *realm, char *nonce, Data *password)

Please document the ownership contract for the buffers passed to this function.

Alternately, if this function is unused, remove it.

@@ +435,5 @@
>        ABORT(R_REJECTED);
>  
>      if (ctx->state != NR_STUN_CLIENT_STATE_RUNNING)
>        ABORT(R_REJECTED);
>  

We should add a check here for transaction ID. The way this is called from the TURN client is by throwing any incoming response at all of the possible STUN transaction contexts. If you have two pending transactions (say, refresh and permissions), you could end up matching them to the wrong request. (Similarly, if you get a martian response, you want to ignore it).

@@ +444,5 @@
>  
>      /* determine password */
>      switch (ctx->mode) {
>      case NR_STUN_CLIENT_MODE_BINDING_REQUEST_LONG_TERM_AUTH:
> +	compute_lt_key=1;

Spaces around equal sign

@@ +470,5 @@
>  
>  #ifdef USE_TURN
> +    case NR_TURN_CLIENT_MODE_ALLOCATE_REQUEST:
> +	fail_on_error = 1;
> +	compute_lt_key=1;

Spaces around equal sign -- repeated twice below.

@@ +519,5 @@
>      if (ctx->mode == NR_TURN_CLIENT_MODE_SEND_INDICATION) {
>          /* SEND-INDICATION gets a DATA-INDICATION back, which will always
>           * be a different transaction, so don't perform the check in that
>           * case */
>      }

This conditional doesn't appear to do anything -- can we remove it?

@@ +560,5 @@
>      else {
> +	if (fail_on_error) {
> +	    ctx->state = NR_STUN_CLIENT_STATE_FAILED;
> +	}
> +        if ((r=nr_stun_process_error_response(ctx->response, &ctx->error_code))) {

The logic here is kind of hard to read. For most functions, r=0 means "the normal thing happened," and r!=0 means "something went wrong in the function." It took me a while to get unwrapped from the axle since the code reads like we demolish the return code under "normal" failure circumstances. In practice, the "else" branch, below, represents an exceptional codepath.

Comments to this effect would be helpful.

@@ +633,5 @@
> +		 &attr->u.relay_address.unmasked)))
> +	    ABORT(r);
> +
> +	if (!nr_stun_message_has_attribute(ctx->response, NR_STUN_ATTR_LIFETIME, &attr))
> +	    ABORT(R_BAD_DATA);

There are a lot of ways this can abort. It might be helpful to add debug logging so we have some idea what went wrong.

@@ +653,5 @@
>          if (! nr_stun_message_has_attribute(ctx->response, NR_STUN_ATTR_MESSAGE_INTEGRITY, 0))
>              ABORT(R_BAD_DATA);
>          break;
>      case NR_TURN_CLIENT_MODE_SEND_INDICATION:
> +	/* TODO(ekr@rtfm.com): This isn't needed? */

Yeah, I think that's right. Unless I'm confused, indications don't have associated responses.

::: media/mtransport/third_party/nICEr/src/stun/stun_hint.c
@@ +95,1 @@
>      case NR_STUN_MSG_SEND_INDICATION:

We probably ought to add the channel constants here (and define them) so they don't fall between the cracks.

::: media/mtransport/third_party/nICEr/src/stun/stun_util.c
@@ +235,4 @@
>  #endif /* USE_TURN */
>  
>      default:
>           assert(0);

I know it's not part of the changes made by this patch, but this assert(0) can trigger based on off-the-wire data, IIUC (this function is called by nr_stun_decode_message, which is used to process stuff from the wire). I think we want to save asserts for programming errors, not protocol errors.

::: media/mtransport/third_party/nICEr/src/stun/turn_client_ctx.c
@@ +52,5 @@
>  int NR_LOG_TURN = 0;
>  
> +#define TURN_RTO 100  /* Hardcoded RTO estimate */
> +#define TURN_LIFETIME_REQUEST_SECONDS    3600  /* One hour */
> +#define TURN_REFRESH_SLACK_SECONDS       10    /* How long before expiry to refresh

10 seconds seems to be cutting things close. Is there a reason we're not using the ~60 second timer suggested in the RFC?

@@ +263,5 @@
> +        }
> +
> +        ctx->retry_ct++;
> +      }
> +      else {

I think we need to add handling here for error_code == 438 (Stale Nonce), at a minimum. I think the handling can be identical or nearly identical to handling for 401.

We should (actually, RFC 2119 "SHOULD", according to RFC 5766, section 6.4) also add handling for 300/ALTERNATE-SERVER here, as it is actually quite important for TURN servers' ability to scale up. If this isn't something we can easily add right now, please open a bug to hold the place of adding it in the future.

@@ +273,5 @@
> +      ABORT(R_FAILED);
> +      break;
> +
> +    case NR_STUN_CLIENT_STATE_CANCELLED:
> +      return;  /* No-op. TODO(ekr@rtfm.com): Shouldn't happen? */

"return;" or "ABORT(R_INTERNAL);"? Is there a risk that some crank-turning will fall on the floor if we call neither the error_cb nor the success_cb?

@@ +344,2 @@
>  
> +  nr_turn_client_cancel(ctx);

Please add a comment indicating that nr_turn_client_cancel is responsible for deallocating all of our buffers.

@@ +361,5 @@
> +  RFREE(ctx->label);
> +  RFREE(ctx->username);
> +  r_data_destroy(&ctx->password);
> +  RFREE(ctx->nonce);
> +  RFREE(ctx->realm);

I think we need to zero-out the preceding pointers to ensure they're not double freed when we call nr_turn_client_failed() followed by nr_turn_client_ctx_destroy().

@@ +593,5 @@
> +  int r, _status;
> +
> +  assert(!tctx->refresh_timer_handle);
> +
> +  if (lifetime < TURN_REFRESH_SLACK_SECONDS) {

There's a degenerate case lurking here. If lifetime == TURN_REFRESH_SLACK_SECONDS, you will end up blasting out refreshes at a cadence of 1 per RTT, since each refresh response will come back and make it exactly time to send out a new refresh request. You probably want to make this, at the very least "if (lifetime <= TURN_REFRESH_SLACK_SECONDS)".

@@ +651,2 @@
>  #if 0
> +  // TODO(ekr@rtfm.com. Remove?

Please either explain the rationale for possibly doing a reset, or remove this block.

@@ +669,5 @@
> +}
> +
> +int nr_turn_client_send_indication(nr_turn_client_ctx *ctx,
> +                                   const UCHAR *msg, size_t len,
> +                                   int flags, nr_transport_addr *remote_addr)

I don't see any support for channels in here. That's probably okay, but I'd like to see documentation indicating that (a) we don't currently support channels, and (b) whether we intend to support them in the future.

@@ +719,5 @@
> +  nr_stun_message *ind=0;
> +  nr_stun_message_attribute *attr;
> +
> +  /* TODO(ekr@rtfm.com): Check that incoming message source is the
> +     relay */

This is an important part of the security story, right? I think we really want to implement this check before landing. It should be a straightforward check, right? Is there something tricky here that I'm not seeing?

@@ +738,5 @@
> +  if (!nr_stun_message_has_attribute(ind, NR_STUN_ATTR_DATA, &attr)) {
> +    ABORT(R_BAD_DATA);
> +  }
> +
> +  assert(newsize > attr->u.data.length);

assert(newsize >= attr->u.data.length);

@@ +770,5 @@
> +  int r, _status;
> +  nr_turn_permission *perm = 0;
> +  UINT8 now;
> +  UINT8 turn_permission_refresh = (TURN_PERMISSION_LIFETIME_SECONDS -
> +                                   TURN_REFRESH_SLACK_SECONDS) * 1000000;

You might consider a MICROSECONDS_PER_SECOND define somewhere to make it more clearer what's going on here.

::: media/mtransport/third_party/nICEr/src/stun/turn_client_ctx.h
@@ +44,5 @@
> +   permissions and restarts.
> + */
> +typedef struct nr_turn_stun_ctx_ {
> +  struct nr_turn_client_ctx_ *tctx;
> +  int mode;

Please add a comment pointing to where the values for "mode" are defined.
Attachment #731863 - Flags: review?(adam) → review-

Updated

5 years ago
Attachment #731948 - Attachment is obsolete: true
(Assignee)

Comment 14

5 years ago
Created attachment 733466 [details] [diff] [review]
Rewrite TURN
(Assignee)

Updated

5 years ago
Attachment #731863 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Attachment #733466 - Flags: review?(adam)
(Assignee)

Comment 15

5 years ago
Created attachment 733475 [details] [diff] [review]
Rewrite TURN
(Assignee)

Updated

5 years ago
Attachment #733466 - Attachment is obsolete: true
Attachment #733466 - Flags: review?(adam)

Updated

5 years ago
Attachment #733475 - Flags: review?(adam)

Comment 16

5 years ago
Comment on attachment 733475 [details] [diff] [review]
Rewrite TURN

Review of attachment 733475 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Minor nits below.

::: media/mtransport/nricectx.cpp
@@ +217,5 @@
> +    return NS_ERROR_INVALID_ARG;
> +  if (password_.empty())
> +    return NS_ERROR_INVALID_ARG;
> +
> +  if (!(server->username=r_strdup(const_cast<char *>(

Now that r_strdup takes a const, I think you can simply get rid of this const_cast<>().

@@ +229,5 @@
> +  // C++03 23.2.4, Paragraph 1 stipulates that the elements
> +  // in std::vector must be contiguous, and can therefore be
> +  // used as input to functions expecting C arrays.
> +  int r = r_data_create(&server->password,
> +                        const_cast<UCHAR *>(&password_[0]),

I see we fixed the other const_cast<>() instances. Thanks! Can we fix r_data_create also?

::: media/mtransport/test/ice_unittest.cpp
@@ +623,5 @@
> +    g_turn_server = turn_server;
> +  } else {
> +    printf(
> +        "Set the TURN_SERVER_ADDRESS environment variable to run TURN tests\n");
> +    return 0;

The username and password will probably vary server-to-server. I think we want to read them from environment variables also.

::: media/mtransport/test/turn_unittest.cpp
@@ +323,5 @@
> +  if (turn_server) {
> +    g_turn_server = turn_server;
> +  } else {
> +    printf(
> +        "Set the TURN_SERVER_ADDRESS environment variable to run this test\n");

The username and password will probably vary server-to-server. I think we want to read them from environment variables also.

::: media/mtransport/third_party/nICEr/src/stun/stun_util.c
@@ +235,4 @@
>  #endif /* USE_TURN */
>  
>      default:
> +         ret = "UNKNOWN MESSAGE";

When I did a check for what callers of this function expect, they appear to all check for 0 and do the right thing (e.g., print out the code as a hex string rather than the message type). I think we want to go ahead and leave ret=0. Doing so will allow the callers to fall through to their (more informative) exception handling rather than generating the string "UNKNOWN MESSAGE").
Attachment #733475 - Flags: review?(adam) → review+
(Assignee)

Comment 17

5 years ago
Created attachment 734303 [details] [diff] [review]
Rewrite TURN stack to match RFC 5766.
(Assignee)

Updated

5 years ago
Attachment #733475 - Attachment is obsolete: true
(Assignee)

Comment 18

5 years ago
Comment on attachment 734303 [details] [diff] [review]
Rewrite TURN stack to match RFC 5766.

Review of attachment 734303 [details] [diff] [review]:
-----------------------------------------------------------------

Adam,

I went over the TURN spec and made two slight modifications
for better conformance.

1. Don't create two permissions with the same port since
only the address is looked at.

2. Check incoming data indications match an extant permission.
Attachment #734303 - Flags: review?(adam)
Target is in and turned on in 23, and uplift it to 22 preffed off so it's available to try, and to greatly ease uplift of other fixes in related code (minimize bitrot that would block STUN updates, for example).
Target Milestone: --- → mozilla22

Comment 20

5 years ago
Comment on attachment 734303 [details] [diff] [review]
Rewrite TURN stack to match RFC 5766.

Review of attachment 734303 [details] [diff] [review]:
-----------------------------------------------------------------

The changes all look good, presuming the intent was "Don't create two permissions with the same address and different ports..."
Attachment #734303 - Flags: review?(adam) → review+
(Assignee)

Comment 21

5 years ago
Comment on attachment 734303 [details] [diff] [review]
Rewrite TURN stack to match RFC 5766.

Review of attachment 734303 [details] [diff] [review]:
-----------------------------------------------------------------

Adam, could you check this in? It's hard for me to do from here.
Attachment #734303 - Flags: checkin?(adam)
Attachment #734303 - Flags: checkin?(adam) → checkin+
https://hg.mozilla.org/mozilla-central/rev/e69693180dff
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla22 → mozilla23

Updated

5 years ago
Whiteboard: [WebRTC], [blocking-webrtc-] → [WebRTC], [blocking-webrtc-], [qa-]
I get this linking error on Linux64 when building with clang 3.2 and gold 1.11.

17:18.01 /usr/bin/ld.gold.real: error: /home/njn/moz/mi8/d64/toolkit/library/../../media/mtransport/third_party/nICEr/nicer_nicer/src/stun/stun_build.o: requires dynamic R_X86_64_PC32 reloc against 'isascii' which may overflow at runtime; recompile with -fPIC
17:18.01 /usr/bin/ld.gold.real: error: hidden symbol 'isascii' is not defined locally
17:18.01 /usr/bin/ld.gold.real: error: hidden symbol 'isascii' is not defined locally
17:18.01 clang: error: linker command failed with exit code 1 (use -v to see invocation)
> I get this linking error on Linux64 when building with clang 3.2 and gold
> 1.11.

I don't get it with GCC 4.7.2 and gold 1.11.  According to the man page, isascii() requires one of the following macros to be defined:

           _BSD_SOURCE || _SVID_SOURCE || _XOPEN_SOURCE

so I think you can't just assume it's present.

Updated

5 years ago
Depends on: 860222
Whiteboard: [WebRTC], [blocking-webrtc-], [qa-] → [WebRTC], [blocking-webrtc-], [qa-][webrtc-uplift]
Comment on attachment 734303 [details] [diff] [review]
Rewrite TURN stack to match RFC 5766.

NOTE: We're planning to land TURN with a pref that turns it off by default.  Since the changes cover a number of other parts we may need to modify, landing this in Aurora will greatly minimize the risk and effort to uplift any bugfixes from 23, now or when 22 gets to beta.  It also means that people using 22 can try it out (and if it survives largely bug-free, we might end up recommending it to people).

This will need to be taken with the front-end patches for TURN support and adding the pref from bug 855769

[Approval Request Comment]
Bug caused by (feature/regressing bug #): n/a

User impact if declined: Turn will not be accessible via a pref in FF22.  Extra effort and risk doing other uplifts in Aurora and Beta 22.

Testing completed (on m-c, etc.): on m-c; being fuzzed by cdiehl since before it landed.  So far pretty clean.  We're working with releng on getting them to add a TURN server to our test setup; we've stood up our own TURN server on EC2 and are setting up some continuous integration builds since the mochitests/etc can't test it.

Risk to taking this patch (and alternatives if risky): Low since the majority of the changes (those not shared by ICE) will be disabled by by default.  ICE changes are generally being covered by existing tests and our continuous integration, though coverage will be better once we get STUN & TURN support in test.

String or IDL/UUID changes made by this patch: none
Attachment #734303 - Flags: approval-mozilla-aurora?
(In reply to Randell Jesup [:jesup] from comment #26)
> Comment on attachment 734303 [details] [diff] [review]
> Rewrite TURN stack to match RFC 5766.
> 
> NOTE: We're planning to land TURN with a pref that turns it off by default. 
> Since the changes cover a number of other parts we may need to modify,
> landing this in Aurora will greatly minimize the risk and effort to uplift
> any bugfixes from 23, now or when 22 gets to beta.  It also means that
> people using 22 can try it out (and if it survives largely bug-free, we
> might end up recommending it to people).
> 
> This will need to be taken with the front-end patches for TURN support and
> adding the pref from bug 855769
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): n/a
> 
> User impact if declined: Turn will not be accessible via a pref in FF22. 
> Extra effort and risk doing other uplifts in Aurora and Beta 22.
> 
> Testing completed (on m-c, etc.): on m-c; being fuzzed by cdiehl since
> before it landed.  So far pretty clean.  We're working with releng on
> getting them to add a TURN server to our test setup; we've stood up our own
> TURN server on EC2 and are setting up some continuous integration builds
> since the mochitests/etc can't test it.
> 
> Risk to taking this patch (and alternatives if risky): Low since the
> majority of the changes (those not shared by ICE) will be disabled by by
> default.  ICE changes are generally being covered by existing tests and our
> continuous integration, though coverage will be better once we get STUN &
> TURN support in test.
> 
> String or IDL/UUID changes made by this patch: none

Spoke, offline with :jesup on this as we had risk concerns around uplifting such a huge patch on aurora as its quite uncommon that we do take such a change . Wanted to check what our alternative less riskier options could be here and be sure we understand risk of uplift divergence vs the risk of regression.We may consider uplift only if it is less risk and want to be sure its the case here.

:jesup is going to follow-up with :ekr,:abr for alternatives or address our concern's here.
(Assignee)

Comment 28

5 years ago
Small clarification that probably doesn't change the situation: the WebRTC Ad Hoc CI system already tests TURN.
Whiteboard: [WebRTC], [blocking-webrtc-], [qa-][webrtc-uplift] → [WebRTC], [blocking-webrtc-], [qa-][webrtc-uplift][fuzzing:cdiehl]
Whiteboard: [WebRTC], [blocking-webrtc-], [qa-][webrtc-uplift][fuzzing:cdiehl] → [WebRTC], [blocking-webrtc-], [qa-][webrtc-uplift], [fuzzing:cdiehl]
Comment on attachment 734303 [details] [diff] [review]
Rewrite TURN stack to match RFC 5766.

Spoke with jesup - at this point, risk caused by divergence is not greater than risk of critical regression.
Attachment #734303 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Whiteboard: [WebRTC], [blocking-webrtc-], [qa-][webrtc-uplift], [fuzzing:cdiehl] → [WebRTC], [blocking-webrtc-], [qa-], [fuzzing:cdiehl]
You need to log in before you can comment on or make changes to this bug.