Closed Bug 729512 Opened 8 years ago Closed 7 years ago

Implement DataChannel protocol on top of libsctp

Categories

(Core :: WebRTC: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: jesup, Assigned: jesup)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [WebRTC], [blocking-webrtc+] [qa-])

Attachments

(3 files, 31 obsolete files)

84.88 KB, patch
ted
: review+
mcmanus
: review+
Details | Diff | Splinter Review
12.47 KB, patch
mcmanus
: review+
Details | Diff | Splinter Review
80.56 KB, patch
mcmanus
: review+
Details | Diff | Splinter Review
Works over SCTP (over DTLS over ICE/UDP)
Specification per the W3C WebRTC group, in-progress.  Multiple data channels, reliable and unreliable.  Probably use PPID fields in SCTP streams to signal metadata and open/closes.
Note: this is for the bidirectional wire-line protocol on top of SCTP to be submitted to the IETF tracker today.
Blocks: 733002
Summary: Implement DataChannel support for PeerConnections → Implement DataChannel protocol on top of libsctp
Component: Networking → WebRTC: Networking
QA Contact: networking → webrtc-networking
Note: current patch still fails on Fedora 15 with 
../../../../netwerk/sctp/datachannel/../src/user_ip6_var.h:38:8: error: redefinition of ‘struct in6_pktinfo’
/usr/include/netinet/in.h:471:8: error: previous definition of ‘struct in6_pktinfo’

There are a number of dangling ends still in this patch, and it badly needs cleanup - I just worried about getting it to compile so far.
Attachment #608761 - Attachment is obsolete: true
Attachment #608820 - Attachment is obsolete: true
Attachment #608878 - Attachment is obsolete: true
Now should call the listeners when data comes in and channels are created.  Very little API change from last
Attachment #608883 - Attachment is obsolete: true
Attachment #609074 - Attachment is obsolete: true
Attachment #609132 - Attachment is obsolete: true
This applies on top of the two "import libsctp" patches and compiles clean on paris_demo branch.  No testing yet
Builds clean.  No testing
Attachment #609887 - Attachment is obsolete: true
Attached patch WIP on data channel (obsolete) — Splinter Review
Attachment #609142 - Attachment is obsolete: true
Attachment #609717 - Attachment is obsolete: true
Attachment #609909 - Attachment is obsolete: true
Attachment #610057 - Attachment is obsolete: true
This patch (and bug 729511 and bug 733002) are built on top of checkin 89130 in the paris_demo branch of alder, FYI
Attachment #610675 - Attachment is obsolete: true
Comment on attachment 617791 [details] [diff] [review]
DataChannel protocol (netwerk), with 3-way handshake

Note: this is against alder/default, though it really doesn't matter for this patch
Attachment #617791 - Attachment is obsolete: true
Attachment #617797 - Attachment is obsolete: true
Attachment #634388 - Attachment is obsolete: true
Attachment #634389 - Attachment is obsolete: true
Comment on attachment 634392 [details] [diff] [review]
DataChannel protocol (netwerk), with 3-way handshake - using updated API

This has Irene's and my updates merged in, but not the WinXP changes
Blocks: dc-tests
Attachment #634392 - Attachment is obsolete: true
rebased on top of latest SCTP and alder update
Attachment #640923 - Attachment is obsolete: true
Attachment #634394 - Attachment is obsolete: true
Attachment #641694 - Attachment is obsolete: true
Attached patch Update MPL headers (obsolete) — Splinter Review
also updates the xpcshell test for new PeerConnectionImpl interface
QA Contact: jsmith
more cleanup and error handling on hard errors; remove debugs not needed anymore.  NOTEhg qser! the transport flow creation has gotten iffy; it succeeds about one in 5 times.  If it fails you won't see the datachannels connecting.
Attachment #654141 - Attachment is obsolete: true
Should resolve clang issues with default destructors
Attachment #645464 - Attachment is obsolete: true
Attachment #645479 - Attachment is obsolete: true
Attachment #650622 - Attachment is obsolete: true
Attachment #653180 - Attachment is obsolete: true
Attachment #654191 - Attachment is obsolete: true
Attachment #654242 - Attachment is obsolete: true
Whiteboard: [WebRTC], [blocking-webrtc+]
Attachment #655518 - Attachment is obsolete: true
Comment on attachment 659644 [details] [diff] [review]
DataChannel protocol rollup

r? to biesi/mcmanus (either or both if they think it's needed)
r? to ted for the small amount of build changes

Code is running on alder (dom/media/tests/local_data_chat.html for an example, including transferring multi-MB blobs.

API implementation patterned significantly on WebSockets (though more so at the DOM level).  Protocol was originally tested in stand-alone test programs.

Tries to keep all IO from blocking the main thread; does not (yet) use the entire OutboundEnque setup, though I'm looking at switching to that (it does add considerable complexity, and I don't know/think usrsctp UDP (or DTLS/ICE/UDP) sends need to occur on the gSocketThread).

This implementation uses the usrsctp socket in non-blocking mode, and if a call would block, internally queues it for deferred sending later.  Calls to usrsctp_connect() and listen() are proxied to temporary threads.

Even with this setup, I probably can change the timer to a notification from the stack via the callback - looking into that; need to be very sure there aren't any cases where we fail to resend.

Currently fragments large data transfers into 16K chunks to avoid blocking in SCTP - SCTP doesn't allow internally-fragmented datagrams from different streams to be interleaved, for >128/256K transfers would monopolize all traffic until the entire blob was transferred.  Currently, it does fill the network buffers, so other channels might have to wait to get a slot even with this user-level fragmentation.  If the SCTP authors can come up with a draft for datagram interleaving in SCTP, that may solve this problem and allow us to just call usrsctp_sendv(). They do intend to write that draft and implementation.  If not, then this fragmentation mechanism will need to be refined, probably by monitoring network buffer fullness and avoiding pre-queuing much data.  (This is tricky to do fairly, though the current algorithm isn't really fair either.)
Attachment #659644 - Flags: review?(ted.mielczarek)
Attachment #659644 - Flags: review?(mcmanus)
Attachment #659644 - Flags: review?(cbiesinger)
Attachment #659644 - Attachment is obsolete: true
Attachment #659644 - Flags: review?(ted.mielczarek)
Attachment #659644 - Flags: review?(mcmanus)
Attachment #659644 - Flags: review?(cbiesinger)
Attachment #662012 - Attachment is obsolete: true
Comment on attachment 662013 [details] [diff] [review]
DataChannel protocol rollup (revised with better stream renegotiate support)

The updated patch has been tested through 100 simultaneous DataChannel opens, which means resetting the association several times to bump the number of SCTP streams (and resetting in the other direction to bump the number of incoming SCTP streams - streams are unidirectional, but DataChannels are bidirectional).

Cleaned up one thread issue I found, where StartDefer() was being called on the network thread - now proxies itself to MainThread if needed (timers must be inited on mainthread).
Attachment #662013 - Flags: review?(ted.mielczarek)
Attachment #662013 - Flags: review?(mcmanus)
Attachment #662013 - Flags: review?(cbiesinger)
Comment on attachment 662013 [details] [diff] [review]
DataChannel protocol rollup (revised with better stream renegotiate support)

Review of attachment 662013 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +1,5 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* vim: set ts=2 et sw=2 tw=80: */
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */

There's a bunch of trailing whitespace in this file, you should clean that up before landing it.

::: netwerk/sctp/datachannel/Makefile.in
@@ +2,5 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +DEPTH     = ../../..

DEPTH = @DEPTH@

@@ +19,5 @@
> +
> +EXPORTS_NAMESPACES = mozilla/net
> +
> +XPIDLSRCS = \
> +  $(NULL)

Get rid of this.

@@ +42,5 @@
> +DEFINES = \
> +   -DINET=1 \
> +   -DINET6=1 \
> +   -DSCTP_DEBUG=1 \
> +   $(NULL)

two-space indent.

@@ +65,5 @@
> +endif
> +
> +
> +DEFINES_debug = \
> +   $(NULL)

Get rid of this.
Attachment #662013 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 662013 [details] [diff] [review]
DataChannel protocol rollup (revised with better stream renegotiate support)

Review of attachment 662013 [details] [diff] [review]:
-----------------------------------------------------------------

I'll leave this review to Patrick, unless you specifically want me to take a look. Just one comment:

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +40,5 @@
> +#else
> +#define ARRAY_LEN(x) (sizeof((x))/sizeof((x)[0]))
> +#endif
> +
> +// NS_ENSURE_TRUE for void functions

You can just use NS_ENSURE_TRUE(foo, /**/); There's a few cases of that in our tree:
./layout/generic/nsObjectFrame.cpp:  NS_ENSURE_TRUE(window, /**/);
./layout/style/nsHTMLStyleSheet.cpp:  NS_ENSURE_TRUE(aMapped, /**/);
./layout/base/nsCaret.cpp:  NS_ENSURE_TRUE(presShell, /**/);
Attachment #662013 - Flags: review?(cbiesinger)
Actually bug 788242 just added NS_ENSURE_TRUE_VOID.
Comment on attachment 662013 [details] [diff] [review]
DataChannel protocol rollup (revised with better stream renegotiate support)

Review of attachment 662013 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!

Lots of notes included - I hope they help you make it stronger. Most stuff can either be done as followups (please do file bugs if you think that it should be deferred), or just changed/discussed without me doing a re-review. There are a couple important exceptions that block this from being checked in as-is imo:

* a couple uses of malloc() assume that it is infallible which I don't think is true. (Its true for new()) and bad stuff can happen - so that needs to be sorted out. Also I think there are some big allocations where you actually don't want infallible semantics, but that can be a followon.

* new blocking main thread IO just can't be introduced. sad, but not negotiable at least at my pay grade. Other pay grades may have more discretion :)

also, some notes on what it is (http://tools.ietf.org/html/draft-jesup-rtcweb-data-protocol-03 ??) that is being implemented would help future users decide whether lack of interop is a bug or not.

There are obvious limits here to the value of my review without becoming a subject matter expert in rtc specs. But that's the situation we are in. for this in particular I have read draft-data-channel and draft-data-protocol

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +42,5 @@
> +#endif
> +
> +// NS_ENSURE_TRUE for void functions
> +#define DC_ENSURE_TRUE(x)                                     \
> +  PR_BEGIN_MACRO                                              \

I thought the trend was against macros that did a return because they obscured flow control. That's certainly my pov, but its not any kind of "I insist review comment".

@@ +54,5 @@
> +
> +namespace mozilla {
> +
> +class DataChannelShutdown;
> +DataChannelShutdown *gDataChannelShutdown;

you've kinda given me the willies here.. this is an xpcom object without any references other than that held by the observer service and this weak ptr.. so its very hard to know if the weak ptr is valid or not.

I think this should be a nsCOMPtr which is set to null when you observe the net-teardown event.

@@ +67,5 @@
> +  // sctp_finish)
> +
> +  NS_DECL_ISUPPORTS
> +
> +  DataChannelShutdown() 

I would assert the methods of this class are run on the main thread (observersevice requires it)

@@ +75,5 @@
> +      if (!observerService)
> +        return;
> +
> +      // XXX Verify this is true
> +      ++mRefCnt;    // Our refcnt must be > 0 when we call this, or we'll get deleted!

I get why you do this instead of addref/release, but I wonder how kosher it is to be doing this kind of xpcom stuff out of the ctor of an object that isn't fully initialized (particularly passing this on the next line).

can you just split it into a ctor/init pair?

@@ +95,5 @@
> +    }
> +
> +  NS_IMETHODIMP Observe(nsISupports* aSubject, const char* aTopic,
> +                        const PRUnichar* aData) {
> +    if (strcmp(aTopic, "profile-change-net-teardown") == 0) {

other code also listens for xpcom-shutdown as a backup.. not sure if there is a circumstance where net-teardown could be not generated.

@@ +109,5 @@
> +
> +NS_IMPL_ISUPPORTS1(DataChannelShutdown, nsIObserver);
> +
> +
> +BufferedMsg::BufferedMsg(struct sctp_sendv_spa &spa,const char *data,

space after comma.. again in memcpy.. and recurring throughout the file.. I'm a huge offender of that stuff myself - google for jst-review.pl if you haven't seen it and it is a script that will help find that nitty stuff in your patches. it helps me (and it finds whitespace stuff too that really rubs some folks the wrong way.)

@@ +114,5 @@
> +                         uint32_t length) : mLength(length)
> +{
> +  mSpa = new sctp_sendv_spa;
> +  *mSpa = spa;
> +  char *tmp = new char[length]; // infallible malloc!

I don't have a good feel for the size of these messages.. but are they potentially large allocations that we should be using fallible malloc for (it does exist if you go hunting for it :)) with an error path? Crashing OOM is a pretty common thing for us.

@@ +127,5 @@
> +}
> +
> +static int
> +receive_cb(struct socket* sock, union sctp_sockstore addr, 
> +           void *data, size_t datalen, 

you've got both a "socket* sock" and a "void *data".. pick one.. necko runs contrary to most of gecko and uses "type *foo".. but as long as the file is consistent its fine by me either way

@@ +166,5 @@
> +
> +NS_IMPL_THREADSAFE_QUERY_INTERFACE1(DataChannelConnection,
> +                                    nsITimerCallback)
> +NS_IMPL_THREADSAFE_ADDREF(DataChannelConnection)
> +NS_IMPL_THREADSAFE_RELEASE(DataChannelConnection)

I think you can just do NS_IMPL_THREADSAFE_ISUPPORTS1(DataChannelConnection, nsITimerCallback) instead of the triplet.. no?

@@ +204,5 @@
> +      usrsctp_sysctl_set_sctp_debug_on(0 /* SCTP_DEBUG_ALL */);
> +      usrsctp_sysctl_set_sctp_blackhole(2);
> +      sctp_initialized = true;
> +
> +      gDataChannelShutdown = new DataChannelShutdown();

DataChannelShutdown can only be called on the main thread (observer service dep), so this method should assert it too. If the sctp library is threadsafe then you can presumably get rid of the lock in the init function too. (main thread only will prevent simultaneous init)

@@ +257,5 @@
> +
> +  // Update number of streams
> +  mStreamsOut.AppendElements(aNumStreams);
> +  mStreamsIn.AppendElements(aNumStreams); // make sure both are the same length
> +  for (uint32_t i = 0; i < aNumStreams; i++) {

we're making an effort to standardize on ++i in our necko for loops.. its totally unimportant but I throw it out there as fyi

@@ +278,5 @@
> +                         (socklen_t)sizeof(initmsg)) < 0) {
> +    LOG(("*** failed setsockopt SCTP_INITMSG, errno %d",errno));
> +    usrsctp_close(mMasterSocket);
> +    mMasterSocket = NULL;
> +    return false;

this close(), mastersocket = null, return false pattern is used 5 times in the function.. I would replace it either with a goto exception or a wrapper function to check the return code and do the cleanup.

also use nullptr everywhere in the file that NULL is used

@@ +321,5 @@
> +  MOZ_ASSERT(NS_IsMainThread());
> +  LOG(("%s: %p [%p] (%dms), sending deferred messages", __FUNCTION__, this, timer, mDeferTimeout));
> +
> +  if (timer == mDeferredTimer) {
> +    if (SendDeferredMessages() != 0) {

senddeferredmessages() is a bool.. so you probably don't want to confuse the issue by comparing against 0.

@@ +341,5 @@
> +}
> +
> +#ifdef MOZ_PEERCONNECTION
> +bool 
> +DataChannelConnection::ConnectDTLS(TransportFlow *aFlow, uint16_t localport, uint16_t remoteport)

Its hard for me to understand how this is used.. no callers are in alder..

@@ +398,5 @@
> +
> +  // Notify Connection open
> +  LOG(("%s: sending ON_CONNECTION for %p",__FUNCTION__,_this));
> +  _this->mNumChannels = 0;
> +  _this->mSocket = _this->mMasterSocket;  // XXX Be careful!  

what am I being careful of?

@@ +413,5 @@
> +void 
> +DataChannelConnection::PacketReceived(TransportFlow *flow, 
> +                                      const unsigned char *data, size_t len)
> +{
> +  //LOG(("%p: SCTP/DTLS received %ld bytes",this,len));

leave the logs if you can .. spdy manages to log every byte and still run sorta well enough to get a log when you need one.

@@ +446,5 @@
> +{
> +  struct sockaddr_in addr;
> +  socklen_t addr_len;
> +
> +  NS_WARN_IF_FALSE(!NS_IsMainThread(), "Blocks, do not call from main thread!!!");

so what thread is this meant to be called from? a little documentation on the threading model would help as this patch doesn't have any callers

@@ +494,5 @@
> +{
> +  struct sockaddr_in addr4;
> +  struct sockaddr_in6 addr6;
> +
> +  NS_WARN_IF_FALSE(!NS_IsMainThread(), "Blocks, do not call from main thread!!!");

again.. what thread?

@@ +643,5 @@
> +
> +int32_t
> +DataChannelConnection::SendControlMessage(void *msg, uint32_t len, uint16_t streamOut)
> +{
> +  struct sctp_sndinfo sndinfo;

mLock.AssertCurrentThreadOwns() ?

@@ +645,5 @@
> +DataChannelConnection::SendControlMessage(void *msg, uint32_t len, uint16_t streamOut)
> +{
> +  struct sctp_sndinfo sndinfo;
> +
> +  // XXX fix!  Main-thread IO

So far this is the only thing I have read that should block this code from being merged to m-c.

We can't have mechanisms that introduce main thread jank - we're investing tons of resources into removing dangerous interfaces that aren't even being currently abused - we can't add new ones.

@@ +655,5 @@
> +                    SCTP_SENDV_SNDINFO, 0) < 0) {
> +    //LOG(("***failed: sctp_sendv")); don't log because errno is a return!
> +    return (0);
> +  } else {
> +    return (1);

no need for the else branch

@@ +663,5 @@
> +
> +int32_t
> +DataChannelConnection::SendOpenResponseMessage(uint16_t streamOut, uint16_t streamIn)
> +{
> +  /* XXX: This should be encoded in a better way */

this comment doesn't help me understand the shortcomings of what's going on..

also assert lock?

@@ +677,5 @@
> +
> +int32_t
> +DataChannelConnection::SendOpenAckMessage(uint16_t streamOut)
> +{
> +  /* XXX: This should be encoded in a better way */

assert lock?

@@ +692,5 @@
> +                                              uint16_t streamOut, bool unordered,
> +                                              uint16_t prPolicy, uint32_t prValue)
> +{
> +  /* XXX: This should be encoded in a better way */
> +  char *temp = ToNewCString(label);

I think you can/should avoid the alloc/copy/free of this with BeginReading() and Length()

@@ +695,5 @@
> +  /* XXX: This should be encoded in a better way */
> +  char *temp = ToNewCString(label);
> +  int len = strlen(temp); // not including nul
> +  struct rtcweb_datachannel_open_request *req = 
> +    (struct rtcweb_datachannel_open_request*) malloc(sizeof(*req)+len);

I think you should call moz_xmalloc if you want infallible allocations

@@ +728,5 @@
> +  return SendControlMessage(req, sizeof(*req)+len, streamOut);
> +}
> +
> +// XXX This should use a separate thread (outbound queue) which should
> +// select() to know when to *try* to send data to the socket again.

yeah.. please file a bugzilla to track and expand this comment to make it clear what the current implmented algorithm is.

@@ +731,5 @@
> +// XXX This should use a separate thread (outbound queue) which should
> +// select() to know when to *try* to send data to the socket again.
> +// Alternatively, it can use a timeout, but that's guaranteed to be wrong
> +// (just not sure in what direction).  We could re-implement NSPR's
> +// PR_POLL_WRITE/etc handling... with a lot of work.

comment on what return value means here

@@ +768,5 @@
> +                                    channel));
> +        }
> +      }
> +    }
> +    if (!still_blocked &&

normal necko style would be 

if (still_blocked)
  continue;

each time you check it.. helps with the indentation and understanding the logic.

@@ +833,5 @@
> +            LOG(("error %d re-sending string",errno));
> +            failed_send = true;
> +          }
> +        } else {
> +          //LOG(("Resent buffer of %d bytes (%d)",len,result));

leave the log in imo.. or take it out if you must.. but probably not as a comment

@@ +846,5 @@
> +    }
> +  }
> +
> +  if (!still_blocked) {
> +    // adjust time for next wait

did you mean to reset this to 10 here?

@@ +850,5 @@
> +    // adjust time for next wait
> +    return false;
> +  }
> +  // adjust time?  More time for next wait if we didn't send anything, less if did
> +  // Pretty crude, but better than nothing; just to keep CPU use down

the decision is up to you obviously, but my gut says you need to move these in > 1 ms increments to have an impact

@@ +868,5 @@
> +  DataChannel *channel;
> +  uint32_t prValue;
> +  uint16_t prPolicy;
> +  uint32_t flags;
> +  nsCString label(nsDependentCString(req->label));

assert mlock held?

@@ +903,5 @@
> +
> +  OpenResponseFinish(channel);
> +}
> +
> +// Called under MutexAutoLock(mLock)

assert it!

@@ +915,5 @@
> +  if (streamOut == INVALID_STREAM) {
> +    if (!RequestMoreStreamsOut()) {
> +      /* XXX: Signal error to the other end. */
> +      mStreamsIn[channel->mStreamIn] = NULL;
> +      MutexAutoUnlock unlock(mLock);

wow.. that's exciting!

I get that you're doing that to protect against a deadlock in datachannelconnection::close(), but it isn't clear to me that there couldn't be a problem with datachannel::dtor .. I can't really say because I don't have a firm grasp on the calling stacks for all of this code.

If the concern is different threads (i.e. none of the code is reentrant on the same thread), then maybe datachannelconnection::close can test if the current thread holds the lock and only obtain it if that isn't true, then you don't have to temporarily give it up and create a racy window.

@@ +945,5 @@
> +      } else {
> +        /* XXX: Signal error to the other end. */
> +        mStreamsIn[channel->mStreamIn] = NULL;
> +        mStreamsOut[streamOut] = NULL;
> +        MutexAutoUnlock unlock(mLock);

still makes me queasy!

@@ +961,5 @@
> +{
> +  uint16_t streamOut;
> +  DataChannel *channel;
> +
> +  streamOut = ntohs(rsp->reverse_stream);

assert lock held? (I'll stop saying this)

@@ +1016,5 @@
> +DataChannelConnection::HandleUnknownMessage(uint32_t ppid, size_t length, uint16_t streamIn)
> +{
> +  /* XXX: Send an error message? */
> +  LOG(("unknown DataChannel message received: %u, len %ld on stream %lu",ppid,length,streamIn));
> +  NS_WARNING("unknown DataChannel message received");

a data condition from an untrusted endpoint shouldn't generate NS_WARNING imo. LOG it. Even the JS error console, but its not a code failure.

@@ +1048,5 @@
> +             length, channel->mStreamOut, (int)PR_MIN(length,80), buffer));
> +        length = -1; // Flag for DOMString
> +
> +        // paranoia (WebSockets does this)
> +        if (!IsUTF8(recvData, false)) {

websockets does that because the websockets spec insists on it. I think its kinda weird - so don't copy it unless you want it - that's a linear traversal of the message.

@@ +1058,5 @@
> +          channel->mBinaryBuffer.Truncate(0);
> +        break;
> +
> +      case DATA_CHANNEL_PPID_BINARY:
> +        channel->mBinaryBuffer += recvData;

can you double check the error path here? I don't think (but am not sure) that is infallible.. which is how you are using it. and given these might be large messages (true?) infallible probably isn't what you want..

@@ +1093,5 @@
> +  return;
> +}
> +
> +void
> +DataChannelConnection::HandleMessage(char *buffer, size_t length, uint32_t ppid, uint16_t streamIn)

why is this being cast through char * when its never a char *? is void * better?

@@ +1109,5 @@
> +        case DATA_CHANNEL_OPEN_REQUEST:
> +          LOG(("length %u, sizeof(*req) = %u",length,sizeof(*req)));
> +          DC_ENSURE_TRUE(length >= sizeof(*req));
> +
> +          req = (struct rtcweb_datachannel_open_request *)buffer;

try and use c++ casts throughout please.

@@ +1119,5 @@
> +          rsp = (struct rtcweb_datachannel_open_response *)buffer;
> +          HandleOpenResponseMessage(rsp, length, streamIn);
> +          break;
> +        case DATA_CHANNEL_ACK:
> +          // DC_ENSURE_TRUE(length >= sizeof(*ack)); checked above

then assert it or just leave the comment imo

@@ +1170,5 @@
> +  LOG(("Association change: streams (in/out) = (%u/%u)",
> +       sac->sac_inbound_streams, sac->sac_outbound_streams));
> +  n = sac->sac_length - sizeof(*sac);
> +  if (((sac->sac_state == SCTP_COMM_UP) ||
> +        (sac->sac_state == SCTP_RESTART)) && (n > 0)) {

n is uint.. any chance it has wrapped around? (i.e. could sac->sac_length be 0?) maybe runtime check that?

@@ +1292,5 @@
> +void
> +DataChannelConnection::HandleShutdownEvent(const struct sctp_shutdown_event *sse)
> +{
> +  LOG(("Shutdown event."));
> +  /* XXX: notify all channels. */

isn't this important to have the first time around?
can you file a followon bug?

@@ +1353,5 @@
> +  if (mStreamsResetting.IsEmpty() == 0) {
> +    return;
> +  }
> +  len = sizeof(sctp_assoc_t) + (2 + mStreamsResetting.Length()) * sizeof(uint16_t);
> +  srs = (struct sctp_reset_streams *) malloc(len); // infallible malloc

I could be wrong about this (it doesn't come up very often), but I believe malloc() is fallible. new() is infallible. moz_xmalloc is the infallible malloc and even if some macro/linker trick has made malloc() infallible moz_xmalloc is more readable imo because you don't need to remember that it isn't doing the standard libc thing or worry that your code failed the link/include correctly to get the tricks working. (that was a recurring bug on a previous project of mine that overloaded malloc).

@@ +1366,5 @@
> +  } else {
> +    mStreamsResetting.Clear();
> +  }
> +  free(srs);
> +  return;

return as last line of a void function is frowned upon in gecko. I'm used to it myself but my gecko reviews have beaten it out of me now :)

@@ +1416,5 @@
> +        }
> +      }
> +    }
> +  }
> +  return;

don't return as last line of a void functon

@@ +1431,5 @@
> +    LOG(("*** Failed increasing number of streams from %u (%u/%u)",
> +         mStreamsOut.Length(),
> +         strchg->strchange_instrms,
> +         strchg->strchange_outstrms));
> +    // XXX FIX! notify pending opens of failure

we need to open bugs for all of this kind of thing that will be implemented in another pass

@@ +1549,5 @@
> +    break;
> +  case SCTP_ADAPTATION_INDICATION:
> +    HandleAdaptationIndication(&(notif->sn_adaptation_event));
> +    break;
> +  case SCTP_PARTIAL_DELIVERY_EVENT:

do we want to at least log the cases that aren't handled?

@@ +1574,5 @@
> +   }
> + }
> +
> +
> +// NOTE: not called on main thread

awesome. assert it. what thread is it called on?

@@ +1576,5 @@
> +
> +
> +// NOTE: not called on main thread
> +int
> +DataChannelConnection::ReceiveCallback(struct socket* sock, void *data, size_t datalen,

some documentation on what the return values mean would help

@@ +1580,5 @@
> +DataChannelConnection::ReceiveCallback(struct socket* sock, void *data, size_t datalen,
> +                                       struct sctp_rcvinfo rcv, int32_t flags)
> +{
> +  if (data == NULL) {
> +    usrsctp_close(sock);

I'm a little confused at what's going on here.. why would it be null and why is closing the socket the right thing to do?

also s/NULL/nullptr

@@ +1646,5 @@
> +        channel->mState = CLOSED;
> +        NS_ERROR("Failed to request more streams");
> +        return channel;
> +      }
> +      MutexAutoUnlock unlock(mLock);

same comment as before

@@ +1747,5 @@
> +  spa.sendv_prinfo.pr_value = channel->mPrValue;
> +
> +  spa.sendv_flags = SCTP_SEND_SNDINFO_VALID | SCTP_SEND_PRINFO_VALID;
> +
> +  // XXX fix!  Main-thread IO

I presume this can block based on media access or congestion control, yes? I'm a little confused on that point because there is a queueing and deferral path, but I think that's just for exhaustion of sctp buffers?

in any event - the main thread can't block arbitrarily. that's not something that can really wait for a followup.

@@ +1788,5 @@
> +{
> +  // Since there's a limit on network buffer size and no limits on message
> +  // size, and we don't want to use EOR mode (multiple writes for a
> +  // message, but all other streams are blocked until you finish sending
> +  // this message), we need to add application-level fragmentatio of large

typo fragmentation

@@ +1841,5 @@
> +  // socket and to avoid main-thread IO that might block.  Even on a
> +  // background thread, we may not want to block on one stream's data.
> +  // I.e. run non-blocking and service multiple channels.
> +
> +  // For now as a hack, block main thread while sending it.

a little bit of me just died :(

@@ +1902,5 @@
> +  }
> +  return NS_OK;
> +}
> +
> +}

comment that this is about the namespace

::: netwerk/sctp/datachannel/DataChannel.h
@@ +29,5 @@
> +  struct socket;
> +  struct sctp_rcvinfo;
> +}
> +
> +namespace mozilla {

we're using mozilla::net .. please move to that

@@ +98,5 @@
> +
> +  DataChannelConnection(DataConnectionListener *listener);
> +  virtual ~DataChannelConnection();
> +
> +  bool Init(unsigned short aPort, uint16_t aNumStreams, bool aUsingDtls);

what does aNumStreams mean? what are the implications of different numbers? I think it is a max # of opens?

@@ +216,5 @@
> +  nsAutoTArray<uint16_t,4> mStreamsResetting;
> +
> +  struct socket *mMasterSocket;
> +  struct socket *mSocket;
> +  uint16_t mNumChannels;

I'd suggest a differnet name.. totalChannelCount maybe.. took me a while to realize it wasn't tracking a balance

@@ +227,5 @@
> +  uint16_t mRemotePort;
> +
> +  // Timer to control when we try to resend blocked messages
> +  nsCOMPtr<nsITimer> mDeferredTimer;
> +  uint32_t mDeferTimeout;

units?

@@ +231,5 @@
> +  uint32_t mDeferTimeout;
> +  bool mTimerRunning;
> +};
> +
> +class DataChannel {

in general there is a LOT of code defined in this header. can we move most of these methods that are > 1 line to the cpp?

@@ +247,5 @@
> +              const nsACString& label,
> +              uint16_t policy, uint32_t value,
> +              uint32_t flags,
> +              DataChannelListener *aListener,
> +              nsISupports *aContext) : 

preferred necko syntax

fx(a, b, c)
 : mA(a)
 , mB(b)
 , mC(c)
{}

@@ +277,5 @@
> +    }
> +
> +  // Set the listener (especially for channels created from the other side)
> +  void SetListener(DataChannelListener *aListener, nsISupports *aContext)
> +    { mContext = aContext; mListener = aListener; } // XXX Locking?

you don't seem to hold the lock when you call mListener->foo(mContext so I dno't know why you would need it here. perhaps assert main thread.

remove the comment if the question is resolved.

@@ +395,5 @@
> +      mConnection(aConnection) {}
> +
> +  NS_IMETHOD Run()
> +  {
> +    printf("OnMessage: mChannel %p mConnection %p\n",mChannel,(void *)mConnection.get());

LOG or delete

@@ +416,5 @@
> +        break;
> +    }
> +    switch (mType) {
> +      case ON_DATA:
> +        printf("OnMessage: ON_DATA:  mListener %p context %p, mLen %d\n",(void *)mChannel->mListener,(void*) mChannel->mContext,mLen);

LOG or delete

::: netwerk/sctp/datachannel/DataChannelLog.h
@@ +10,5 @@
> +#ifdef MOZ_LOGGING
> +#define FORCE_PR_LOG
> +#endif
> +
> +#if defined(PR_LOG)

Make you sure you internalize this piece of advice. I think you're ok, but in case you don't know all the ramifications: https://bugzilla.mozilla.org/show_bug.cgi?id=795897#c2

::: netwerk/sctp/datachannel/DataChannelProtocol.h
@@ +7,5 @@
> +#ifndef NETWERK_SCTP_DATACHANNEL_DATACHANNELPROTOCOL_H_
> +#define NETWERK_SCTP_DATACHANNEL_DATACHANNELPROTOCOL_H_
> +
> +#if defined(__GNUC__)
> +#define SCTP_PACKED __attribute__((packed))

its a little weird to see compiler directives showing up in leaf h files (usually there is an abstraction for them somewhere), but I don't have a particular suggestion or objection.. I might check with bsmedberg or ehsan.
Attachment #662013 - Flags: review?(mcmanus)
Attachment #663132 - Flags: review+
Blocks: 797135
(In reply to Patrick McManus [:mcmanus] from comment #44)

If I don't respond to an item here, assume I agree and will deal with it.

> Lots of notes included - I hope they help you make it stronger. Most stuff
> can either be done as followups (please do file bugs if you think that it
> should be deferred), or just changed/discussed without me doing a re-review.
> There are a couple important exceptions that block this from being checked
> in as-is imo:
> 
> * a couple uses of malloc() assume that it is infallible which I don't think
> is true. (Its true for new()) and bad stuff can happen - so that needs to be
> sorted out. Also I think there are some big allocations where you actually
> don't want infallible semantics, but that can be a followon.

Turns out we haven't made malloc() infallible yet, and no one is pushing it.  Sigh.  Easily changed to moz_xmalloc().  (I'll look at changing to fallible for large blobs/arrays as a followup.)

> * new blocking main thread IO just can't be introduced. sad, but not
> negotiable at least at my pay grade. Other pay grades may have more
> discretion :)

Understood.  We're going in preffed-off by default, so I think a bug on it is enough, blocking preffing it on - I'll verify.

> also, some notes on what it is
> (http://tools.ietf.org/html/draft-jesup-rtcweb-data-protocol-03 ??) that is
> being implemented would help future users decide whether lack of interop is
> a bug or not.
> 
> There are obvious limits here to the value of my review without becoming a
> subject matter expert in rtc specs. But that's the situation we are in. for
> this in particular I have read draft-data-channel and draft-data-protocol
> 
> ::: netwerk/sctp/datachannel/DataChannel.cpp
> @@ +42,5 @@
> > +#endif
> > +
> > +// NS_ENSURE_TRUE for void functions
> > +#define DC_ENSURE_TRUE(x)                                     \
> > +  PR_BEGIN_MACRO                                              \
> 
> I thought the trend was against macros that did a return because they
> obscured flow control. That's certainly my pov, but its not any kind of "I
> insist review comment".

I prefer to see returns myself, but certainly that's the common idiom.  As I've gotten used to seeing it, I find it less annoying, but I understand.  I'll chat with some people.  At minimum I'll switch from DC_ENSURE_SUCCESS(x) to NS_ENSURE_SUCCESS(x, /* */) which is used a lot

> @@ +54,5 @@
> > +
> > +namespace mozilla {
> > +
> > +class DataChannelShutdown;
> > +DataChannelShutdown *gDataChannelShutdown;
> 
> you've kinda given me the willies here.. this is an xpcom object without any
> references other than that held by the observer service and this weak ptr..
> so its very hard to know if the weak ptr is valid or not.
> 
> I think this should be a nsCOMPtr which is set to null when you observe the
> net-teardown event.

I can do that or something similar; I was copying usage from elsewhere.


> @@ +75,5 @@
> > +      if (!observerService)
> > +        return;
> > +
> > +      // XXX Verify this is true
> > +      ++mRefCnt;    // Our refcnt must be > 0 when we call this, or we'll get deleted!
> 
> I get why you do this instead of addref/release, but I wonder how kosher it
> is to be doing this kind of xpcom stuff out of the ctor of an object that
> isn't fully initialized (particularly passing this on the next line).
> 
> can you just split it into a ctor/init pair?

I can simply clean this up some.  I was using nsPrefBranch as a model, but there are better ones.

> 
> @@ +95,5 @@
> > +    }
> > +
> > +  NS_IMETHODIMP Observe(nsISupports* aSubject, const char* aTopic,
> > +                        const PRUnichar* aData) {
> > +    if (strcmp(aTopic, "profile-change-net-teardown") == 0) {
> 
> other code also listens for xpcom-shutdown as a backup.. not sure if there
> is a circumstance where net-teardown could be not generated.

Well, if we're not generating it there's a bug, and while we could clean up memory allocs, it would be too late to cleanly close sockets and tell the other side we're gone.

> 
> @@ +109,5 @@
> > +
> > +NS_IMPL_ISUPPORTS1(DataChannelShutdown, nsIObserver);
> > +
> > +
> > +BufferedMsg::BufferedMsg(struct sctp_sendv_spa &spa,const char *data,
> 
> space after comma.. again in memcpy.. and recurring throughout the file..
> I'm a huge offender of that stuff myself - google for jst-review.pl if you
> haven't seen it and it is a script that will help find that nitty stuff in
> your patches. it helps me (and it finds whitespace stuff too that really
> rubs some folks the wrong way.)

I've used the virtual-jst tool before; got out of the habit.  I'll at least clean up the comma-space stuff.

> 
> @@ +114,5 @@
> > +                         uint32_t length) : mLength(length)
> > +{
> > +  mSpa = new sctp_sendv_spa;
> > +  *mSpa = spa;
> > +  char *tmp = new char[length]; // infallible malloc!
> 
> I don't have a good feel for the size of these messages.. but are they
> potentially large allocations that we should be using fallible malloc for
> (it does exist if you go hunting for it :)) with an error path? Crashing OOM
> is a pretty common thing for us.

Could be MB, in theory (though not a great idea in practice) could be many, many MB or even more.  I'll move them to fallible-malloc in a followup (I'll file a bug).  Was on my list, but not down in a bug.

> 
> @@ +127,5 @@
> > +}
> > +
> > +static int
> > +receive_cb(struct socket* sock, union sctp_sockstore addr, 
> > +           void *data, size_t datalen, 
> 
> you've got both a "socket* sock" and a "void *data".. pick one.. necko runs
> contrary to most of gecko and uses "type *foo".. but as long as the file is
> consistent its fine by me either way

Since we're in necko, "foo *bar" it is.  I far prefer that myself, and while smaug made me switch the DataChannel DOM, we're horribly inconsistent within the tree (and within files; a bunch of classes often are exceptions because they read better with "foo *bar").


> @@ +257,5 @@
> > +
> > +  // Update number of streams
> > +  mStreamsOut.AppendElements(aNumStreams);
> > +  mStreamsIn.AppendElements(aNumStreams); // make sure both are the same length
> > +  for (uint32_t i = 0; i < aNumStreams; i++) {
> 
> we're making an effort to standardize on ++i in our necko for loops.. its
> totally unimportant but I throw it out there as fyi

Easy change, done.

> @@ +341,5 @@
> > +}
> > +
> > +#ifdef MOZ_PEERCONNECTION
> > +bool 
> > +DataChannelConnection::ConnectDTLS(TransportFlow *aFlow, uint16_t localport, uint16_t remoteport)
> 
> Its hard for me to understand how this is used.. no callers are in alder..

media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
The DOM/PeerConnection-level interface is somewhat temporary, as we're waiting for the SCTP library people to let us use the same port on both sides of the DTLS connection (currently they must be different).

> @@ +398,5 @@
> > +
> > +  // Notify Connection open
> > +  LOG(("%s: sending ON_CONNECTION for %p",__FUNCTION__,_this));
> > +  _this->mNumChannels = 0;
> > +  _this->mSocket = _this->mMasterSocket;  // XXX Be careful!  
> 
> what am I being careful of?

Old note to myself basically to be careful with socket vs master socket.  Removed.

> @@ +413,5 @@
> > +void 
> > +DataChannelConnection::PacketReceived(TransportFlow *flow, 
> > +                                      const unsigned char *data, size_t len)
> > +{
> > +  //LOG(("%p: SCTP/DTLS received %ld bytes",this,len));
> 
> leave the logs if you can .. spdy manages to log every byte and still run
> sorta well enough to get a log when you need one.

These are available via mtransport:5, so I can leave them off here.  Turned back on some others, but I want the logs to be useful for me to read through; some stuff just obscures things or may change timing.

> 
> @@ +446,5 @@
> > +{
> > +  struct sockaddr_in addr;
> > +  socklen_t addr_len;
> > +
> > +  NS_WARN_IF_FALSE(!NS_IsMainThread(), "Blocks, do not call from main thread!!!");
> 
> so what thread is this meant to be called from? a little documentation on
> the threading model would help as this patch doesn't have any callers

Temporary thread to get blocking IO off mainthread

PeerConnection calls this:
  listenPort = port;
  PR_CreateThread(
    PR_SYSTEM_THREAD,
    PeerConnectionImpl::ListenThread, this,
    PR_PRIORITY_NORMAL,
    PR_GLOBAL_THREAD,
    PR_JOINABLE_THREAD, 0
  );

And ListenThread() calls Listen().
Ditto for Connect.

> @@ +643,5 @@
> > +
> > +int32_t
> > +DataChannelConnection::SendControlMessage(void *msg, uint32_t len, uint16_t streamOut)
> > +{
> > +  struct sctp_sndinfo sndinfo;
> 
> mLock.AssertCurrentThreadOwns() ?

We can; we do that in HandleMessage().  However, this made me realize that SendDeferredMessages() doesn't have the lock.  Added it.

> 
> @@ +645,5 @@
> > +DataChannelConnection::SendControlMessage(void *msg, uint32_t len, uint16_t streamOut)
> > +{
> > +  struct sctp_sndinfo sndinfo;
> > +
> > +  // XXX fix!  Main-thread IO
> 
> So far this is the only thing I have read that should block this code from
> being merged to m-c.
> 
> We can't have mechanisms that introduce main thread jank - we're investing
> tons of resources into removing dangerous interfaces that aren't even being
> currently abused - we can't add new ones.

I'll propose landing it as-is, and adding a to resolve this.  I have a partial patch based on WebSockets (proxy all IO), but it's a fair bit of rewrite.

The current code does main-thread IO - but it does it in non-blocking mode so it wouldn't block main-thread. The downside is it may not be 100% efficient in transferring very large blobs due to using a timer.

> @@ +663,5 @@
> > +
> > +int32_t
> > +DataChannelConnection::SendOpenResponseMessage(uint16_t streamOut, uint16_t streamIn)
> > +{
> > +  /* XXX: This should be encoded in a better way */
> 
> this comment doesn't help me understand the shortcomings of what's going on..

Irene added that; removed

> 
> also assert lock?

Nope, it doesn't touch the channel or connection data structures (the SendFooMessage functions mostly just format a packet from their args)

> @@ +692,5 @@
> > +                                              uint16_t streamOut, bool unordered,
> > +                                              uint16_t prPolicy, uint32_t prValue)
> > +{
> > +  /* XXX: This should be encoded in a better way */
> > +  char *temp = ToNewCString(label);
> 
> I think you can/should avoid the alloc/copy/free of this with BeginReading()
> and Length()

Thanks!

> @@ +695,5 @@
> > +  /* XXX: This should be encoded in a better way */
> > +  char *temp = ToNewCString(label);
> > +  int len = strlen(temp); // not including nul
> > +  struct rtcweb_datachannel_open_request *req = 
> > +    (struct rtcweb_datachannel_open_request*) malloc(sizeof(*req)+len);
> 
> I think you should call moz_xmalloc if you want infallible allocations

Yeah, somewhere I misread that "infallible malloc has landed" as "malloc() is infallible". Hard to see how I could make that error...  ;-)

> 
> @@ +728,5 @@
> > +  return SendControlMessage(req, sizeof(*req)+len, streamOut);
> > +}
> > +
> > +// XXX This should use a separate thread (outbound queue) which should
> > +// select() to know when to *try* to send data to the socket again.
> 
> yeah.. please file a bugzilla to track and expand this comment to make it
> clear what the current implmented algorithm is.

Ok

> @@ +768,5 @@
> > +                                    channel));
> > +        }
> > +      }
> > +    }
> > +    if (!still_blocked &&
> 
> normal necko style would be 
> 
> if (still_blocked)
>   continue;
> 
> each time you check it.. helps with the indentation and understanding the
> logic.

Changed to "if (still_blocked) break;" (on two lines), after each if block, and removed it from the for() loop.

> @@ +846,5 @@
> > +    }
> > +  }
> > +
> > +  if (!still_blocked) {
> > +    // adjust time for next wait
> 
> did you mean to reset this to 10 here?

No, changed comment to:
    // mDeferTimeout becomes an estimate of how long we need to wait next time we block
 
> @@ +850,5 @@
> > +    // adjust time for next wait
> > +    return false;
> > +  }
> > +  // adjust time?  More time for next wait if we didn't send anything, less if did
> > +  // Pretty crude, but better than nothing; just to keep CPU use down
> 
> the decision is up to you obviously, but my gut says you need to move these
> in > 1 ms increments to have an impact

We're likely to be sitting on or near 10ms in most cases, given typical connection speeds.

> @@ +915,5 @@
> > +  if (streamOut == INVALID_STREAM) {
> > +    if (!RequestMoreStreamsOut()) {
> > +      /* XXX: Signal error to the other end. */
> > +      mStreamsIn[channel->mStreamIn] = NULL;
> > +      MutexAutoUnlock unlock(mLock);
> 
> wow.. that's exciting!
> 
> I get that you're doing that to protect against a deadlock in
> datachannelconnection::close(), but it isn't clear to me that there couldn't
> be a problem with datachannel::dtor .. I can't really say because I don't
> have a firm grasp on the calling stacks for all of this code.
> 
> If the concern is different threads (i.e. none of the code is reentrant on
> the same thread), then maybe datachannelconnection::close can test if the
> current thread holds the lock and only obtain it if that isn't true, then
> you don't have to temporarily give it up and create a racy window.

Yeah, there's no obvious way to test if we own the lock in Mutex.h.  We'd have to add our own "I own the lock" thread pointer.

However, this was safe because ~DataChannel() does "if (.... || mStreamOut == INVALID_STREAM) return;".  In fact, it's so safe that we can remove the unlock (and I've done so and commented it).

> 
> @@ +945,5 @@
> > +      } else {
> > +        /* XXX: Signal error to the other end. */
> > +        mStreamsIn[channel->mStreamIn] = NULL;
> > +        mStreamsOut[streamOut] = NULL;
> > +        MutexAutoUnlock unlock(mLock);
> 
> still makes me queasy!

Ditto, with channel->mStreamOut = INVALID_STREAM after removing it from the array and a comment.

> @@ +1016,5 @@
> > +DataChannelConnection::HandleUnknownMessage(uint32_t ppid, size_t length, uint16_t streamIn)
> > +{
> > +  /* XXX: Send an error message? */
> > +  LOG(("unknown DataChannel message received: %u, len %ld on stream %lu",ppid,length,streamIn));
> > +  NS_WARNING("unknown DataChannel message received");
> 
> a data condition from an untrusted endpoint shouldn't generate NS_WARNING
> imo. LOG it. Even the JS error console, but its not a code failure.

Know how to do that (JS error console)?

> @@ +1048,5 @@
> > +             length, channel->mStreamOut, (int)PR_MIN(length,80), buffer));
> > +        length = -1; // Flag for DOMString
> > +
> > +        // paranoia (WebSockets does this)
> > +        if (!IsUTF8(recvData, false)) {
> 
> websockets does that because the websockets spec insists on it. I think its
> kinda weird - so don't copy it unless you want it - that's a linear
> traversal of the message.

Ok.  We're going to end up converting it to a string; I imagine that conversion will stop or skip over bad data and then deliver it.  Changed to a comment.

> 
> @@ +1058,5 @@
> > +          channel->mBinaryBuffer.Truncate(0);
> > +        break;
> > +
> > +      case DATA_CHANNEL_PPID_BINARY:
> > +        channel->mBinaryBuffer += recvData;
> 
> can you double check the error path here? I don't think (but am not sure)
> that is infallible.. which is how you are using it. and given these might be
> large messages (true?) infallible probably isn't what you want..

Correct, those are infallible.  Followup bug.

> 
> @@ +1093,5 @@
> > +  return;
> > +}
> > +
> > +void
> > +DataChannelConnection::HandleMessage(char *buffer, size_t length, uint32_t ppid, uint16_t streamIn)
> 
> why is this being cast through char * when its never a char *? is void *
> better?

Done, and made more of them const.

> 
> @@ +1170,5 @@
> > +  LOG(("Association change: streams (in/out) = (%u/%u)",
> > +       sac->sac_inbound_streams, sac->sac_outbound_streams));
> > +  n = sac->sac_length - sizeof(*sac);
> > +  if (((sac->sac_state == SCTP_COMM_UP) ||
> > +        (sac->sac_state == SCTP_RESTART)) && (n > 0)) {
> 
> n is uint.. any chance it has wrapped around? (i.e. could sac->sac_length be
> 0?) maybe runtime check that?

Added a check.

> @@ +1292,5 @@
> > +void
> > +DataChannelConnection::HandleShutdownEvent(const struct sctp_shutdown_event *sse)
> > +{
> > +  LOG(("Shutdown event."));
> > +  /* XXX: notify all channels. */
> 
> isn't this important to have the first time around?
> can you file a followon bug?

Yeah, though it in reality teardowns are at the signaling level, rendering association teardowns largely moot - attempts to use the association after this will simply error out.  But we should do so anyways...  Will comment and file a bug.

> @@ +1366,5 @@
> > +  } else {
> > +    mStreamsResetting.Clear();
> > +  }
> > +  free(srs);
> > +  return;
> 
> return as last line of a void function is frowned upon in gecko. I'm used to
> it myself but my gecko reviews have beaten it out of me now :)

Removed them all...  I don't care; they were added by Irene largely IIRC.

> @@ +1574,5 @@
> > +   }
> > + }
> > +
> > +
> > +// NOTE: not called on main thread
> 
> awesome. assert it. what thread is it called on?

SCTP's callback thread

> 
> @@ +1576,5 @@
> > +
> > +
> > +// NOTE: not called on main thread
> > +int
> > +DataChannelConnection::ReceiveCallback(struct socket* sock, void *data, size_t datalen,
> 
> some documentation on what the return values mean would help

Yeah.  It's defined as an int, but the sctp library ignores the result.... :-/

> 
> @@ +1580,5 @@
> > +DataChannelConnection::ReceiveCallback(struct socket* sock, void *data, size_t datalen,
> > +                                       struct sctp_rcvinfo rcv, int32_t flags)
> > +{
> > +  if (data == NULL) {
> > +    usrsctp_close(sock);
> 
> I'm a little confused at what's going on here.. why would it be null and why
> is closing the socket the right thing to do?

That's actually how it notifies you it has finished asynchronous shutdown.  Adding comment.


> @@ +1747,5 @@
> > +  spa.sendv_prinfo.pr_value = channel->mPrValue;
> > +
> > +  spa.sendv_flags = SCTP_SEND_SNDINFO_VALID | SCTP_SEND_PRINFO_VALID;
> > +
> > +  // XXX fix!  Main-thread IO
> 
> I presume this can block based on media access or congestion control, yes?
> I'm a little confused on that point because there is a queueing and deferral
> path, but I think that's just for exhaustion of sctp buffers?
> 
> in any event - the main thread can't block arbitrarily. that's not something
> that can really wait for a followup.

Per above, it doesn't block.  Changed comment to // Note: Main-thread IO, but doesn't block

> 
> @@ +1788,5 @@
> > +{
> > +  // Since there's a limit on network buffer size and no limits on message
> > +  // size, and we don't want to use EOR mode (multiple writes for a
> > +  // message, but all other streams are blocked until you finish sending
> > +  // this message), we need to add application-level fragmentatio of large
> 
> typo fragmentation
> 
> @@ +1841,5 @@
> > +  // socket and to avoid main-thread IO that might block.  Even on a
> > +  // background thread, we may not want to block on one stream's data.
> > +  // I.e. run non-blocking and service multiple channels.
> > +
> > +  // For now as a hack, block main thread while sending it.
> 
> a little bit of me just died :(

Please bring it back to life; that's an out-of-date comment.  We don't block. :-)  Fixed comment.

> 
> @@ +1902,5 @@
> > +  }
> > +  return NS_OK;
> > +}
> > +
> > +}
> 
> comment that this is about the namespace
> 
> ::: netwerk/sctp/datachannel/DataChannel.h
> @@ +29,5 @@
> > +  struct socket;
> > +  struct sctp_rcvinfo;
> > +}
> > +
> > +namespace mozilla {
> 
> we're using mozilla::net .. please move to that

Per style updates from months ago, all new things should be in plain mozilla with few exceptions.

https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Namespaces

> 
> @@ +98,5 @@
> > +
> > +  DataChannelConnection(DataConnectionListener *listener);
> > +  virtual ~DataChannelConnection();
> > +
> > +  bool Init(unsigned short aPort, uint16_t aNumStreams, bool aUsingDtls);
> 
> what does aNumStreams mean? what are the implications of different numbers?
> I think it is a max # of opens?

Initial number of streams to negotiate; if you need more simultaneous it will renegotiate on the fly

> 
> @@ +216,5 @@
> > +  nsAutoTArray<uint16_t,4> mStreamsResetting;
> > +
> > +  struct socket *mMasterSocket;
> > +  struct socket *mSocket;
> > +  uint16_t mNumChannels;
> 
> I'd suggest a differnet name.. totalChannelCount maybe.. took me a while to
> realize it wasn't tracking a balance

Cruft; removed it.

 
> @@ +231,5 @@
> > +  uint32_t mDeferTimeout;
> > +  bool mTimerRunning;
> > +};
> > +
> > +class DataChannel {
> 
> in general there is a LOT of code defined in this header. can we move most
> of these methods that are > 1 line to the cpp?

Moved the three largest; left the "if (valid) return mConnection->blah(); else return false" type in the header.

> @@ +277,5 @@
> > +    }
> > +
> > +  // Set the listener (especially for channels created from the other side)
> > +  void SetListener(DataChannelListener *aListener, nsISupports *aContext)
> > +    { mContext = aContext; mListener = aListener; } // XXX Locking?
> 
> you don't seem to hold the lock when you call mListener->foo(mContext so I
> dno't know why you would need it here. perhaps assert main thread.

Asserted that mListener is only set once instead.  The only reason this exists is because we can't know the Listener/context when the channel is created from the other side; we create the channel and pass up to be wrapped by a DOM object which calls SetListener.
 

> ::: netwerk/sctp/datachannel/DataChannelProtocol.h
> @@ +7,5 @@
> > +#ifndef NETWERK_SCTP_DATACHANNEL_DATACHANNELPROTOCOL_H_
> > +#define NETWERK_SCTP_DATACHANNEL_DATACHANNELPROTOCOL_H_
> > +
> > +#if defined(__GNUC__)
> > +#define SCTP_PACKED __attribute__((packed))
> 
> its a little weird to see compiler directives showing up in leaf h files
> (usually there is an abstraction for them somewhere), but I don't have a
> particular suggestion or objection.. I might check with bsmedberg or ehsan.

Since this is a binary protocol, we can't leave packing up to the compiler or we could silently not be sending the correct bit patterns.  It's pack or drop down to inserting bytes in a uint8_t array.  :-)
ehsan says this is good (and I had added the missing #pragma pack(pop) to the bottom of the file for MSVC after I put the patch up).
Attachment #667322 - Attachment is obsolete: true
Comment on attachment 667326 [details] [diff] [review]
Update DataChannel for review comments, fix missing stream setting

Marking for review by mcmanus mostly for the Shutdown handling
Attachment #667326 - Flags: review?(mcmanus)
(
> 
> The current code does main-thread IO - but it does it in non-blocking mode
> so it wouldn't block main-thread. The downside is it may not be 100%
> efficient in transferring very large blobs due to using a timer.

that's totally fine by me.

> 
> Know how to do that (JS error console)?

http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/ConnectionDiagnostics.cpp#25
Comment on attachment 667326 [details] [diff] [review]
Update DataChannel for review comments, fix missing stream setting

Review of attachment 667326 [details] [diff] [review]:
-----------------------------------------------------------------

awesome sauce.. r=mcmanus with the strcpy() bit fixed up.

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +686,5 @@
>      req->flags |= htons(DATA_CHANNEL_FLAG_OUT_OF_ORDER_ALLOWED);
>    }
>    req->reliability_params = htons((uint16_t)prValue); /* XXX Why 16-bit */
>    req->priority = htons(0); /* XXX: add support */
> +  strcpy(&req->label[0], temp);

temp is not necessarily null terminated.. so you have to do the memcpy and add a null yourself.

@@ +1370,4 @@
>    } else {
>      mStreamsResetting.Clear();
>    }
>    free(srs);

s/free/moz_free when paired with moz_xmalloc  (which admittedly is a pointless indirection atm)
Attachment #667326 - Flags: review?(mcmanus) → review+
Comment on attachment 662013 [details] [diff] [review]
DataChannel protocol rollup (revised with better stream renegotiate support)

r=mcmanus with the 2 delta patches on this bug also applied
Attachment #662013 - Flags: review+
(In reply to Patrick McManus [:mcmanus] from comment #50)
> Comment on attachment 667326 [details] [diff] [review]
> Update DataChannel for review comments, fix missing stream setting
> 
> Review of attachment 667326 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> awesome sauce.. r=mcmanus with the strcpy() bit fixed up.
> 
> ::: netwerk/sctp/datachannel/DataChannel.cpp
> @@ +686,5 @@
> >      req->flags |= htons(DATA_CHANNEL_FLAG_OUT_OF_ORDER_ALLOWED);
> >    }
> >    req->reliability_params = htons((uint16_t)prValue); /* XXX Why 16-bit */
> >    req->priority = htons(0); /* XXX: add support */
> > +  strcpy(&req->label[0], temp);
> 
> temp is not necessarily null terminated.. so you have to do the memcpy and
> add a null yourself.

Even better:
strcpy(&req->label[0], PromiseFlatCString(label).get());
and get rid of temp

> @@ +1370,4 @@
> >    } else {
> >      mStreamsResetting.Clear();
> >    }
> >    free(srs);
> 
> s/free/moz_free when paired with moz_xmalloc  (which admittedly is a
> pointless indirection atm)

And this pointed out the other moz_xmalloc in there was leaking requests.
This landed on inbound (without a changeset link posted here, apparently...), but had to be backed out due to mochitest-3 failures.
https://hg.mozilla.org/integration/mozilla-inbound/rev/94279b84aee8

https://tbpl.mozilla.org/php/getParsedLog.php?id=15791309&tree=Mozilla-Inbound

13366 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/general/test_interfaces.html | Unexpected interface name in global scope: DataChannel
(In reply to Randell Jesup [:jesup] from comment #52)
> (In reply to Patrick McManus [:mcmanus] from comment #50)
> > Comment on attachment 667326 [details] [diff] [review]
> > Update DataChannel for review comments, fix missing stream setting
> > 
> > Review of attachment 667326 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > awesome sauce.. r=mcmanus with the strcpy() bit fixed up.
> > 
> > ::: netwerk/sctp/datachannel/DataChannel.cpp
> > @@ +686,5 @@
> > >      req->flags |= htons(DATA_CHANNEL_FLAG_OUT_OF_ORDER_ALLOWED);
> > >    }
> > >    req->reliability_params = htons((uint16_t)prValue); /* XXX Why 16-bit */
> > >    req->priority = htons(0); /* XXX: add support */
> > > +  strcpy(&req->label[0], temp);
> > 
> > temp is not necessarily null terminated.. so you have to do the memcpy and
> > add a null yourself.
> 
> Even better:
> strcpy(&req->label[0], PromiseFlatCString(label).get());
> and get rid of temp
> 

imo not better. Promise* make an allocation and a copy of the string (if necessary) to make that thing null terminated. You don't need to do that just to copy it again.
(In reply to Patrick McManus [:mcmanus] from comment #54)

> > Even better:
> > strcpy(&req->label[0], PromiseFlatCString(label).get());
> > and get rid of temp
> > 
> 
> imo not better. Promise* make an allocation and a copy of the string (if
> necessary) to make that thing null terminated. You don't need to do that
> just to copy it again.

You're right; I'm used to it helping avoid (some) allocations, but since we're copying to allocated memory it's at best a wash.  I'll fix that in a followup.
https://hg.mozilla.org/mozilla-central/rev/37ac46f7dd40
Assignee: nobody → rjesup
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Windows8+VS2008SP1

I get this error message:

c:/t1/hg/comm-central/mozilla/netwerk/sctp/datachannel/DataChannel.cpp(605) : error C2065: 'EALREADY' : undeclared identifier

Adding the following to DataChannelProtocol.h allows me to continue building:
#ifndef EALREADY
#define EALREADY                WSAEALREADY
#endif
(In reply to Philip Chee from comment #57)
> Windows8+VS2008SP1
> 
> I get this error message:
> 
> c:/t1/hg/comm-central/mozilla/netwerk/sctp/datachannel/DataChannel.cpp(605)
> : error C2065: 'EALREADY' : undeclared identifier
> 
> Adding the following to DataChannelProtocol.h allows me to continue building:
> #ifndef EALREADY
> #define EALREADY                WSAEALREADY
> #endif

I got this on my VC2008 build too, although I worked around it by adding the define to DataChannel.cpp itself.
That's Bug 798352

I'll resolve that ASAP; it was working on VC8 and VC9 until the latest update of usrsctp I believe
Will use the DOM bugs to track end-to-end testing.
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC], [blocking-webrtc+] [qa-]
You need to log in before you can comment on or make changes to this bug.