Last Comment Bug 837035 - Hook up DataChannel SDP to implementation (in particular port and options)
: Hook up DataChannel SDP to implementation (in particular port and options)
Status: VERIFIED FIXED
[WebRTC],[blocking-webrtc+][aurora-ne...
:
Product: Core
Classification: Components
Component: WebRTC: Networking (show other bugs)
: 17 Branch
: All All
-- normal (vote)
: mozilla22
Assigned To: Ethan Hugg [:ehugg]
: Jason Smith [:jsmith]
: Maire Reavy [:mreavy] Please needinfo me
Mentors:
Depends on: 786152 848966
Blocks: 796894 851293 856319 856434
  Show dependency treegraph
 
Reported: 2013-01-31 21:15 PST by Randell Jesup [:jesup]
Modified: 2013-04-13 14:03 PDT (History)
15 users (show)
jsmith: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Support DataChannels with SDP Negotiation (15.63 KB, patch)
2013-03-01 09:18 PST, Ethan Hugg [:ehugg]
no flags Details | Diff | Splinter Review
Support DataChannels with SDP Negotiation (21.71 KB, patch)
2013-03-01 15:15 PST, Ethan Hugg [:ehugg]
no flags Details | Diff | Splinter Review
Support DataChannels with SDP Negotiation (24.01 KB, patch)
2013-03-04 14:06 PST, Ethan Hugg [:ehugg]
rjesup: review+
Details | Diff | Splinter Review
initial patch didn't create the TransportFlow. Also now passes the m-line index to retrieve the correct flow (17.92 KB, patch)
2013-03-08 01:32 PST, Randell Jesup [:jesup]
ethanhugg: review+
Details | Diff | Splinter Review
Version of local datachannel test page with no audio streams (7.52 KB, text/html)
2013-03-08 01:35 PST, Randell Jesup [:jesup]
no flags Details
Support DataChannels with SDP Negotiation (24.83 KB, patch)
2013-03-28 10:24 PDT, Ethan Hugg [:ehugg]
ethanhugg: review+
Details | Diff | Splinter Review
Support DataChannels with SDP Negotiation (25.82 KB, patch)
2013-03-30 05:04 PDT, Randell Jesup [:jesup]
ethanhugg: review+
Details | Diff | Splinter Review

Description User image Randell Jesup [:jesup] 2013-01-31 21:15:01 PST
In bug 786152 we added support for DataChannel SDP in negotiations, but we need to hook up the port numbers so we can get rid of ConnectDataChannels() and have it use the correct Transport (right now it just searches the transport list for a connected transport to use).

We also need to pass the number of streams.

Follow-on: Implement dynamic negotiation - only offer DataChannel if createDataChannel() is called before CreateOffer, and re-negotiate to add it if it's called initially after CreateOffer/CreateAnswer.  Also pass pre-opened channels in the initial SDP (see SCTP SDP draft in MMUSIC).
Comment 1 User image Ethan Hugg [:ehugg] 2013-03-01 09:18:55 PST
Created attachment 719996 [details] [diff] [review]
Support DataChannels with SDP Negotiation
Comment 2 User image Ethan Hugg [:ehugg] 2013-03-01 09:19:29 PST
Comment on attachment 719996 [details] [diff] [review]
Support DataChannels with SDP Negotiation


Work in progress, still more to do.
Comment 3 User image Ethan Hugg [:ehugg] 2013-03-01 15:15:44 PST
Created attachment 720157 [details] [diff] [review]
Support DataChannels with SDP Negotiation
Comment 4 User image Ethan Hugg [:ehugg] 2013-03-04 14:06:42 PST
Created attachment 720886 [details] [diff] [review]
Support DataChannels with SDP Negotiation
Comment 5 User image Randell Jesup [:jesup] 2013-03-06 06:13:33 PST
Comment on attachment 720886 [details] [diff] [review]
Support DataChannels with SDP Negotiation

Review of attachment 720886 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!  r+ with minor nits

::: media/webrtc/signaling/src/sipcc/core/gsm/gsm_sdp.c
@@ +3333,5 @@
> +
> +    if (offer) {
> +        /* Increment port for answer SDP */
> +        media->local_datachannel_port = media->remote_datachannel_port + 1;
> +    }

This is needed today, but when I pull updated source from the repo this requirement will be gone.  Mark for removal with a comment

::: media/webrtc/signaling/src/sipcc/core/sdp/sdp_attr_access.c
@@ +6658,5 @@
>              CSFLogError(logTag, "%s fmtp attribute, level %u instance %u "
>                        "not found.", sdp_p->debug_str, level, inst_num);
>          }
>          sdp_p->conf_p->num_invalid_param++;
> +        *val = 16;

Please move to a #define somewhere
Comment 7 User image Ryan VanderMeulen [:RyanVM] 2013-03-07 09:53:18 PST
https://hg.mozilla.org/mozilla-central/rev/789a60e48ee2
Comment 8 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2013-03-07 13:05:28 PST
Something which didn't work before was a datachannel connection without media. With this patch in place at least that is working now. So I can work on the mochitest (bug 796894) now. Thanks!
Comment 9 User image Ryan VanderMeulen [:RyanVM] 2013-03-07 16:37:54 PST
Backed out for now while we investigate bug 848966. I'll re-land whatever comes up clean.
https://hg.mozilla.org/integration/mozilla-inbound/rev/6cdb9f903c02
Comment 10 User image Randell Jesup [:jesup] 2013-03-08 01:32:11 PST
Created attachment 722687 [details] [diff] [review]
initial patch didn't create the TransportFlow.  Also now passes the m-line index to retrieve the correct flow
Comment 11 User image Randell Jesup [:jesup] 2013-03-08 01:33:57 PST
Comment on attachment 722687 [details] [diff] [review]
initial patch didn't create the TransportFlow.  Also now passes the m-line index to retrieve the correct flow

Tested with no-audio testcase (which failed with the old patch only).  This doesn't replace the old patch, this augments it.  Ignore the random MediaManager LOG() that crept in
Comment 12 User image Randell Jesup [:jesup] 2013-03-08 01:35:10 PST
Created attachment 722692 [details]
Version of local datachannel test page with no audio streams
Comment 13 User image Randell Jesup [:jesup] 2013-03-08 01:35:41 PST
leave-open when the first patch relands
Comment 14 User image Randell Jesup [:jesup] 2013-03-08 06:47:31 PST
We should hold off landing the second patch until we're ready to tell people we're breaking backwards compatibility at the binary level.  Also, we should check compatibility at the source level - even the current patch probably breaks anyone not using the same method of choosing 5000 or 5001 for ports via ConnectDataConnection - and so we may want to combine forcing them to change their code for that with the "ondatachannel should be an Event" bug, and break source compat only once.

So, if testing shows this breaks source compatibility, we should a) warn people (Hacks perhaps, or Maire's blog plus targeted warnings), and b) then land it and also break binary interop.

Alternatively, and likely better: re-hook-up ConnectDataConnection to close the current mDataConnection, and then re-open using the old flow-selection algorithm (and ignoring any m=application lines - you can easily test if there's an rtcp flow in the table to know) and the user-supplied ports.  This gives us binary compatibility until we remove ConnectDataConnection, which we can then warn people is deprecated and going away in a week or two.  We could then combine taking that compat patch out with the ondatachannel -> event change.

I like this final idea.  Comments?  Ethan, if you think this works, can you do a patch for it?  I'm traveling, but I can try to review a patch tonight, or for this part probably ekr or abr could review; see who's around.

Let's do that part as a separate patch on top of the others, so it's easy to back out.
Comment 15 User image Maire Reavy [:mreavy] Please needinfo me 2013-03-08 07:05:00 PST
(In reply to Randell Jesup [:jesup] from comment #14)
> Alternatively, and likely better: re-hook-up ConnectDataConnection to close
> the current mDataConnection, and then re-open using the old flow-selection
> algorithm (and ignoring any m=application lines - you can easily test if
> there's an rtcp flow in the table to know) and the user-supplied ports. 
> This gives us binary compatibility until we remove ConnectDataConnection,
> which we can then warn people is deprecated and going away in a week or two.
> We could then combine taking that compat patch out with the ondatachannel ->
> event change.
> 
> I like this final idea.  Comments?  

I like this idea also.  Anytime we can warn people before breaking backwards compat is a win, assuming there aren't too many downsides (and it seems like there aren't many here).  If Ethan likes this plan too, I'd love to move forward with it.  Thanks.
Comment 16 User image Ethan Hugg [:ehugg] 2013-03-08 09:20:54 PST
Comment on attachment 722687 [details] [diff] [review]
initial patch didn't create the TransportFlow.  Also now passes the m-line index to retrieve the correct flow

Review of attachment 722687 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Built and tested on Linux64.

::: media/webrtc/signaling/src/media/VcmSIPCCBinding.cpp
@@ +1058,2 @@
>   *  @param[in]  peerconnection - the peerconnection in use
>   *  @param[in]  streams - the number of streams for this data channel

You may want to list track_id here

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +633,5 @@
>      return NS_OK;
>    }
>    mDataConnection = new mozilla::DataChannelConnection(this);
> +  if (mDataConnection->Init(aLocalport, aNumstreams, true)) {
> +    // us ethe specified TransportFlow

Spelling
Comment 17 User image Ryan VanderMeulen [:RyanVM] 2013-03-08 10:50:59 PST
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> Backed out for now while we investigate bug 848966. I'll re-land whatever
> comes up clean.
> https://hg.mozilla.org/integration/mozilla-inbound/rev/6cdb9f903c02

This does appear to be the cause of bug 848966.
Comment 18 User image Ethan Hugg [:ehugg] 2013-03-26 10:07:03 PDT
Adding Glandium and Michael Tuexen.  We have an interesting bug here that points to heap corruption on shutdown of datachannels, but only on Windows PGO builds.
Comment 19 User image Michael Tüxen 2013-03-26 11:33:30 PDT
Can you provide some information about the heap corruption?
Comment 20 User image Ethan Hugg [:ehugg] 2013-03-26 14:07:31 PDT

Michael,

I was talking to Jesup this morning and he suggested I put you in the loop on this.  

We could be wrong, but the idea is that we're getting some heap corruption based on how it crashes in so many different places.  You can see some of these in bug 848966, and here's a recent one I got on Try - https://tbpl.mozilla.org/?tree=Try&rev=25fdd5193171 - the orange '2' next to WinXP opt.

The first patch in this bug causes the mochitests to connect datachannel in tests where the didn't used to.  Everything worked fine and we got it into M-C, but we got these intermittent crashes, only on Windows PGO.  Only when run on the builders or try servers as well, a local PGO build did not replicate this for me.  If I comment out the call to ConnectDTLS() we do not get these crashes, so there appears to be something about starting up and tearing down a datachannel connection that either causes or triggers this problem.

I ran Firefox under the MS Application Verifier which is where I found that unrelated CritSec bug.  I also found out that we always build FF with WINVER=502 which takes the XP code parts of netwerk/sctp.

Let me know if you have any ideas for testing this code in isolation or putting new asserts in to make sure it's being used as expected.
Comment 21 User image Mike Hommey [:glandium] 2013-03-27 05:50:05 PDT
Try adding
NO_PROFILE_GUIDED_OPTIMIZE=1
in some directories. If that does help, then try file by file with:
file.$(OBJ_SUFFIX): PROFILE_GEN_CFLAGS=
file.$(OBJ_SUFFIX): PROFILE_USE_CFLAGS=
Comment 22 User image Randell Jesup [:jesup] 2013-03-27 05:53:32 PDT
I'd start with netwerk/sctp/src
Comment 23 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2013-03-27 05:56:50 PDT
(In reply to Ethan Hugg [:ehugg] from comment #20)
> M-C, but we got these intermittent crashes, only on Windows PGO.  Only when
> run on the builders or try servers as well, a local PGO build did not

Ethan, another option would be to request such a builder for testing. Then you could debug it. If that would be an option for you and you need help, please let me know.
Comment 24 User image Michael Tüxen 2013-03-27 06:08:14 PDT
(In reply to Ethan Hugg [:ehugg] from comment #20)
> 
> Michael,
> 
> I was talking to Jesup this morning and he suggested I put you in the loop
> on this.  
> 
> We could be wrong, but the idea is that we're getting some heap corruption
> based on how it crashes in so many different places.  You can see some of
> these in bug 848966, and here's a recent one I got on Try -
> https://tbpl.mozilla.org/?tree=Try&rev=25fdd5193171 - the orange '2' next to
> WinXP opt.
> 
> The first patch in this bug causes the mochitests to connect datachannel in
> tests where the didn't used to.  Everything worked fine and we got it into
> M-C, but we got these intermittent crashes, only on Windows PGO.  Only when
> run on the builders or try servers as well, a local PGO build did not
> replicate this for me.  If I comment out the call to ConnectDTLS() we do not
> get these crashes, so there appears to be something about starting up and
> tearing down a datachannel connection that either causes or triggers this
> problem.
> 
> I ran Firefox under the MS Application Verifier which is where I found that
> unrelated CritSec bug.  I also found out that we always build FF with
> WINVER=502 which takes the XP code parts of netwerk/sctp.
> 
> Let me know if you have any ideas for testing this code in isolation or
> putting new asserts in to make sure it's being used as expected.

Unfortunately, I have no idea. Is it that the problem only occurs on older Windows
boxes or also on more recent versions of Windows. I think we have only very limited
code which is specific for the old Windows versions. We don't do testing on
Windows XP, but I could review the XP specific code.
Comment 25 User image Ethan Hugg [:ehugg] 2013-03-27 21:57:15 PDT
We got some interesting results from selectively turning PGO off on Try.

Got a green run from turning PGO off for both sctp and datachannel - https://tbpl.mozilla.org/?tree=Try&rev=f184fb24aeca

There is an orange 2 in this run, but it is not a crash

Got a orange run from turning off PGO for just sctp - https://tbpl.mozilla.org/?tree=Try&rev=b475e5033fc9

Got a orange run from turning off PGO for just datachannel - https://tbpl.mozilla.org/?tree=Try&rev=44932c535408

Not particularly conclusive.  Still investigating.
Comment 26 User image Ethan Hugg [:ehugg] 2013-03-28 10:19:38 PDT
Adding a new Try run on the latest M-C with full PGO to see if any of the recent DC patches such as Bug 842126 fix this as well

https://tbpl.mozilla.org/?tree=Try&rev=1363f6a4f17c

All previous Trys were run on top of the original M-C landing, rev 789a60e48ee2
Comment 27 User image Ethan Hugg [:ehugg] 2013-03-28 10:24:39 PDT
Created attachment 730751 [details] [diff] [review]
Support DataChannels with SDP Negotiation
Comment 28 User image Ethan Hugg [:ehugg] 2013-03-28 10:31:30 PDT
Comment on attachment 730751 [details] [diff] [review]
Support DataChannels with SDP Negotiation


Updated patch to apply on latest M-C. Bringing forward r+ from jesup.
Comment 29 User image Ethan Hugg [:ehugg] 2013-03-28 19:59:30 PDT
New Try on latest M-C with PGO turned off for both sctp and datachannel
https://tbpl.mozilla.org/?tree=Try&rev=6421a0a46b59
Comment 30 User image Ethan Hugg [:ehugg] 2013-03-29 11:10:13 PDT
Pushed with PGO off for both sctp and datachannel

https://hg.mozilla.org/integration/mozilla-inbound/rev/af6a5d1a3097
Comment 31 User image Ethan Hugg [:ehugg] 2013-03-29 13:21:21 PDT
My guess is this patch calls a DISPATCH_SYNC the 'old way' which is no longer in fashion.  I'll update soon with the new DISPATCH_SYNC style.
Comment 32 User image Ryan VanderMeulen [:RyanVM] 2013-03-29 13:25:16 PDT
Wow, thanks for the blast from the past. Backed out for those #$()*@#$ leaks.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1cbbb788632a
Comment 33 User image Randell Jesup [:jesup] 2013-03-30 04:53:51 PDT
With a fix to remove the 1 added instance of DISPATCH_SYNC:

https://tbpl.mozilla.org/?tree=Try&rev=dec9bcd9b66a

Looks quite nicely green
Comment 34 User image Randell Jesup [:jesup] 2013-03-30 05:04:02 PDT
Created attachment 731462 [details] [diff] [review]
Support DataChannels with SDP Negotiation
Comment 35 User image Randell Jesup [:jesup] 2013-03-30 05:05:57 PDT
Comment on attachment 731462 [details] [diff] [review]
Support DataChannels with SDP Negotiation

r? back to ehugg for the mods (just removing the one Dispatch(...DISPATCH_SYNC) for SyncRunnable)
Comment 36 User image Randell Jesup [:jesup] 2013-03-30 08:30:31 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d3fc2a278d3
Once more with feeling.  Try is green on many retriggers
Comment 37 User image Ryan VanderMeulen [:RyanVM] 2013-03-30 17:51:57 PDT
https://hg.mozilla.org/mozilla-central/rev/8d3fc2a278d3
Comment 38 User image Jason Smith [:jsmith] 2013-04-13 08:57:36 PDT
Verified through some sanity tests doing 1:1 peer hookups with data channels & with streams included.

Note You need to log in before you can comment on or make changes to this bug.