Closed Bug 837035 Opened 11 years ago Closed 11 years ago

Hook up DataChannel SDP to implementation (in particular port and options)

Categories

(Core :: WebRTC: Networking, defect)

17 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla22

People

(Reporter: jesup, Assigned: ehugg)

References

Details

(Whiteboard: [WebRTC],[blocking-webrtc+][aurora-needed])

Attachments

(4 files, 3 obsolete files)

In bug 786152 we added support for DataChannel SDP in negotiations, but we need to hook up the port numbers so we can get rid of ConnectDataChannels() and have it use the correct Transport (right now it just searches the transport list for a connected transport to use).

We also need to pass the number of streams.

Follow-on: Implement dynamic negotiation - only offer DataChannel if createDataChannel() is called before CreateOffer, and re-negotiate to add it if it's called initially after CreateOffer/CreateAnswer.  Also pass pre-opened channels in the initial SDP (see SCTP SDP draft in MMUSIC).
Whiteboard: [WebRTC],[blocking-webrtc+]
Comment on attachment 719996 [details] [diff] [review]
Support DataChannels with SDP Negotiation


Work in progress, still more to do.
Attachment #719996 - Attachment is obsolete: true
Assignee: rjesup → ethanhugg
Attachment #720157 - Attachment is obsolete: true
Attachment #720886 - Flags: review?(rjesup)
Comment on attachment 720886 [details] [diff] [review]
Support DataChannels with SDP Negotiation

Review of attachment 720886 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!  r+ with minor nits

::: media/webrtc/signaling/src/sipcc/core/gsm/gsm_sdp.c
@@ +3333,5 @@
> +
> +    if (offer) {
> +        /* Increment port for answer SDP */
> +        media->local_datachannel_port = media->remote_datachannel_port + 1;
> +    }

This is needed today, but when I pull updated source from the repo this requirement will be gone.  Mark for removal with a comment

::: media/webrtc/signaling/src/sipcc/core/sdp/sdp_attr_access.c
@@ +6658,5 @@
>              CSFLogError(logTag, "%s fmtp attribute, level %u instance %u "
>                        "not found.", sdp_p->debug_str, level, inst_num);
>          }
>          sdp_p->conf_p->num_invalid_param++;
> +        *val = 16;

Please move to a #define somewhere
Attachment #720886 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/mozilla-central/rev/789a60e48ee2
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Something which didn't work before was a datachannel connection without media. With this patch in place at least that is working now. So I can work on the mochitest (bug 796894) now. Thanks!
Backed out for now while we investigate bug 848966. I'll re-land whatever comes up clean.
https://hg.mozilla.org/integration/mozilla-inbound/rev/6cdb9f903c02
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 722687 [details] [diff] [review]
initial patch didn't create the TransportFlow.  Also now passes the m-line index to retrieve the correct flow

Tested with no-audio testcase (which failed with the old patch only).  This doesn't replace the old patch, this augments it.  Ignore the random MediaManager LOG() that crept in
Attachment #722687 - Flags: review?(ethanhugg)
leave-open when the first patch relands
Whiteboard: [WebRTC],[blocking-webrtc+] → [WebRTC],[blocking-webrtc+][leave-open]
We should hold off landing the second patch until we're ready to tell people we're breaking backwards compatibility at the binary level.  Also, we should check compatibility at the source level - even the current patch probably breaks anyone not using the same method of choosing 5000 or 5001 for ports via ConnectDataConnection - and so we may want to combine forcing them to change their code for that with the "ondatachannel should be an Event" bug, and break source compat only once.

So, if testing shows this breaks source compatibility, we should a) warn people (Hacks perhaps, or Maire's blog plus targeted warnings), and b) then land it and also break binary interop.

Alternatively, and likely better: re-hook-up ConnectDataConnection to close the current mDataConnection, and then re-open using the old flow-selection algorithm (and ignoring any m=application lines - you can easily test if there's an rtcp flow in the table to know) and the user-supplied ports.  This gives us binary compatibility until we remove ConnectDataConnection, which we can then warn people is deprecated and going away in a week or two.  We could then combine taking that compat patch out with the ondatachannel -> event change.

I like this final idea.  Comments?  Ethan, if you think this works, can you do a patch for it?  I'm traveling, but I can try to review a patch tonight, or for this part probably ekr or abr could review; see who's around.

Let's do that part as a separate patch on top of the others, so it's easy to back out.
(In reply to Randell Jesup [:jesup] from comment #14)
> Alternatively, and likely better: re-hook-up ConnectDataConnection to close
> the current mDataConnection, and then re-open using the old flow-selection
> algorithm (and ignoring any m=application lines - you can easily test if
> there's an rtcp flow in the table to know) and the user-supplied ports. 
> This gives us binary compatibility until we remove ConnectDataConnection,
> which we can then warn people is deprecated and going away in a week or two.
> We could then combine taking that compat patch out with the ondatachannel ->
> event change.
> 
> I like this final idea.  Comments?  

I like this idea also.  Anytime we can warn people before breaking backwards compat is a win, assuming there aren't too many downsides (and it seems like there aren't many here).  If Ethan likes this plan too, I'd love to move forward with it.  Thanks.
Comment on attachment 722687 [details] [diff] [review]
initial patch didn't create the TransportFlow.  Also now passes the m-line index to retrieve the correct flow

Review of attachment 722687 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Built and tested on Linux64.

::: media/webrtc/signaling/src/media/VcmSIPCCBinding.cpp
@@ +1058,2 @@
>   *  @param[in]  peerconnection - the peerconnection in use
>   *  @param[in]  streams - the number of streams for this data channel

You may want to list track_id here

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +633,5 @@
>      return NS_OK;
>    }
>    mDataConnection = new mozilla::DataChannelConnection(this);
> +  if (mDataConnection->Init(aLocalport, aNumstreams, true)) {
> +    // us ethe specified TransportFlow

Spelling
Attachment #722687 - Flags: review?(ethanhugg) → review+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> Backed out for now while we investigate bug 848966. I'll re-land whatever
> comes up clean.
> https://hg.mozilla.org/integration/mozilla-inbound/rev/6cdb9f903c02

This does appear to be the cause of bug 848966.
Depends on: 848966
Blocks: 851293
Adding Glandium and Michael Tuexen.  We have an interesting bug here that points to heap corruption on shutdown of datachannels, but only on Windows PGO builds.
Can you provide some information about the heap corruption?

Michael,

I was talking to Jesup this morning and he suggested I put you in the loop on this.  

We could be wrong, but the idea is that we're getting some heap corruption based on how it crashes in so many different places.  You can see some of these in bug 848966, and here's a recent one I got on Try - https://tbpl.mozilla.org/?tree=Try&rev=25fdd5193171 - the orange '2' next to WinXP opt.

The first patch in this bug causes the mochitests to connect datachannel in tests where the didn't used to.  Everything worked fine and we got it into M-C, but we got these intermittent crashes, only on Windows PGO.  Only when run on the builders or try servers as well, a local PGO build did not replicate this for me.  If I comment out the call to ConnectDTLS() we do not get these crashes, so there appears to be something about starting up and tearing down a datachannel connection that either causes or triggers this problem.

I ran Firefox under the MS Application Verifier which is where I found that unrelated CritSec bug.  I also found out that we always build FF with WINVER=502 which takes the XP code parts of netwerk/sctp.

Let me know if you have any ideas for testing this code in isolation or putting new asserts in to make sure it's being used as expected.
Try adding
NO_PROFILE_GUIDED_OPTIMIZE=1
in some directories. If that does help, then try file by file with:
file.$(OBJ_SUFFIX): PROFILE_GEN_CFLAGS=
file.$(OBJ_SUFFIX): PROFILE_USE_CFLAGS=
I'd start with netwerk/sctp/src
(In reply to Ethan Hugg [:ehugg] from comment #20)
> M-C, but we got these intermittent crashes, only on Windows PGO.  Only when
> run on the builders or try servers as well, a local PGO build did not

Ethan, another option would be to request such a builder for testing. Then you could debug it. If that would be an option for you and you need help, please let me know.
(In reply to Ethan Hugg [:ehugg] from comment #20)
> 
> Michael,
> 
> I was talking to Jesup this morning and he suggested I put you in the loop
> on this.  
> 
> We could be wrong, but the idea is that we're getting some heap corruption
> based on how it crashes in so many different places.  You can see some of
> these in bug 848966, and here's a recent one I got on Try -
> https://tbpl.mozilla.org/?tree=Try&rev=25fdd5193171 - the orange '2' next to
> WinXP opt.
> 
> The first patch in this bug causes the mochitests to connect datachannel in
> tests where the didn't used to.  Everything worked fine and we got it into
> M-C, but we got these intermittent crashes, only on Windows PGO.  Only when
> run on the builders or try servers as well, a local PGO build did not
> replicate this for me.  If I comment out the call to ConnectDTLS() we do not
> get these crashes, so there appears to be something about starting up and
> tearing down a datachannel connection that either causes or triggers this
> problem.
> 
> I ran Firefox under the MS Application Verifier which is where I found that
> unrelated CritSec bug.  I also found out that we always build FF with
> WINVER=502 which takes the XP code parts of netwerk/sctp.
> 
> Let me know if you have any ideas for testing this code in isolation or
> putting new asserts in to make sure it's being used as expected.

Unfortunately, I have no idea. Is it that the problem only occurs on older Windows
boxes or also on more recent versions of Windows. I think we have only very limited
code which is specific for the old Windows versions. We don't do testing on
Windows XP, but I could review the XP specific code.
Whiteboard: [WebRTC],[blocking-webrtc+][leave-open] → [WebRTC],[blocking-webrtc+][leave-open][aurora-needed]
We got some interesting results from selectively turning PGO off on Try.

Got a green run from turning PGO off for both sctp and datachannel - https://tbpl.mozilla.org/?tree=Try&rev=f184fb24aeca

There is an orange 2 in this run, but it is not a crash

Got a orange run from turning off PGO for just sctp - https://tbpl.mozilla.org/?tree=Try&rev=b475e5033fc9

Got a orange run from turning off PGO for just datachannel - https://tbpl.mozilla.org/?tree=Try&rev=44932c535408

Not particularly conclusive.  Still investigating.
Adding a new Try run on the latest M-C with full PGO to see if any of the recent DC patches such as Bug 842126 fix this as well

https://tbpl.mozilla.org/?tree=Try&rev=1363f6a4f17c

All previous Trys were run on top of the original M-C landing, rev 789a60e48ee2
Attachment #720886 - Attachment is obsolete: true
Comment on attachment 730751 [details] [diff] [review]
Support DataChannels with SDP Negotiation


Updated patch to apply on latest M-C. Bringing forward r+ from jesup.
Attachment #730751 - Flags: review+
New Try on latest M-C with PGO turned off for both sctp and datachannel
https://tbpl.mozilla.org/?tree=Try&rev=6421a0a46b59
Pushed with PGO off for both sctp and datachannel

https://hg.mozilla.org/integration/mozilla-inbound/rev/af6a5d1a3097
My guess is this patch calls a DISPATCH_SYNC the 'old way' which is no longer in fashion.  I'll update soon with the new DISPATCH_SYNC style.
Wow, thanks for the blast from the past. Backed out for those #$()*@#$ leaks.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1cbbb788632a
With a fix to remove the 1 added instance of DISPATCH_SYNC:

https://tbpl.mozilla.org/?tree=Try&rev=dec9bcd9b66a

Looks quite nicely green
Whiteboard: [WebRTC],[blocking-webrtc+][leave-open][aurora-needed] → [WebRTC],[blocking-webrtc+][aurora-needed]
Comment on attachment 731462 [details] [diff] [review]
Support DataChannels with SDP Negotiation

r? back to ehugg for the mods (just removing the one Dispatch(...DISPATCH_SYNC) for SyncRunnable)
Attachment #731462 - Flags: review?(ethanhugg)
Attachment #731462 - Flags: review?(ethanhugg) → review+
Blocks: 856319
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d3fc2a278d3
Once more with feeling.  Try is green on many retriggers
https://hg.mozilla.org/mozilla-central/rev/8d3fc2a278d3
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Keywords: verifyme
Blocks: 856434
Verified through some sanity tests doing 1:1 peer hookups with data channels & with streams included.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: