Closed Bug 925880 Opened 11 years ago Closed 11 years ago

Data Channel set-up is unreliable

Categories

(Core :: WebRTC, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: standard8, Assigned: ekr)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

When using the latest nightlies and Talkilla, we are seeing an effect whereby the data channel doesn't initiate properly all the time. This is generally seen when connecting over the internet rather than two computers on the same network.

Talkilla's set up is along the lines of:

- Create Peer Connection
- Add a data channel with id = 0
- Set up the offer and pass it to the other side
- When the data channel open event is received, send a text message

When the channel is opened, if we're over the internet, we're generally seeing that the data channel has moved to the closed state rather than being open, despite the fact we've not closed it.

The same code works if you try it between two instance on a local machine or on the local network.
Blocks: Talkilla
Is this a regression?

Do A/V work?

Does AppRTC work between the two devices?
(In reply to Eric Rescorla (:ekr) from comment #1)
> Is this a regression?

Yes, this is working fine between the same two remote people on latest Aurora.

> Do A/V work?

Yes, we can do a connection fine, and having a/v also seems to make data channel work for that connection.
Jesup,

This looks like it might be in your area. If it turns out that ICE just isn't
working, please reassign to me.
Assignee: nobody → rjesup
I've just sent Randell some logs. Also of note in obtaining the logs I discovered that:

- a debug build of Nightly was correctly connecting to an opt build of Nightly.
- two opt builds of Nightly were not connecting for the data channel.
Ok, it appears to be an ICE/transport failure:

2013-10-15 09:04:39.528511 UTC - 1961009536[100567070]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:195: PeerConnectionObserverDispatch processing mCallState = 27 (SETREMOTEDESCSUCCESS), mFsmState = 16 (STABLE)
2013-10-15 09:04:39.563280 UTC - 150552576[100567530]: Setting transport signals, state: 1
2013-10-15 09:04:39.563312 UTC - 150552576[100567530]: Data transport state: 1
2013-10-15 09:04:42.931458 UTC - 150552576[100567530]: Data transport state: 5
2013-10-15 09:05:02.500422 UTC - 1961009536[100567070]: [main|PeerConnectionImpl] PeerConnectionImpl.cpp:1387: CloseInt: Closing PeerConnectionImpl d3623a916952341c; ending call

That's a transition from TS_INIT -> TS_ERROR (5)

The debug build went 1 -> 2 -> 3 (init -> connecting -> open)

Reassigning to ekr, but ethan - since you were working datachannel ICE setup you might want to have a look
Assignee: rjesup → ekr
As requested over irc, this is a log with ICE info using R_LOG_LEVEL=3.
(In reply to Mark Banner (:standard8) from comment #6)
> Created attachment 817370 [details]
> Data channel failure log with ICE details
> 
> As requested over irc, this is a log with ICE info using R_LOG_LEVEL=3.

I think the R_LOG_LEVEL 3 logs only are going to stderr....
Flags: needinfo?(mbanner)
(In reply to Randell Jesup [:jesup] from comment #7)
> (In reply to Mark Banner (:standard8) from comment #6)
> > Created attachment 817370 [details]
> > Data channel failure log with ICE details
> > 
> > As requested over irc, this is a log with ICE info using R_LOG_LEVEL=3.
> 
> I think the R_LOG_LEVEL 3 logs only are going to stderr....

The full command I used was:

R_LOG_DESTINATION=stderr R_LOG_LEVEL=3 R_LOG_VERBOSE=1 NSPR_LOG_MODULES=signaling:5,mtransport:5,timestamp webrtctest &>../datachannel_fail.txt

did this need doing on a debug build?
Flags: needinfo?(mbanner)
I'd like to try to replicate this problem.  Could you give me more details on what kind of firewalls the users may be behind when this happens and which side is initiating the connection?  Mozilla Office, Home Router, 3G/LTE connection, etc.  Are you doing your testing with the chat from talkilla.mozillalabs.com or some simpler test app?  Also, are you sure that Aurora works better than Nightly, or could this have been chance, and does apprtc.appspot.com work in the same configuration with the same side initiating?

I would be useful to get the same logs above only from a debug build and R_LOG_LEVEL=9, but if I can replicate this I can get them myself.  Thanks.
Flags: needinfo?(mbanner)
(In reply to Ethan Hugg [:ehugg] from comment #9)
> does
> apprtc.appspot.com work in the same configuration with the same side
> initiating?

apprtc doesn't do data channels, we only see this issue with data channel-only connections. Video + Audio + Data Channel works fine as far as we can tell.

I'll try and answer the rest of the questions in a little bit.
New log with the log level set to 9.

This was between me in the UK and Dan in the US. We're both on home routers, Wifi at my end.

Note, that I've just discovered trickle-ice has been turned on. It is quite possible that that is causing the issue here, so I'm going to investigate that now.
Attachment #817370 - Attachment is obsolete: true
Flags: needinfo?(mbanner)
Further testing has shown that setting media.peerconnection.trickle_ice to false seems to fix the issue.

We hadn't realised that trickle ice had been implemented, and hence hadn't yet implemented onicecandidate handling as a result. We'll implement that now, I'll mark this as invalid and re-open if we find an issue.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
No longer blocks: 923365
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: