Closed Bug 1000858 Opened 10 years ago Closed 10 years ago

TURN does not work on Windows when ports are blocked

Categories

(Core :: WebRTC: Networking, defect)

31 Branch
x86
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla31
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- verified
firefox-esr24 --- unaffected
b2g-v1.3 --- unaffected
b2g-v1.3T --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- unaffected

People

(Reporter: philipp+bugzilla, Assigned: bwc)

References

Details

(Whiteboard: [webrtc-uplift][p=3, 1.5:p1, ft:webrtc])

Attachments

(5 files)

Attached file about:webrtc log
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/33.0.1750.152 Chrome/33.0.1750.152 Safari/537.36

Steps to reproduce:

Block all inbound and outbound traffic using the windows firewall. Specifically allow ports 443/TC and 3478/UDP outbound.

Then I went into a room on talky.io where another chrome browser (from the same network) was waiting. talky runs a TURN server (restund 0.4.2) on TCP/80 and UDP/3478


Actual results:

The connection failed (with a iceconnectionstatechange to failed).

about:webrtc and a pcap containing all STUN traffic are attached. 

I am not sure why binding requests containing a username are sent to the TURN server on port 3478 in packet #82 and later


Expected results:

chrome is able to connect in that case. I think I've seen this work with firefox before.

Compared to what chrome does, I don't see any CreatePermission requests which sounds similar to https://bugzilla.mozilla.org/show_bug.cgi?id=922928
Attached file STUN traffic
Status: UNCONFIRMED → NEW
Ever confirmed: true
talky STUN/TURN settings (with time-based credentials invalidated):
"{"iceServers":[{"url":"stun:166.78.141.183"},{"username":"1398351550489","credential":"0wL/.../81kF3KI=","url":"turn:166.78.141.183"},{"username":"1398351550489","credential":"0wL/.../81kF3KI=","url":"turn:166.78.141.183:80?transport=tcp"}]}"
beta.talky.io is now running without TURN/TCP. That did not solve the problem, but at least makes sure this is not related to TURN/TCP or multiple turn servers.
In that log I'm seeing 72 candidate pairs. That's kinda large. I'll keep looking at the pcap and logs to see if I can turn anything up.
So, there are 18 pairs that are relayed on both sides (these are the only ones we expect to work). We emit check requests for 10 of these in the first 2 seconds, but the remaining 8 come in slowly over the course of the next 6 seconds (after we've set up our channel binding; 6 of these checks are sent over the bound channel). Definitely weird.
the other side has a number of virtualbox interfaces which IIRC cause this (known bug in libjingle, it only ignores vmware). I'll see if I can turn those off to reduce the number of candidates and see if that changes the situation.
I'll note that the log contains several statements of the form:

(turn/INFO) TURN(relay(IP4:192.168.178.45:51506/UDP|IP4:166.78.141.183:3478/UDP)): Creating permission for IP4:192.168.122.1:47511/UDP

But, looking at the pcap file, these messages don't appear to ever make it to the wire. I believe untangling *why* will be the key to solving this issue.
Do you have a PCAP that has _all_ of the STUN traffic in it? Because I suspect that you're tripping over the DoS circuit breaker with the combination of lots of candidates, and almost no request working (lots of retransmissions).
Flags: needinfo?(philipp+bugzilla)
Ok, what I just said was complete crap. I should have just read the code and ignored all the traces:

http://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/stun/turn_client_ctx.c#948

Ok, that's where we decide if it is time to refresh the permission. What's the impl of r_gettimeint anyway?

http://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nrappkit/src/util/libekr/r_time.c#178

... oh. :(

Will fix.
We seem to be using gettimeofday unconditionally all over the codebase, so I think I'll just try removing the #ifdef. Patch in a minute.
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED
Comment on attachment 8412247 [details] [diff] [review]
Fix r_gettimeint on win32 (borken impl prevented us from sending TURN allocations in any scenario).

Review of attachment 8412247 [details] [diff] [review]:
-----------------------------------------------------------------

Builds look good, requesting review.
Attachment #8412247 - Flags: review?(ekr)
BTW, we probably want an uplift on this. Do we think we could get this fix into Beta even, since without it TURN will be completely busted on windows until this fix lands?
Flags: needinfo?(philipp+bugzilla)
We can get it into Aurora (will be Beta Monday).  29 (current Beta): No way.
OS: Linux → Windows 7
Hardware: x86_64 → x86
Summary: TURN does not work when ports are blocked → TURN does not work on Windows when ports are blocked
Attachment #8412247 - Flags: review?(ekr) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/83ea7136632f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment on attachment 8412247 [details] [diff] [review]
Fix r_gettimeint on win32 (borken impl prevented us from sending TURN allocations in any scenario).

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 

   Initial WebRTC landing.

User impact if declined: 

   Windows users will not have functioning TURN (ie; media relay) support, causing failures when both ends are behind symmetric NATs. Also, given that TURN TCP is the only way we have around NATs that block UDP, windows users will experience failures here too.

Testing completed (on m-c, etc.): 

   The usual CI stuff has been run, not that it tests this functionality at all (or we would have noticed this sooner). We'll be verifying the fix shortly.

Risk to taking this patch (and alternatives if risky): 

   None.

String or IDL/UUID changes made by this patch:

   None.
Attachment #8412247 - Flags: approval-mozilla-aurora?
I have a windows machine that I can pull tonight's Nightly down on, once built (the toolchain is not set up, so it would take me longer to do my own build). I should be able to get a PCAP that verifies the fix, but if anybody wants to beat me to the punch, feel free.
Attached file worksnow.pcapng
checked with nightly from 2014-04-27, works like charm, both calling and callee roles. pcap attached.

Thanks alot!

(I think bug 922928 can be closed as well now)
Fantastic! Thanks for the verification!
Attachment #8412247 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [webrtc-uplift] → [webrtc-uplift][s=fx32]
Whiteboard: [webrtc-uplift][s=fx32] → [webrtc-uplift][p=3, 1.5:p1, ft:webrtc]
Keywords: verifyme
Marking verified based on comment 21
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.