Closed Bug 1434531 Opened 7 years ago Closed 7 years ago

[firefox 58] WebRTC problem with TURNs in tcp

Categories

(Core :: WebRTC: Networking, defect, P2)

58 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- wontfix
firefox59 --- fixed
firefox60 --- verified

People

(Reporter: lwu, Assigned: mjf)

References

Details

(Keywords: regression)

Attachments

(7 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36

Steps to reproduce:



Application Basics

Name: Firefox
Version: 58.0
Build ID: 20180122225619
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0
OS: Linux 4.13.0-26-generic




I don't face the problem 
 * by negotiating WebRTC with a TURN server in tcp. it does not work and can not get a relay  candidate.


I reproduced it using online tool
https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/

and adding our TURN server then test
(note that it has TURN username and password, I can't copy them here)


see attached "WebRTC samples(Trickle ICE).png":

it has not a relay candidate;


see attached "about_webrtc(Connection Log).png":

it show that "Skipping TURN server because of link local mis-match";


Firefox source code:
firefox-58.0b16\media\mtransport\third_party\nICEr\src\ice\ice_component.c (line 539)



it works when I try with firefox (57) ;


Actual results:

by negotiating WebRTC with a TURN server in tcp. it can work and can get relay  candidates, just like firefox 57.


Expected results:


the code of frefox 57:
(firefox-57.0\media\mtransport\third_party\nICEr\src\ice\ice_component.c)

function: nr_ice_component_initialize_tcp()


#ifdef USE_TURN
      /* Create a new relayed candidate for each addr/TURN server pair */
      for(j=0;j<ctx->turn_server_ct;j++){
        nr_transport_addr addr;
        nr_socket *local_sock;
        nr_socket *buffered_sock;
        nr_socket *turn_sock;
        nr_ice_socket *turn_isock;

        /* Skip non-TCP */
        if (ctx->turn_servers[j].turn_server.transport != IPPROTO_TCP)
          continue;

        if (ctx->turn_servers[j].turn_server.type == NR_ICE_STUN_SERVER_TYPE_ADDR &&
            nr_transport_addr_cmp(&ctx->turn_servers[j].turn_server.u.addr,
                                  &addrs[i].addr,
                                  NR_TRANSPORT_ADDR_CMP_MODE_VERSION)) {
          r_log(LOG_ICE,LOG_INFO,"ICE(%s): Skipping TURN server because of IP version mis-match (%u - %u)",ctx->label,addrs[i].addr.ip_version,ctx->turn_servers[j].turn_server.u.addr.ip_version);
          continue;
        }








the code of frefox 58:
(firefox-58.0b16\media\mtransport\third_party\nICEr\src\ice\ice_component.c)

function: nr_ice_component_initialize_tcp()

#ifdef USE_TURN
      /* Create a new relayed candidate for each addr/TURN server pair */
      for(j=0;j<ctx->turn_server_ct;j++){
        nr_transport_addr addr;
        nr_socket *local_sock;
        nr_socket *buffered_sock;
        nr_socket *turn_sock;
        nr_ice_socket *turn_isock;

        /* Skip non-TCP */
        if (ctx->turn_servers[j].turn_server.transport != IPPROTO_TCP)
          continue;

        if (ctx->turn_servers[j].turn_server.type == NR_ICE_STUN_SERVER_TYPE_ADDR) {
          if (nr_transport_addr_check_compatibility(
                &addrs[i].addr,
                &ctx->turn_servers[j].turn_server.u.addr)) {
            r_log(LOG_ICE,LOG_INFO,"ICE(%s): Skipping TURN server because of link local mis-match",ctx->label);
            continue;
          }
        }



the code has changed(nr_transport_addr_cmp -> nr_transport_addr_check_compatibility)
please forget the describe:
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
(In reply to lwu@grandstream.cn from comment #0)
> it works when I try with firefox (57) ;

If possible, please use mozregression to find the regression range and post the resulting link (hg.mozilla.org…) in a comment here.
https://mozilla.github.io/mozregression/quickstart.html
Has Regression Range: --- → no
Component: Untriaged → WebRTC: Networking
Flags: needinfo?(lwu)
Product: Firefox → Core
(In reply to Gingerbread Man from comment #3)
> (In reply to lwu@grandstream.cn from comment #0)
> > it works when I try with firefox (57) ;
> 
> If possible, please use mozregression to find the regression range and post
> the resulting link (hg.mozilla.org…) in a comment here.
> https://mozilla.github.io/mozregression/quickstart.html

Hi, 

Thanks for your reply. 
The regression range is:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=15f221f491f707b1e8e46da344b6dd5a394b1242&tochange=c97190c389c4cfef20fe55b4bacade95a36ae6ef





user@user:~/src/mozregression$ mozregression --good 2017-10-1 --bad 2018-2-1

......

35:35.14 INFO: Downloading build from: https://archive.mozilla.org/pub/firefox/nightly/2017/10/2017-10-02-22-02-04-mozilla-central/firefox-58.0a1.en-US.linux-x86_64.tar.bz2
===== Downloaded 100% =====
37:25.36 INFO: Running mozilla-central build for 2017-10-02
37:38.03 INFO: Launching /tmp/tmpBxIJzM/firefox/firefox
37:38.03 INFO: Application command: /tmp/tmpBxIJzM/firefox/firefox -profile /tmp/tmpULRooR.mozrunner
37:38.04 INFO: application_buildid: 20171002220204
37:38.04 INFO: application_changeset: 15f221f491f707b1e8e46da344b6dd5a394b1242
37:38.04 INFO: application_name: Firefox
37:38.04 INFO: application_repository: https://hg.mozilla.org/mozilla-central
37:38.04 INFO: application_version: 58.0a1
Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry', 'back' or 'exit' and press Enter): good
39:50.97 INFO: Narrowed nightly regression window from [2017-10-01, 2017-10-03] (2 days) to [2017-10-02, 2017-10-03] (1 days) (~0 steps left)
39:50.98 INFO: Got as far as we can go bisecting nightlies...
39:50.98 INFO: Last good revision: 15f221f491f707b1e8e46da344b6dd5a394b1242 (2017-10-02)
39:50.98 INFO: First bad revision: c97190c389c4cfef20fe55b4bacade95a36ae6ef (2017-10-03)
39:50.98 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=15f221f491f707b1e8e46da344b6dd5a394b1242&tochange=c97190c389c4cfef20fe55b4bacade95a36ae6ef

39:50.98 INFO: Switching bisection method to taskcluster
39:50.98 INFO: Getting mozilla-central builds between 15f221f491f707b1e8e46da344b6dd5a394b1242 and c97190c389c4cfef20fe55b4bacade95a36ae6ef
40:04.43 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=15f221f491f707b1e8e46da344b6dd5a394b1242&tochange=c97190c389c4cfef20fe55b4bacade95a36ae6ef
Flags: needinfo?(lwu)
Attached file mozregression_log β€”
Thank you. If I'm not mistaken, it should be possible to reduce the regression window further, using the following command line. Also, if you can test on Windows, mozregression-gui does this automatically.

mozregression --good a394d976 --bad 6669a572


Out of that regression range, bug 1380346 and bug 1401540 look like they might possibly be related. Flagging the appropriate people.
Has Regression Range: no → yes
Flags: needinfo?(dminor)
Flags: needinfo?(apehrson)
I don't think bug 1380346 is related, but bug 1361894 from the same range looks spot on. It also matches the code change mentioned in comment 0. Michael, can you take a look?
Blocks: 1361894
Flags: needinfo?(mfroman)
Flags: needinfo?(dminor)
Flags: needinfo?(apehrson)
(In reply to lwu@grandstream.cn from comment #1)
> Created attachment 8946982 [details]
> about_webrtc(Connection Log).png

I need to see all of the about:webrtc output for both your working case (57) and non-working case (58).  You can save the output by clicking the "Save Page" at the top of the about:webrtc page.  Please attach both html files on this bug.

It may also help if I could test against your turn server directly.  To avoid attaching temporary credentials here, you can send them to mfroman@mozilla.com if you wish.
Flags: needinfo?(mfroman) → needinfo?(lwu)
(In reply to Michael Froman [:mjf] from comment #8)
> (In reply to lwu@grandstream.cn from comment #1)
> > Created attachment 8946982 [details]
> > about_webrtc(Connection Log).png
> 
> I need to see all of the about:webrtc output for both your working case (57)
> and non-working case (58).  You can save the output by clicking the "Save
> Page" at the top of the about:webrtc page.  Please attach both html files on
> this bug.
> 
> It may also help if I could test against your turn server directly.  To
> avoid attaching temporary credentials here, you can send them to
> mfroman@mozilla.com if you wish.


Thanks for your reply.
I will send them to your E-mail later;
Flags: needinfo?(lwu)
Attached file aboutWebrtc(57).html β€”
Attached file aboutWebrtc(58).html β€”
I can confirm.  This happens on my local build.  Diving in further.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: nobody → mfroman
I am experiencing the same issue with Firefox version 58.  We use a TURN server with the URI specifying TCP as the transport protocol.  This worked fine prior to Firefox version 58 which introduced the fix for bug 1361894.  The problem seems to be occurring because the TURN server is using TCP as its protocol and the relay candidates are all using UDP. The fix for bug 1361894 added a check for version mismatch which causes the potential relay candidates to get thrown out because they are UDP and the TURN server is TCP.

The TURN server specification RFC-5766 "Traversal Using Relays around NAT (TURN):  Relay Extensions to Session Traversal Utilities for NAT (STUN)" allows you to use TCP as the transport protocol between the client and the Turn server:

"TURN, as defined in this specification, always uses UDP between the server and the peer.  However, this specification allows the use of any one of UDP, TCP, or Transport Layer Security (TLS) over TCP to carry the TURN messages between the client and the server.

If TCP or TLS-over-TCP is used between the client and the server,then the server will convert between these transports and UDP transport when relaying data to/from the peer.  Since this version of TURN only supports UDP between the server and the peer, it is expected that most clients will prefer to use UDP between the client and the server as well."
(In reply to Ted Malzahn from comment #13)
> I am experiencing the same issue with Firefox version 58.  We use a TURN
> server with the URI specifying TCP as the transport protocol.  This worked
> fine prior to Firefox version 58 which introduced the fix for bug 1361894. 
> The problem seems to be occurring because the TURN server is using TCP as
> its protocol and the relay candidates are all using UDP. The fix for bug
> 1361894 added a check for version mismatch which causes the potential relay
> candidates to get thrown out because they are UDP and the TURN server is TCP.
> 
> The TURN server specification RFC-5766 "Traversal Using Relays around NAT
> (TURN):  Relay Extensions to Session Traversal Utilities for NAT (STUN)"
> allows you to use TCP as the transport protocol between the client and the
> Turn server:
> 
> "TURN, as defined in this specification, always uses UDP between the server
> and the peer.  However, this specification allows the use of any one of UDP,
> TCP, or Transport Layer Security (TLS) over TCP to carry the TURN messages
> between the client and the server.
> 
> If TCP or TLS-over-TCP is used between the client and the server,then the
> server will convert between these transports and UDP transport when relaying
> data to/from the peer.  Since this version of TURN only supports UDP between
> the server and the peer, it is expected that most clients will prefer to use
> UDP between the client and the server as well."
A fix is on the way.  The problem is that our new check tests for protocol match, but the check for address compatibility, we create a new address from a copy of the original.  The new address is then set to TCP.  I'm moving that address copy earlier.
Comment on attachment 8948025 [details]
Bug 1434531 - fix missing TCP relay candidates.

https://reviewboard.mozilla.org/r/217670/#review223464

LGTM
Attachment #8948025 - Flags: review?(drno) → review+
Michael please request uplift to 59 for the fix.
Flags: needinfo?(mfroman)
Rank: 12
Priority: -- → P2
Pushed by mfroman@nostrum.com:
https://hg.mozilla.org/integration/autoland/rev/dad90457e8fe
fix missing TCP relay candidates. r=drno
https://hg.mozilla.org/mozilla-central/rev/dad90457e8fe
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
This should be fixed in the nightly build for today.  Please give it a try and confirm it works for you.
Flags: needinfo?(lwu)
Is there anyway to get this fixed in Firefox 59? FireFox 60 is not due out until May. Our solution will not work on Firefox until we get this fixed and we will have to tell our customers to use Chrome instead of Firefox.
As soon as I have confirmation that it works in Nightly, I'll be filing to uplift to 59.
Flags: needinfo?(mfroman)
Flags: needinfo?(ted.malzahn)
(In reply to Ted Malzahn from comment #21)
> Is there anyway to get this fixed in Firefox 59? FireFox 60 is not due out
> until May. Our solution will not work on Firefox until we get this fixed and
> we will have to tell our customers to use Chrome instead of Firefox.

If you don't mind verifying it works for you in Nightly, that will speed my uplift request.
(In reply to Michael Froman [:mjf] from comment #23)
> (In reply to Ted Malzahn from comment #21)
> > Is there anyway to get this fixed in Firefox 59? FireFox 60 is not due out
> > until May. Our solution will not work on Firefox until we get this fixed and
> > we will have to tell our customers to use Chrome instead of Firefox.
> 
> If you don't mind verifying it works for you in Nightly, that will speed my
> uplift request.
 
I appologize for the late reply, i verified it works in Nightly;
it fixed the issue.

see attached "WebRTC samples(Trickle ICE, Nightly).png";
Flags: needinfo?(lwu)
Comment on attachment 8948025 [details]
Bug 1434531 - fix missing TCP relay candidates.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1361894.
[User impact if declined]: TCP TURN relay connections will fail making it more difficult to complete webrtc calls in adverse networks.
[Is this code covered by automated tests?]: No.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: Very little risk.
[Why is the change risky/not risky?]: There is no new code, just moved where the TCP address creation a bit earlier in the code.
[String changes made/needed]: None.
Attachment #8948025 - Flags: approval-mozilla-beta?
I have confirmation that it is working so removing the NI.
Flags: needinfo?(ted.malzahn)
Status: RESOLVED → VERIFIED
Comment on attachment 8948025 [details]
Bug 1434531 - fix missing TCP relay candidates.

Low-risk fix, already verified on Nightly. Let's take this for 59b8.
Attachment #8948025 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Michael Froman [:mjf] from comment #23)
> (In reply to Ted Malzahn from comment #21)
> > Is there anyway to get this fixed in Firefox 59? FireFox 60 is not due out
> > until May. Our solution will not work on Firefox until we get this fixed and
> > we will have to tell our customers to use Chrome instead of Firefox.
> 
> If you don't mind verifying it works for you in Nightly, that will speed my
> uplift request.

The nightly build solves the issue.  TURN server is now working again with WebRTC.
Still not fixed in Firefox 60
(In reply to mss.parveensachdeva from comment #31)
> Still not fixed in Firefox 60

You may be facing a slightly different issue, or maybe a regression. Could you open a new bug so we can take a look at it?
Flags: needinfo?(mss.parveensachdeva)
Flags: needinfo?(mss.parveensachdeva)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: