Closed Bug 1416220 Opened 7 years ago Closed 3 years ago

WebRTC problem with TURNs in tcp and a proxy (ok with TURN and proxy)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

87 Branch

Tracking Flags:

Tracking

Status

firefox87

---

fixed

People

(Reporter: borschneck, Assigned: bwc)

References

(Regressed 1 open bug)

Details

(Whiteboard: [needinfo to drno on 2017/11/13] )

Attachments

(10 files, 2 obsolete files)

TURNs direct + proxy port.jpg 7 years ago Pascal BORSCHNECK 84.23 KB, image/jpeg		Details
TURNs direct + proxy port.jpg 7 years ago Pascal BORSCHNECK 84.23 KB, image/jpeg		Details
TURN proxy ok.jpg 7 years ago Pascal BORSCHNECK 102.87 KB, image/jpeg		Details
Bug 1416220 - Part 1. Make connect-only http channels setup end2end ssl 6 years ago Paul Vitale [:iidebyo] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220 - Part 2. Add secure option to WebrtcProxyChannel and related 6 years ago Paul Vitale [:iidebyo] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220: Fix a typo (srvflx -> srflx) in this test. r?ng 3 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220: Ensure that TCP is not blocked in this test, and allow loopback to be used because that's what happens in CI sometimes. r?ng 3 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220: Make the log module for WebrtcTCPSocket be what one would expect it to be. r?mjf 3 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220: Modify mtransport to delegate DNS lookups for TCP sockets to the socket class, instead of using nr_resolver. r?mjf 3 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220: Prevent WebrtcTCPSocket from using an IP version that is inconsistent with the requested local address. r?mjf 3 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220: Add/improve some logging, and add an assertion. r?mjf 3 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1416220: Get ice_unittest working. r?mjf 3 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review

Pascal BORSCHNECK

Reporter

Description

•

7 years ago

Attached image TURNs direct + proxy port.jpg — Details

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
Build ID: 20171024165158

Steps to reproduce:

One partner has a Firefox with a proxy.pac
This proxy.pac points to the proxy
IP: 192.168.245.49
Port: 8080

I don't face a problem 
 * by negotiating WebRTC with a TURN server in tcp.  It'll well use the proxy IP:Port (see attached "TURN proxy ok.jpg").
 * using Chrome, with the same proxy.pac, OK with TURN and TURNs.

But negotiating in TURNs (with tcp), the FF WebRTC stack seems to not use proxy IP:port but
<TURNs server IP>:<Proxy port> 
I think its a bug (see "TURNs direct + proxy port.jpg"  )
With here 149.202.202.213 our TURN server's IP adress ... and 8080 the proxy port number !

I reproduced it using online tool
https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/
and adding our TURN server then test, removing it, adding our TURNs server and testing
(note they have TURN username and password, I won't copy them here ;) )


Actual results:

TURNs negotiation not possible with a proxy in our partner's network from Firefox 56.


Expected results:

Works well as TURN does ;)

Pascal BORSCHNECK

Reporter

Comment 1

•

7 years ago

Attached image TURNs direct + proxy port.jpg — Details

Pascal BORSCHNECK

Reporter

Comment 2

•

7 years ago

Attached image TURN proxy ok.jpg — Details

Kohei Yoshino [:kohei]

Updated

•

7 years ago

Component: Untriaged → WebRTC: Networking

Product: Firefox → Core

Paul Adenot (:padenot)

Comment 3

•

7 years ago

Nils, is that known ?

Flags: needinfo?(drno)

Whiteboard: [needinfo to drno on 2017/11/13]

Pascal BORSCHNECK

Reporter

Comment 4

•

6 years ago

Hi,

No news ?

With a test on another client (a bank) which has the same problem with TURN and Firefox we found something weird.  This bank has
 - a proxy
 - their corporate DNS is very restrictive (only trusted sites from the bank are resolved by their corporate DNS).

we noticed that
 * no problem with Chrome...
 * but Firefox seems to not ask the PROXY to do the TURN DNS resolution, but try to do it first locally (with here a specific DNS very restrictive)

After looking at network traces, Chrome asks the PROXY to do the DNS resolution => OK
But Firefox tries to resolve it with the PC's DNS ... and this doesn't work.

=> Firefox should also ask the PROXY to do the DNS resolution ?  Or ?

Andreas Pehrson [:pehrsons]

Comment 5

•

6 years ago

Byron, perhaps you can check this out sooner than Nils?

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 6

•

6 years ago

Let me look into it.

Assignee: nobody → docfaraday

Flags: needinfo?(drno)

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 7

•

6 years ago

Yeah, looking at the proxy code in nICEr, there's no way it can handle establishing a TLS connection through a http proxy, because the way we implement this is:

1. Take the socket, and override its remote address/port to point at the proxy.
2. Tell the socket to connect.
3. Do a proxy handshake over the socket once it is connected.
4. Hand the resulting socket to the TURN code.

When we're starting out with a TURN TLS socket, this doesn't quite work; step 2 ignores the IP address override and instead uses the FQDN for the TURN server. I don't see any quick fix for this, because in this case we do not want a TLS handshake at step 2, we want it between steps 3 and 4.

I think the right way to fix this is to let Necko handle all of the proxy stuff for us under the hood, so that it just looks like we're using plain old TCP/TLS sockets.

Nils Ohlmeier [:drno]

Comment 8

•

6 years ago

We should also not resolve the TURN URL in case we use a proxy. That allows to use filter criteria if the proxy is to be used. And there is another bug report which states that this should work on systems without any DNS resolver, because the proxy can do the DNS resolution for them.

The bigger issue though is that I think reworking this should wait until we have moved mtransport into a Necko process.

Nils Ohlmeier [:drno]

Updated

•

6 years ago

Status: UNCONFIRMED → NEW

Rank: 25

Ever confirmed: true

Priority: -- → P3

Paul Vitale [:iidebyo]

Comment 9

•

6 years ago

Attached file Bug 1416220 - Part 1. Make connect-only http channels setup end2end ssl (obsolete) — Details

The CONNECT-only flag for http channels will now perform ssl setup for https
uris.  The http connection will now request the transaction to reset after the
proxy tunnel has been established.  An http transaction will only do a partial
reset since the http request head is still needed but no more http data will be
transmitted.  After ssl setup the http connection will close the transaction to
complete the http request.

The xpcshell test requires a raw tcp socket to pipe data through the proxy to
setup ssl.  A node http server has been added which accepts specific CONNECT
requests to facilitate this pipe.  CONNECT requests are accepted if the host has
been registered in the source file and the port matches a listening port.  The
server has http and https listeners.  These ports are recorded in environmnent
variables.

Paul Vitale [:iidebyo]

Comment 10

•

6 years ago

Attached file Bug 1416220 - Part 2. Add secure option to WebrtcProxyChannel and related (obsolete) — Details

This adds a flag to WebrtcProxyChannel and related IPDL classes/helpers to
indicate a secure connection should be established to the endpoint.

When using a double tunnel to the TURN server, the streams are tls filters
wrapped around a raw socket stream.  The stream callbacks do not receive the
tls filter stream but the raw socket stream.

Depends on D13037

Paul Vitale [:iidebyo]

Comment 11

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b489bc1215d34db080ae51f3112a0a04ac2c9030

Paul Vitale [:iidebyo]

Updated

•

5 years ago

Depends on: 1528472

Daniel Koch

Comment 12

•

4 years ago

What is the Status of this bug?
We are still affected by this as we use a proxy which is only allowed to connect to WAN using 80,8080,443 and 8443.
So only TURNs is possible but the request never hits the proxy.

Firefox 82.0.2 is still affected. Unrelated to the OS used.

Byron Campen [:bwc]

Assignee

Comment 13

•

4 years ago

ni self to look into this. Maybe this is easier to fix since the socket process work?

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 14

•

4 years ago

•

Edited

So the patches on this bug have been overtaken by events. At this point, we'll need to do the following:

Change this member to be used for FQDNs in general, and fill it in unconditionally:

https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/dom/media/webrtc/transport/third_party/nICEr/src/net/transport_addr.h#72
https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/dom/media/webrtc/transport/third_party/nICEr/src/ice/ice_component.c#577-582

Modify this code to grab the FQDN instead of an IP address for the remote end, if that FQDN is present. This will need to happen regardless of whether this is a TLS candidate or not; all TCP sockets will end up working the same way.

https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/dom/media/webrtc/transport/nr_socket_tcp.cpp#103-106

Stop running this code for TCP candidates, and instead do something similar to what we do when an IP address is used:

https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/dom/media/webrtc/transport/third_party/nICEr/src/ice/ice_candidate.c#623-653

Scrub through nICEr and find places that assume nr_transport_addr in an initialized TURN candidate has a resolved IP address.

I think the code on the other side of this call will do the right thing if we feed it an FQDN already, but of course there could be problems in practice:

https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/dom/media/webrtc/transport/nr_socket_tcp.cpp#124-125

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 15

•

4 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5029999ab7d5e1332455f7a309df2b39a6f4367f

Byron Campen [:bwc]

Assignee

Comment 16

•

4 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a00d0b41bc1eeaa130b4cffe11857c034e0f50b

Byron Campen [:bwc]

Assignee

Comment 17

•

3 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5dced786d2b9f104d2af06a837147fcfc743aa3c

Byron Campen [:bwc]

Assignee

Updated

•

3 years ago

Comment 18

•

3 years ago

I think the problem I am tripping over here relates to the fact that the linux testers do not have a resolvable hostname configured, and so we are forced to use "localhost" for the test TURN server on those machines. By moving the DNS resolution into the WebrtcTCPSocket class, nICEr does not know what to use for the local IP address (loopback address vs a real one), and it defaults to using a real one, which cannot be used to connect to a TURN server running on a localhost address. I could fix that, but it would be kinda ugly, and would only be useful for making this stuff work on the testers, since in the real world it is extremely unlikely to see anyone using a TURN server on a localhost address, and if they did it would be acceptable to tell them to attach it to the real local address instead.

Byron Campen [:bwc]

Assignee

Comment 19

•

3 years ago

Joel, is there any way we could ensure that the linux testers have a real hostname configured that is resolvable through DNS? That would get me over this hurdle. (See comment 18 for the reason this would be helpful)

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 20

•

3 years ago

thanks for asking :bwc. I think your best bet will be to put this on our hardware workers that run perf tests in our datacenter. These all have DNS already. There are 2 concerns:

load (we might overload these boxes), if too much we could create a media-turn job that just runs these specific tests
os version/package versions. Currently these run ubuntu 16.04 (in the next 6 weeks should all be running 18.04). These machines are not in parity with the docker worker we run unittests on, so there could be packages that need installation.

I would recommend:

forcing to run on existing hardware. here is an example of running mochitest-webgpu on windows hardware: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/mochitest.yml#549, you would need to do something similar for linux+mochitest-media.
running on an 18.04 machine that is in staging. When you are ready for that, ask :aerickson for what to set it to. Ideally you would be able to run something like ./mach try fuzzy linux !debug mochitest-media !fis --worker-override t-talos-1604=t-talos-1804 where you put the correct values in for the worker-override.
if packages are missing on 18.04, this would be a great time to get them installed :)

Flags: needinfo?(jmaher)

Byron Campen [:bwc]

Assignee

Comment 21

•

3 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #20)

thanks for asking :bwc. I think your best bet will be to put this on our hardware workers that run perf tests in our datacenter. These all have DNS already. There are 2 concerns:

load (we might overload these boxes), if too much we could create a media-turn job that just runs these specific tests

os version/package versions. Currently these run ubuntu 16.04 (in the next 6 weeks should all be running 18.04). These machines are not in parity with the docker worker we run unittests on, so there could be packages that need installation.

I would recommend:

forcing to run on existing hardware. here is an example of running mochitest-webgpu on windows hardware: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/mochitest.yml#549, you would need to do something similar for linux+mochitest-media.

running on an 18.04 machine that is in staging. When you are ready for that, ask :aerickson for what to set it to. Ideally you would be able to run something like ./mach try fuzzy linux !debug mochitest-media !fis --worker-override t-talos-1604=t-talos-1804 where you put the correct values in for the worker-override.

if packages are missing on 18.04, this would be a great time to get them installed :)

Do we have an idea how much additional load we're talking about here, percentage-wise? Are we sure that we cannot modify the image for the testers? We can do DNS lookups on them already it seems, but they just don't have their hostname configured (it is a somewhat short random hex string that doesn't resolve to anything, as opposed to something like "ec2-13-56-250-44.us-west-1.compute.amazonaws.com" which does)?

Joel Maher ( :jmaher ) (UTC -8)

Comment 22

•

3 years ago

I am not too familiar with AWS and docker- it would seem that we would need to coordinate across AWS and into docker with the dns entry. Possibly there are some ifconfig routes to add which could make something work.

One option if you want to play with things is to consider running a command in the pre-flight script:
https://searchfox.org/mozilla-central/source/testing/mozharness/configs/unittests/linux_unittest.py#250

That would allow the docker image to be setup with whatever you do there before running any harness scripts.

Byron Campen [:bwc]

Assignee

Comment 23

•

3 years ago

So I don't see anything in the preflight for windows that looks like it is setting up hostname stuff; I wonder how that hostname is set on windows, and whether we're just missing a bit of configuration like that on linux? Who might know more about this?

Joel Maher ( :jmaher ) (UTC -8)

Comment 24

•

3 years ago

I thought this was for linux specifically, not windows. Do you need this both for windows and linux? I think windows will be more difficult as we run it in a VM in a hacky way on an instance not via docker. If this is needed for both, do we need it for macosx and android also?

Byron Campen [:bwc]

Assignee

Comment 25

•

3 years ago

I'm saying that the hostname is set properly when running Windows tests in AWS, and that we may be missing some bit of config for linux that would do the same thing for linux tests in AWS.

Joel Maher ( :jmaher ) (UTC -8)

Comment 26

•

3 years ago

oh, I overlooked that, much clearer now.
:grenade

can you explain how we set the hostname for windows @AWS to resolve via DNS?
will azure offer the same support as we switch there next quarter.
any thoughts on how to get our docker image working with dns name resolution @AWS like windows is?

Flags: needinfo?(rthijssen)

Andrew Erickson [:aerickson]

Comment 27

•

3 years ago

if packages are missing on 18.04, this would be a great time to get them installed :)

I audited the new 1804 puppet configuration and it should have every package that's present in the 1804 docker image.

So I don't see anything in the preflight for windows that looks like it is setting up hostname stuff; I wonder how that hostname is set on windows, and whether we're just missing a bit of configuration like that on linux? Who might know more about this?

I don't think we have windows docker workers. These linux testers are running under docker-worker (https://docs.taskcluster.net/docs/reference/workers/docker-worker). Taskcluster-team maintains the linux docker-worker instances.

The linux talos workers Joel recommended are metal/hardware instances.

Rob Thijssen [:grenade (EET/UTC+0300)]

Comment 28

•

3 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #26)

oh, I overlooked that, much clearer now.
:grenade

can you explain how we set the hostname for windows @AWS to resolve via DNS?

will azure offer the same support as we switch there next quarter.

any thoughts on how to get our docker image working with dns name resolution @AWS like windows is?

on windows we just modify the local hosts file (c:\windows\system32\drivers\etc\hosts) so that the hostname points to 127.0.0.1
yes
it's pretty hacky but the same trick of modifying the local hosts file should work.

sorry about the slow response.

Flags: needinfo?(rthijssen)

Byron Campen [:bwc]

Assignee

Comment 29

•

3 years ago

•

Edited

Seeing some oranges on these:

https://treeherder.mozilla.org/jobs?repo=try&revision=fa9806d81cc7f82a908a9de401df1d1aca9f78ac
https://treeherder.mozilla.org/jobs?repo=try&revision=fe3d61fc60a07614039c122a54dec683ee09a072

Trying the same on the base revision here for comparison:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6eb74c36eb118ebe3164f4f2200a78b0211c593b
https://treeherder.mozilla.org/#/jobs?repo=try&revision=27d833553bde3abc35405a7508a683fadbfb1fe7

Edit: Base revision looks pretty much the same.

Byron Campen [:bwc]

Assignee

Comment 30

•

3 years ago

Try pushes for cleaned-up patches:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf8a10332cafe4ce556a388c259a37e19ffe6769
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6ad4f2738af57ff2577025653a710edbb4f3dd7f

Byron Campen [:bwc]

Assignee

Comment 31

•

3 years ago

Attached file Bug 1416220: Fix a typo (srvflx -> srflx) in this test. r?ng — Details

Byron Campen [:bwc]

Assignee

Comment 32

•

3 years ago

Attached file Bug 1416220: Ensure that TCP is not blocked in this test, and allow loopback to be used because that's what happens in CI sometimes. r?ng — Details

Depends on D101655

Byron Campen [:bwc]

Assignee

Comment 33

•

3 years ago

Attached file Bug 1416220: Make the log module for WebrtcTCPSocket be what one would expect it to be. r?mjf — Details

Depends on D101656

Byron Campen [:bwc]

Assignee

Comment 34

•

3 years ago

Attached file Bug 1416220: Modify mtransport to delegate DNS lookups for TCP sockets to the socket class, instead of using nr_resolver. r?mjf — Details

Modified nr_transport_addr to allow it to represent an
fqdn/ip-version/protocol tuple, and taught nICEr to handle the
fqdn case appropriately.
Since nr_transport_addr can represent an fqdn, nr_ice_stun_server
did not need this ability anymore, and was significantly simplified.
Taught NrIceCtx to handle creation of a V4/V6 pair of nr_ice_stun_server
when an fqdn is used, instead of having nICEr create pairs later.

Depends on D101657

Byron Campen [:bwc]

Assignee

Comment 35

•

3 years ago

Attached file Bug 1416220: Prevent WebrtcTCPSocket from using an IP version that is inconsistent with the requested local address. r?mjf — Details

Depends on D101658

Byron Campen [:bwc]

Assignee

Comment 36

•

3 years ago

Attached file Bug 1416220: Add/improve some logging, and add an assertion. r?mjf — Details

Depends on D101659

Phabricator Automation

Updated

•

3 years ago

Attachment #9027783 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9027784 - Attachment is obsolete: true

Byron Campen [:bwc]

Assignee

Comment 37

•

3 years ago

•

Edited

Try pushes for rebase:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aed1ad9535c0c1f17c4de0e1e0d8d2fec365befe
https://treeherder.mozilla.org/#/jobs?repo=try&revision=df24312a92312b0c851e93b6460a4dd16379d406

Base revision:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=13bc8a9c8188c6a69d1f1fd5ce1966025d6301f2
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b756284c3d116571bcc1420b5fdec790d148d399

Edit: try looks about as ok as base revision

Pulsebot

Comment 38

•

3 years ago

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a2069e46d965
Fix a typo (srvflx -> srflx) in this test. r=ng
https://hg.mozilla.org/integration/autoland/rev/d6ec5692e05e
Ensure that TCP is not blocked in this test, and allow loopback to be used because that's what happens in CI sometimes. r=ng
https://hg.mozilla.org/integration/autoland/rev/d4cadef3d55d
Make the log module for WebrtcTCPSocket be what one would expect it to be. r=mjf
https://hg.mozilla.org/integration/autoland/rev/8720dfa5a6e0
Modify mtransport to delegate DNS lookups for TCP sockets to the socket class, instead of using nr_resolver. r=mjf
https://hg.mozilla.org/integration/autoland/rev/8e9a13c508c6
Prevent WebrtcTCPSocket from using an IP version that is inconsistent with the requested local address. r=mjf
https://hg.mozilla.org/integration/autoland/rev/cf60a787b5ad
Add/improve some logging, and add an assertion. r=mjf

Narcis Beleuzu [:NarcisB]

Comment 39

•

3 years ago

Backed out for GTest failure on TestGatherDNSStunBogusHostnameTcp

Backout link: https://hg.mozilla.org/integration/autoland/rev/aa003f802fb23a48ad29111dc143c6e175930607
Log link: https://treeherder.mozilla.org/logviewer?job_id=326723219&repo=autoland&lineNumber=32002

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 40

•

3 years ago

Ok, there are multiple problems here. I've worked around most of them, but one still remains; the OS X testers do not seem to be able to reach Google's STUN servers. Flaws in the tests have prevented this problem from causing failures.

Joel, any idea why the OS X testers wouldn't be able to reach Google's publicly available STUN servers?

Flags: needinfo?(docfaraday) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 41

•

3 years ago

ok, the one difference between windows/linux and macosx is that macosx is real machines in a datacenter whilst the others are vm/docker images at AWS.

There must be one of 2 things going on:

network ports are blocked from the datacenter
the osx machines have some firewall rules

I suspect #1.

:bwc, which ip addresses do we need? the failures look to be on false stun servers, not live ones.

Flags: needinfo?(jmaher) → needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 42

•

3 years ago

So, we cannot predict the remote IP address. Right this second, stun.l.google.com resolves to 173.194.200.127, but that will always be subject to change.

What do you mean by "false stun servers"?

Flags: needinfo?(docfaraday) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 43

•

3 years ago

I see the test failures being named: TestGatherDNSStunBogusHostnameTcp, VerifyTestStunServerV6FQDN.

possibly I don't understand things. so basically we need stun*.*.google.com resolved?

:dhouse, do you have a way to determine if these are allowed hosts on the MDC OSX machines?

Flags: needinfo?(jmaher) → needinfo?(dhouse)

:dhouse

Comment 44

•

3 years ago

:bwc what can I test for in the mac testers? (or can the tests be adjusted to log the actual failure, or to test for the stun host names resolving if that is the cause of this problem?)

I've tested resolving 'stun.l.google.com' with just ping from a few of the production gecko-t-osx-1014 workers (r7 mac minis in the mozilla datacenters), and they are all able to resolve it (currently to 209.85.144.127):

$ for (( I=0; I<=472; I+=RANDOM%100+1 )); do ssh -o StrictHostKeyChecking=no -o ConnectTimeout=3 -o UserKnownHostsFile=/dev/null t-mojave-r7-$(printf "%03d" $I).test.releng.mdc$(( 2-I/236 )).mozilla.com 'hostname; ping -c1 stun.l.google.com | grep -o "from [0-9\.]*"' 2>/dev/null; done
t-mojave-r7-041.test.releng.mdc2.mozilla.com
from 209.85.144.127
t-mojave-r7-099.test.releng.mdc2.mozilla.com
from 209.85.144.127
t-mojave-r7-256.test.releng.mdc1.mozilla.com
from 209.85.144.127
t-mojave-r7-351.test.releng.mdc1.mozilla.com
from 209.85.144.127
t-mojave-r7-407.test.releng.mdc1.mozilla.com
from 209.85.144.127

Flags: needinfo?(dhouse) → needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 45

•

3 years ago

We aren't having any trouble resolving, but it does appear that we aren't reaching the STUN port (in this case, 19305). We send (UDP) STUN packets, but never see a response. The linux/windows testers get responses just fine.

Flags: needinfo?(docfaraday) → needinfo?(dhouse)

:dhouse

Comment 46

•

3 years ago

(In reply to Byron Campen [:bwc] from comment #45)

We aren't having any trouble resolving, but it does appear that we aren't reaching the STUN port (in this case, 19305). We send (UDP) STUN packets, but never see a response. The linux/windows testers get responses just fine.

I tested reaching out on port 19305 with netcat, and it does appear to be blocked. Is 19305 the port that will always be used for STUN, or is there a standard port we can allow through the network? We can ask netops to allow the specific traffic.

Are the windows and linux tests being done from the perf/talos workers within the datacenters? (netops would be able to compare the rules for those to see how to match it for the macs)

Flags: needinfo?(dhouse) → needinfo?(docfaraday)

Joel Maher ( :jmaher ) (UTC -8)

Comment 47

•

3 years ago

:dhouse, the linux/windows tests are run at AWS not at MDC1, so there might not be working flows setup for networking.

:dhouse

Comment 48

•

3 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #47)

:dhouse, the linux/windows tests are run at AWS not at MDC1, so there might not be working flows setup for networking.

Thanks! That explains why they aren't getting blocked on linux/windows.

:dhouse

Comment 49

•

3 years ago

The firewalls in the datacenter do deep packet inspection, and so we'll need to capture and re-test some failed connections to verify we are allowing the traffic (since the firewall needs to know what the traffic does and looks like and not just an allow on the port). If there is a range of ports that could be used, then we can allow that traffic for more than one port.

I tested a stun connection with the nodejs stun library (https://github.com/nodertc/stun), and confirmed that is blocked by the firewall on port 19305:

[dhouse@t-mojave-r7-256.test.releng.mdc1.mozilla.com ~]$ node <<<"require('stun').request('stun.l.google.com:19305',(e,r)=>{console.log(e||r.getXorAddress())});"
Error: timeout
[...]
house@home:~$ node <<<"require('stun').request('stun.l.google.com:19305',(e,r)=>{console.log(e||r.getXorAddress())});"
{ port: 60803, family: 'IPv4', address: 'my_ip_addr' }

Byron Campen [:bwc]

Assignee

Comment 50

•

3 years ago

Port 19305 is what Google's stun servers have been using for a long time, so it probably won't change on us. I am not sure why they chose that port; 3478 is supposed to be the standard port. If Google were to change that port, I would think that it would change to 3478, so ensuring 3478 is also not blocked would be a good idea (this would also allow us to switch our tests over to a different STUN server, if it became necessary).

Flags: needinfo?(docfaraday)

:dhouse

Comment 51

•

3 years ago

:bwc, :ctb added a firewall policy to allow this and I tested to stun.l.google.com:19305 from a mac tester in both datacenters, and it works now (I tried another on port 3478 that works also).
Can you re-try the tasks?

[dhouse@t-mojave-r7-256.test.releng.mdc1.mozilla.com ~]$ node <<<"require('stun').request('stun.l.google.com:19305',(e,r)=>{console.log(e||r.getXorAddress())});"                             
{ port: 38160, family: 'IPv4', address: '63.245.208.129' }
[dhouse@t-mojave-r7-256.test.releng.mdc1.mozilla.com ~]$ node <<<"require('stun').request('stun.stunprotocol.org:3478',(e,r)=>{console.log(e||r.getXorAddress())});"
{ port: 32902, family: 'IPv4', address: '63.245.208.129' }

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 52

•

3 years ago

Seems to be working now, thanks!

https://treeherder.mozilla.org/jobs?repo=try&revision=57edbbfadf446c8798a1aee592fd41f0f61d5d23

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Assignee

Comment 53

•

3 years ago

Try looks ok.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=61be46f9d1b80ca34619b4e2cb7c1bd2b5e4356b

https://treeherder.mozilla.org/jobs?repo=try&revision=55bb34a1dad7198443433870d7ff1a4163e943fe

Byron Campen [:bwc]

Assignee

Comment 54

•

3 years ago

Attached file Bug 1416220: Get ice_unittest working. r?mjf — Details

Allow nr_resolver to be used for TCP when not running in e10s/socket process mode.
Init IPv6 STUN/TURN servers appropriately.
Fix bug that was preventing STUN server hostname from being configured.
Disable some tests that relied on STUN TCP that hasn't been available for a long time.
(This went unnoticed due to the previous problem)
A small logging improvement.

Depends on D101660

Pulsebot

Comment 55

•

3 years ago

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b32d3600aab9
Fix a typo (srvflx -> srflx) in this test. r=ng
https://hg.mozilla.org/integration/autoland/rev/86a4f3321529
Ensure that TCP is not blocked in this test, and allow loopback to be used because that's what happens in CI sometimes. r=ng
https://hg.mozilla.org/integration/autoland/rev/8bd218b64217
Make the log module for WebrtcTCPSocket be what one would expect it to be. r=mjf
https://hg.mozilla.org/integration/autoland/rev/dfae2287a9e9
Modify mtransport to delegate DNS lookups for TCP sockets to the socket class, instead of using nr_resolver. r=mjf
https://hg.mozilla.org/integration/autoland/rev/fc28a9d072fa
Prevent WebrtcTCPSocket from using an IP version that is inconsistent with the requested local address. r=mjf
https://hg.mozilla.org/integration/autoland/rev/66cd9bf00b3d
Add/improve some logging, and add an assertion. r=mjf
https://hg.mozilla.org/integration/autoland/rev/6ee3e9718f05
Get ice_unittest working. r=mjf

Sandor Molnar[:smolnar]

Comment 56

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b32d3600aab9
https://hg.mozilla.org/mozilla-central/rev/86a4f3321529
https://hg.mozilla.org/mozilla-central/rev/8bd218b64217
https://hg.mozilla.org/mozilla-central/rev/dfae2287a9e9
https://hg.mozilla.org/mozilla-central/rev/fc28a9d072fa
https://hg.mozilla.org/mozilla-central/rev/66cd9bf00b3d
https://hg.mozilla.org/mozilla-central/rev/6ee3e9718f05

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox87: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 87 Branch

Byron Campen [:bwc]

Assignee

Updated

•

3 years ago

Regressions: 1705563

You need to log in before you can comment on or make changes to this bug.