Open
Bug 1461606
Opened 6 years ago
Updated 1 year ago
Creating ServerSocket doesn't fail if port is bound by external application via multicast address
Categories
(Remote Protocol :: Marionette, defect, P3)
Tracking
(Not tracked)
NEW
People
(Reporter: whimboo, Unassigned)
References
Details
Today I discovered that when another application on the system has already bound itself via a multicast address to the port 2828 which Marionette uses by default, Firefox still creates a ServerSocket instance and doesn't fail with an `NS_ERROR_SOCKET_ADDRESS_IN_USE` error. The error is only visible if another instance of Firefox already bound to that port via a unicast address. Steps to reproduce: 1) Run `ncat -k -l 2828` 2) Run `mach marionette test -vv --gecko-log -` This bound ncat to: > % lsof -nP -iTCP:2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 13963 henrik 3u IPv6 0x66002f962daf1099 0t0 TCP *:2828 (LISTEN) > ncat 13963 henrik 4u IPv4 0x66002f961ea01e81 0t0 TCP *:2828 (LISTEN) In the gecko log output you will see: > 1526370899262 Marionette DEBUG New connections are accepted > 1526370899264 Marionette INFO Listening on port 2828 Using a unicast address for ncat instead (`ncat -k -l localhost 2828`) results in the expected behavior: > % lsof -nP -iTCP:2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 15006 henrik 3u IPv4 0x66002f961f3a97e1 0t0 TCP 127.0.0.1:2828 (LISTEN) The same also applies to the Marionette client which doesn't fail in `is_port_available()` because it uses the `socket.SO_REUSEADDR` flag. Removing that flag results in the expected IOError. Now starting the Marionette tests with the port 2828 in `LISTEN` status for a multicast address Marionette hangs for about 60s during each start of Firefox before new connections from the client are getting accepted: > 1526374256640 Marionette DEBUG New connections are accepted > 1526374256641 Marionette INFO Listening on port 2828 > [..] > 1526374316179 Marionette DEBUG Accepted connection 0 from 127.0.0.1:62943 > 1526374316206 Marionette DEBUG Closed connection 0 > 1526374316309 Marionette DEBUG Accepted connection 1 from 127.0.0.1:62944 > 1526374316318 Marionette DEBUG Closed connection 1 > 1526374316319 Marionette DEBUG Accepted connection 2 from 127.0.0.1:62945 > 1526374316336 Marionette TRACE 2 -> [0,1,"WebDriver:NewSession",{}] So I wonder if we should use the same logic as geckodriver to find a random free port, and get rid of the usage of `SO_REUSEADDR`. It would allow us to ensure that we really connect to an instance of Firefox under the control of Marionette and not to a probably inappropriately closed socket in `TIME_WAIT` state. A solution like stated on bug 1240830 would be better but still wouldn't be that easy to implement. A socket handling explanation which I found useful can be found here: https://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t For Windows that is important: A socket with SO_REUSEADDR can always bind to exactly the same source address and port as an already bound socket, even if the other socket did not have this option set when it was bound. This behavior is somewhat dangerous because it allows an application "to steal" the connected port of another application. Andreas, IMO this is an important thing to fix. So what do you think?
Flags: needinfo?(ato)
Reporter | ||
Updated•6 years ago
|
Priority: -- → P3
Comment 1•6 years ago
|
||
My understanding is, with the big caveat that my knowledge of networking is limited, that multicast in TCP is not a thing. The examples you gave above use TCP exclusively, but differentiate in that they use AF_INET and AF_INET6. Let me try to explain. Most modern operating systems, with the exception of OpenBSD, implement ‘dual-stacked’ IP architectures that support both IPv4 and IPv6 simultaneously. It is important to keep two things in mind: firstly that TCP ports and UDP ports are entirely separate, and that they are features of the transport layer protocols, not of the IP layer as such; and secondly, that multicasting is a one-to-many broadcasting system that practically does not make sense in a stream-oriented one-to-one protocol such as TCP. Consequently, talking about unicasting vs. multicasting here is misleading. Ports being a feature of the protocol means you it is possible to bind one TCP port as well as one UDP port to the same number. In dual-stacked IP architectures you can further multiply this by binding two of them to the IPv4 layer only and the other two to IPv6 only. This means you can have four distinct ports listening on two layers, for two different protocols. So when you ask (nmap’s) ncat to listen on port 2828, it relies on a set of defaults that are different to those of a standard socket. Your example showed that it bound to both IPv4 and IPv6 ports on TCP. I can confirm this on Debian: > % ncat -l 2828 & > [1] 15017 > % lsof -i :2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 15017 ato 3u IPv6 1302799 0t0 TCP *:2828 (LISTEN) > ncat 15017 ato 4u IPv4 1302800 0t0 TCP *:2828 (LISTEN) However, if you ask ncat to listen on a specific host, it will use the type of the address as the basis for the protocol it chooses. Listening on 127.0.0.1 will open a TCP port on IPv4: > % ncat -l 127.0.0.1 2828 & > [1] 15389 > % lsof -i :2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 15389 ato 3u IPv4 1305864 0t0 TCP localhost:2828 (LISTEN) In your example you bind to "localhost", but what "localhost" means is system-dependent due to host overrides in e.g. /etc/hosts. On my Debian system it happens to look up ::1, but it may not on other systems. A quick test I did shows that "ncat -l localhost 2828" binds to both IPv4 and v6 on Debian, but only to IPv6 on my macOS system, and apparently only IPv4 on your system. The same principle goes for the _type of address_ you ask it to listen for. If you explicitly listen on an IPv6 address it will apply the IPV6_V6ONLY socket option, similar to passing the -6 flag: > % ncat -l ::1 2828 & > [1] 15417 > % lsof -i :2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 15417 ato 3u IPv6 1312899 0t0 TCP localhost:2828 (LISTEN) Since "ncat -l 2828", out of no other conviction than its own, binds on both layers we will see the expected IOError when trying to run the Mn tests: > % ncat -l 2828 & > [1] 15463 > % lsof -i :2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 15463 ato 3u IPv6 1312971 0t0 TCP *:2828 (LISTEN) > ncat 15463 ato 4u IPv4 1312972 0t0 TCP *:2828 (LISTEN) > % ./mach marionette-test -z > […] > IOError: Port localhost:2828 is unavailable. > […] The same goes if we explicitly pick the IPv4 layer through -4 or choosing 127.0.0.1: > % ncat -l 127.0.0.1 2828 & > [1] 15499 > % lsof -i :2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 15499 ato 3u IPv4 1305957 0t0 TCP localhost:2828 (LISTEN) > % ./mach marionette-test -z > […] > IOError: Port localhost:2828 is unavailable. > […] On the other hand the tests will run just fine when ncat does not reserve the IPv4 port, because the Python client specifically checks "localhost:2828" in https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/marionette/client/marionette_driver/marionette.py#563: > % ncat -l ::1 2828 & > [1] 15537 > % lsof -i :2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > ncat 15537 ato 3u IPv6 1313864 0t0 TCP localhost:2828 (LISTEN) > % ./mach marionette-test -z > […] > 0:02.63 SUITE_START: marionette-test - running 84 tests > 0:02.63 TEST_START: testing/marionette/harness/marionette_harness/tests/unit/test_window_minimize.py > 0:02.63 TEST_END: SKIP > […] Since the client prevents us from reaching the server, I looked into how the server behaves, and nsIServerSocket’s default also appears to be to listen on IPv4-only: > % ./mach run -marionette --headless & > [1] 16263 > % 0:00.33 /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/dist/bin/firefox -marionette --headless -no-remote -profile /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/tmp/profile-default > *** You are running in headless mode. > 1526393456143 Marionette INFO Listening on port 2828 > % lsof -i :2828 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > firefox 16295 ato 80u IPv4 1308171 0t0 TCP localhost:2828 (LISTEN) Since this bug is starting to look more and more like https://bugzilla.mozilla.org/show_bug.cgi?id=1257719, the right thing to do here would probably be to make use of IN6ADDR_SETV4MAPPED, which is a technique for making IPv6 sockets also listen on IPv4 and map IPv4 connections to the IPv6 layer using the ::ffff:* namespace. An approach to this can be found in https://stackoverflow.com/a/16763601. (In reply to Henrik Skupin (:whimboo) from comment #0) > The same also applies to the Marionette client which doesn't fail > in `is_port_available()` because it uses the `socket.SO_REUSEADDR` > flag. Removing that flag results in the expected IOError. SO_REUSEADDR on the “is this port available” check probably does make some sense, since the intention is not to fail because there might have been previous connections open when the server process was killed. But using it on the _client connection_ to Marionette in transport.py I agree does indeed not make any sense. > So I wonder if we should use the same logic as geckodriver to find > a random free port, and get rid of the usage of `SO_REUSEADDR`. It > would allow us to ensure that we really connect to an instance > of Firefox under the control of Marionette and not to a probably > inappropriately closed socket in `TIME_WAIT` state. A solution > like stated on bug 1240830 would be better but still wouldn't be > that easy to implement. Yes, I do think using a randomly chosen port would do away with some of the problems you describe with regards to the TIME_WAIT period, or at least make them less prominent. This naturally does come with some minuscule chance of raciness due to the fact that the port is atomically allocated, then freed, then bound again, but seems overall like it would solve the particular problem you are describing. > Andreas, IMO this is an important thing to fix. So what do you > think? IMHO it is not so important to fix, but we do appear to have pinpointed a couple of orthogonal problems: (1) SO_REUSEADDR on client connections are probably not necessary. (2) Having the system atomically allocate a free port would increase reliability due to socket timeout durations. Since we cannot have true atomicity due to the client picking the port, using a geckodriver-esque approach for the time being makes sense as it has proven sufficiently stable. (3) nsIServerSocket doesn’t appear to support IPv6, and we don’t know if it will support IPv6-mapped-IPv4-addresses.
Flags: needinfo?(ato)
Updated•2 years ago
|
Severity: normal → S3
Updated•1 year ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•