Closed Bug 1472168 Opened 6 years ago Closed 6 years ago

CiDuty/BuildDuty VPN issues

Categories

(Infrastructure & Operations :: Corporate VPN: Support requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dlabici, Unassigned)

Details

Hello, CiDuty (previously known as BuildDuty) has a bunch of issues while working under the VPN, below I'll list our environment and the problems: Working Station(s): OS: Ubuntu16.{04-10} Connecting With: OpenVPN (using the certificate generated on sso.m.o) Problems: 1) After successfully connecting to the VPN, publicly available website (such as bugzilla, mozilla.com, irccloud, etc) become unreachable. Everything thats private (behind VPN) is accessible. 2) We get randomly disconnected after 30-60 minutes. This is quite the issue as we run scripts that require constant connection to the VPN and can take up to 5-8 hours. 3) We successfully connect to the VPN, but whenever we try to reach any private services, we timeout (well not exactly, but we can wait and wait and nothing happens). The problems listed above appeared about 2 weeks ago and it manifests on a couple of machines (main work stations) Anything that we can do so we have this fixed?
The briefest of answers is "these are linux-specific client issues and you probably need to upgrade to Ubuntu 18" but I'll try to detail what's going on. 1) This "some work/some don't" is usually is related to your NetworkManager configuration, since Linux does routing differently. Network icon in the top right, then the VPN name, Edit, IPv4 Settings, Routes, 'Use this connection only for resources on its network' - is this checked? It should be. If unchecked (the default), NetworkManager will try to route bugzilla.mozilla.org (a public site in AWS) through the VPN, instead of across the public Internet (which is actually where the site lives). And since the site isn't "there" (on the VPN), you never get to the site. If the checkbox is checked, we'll need more information and to consider this more. More down at item 3. 2) The disconnects are from a bug in openvpn < 2.4.4. Ubuntu 16 ships with 2.3.10. 2.4.4 ships with Ubuntu 18, or a later openvpn can be compiled if you're so inclined (it's not in official backport channels). https://community.openvpn.net/openvpn/ticket/904 I hate that we didn't detect this in VPN-server-upgrade testing, but once we found it, since it's a client bug with a way forward, we've discussed it and decided to keep the server where it is. ServiceDesk sent a note to people who we detected were hitting this issue. I don't see your account being knocked offline at the 1 hour mark (which is how we built the list), but I see from phonebook some of your teammates who ServiceDesk would have reached out to (rmutter, apop, riman, bcrisan). 3) This in unclear (in step 1 you say you can get to private items), so we'd need specific situations in order to look further ("this user doing this action works, doing that action didn't"). If so, please include your NM config from /etc/NetworkManager/system-connections/YOUR_CONNECTION_NAME.
1) The IPv4 is set to "Use this connection only for resources on its network" since the beginning. It used to work quite well. 2) You are right on this one. I have made the bug to cover the Team issues (all members) 3) Where is the contents of my etc/NetworkManager/system-connection/Mozilla [connection] id=Mozilla uuid=0921626b-0427-42e7-b305-6e46585eaa8a type=vpn permissions=user:danut.labici:; secondaries= [vpn] ta-dir=1 connection-type=password-tls password-flags=1 remote=openvpn.scl3.mozilla.com cipher=AES-256-CBC comp-lzo=yes cert-pass-flags=0 dev-type=tun username=dlabici cert=/home/danut.labici/.openvpn/openvpn.scl3.mozilla.com/cert.crt ca=/home/danut.labici/.openvpn/openvpn.scl3.mozilla.com/ca.crt key=/home/danut.labici/.openvpn/openvpn.scl3.mozilla.com/key.key ta=/home/danut.labici/.openvpn/openvpn.scl3.mozilla.com/ta.key dev=tun service-type=org.freedesktop.NetworkManager.openvpn [ipv4] dns-search= method=auto never-default=true [ipv6] addr-gen-mode=stable-privacy dns-search= ip6-privacy=0 method=auto
Met with Danut on Vidyo to discuss. We set the 'reneg-sec' to 0 on his connection, which should cause him to be booted by the server instead of by himself (which is probably the source of the timing issue that has him being disconnected by himself, rather than by the server). This doesn't heal anything about the 1 hour timeout, but does make it more likely that I will detect him. The root of the issue still comes back to the openvpn versioning. Workarounds were discussed but ultimately unworkable (screen session of long jobs won't work because of ssh-agent needs, backporting of openvpn to Ubuntu 16 was pretty nontrivial when I looked, but someone with better build skills might find it easy). The config looks mostly correct, and I've not been able to replicate the issue where "Use this connection only for resources on its network" doesn't solve issue 1, but I'll keep trying to find what in NetworkManager is causing this.
As of bug 1473746 the hourly disconnect issue has been mitigated, due to more widespread issues than just 'older linux clients' running into it. This is still open based on item 1, the "dns doesn't resolve internals", where we have reports of this but I've been unable to replicate personally, and/or find a root cause.
I'm going to close this out, because we haven't gotten reports of the DNS issues since this bug. I am not entirely thrilled that I don't have an answer here, but barring more evidence I know I'm not going to make progress. here. If this recurs, feel free to reopen or refer back, with a new set of information / demonstration of the breakage.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.