Closed Bug 604139 Opened 14 years ago Closed 14 years ago

Build VPN's DNS periodically unresponsive

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: ravi)

Details

Attachments

(3 files)

From home (not in MV), trying to reach buildbot-master1.build.scl1.mozilla.com

This seems very odd:

(buildbot-0.8.1)host-3-122:buildbotcustom asasaki$ nslookup buildbot-master1.build.mozilla.org
Server:         10.2.74.123
Address:        10.2.74.123#53

buildbot-master1.build.mozilla.org      canonical name = buildbot-master1.build.scl1.mozilla.com.
Name:   buildbot-master1.build.scl1.mozilla.com
Address: 10.12.48.111

(buildbot-0.8.1)host-3-122:buildbotcustom asasaki$ nslookup buildbot-master1.build.scl1.mozilla.com
Server:         10.2.74.123
Address:        10.2.74.123#53

Name:   buildbot-master1.build.scl1.mozilla.com
Address: 10.12.48.111

(buildbot-0.8.1)host-3-122:buildbotcustom asasaki$ telnet !$ 8010
telnet buildbot-master1.build.scl1.mozilla.com 8010
buildbot-master1.build.scl1.mozilla.com: nodename nor servname provided, or not known

(buildbot-0.8.1)host-3-122:buildbotcustom asasaki$ ssh cltbld@buildbot-master1.build.scl1.mozilla.com
ssh: Could not resolve hostname buildbot-master1.build.scl1.mozilla.com: nodename nor servname provided, or not known

From firefox:
http://buildbot-master1.build.scl1.mozilla.com:8010/builders/Rev3 MacOSX Snow Leopard 10.6.2 places debug test mochitest-other

Firefox can't find the server at buildbot-master1.build.scl1.mozilla.com.
Assignee: network-operations → ravi
(buildbot-0.8.1)host-3-122:buildbotcustom asasaki$ ssh cltbld@10.12.48.111
cltbld@10.12.48.111's password: 
Last login: Tue Oct 12 16:16:12 2010 from bm-vpn01.build.mozilla.org
[cltbld@buildbot-master1 ~]$ 

routing isn't broken.
(buildbot-0.8.1)host-3-122:buildbotcustom asasaki$ ssh cltbld@buildbot-master1.build.mozilla.org
cltbld@buildbot-master1.build.mozilla.org's password:

so build.m.o works for buildbot-master1 but not build.scl1.m.c.
And build.mozilla.org is Build's namespace everything appears to be how it should be, no?  I don't want to pollute mozilla.com with build hosts.
what?

i would like to be able to reach my build hosts from CNAME or A record, via build-vpn.

The build hosts are already in *.build.scl1.mozilla.com, but are not reachable from build-vpn.

There is no pollution afaict.
I thought the goal was to have build.mozilla.com work.  Either way I am unable to reproduce your problem.

I may have to check when not also on-net (read: home).

host-5-240:~ ravi$ grep server /etc/resolv.conf 
nameserver 10.2.74.123

host-5-240:~ ravi$ nc -vz buildbot-master1.build.scl1.mozilla.com 22
Connection to buildbot-master1.build.scl1.mozilla.com 22 port [tcp/ssh] succeeded!


host-5-240:~ ravi$ netstat -nr | grep '10.12'
10.12.48/22        10.2.171.21        UGSc            1        0    tun0
Status: NEW → ASSIGNED
(In reply to comment #5)
> I thought the goal was to have build.mozilla.com work.

I have the sneaking suspicion joduinn was confused re: *.build.mozilla.com, as it's the first I've ever heard of it.

> I may have to check when not also on-net (read: home).
> 
> host-5-240:~ ravi$ grep server /etc/resolv.conf 
> nameserver 10.2.74.123

akira-sasakis-macbook:Desktop asasaki$ grep server /etc/resolv.conf 
# nameserver 10.0.1.1
# nameserver 10.2.74.123
nameserver 10.2.74.123
 
> host-5-240:~ ravi$ nc -vz buildbot-master1.build.scl1.mozilla.com 22
> Connection to buildbot-master1.build.scl1.mozilla.com 22 port [tcp/ssh]
> succeeded!

akira-sasakis-macbook:Desktop asasaki$ nc -vz buildbot-master1.build.scl1.mozilla.com 22
nc: getaddrinfo: nodename nor servname provided, or not known

> host-5-240:~ ravi$ netstat -nr | grep '10.12'
> 10.12.48/22        10.2.171.21        UGSc            1        0    tun0

akira-sasakis-macbook:Desktop asasaki$ netstat -nr | grep '10.12'
10.12.48/22        10.2.171.45        UGSc            0       15    tun0
It would be helpful if someone else WFH could confirm this as well... may speed things a little.
I'm WFH and able to connect:

lukas-blakks-macbook:~ lsb$ grep server /etc/resolv.conf
nameserver 10.2.74.123
lukas-blakks-macbook:~ lsb$ nc -vz buildbot-master1.build.scl1.mozilla.com 22
Connection to buildbot-master1.build.scl1.mozilla.com 22 port [tcp/ssh] succeeded!
lukas-blakks-macbook:~ lsb$ netstat -nr | grep '10.12'
10.12.48/22        10.2.171.29        UGSc            1        0    tun0
Yup, looks like it works for nthomas as well. Updating summary.
Summary: *.build.scl1.m.c hosts unreachable from build-vpn → *.build.scl1.m.c hosts unreachable from build-vpn for Aki at home
Yeah, it does. I have seen some issues where it doesn't, which may be related to long running vpn sessions (viscosity is reconnecting in the background?) and/or also running other VPNs. Nothing I can give a list of steps to reproduce for, but I'll keep an eye out.
(In reply to comment #6)
> (In reply to comment #5)
> > I thought the goal was to have build.mozilla.com work.
> 
> I have the sneaking suspicion joduinn was confused re: *.build.mozilla.com, as
> it's the first I've ever heard of it.

Yes, I misunderstood. There is no build.mozilla.com. There is build.mozilla.org which is an alias/ANAME/CNAME to build.{location}.mozilla.com.

From irc with ravi, this is already in place.
Started working for me, stopped working for nthomas.

I'm wondering if there's some limit of connections we're hitting.
Summary: *.build.scl1.m.c hosts unreachable from build-vpn for Aki at home → *.build.scl1.m.c hosts unreachable from build-vpn for nthomas at home
(In reply to comment #12)
> Started working for me, stopped working for nthomas.
> 
> I'm wondering if there's some limit of connections we're hitting.

FF stopped working for me - could not get dns resolution for bm-foopy-build.mozilla.org

after restarting viscosity's build-vpn link, now can see via FF but not command line ssh

my /etc/resolve.conf is now missing all mozilla name servers which previously was being added by viscocity
Summary: *.build.scl1.m.c hosts unreachable from build-vpn for nthomas at home → build-vpn flaky for out-of-MV users
I'm hitting this too:
foo-ix-blah:~ bhearsum$ grep server /etc/resolv.conf 
# nameserver 64.71.255.198
nameserver 10.2.74.123
foo-ix-blah:~ bhearsum$ nc -vz buildbot-master1.build.scl1.mozilla.com 22
nc: connect to buildbot-master1.build.scl1.mozilla.com port 22 (tcp) failed: Connection refused
nc: connect to buildbot-master1.build.scl1.mozilla.com port 22 (tcp) failed: Connection refused
foo-ix-blah:~ bhearsum$ netstat -nr | grep '10.12'
10.12.48/22        10.2.171.29        UGSc            1        0    tun0
foo-ix-blah:~ bhearsum$ host buildbot-master1.build.scl1.mozilla.com
buildbot-master1.build.scl1.mozilla.com has address 10.12.48.111
foo-ix-blah:~ bhearsum$ nc -vz 10.12.48.111 22
Connection to 10.12.48.111 22 port [tcp/ssh] succeeded!

DNS configuration

resolver #1
  domain : wittydomain.in
  search domain[0] : mozilla.org
  search domain[1] : build.mozilla.org
  search domain[2] : wittydomain.in
  nameserver[0] : 64.71.255.198
  order   : 200000

resolver #2
  domain : mozilla.org
  nameserver[0] : 10.2.74.123
  order   : 100600

resolver #3
  domain : build.mozilla.org
  nameserver[0] : 10.2.74.123
  order   : 100601

resolver #4
  domain : local
  options : mdns
  timeout : 2
  order   : 300000

resolver #5
  domain : 254.169.in-addr.arpa
  options : mdns
  timeout : 2
  order   : 300200

resolver #6
  domain : 8.e.f.ip6.arpa
  options : mdns
  timeout : 2
  order   : 300400

resolver #7
  domain : 9.e.f.ip6.arpa
  options : mdns
  timeout : 2
  order   : 300600

resolver #8
  domain : a.e.f.ip6.arpa
  options : mdns
  timeout : 2
  order   : 300800

resolver #9
  domain : b.e.f.ip6.arpa
  options : mdns
  timeout : 2
  order   : 301000


It looks like I don't have a resolver for anything but mozilla.org and build.mozilla.org.
foo-ix-blah:~ bhearsum$ nc -vz buildbot-master1.build.mozilla.org 22
Connection to buildbot-master1.build.mozilla.org 22 port [tcp/ssh] succeeded!
In Comment 14 I'm unclear what produced the output

DNS configuration

resolver #1
[...]

but the resolver that is assigned by the VPN is recursive.  mozilla.org and build.mozilla.org are the only domains assigned for your search path.

Either way I have a few things I can look at that may be being tickled here.
Not sure how familiar you are with Mac OS X, but it uses a more complicated system for determining which DNS server to use for which domains, and scutil --dns gives a list of resolvers to use for which domains.
I'm familiar that derivative of BSD specifically FreeBSD which I have been using since 1995.  OSX specific variances like the DNS behavior you described is new to me.  It is also my primary working OS.

I've been VPN'd (admittedly from the office, but that shouldn't matter), and have a while loop that nc to the host on tcp/22 and tcp/8010 every 30 seconds and it hasn't failed yet.

The next time it fails for anyone can you please include a timestamp (noting offset if not PST).

Has anyone examined the host in question to validate there isn't anything going with it?
(In reply to comment #18)
> Has anyone examined the host in question to validate there isn't anything going
> with it?

(Sorry for the slow reply here)

Definitely haven't see any problems with buildbot-master1 or buildbot-master2 -- any issues would be very visible through burning builds.
have not seen a repeat of the issue I was having earlier - but I am also running with a hack/workaround for smtp access to avoid the issue.  otherwise build-vpn has been quietly stable today
What is the overall status here?  Things still unstable or have they worked themselves out?
wfm currently.
akira-sasakis-macbook:wrk asasaki$ nc -vz mv-moz2-linux-ix-slave01.build.mozilla.org 22
nc: getaddrinfo: nodename nor servname provided, or not known

and now not.
(this is a box i'm currently ssh'ed into in a 2nd terminal window, and just sshed there via ip)
Summary: build-vpn flaky for out-of-MV users → Build VPN's DNS periodically unresponsive
I reconfigured the build VPN to push different DNS.  You will need to disconnect and reconenct to the Build VPN to get these changes.

Hopefully this will shake lose any of the issues people have been having.
(In reply to comment #25)
> I reconfigured the build VPN to push different DNS.  You will need to
> disconnect and reconenct to the Build VPN to get these changes.
> 
> Hopefully this will shake lose any of the issues people have been having.

I'm no longer able to send email (cannot find smtp.m.o) while connected to Build-VPN. This used to work for me before yesterday.
works for me (just sent a test msg to john).  I'm using the "Alternate DNS" option in viscosity.  scutil --dns gives:

resolver #1
  domain : r.igoro.us
  search domain[0] : mozilla.org
  search domain[1] : build.mozilla.org
  search domain[2] : r.igoro.us
  search domain[3] : igoro.us
  nameserver[0] : 172.16.1.8
  order   : 200000

resolver #2
  domain : mozilla.org
  nameserver[0] : 10.12.75.10
  nameserver[1] : 10.12.75.12
  order   : 100600

resolver #3
  domain : build.mozilla.org
  nameserver[0] : 10.12.75.10
  nameserver[1] : 10.12.75.12
  order   : 100601

(and the usual 169.254, IPv6, local,etc.)

As I understand it, the first is used as the default nameserver (that's mine - 127.16 is my home network).  The remainder are supplemental and are used for matching domain names.
(In reply to comment #26)
> (In reply to comment #25)
> > I reconfigured the build VPN to push different DNS.  You will need to
> > disconnect and reconenct to the Build VPN to get these changes.
> > 
> > Hopefully this will shake lose any of the issues people have been having.
> 
> I'm no longer able to send email (cannot find smtp.m.o) while connected to
> Build-VPN. This used to work for me before yesterday.

Also, fyi, connecting to smtp.m.o fails consistently for me both outside of Mozilla, and on the Mozilla 650castro wifi.
You're saying when you telnet to smtp.mozilla.org it fails everywhere?  Internall,y VPN, or externally?  Sounds to me like a isolated problem with your laptop otherwise this would be a significantly more wide spread problem.
Previously I was able to stay connected to build-vpn for long periods of time and have it mostly work (DNS would be flaky).
Currently (past couple days) I'm finding my idle ssh connections freeze, and I need to kill them and re-ssh.
(I'm currently in MV, sshing to MV and MPT while connected to build-vpn)
All this weekend I've been forced to switch between dns names and ips to deal with build.mozilla.org hosts while running staging release runs.
So far most of the data points in this bug are less than helpful.

Can we isolate a particular OS that is having these issues?  OSX, Linux, Windows?
What method are you using to VPN?
Are you using a stock configuration or one you modified yourself?
How about routes?
Resolv.conf?
(In reply to comment #33)
> So far most of the data points in this bug are less than helpful.
> 
> Can we isolate a particular OS that is having these issues?  OSX, Linux,
> Windows?

I'm on OSX; that may be true for everyone having issues.

> What method are you using to VPN?

Viscosity 1.2.1, connected to Build-VPN only.

> Are you using a stock configuration or one you modified yourself?

Afaik, there is no stock configuration for Build-VPN. We all modified the MPT vpn file to work against Build-VPN.

> How about routes?

I have no custom routes afaik.

> Resolv.conf?

If you're asking if I manually change resolv.conf, I haven't.
Attachment #485909 - Attachment description: build.sjc1 Viscosity configugration → build.sjc1 Viscosity configuration
I attached a new Viscosity configuration which you should extract and install.  You will then need to set in Viscosity global preferences:

  Preferences -> Advanced -> Network Settings -> [x] Use alternate DNS support

There is also a behavior of OSX[1] which may be negatively impacting things.  You may have success without doing the below 3 commands, but if you want to go down the path the work-around is to do:

  sudo /usr/libexec/PlistBuddy -c "Add :StrictUnicastOrdering bool true" /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist
  sudo launchctl unload /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist
  sudo launchctl load /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist

Additional discussion about this feature/limitation can be found at superuser.com[2].

I tested with success:

  svn.mozilla.org (63.245.208.18[35])
  mail.mozilla.com (63.245.208.167)
  mail.mozilla.org (63.245.208.162)
  munin.ops.sjc1.mozilla.com (10.2.10.168)
  bounceradmin.private.sjc1.mozilla.com (10.2.75.6)

and now more recently with

  graphs.mozilla.org (63.245.208.194).

Additionally access is open to both the corp and pub networks as referenced in an email to release@

[1] http://support.apple.com/kb/HT4030
[2] http://superuser.com/questions/84144/how-is-dns-used-by-individual-processes
Thanks for the VPN config. To be clear, it covers all of the build network (sjc, scl, and castro), not just sjc?
Affirmative.
Comment on attachment 485909 [details]
build.sjc1 Viscosity configuration

It seems that I can't resolve anything on the build vpn anymore and the updated config is viscosity specific.  Do you have one that would work with the command line client?
No I don't.  I run Viscosity and as far as I know that is what Ops/IT supports.  Upon connect /etc/resolv.conf gets modified such that the only active config is:

  nameserver 10.12.75.10
  nameserver 10.12.75.12
  search build.scl1.mozilla.com build.mozilla.org sjc1.mozilla.com mozilla.org  mozilla.com mozilla.org build.mozilla.org

I also did the plist modifications as noted in Comment #37.

I'm currently connected to the VPN with Viscosity and am able to reach everything I noted in Comment #37.

Can you provide any details about your session to help with troubleshooting?

  /etc/resolv.conf
  netstat -nr
(In reply to comment #41)
> No I don't.  I run Viscosity and as far as I know that is what Ops/IT supports.

What about for linux and windows?
(In reply to comment #42)
> (In reply to comment #41)
> > No I don't.  I run Viscosity and as far as I know that is what Ops/IT supports.
> 
> What about for linux and windows?

Viscosity is basically a GUI for OpenVPN, so it's configs can be used in Linux as-is

Windows based VPN's normally have the same config items that are listed in the OpenVPN config file, you may just need to manually enter them.
(In reply to comment #43)
> (In reply to comment #42)
> > (In reply to comment #41)
> > > No I don't.  I run Viscosity and as far as I know that is what Ops/IT supports.
> > 
> > What about for linux and windows?
> 
> Viscosity is basically a GUI for OpenVPN, so it's configs can be used in Linux
> as-is

that's true for tunnnelblick, but viscosity does mangle the config files.  It is trivial to import a .ovpn file into viscosity, but I don't know about the other way. 

> Windows based VPN's normally have the same config items that are listed in the
> OpenVPN config file, you may just need to manually enter them.

the windows gui has a directory that you drop keys, certs and ovpn files into and the client understands those files and makes them available.

I tried with this .conf file on the windows client and it didn't work at all.
I attached a tgz that contains all the certificates, configs, and scripts less the package resolvconf[1] that I used to get my Ubuntu install to work with OpenVPN.

Fetch, extract, move build.ovpn/update-resolv-conf to /etc/openvpn/ if it is not there already and install resolvconf.

Invoke OpenVPN from the build.ovpn dir with

  openvpn --config vpn1.build.pub.sjc1.mozilla.com.conf

Once I get a Windows instance going I will submit a similar config for it.

I do not think it is reasonable to expect Linux distributions that can't get resolvconf installed to be supported.

[1] http://en.wikipedia.org/wiki/Resolvconf
I tried extracting that zip into my configs directory and it didn't pick up the configuration.  for MPT and MV, the configuration tarballs [1][2] include the .ovpn as the actual configuration file.

I was able to get this working by extracting the files from by doing:

tar zxf build.ovpn.tgz
cd build.ovpn
mv vpn1.build.pub.sjc1.mozilla.com.conf build.ovpn
mv * /c/Program\ Files\ \(x86\)/OpenVPN/config/

I commented out the up and down lines in order for this to work on windows where there is no resolveconf (or need for it).

Once the modified .ovpn file was in my configs directory, I was able to connect to VPN.  Thanks for the resolvconf tip, that'll be great on my linux machine :)


[1] https://intranet.mozilla.org/JumpHost#Common_Files
[2] https://intranet.mozilla.org/IT_MPT-RemoteAccess#Common_Files
(In reply to comment #47)
> I tried extracting that zip into my configs directory and it didn't pick up the
> configuration.  for MPT and MV, the configuration tarballs [1][2] include the
> .ovpn as the actual configuration file.

sorry, i forgot to specify that this is for windows.
The custom DNS for build *VPN* has been in place for about 3 days now and there have been no reported issues.  Additionally I've had a test laptop at the office connected for 28 hours with no issues.

With the custom configurations I've provided it appears MacOS, Linux (Ubuntu), and Windows are able to connect without issues.

I'm calling this done.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
nthomas and bear and I can confirm that attachment 485909 [details] (the *first* viscosity config on this bug) seems to work fine - it includes config for mozilla.com as well as mozilla.org.  I'm not sure how the second attached config is different, but if it ain't broke, don't fix it.
I'm a dork - it sez right there, "OpenVPN connection".  So this is all good.
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: