Closed Bug 992940 Opened 11 years ago Closed 10 years ago

403 errors from ship it dev/stage

Categories

(Infrastructure & Operations :: Corporate VPN: Support requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: jabba)

Details

I'm getting 403 errors from both https://ship-it-dev.allizom.org/ and https://ship-it.allizom.org/. I never get prompted for authentication, so that might be why. I also noticed that requests to ship-it-dev are going to backend servers with stage in their name. Eg: ➜ release-kickoff git:(master) curl -IL https://ship-it-dev.allizom.org HTTP/1.1 403 Forbidden Server: Apache X-Backend-Server: generic1.stage.webapp.phx1.mozilla.com Vary: Accept-Encoding Content-Type: text/html; charset=iso-8859-1 Strict-Transport-Security: max-age=15768000 ; includeSubDomains Date: Mon, 07 Apr 2014 14:46:34 GMT Transfer-Encoding: chunked Connection: Keep-Alive X-Cache-Info: caching Not a blocker, but it hurts our ability to sanity check things before pushing to prod.
o_O i am prompted for auth at both locations. $ curl -I https://ship-it-dev.allizom.org/ HTTP/1.1 401 Authorization Required Server: Apache X-Backend-Server: generic1.dev.webapp.phx1.mozilla.com WWW-Authenticate: Basic realm="ship-it LDAP Login" Content-Type: text/html; charset=iso-8859-1 Strict-Transport-Security: max-age=15768000 ; includeSubDomains Date: Mon, 07 Apr 2014 16:32:17 GMT Transfer-Encoding: chunked Connection: Keep-Alive $ curl -I https://ship-it.allizom.org/ HTTP/1.1 401 Authorization Required Server: Apache X-Backend-Server: generic1.stage.webapp.phx1.mozilla.com WWW-Authenticate: Basic realm="ship-it LDAP Login" Content-Type: text/html; charset=iso-8859-1 Strict-Transport-Security: max-age=15768000 ; includeSubDomains Date: Mon, 07 Apr 2014 16:32:12 GMT Transfer-Encoding: chunked Connection: Keep-Alive
I wonder if this started when we first switched to the Mozilla VPN...it's been quite awhile since I've tried to access one of these. I do get prompted if I try the same thing on the old jumphost, for what it's worth: [bhearsum@vpn1.dmz.releng.scl3 ~]$ curl -I https://ship-it-dev.allizom.org/ HTTP/1.1 401 Authorization Required Server: Apache X-Backend-Server: generic1.dev.webapp.phx1.mozilla.com WWW-Authenticate: Basic realm="ship-it LDAP Login" Content-Type: text/html; charset=iso-8859-1 Strict-Transport-Security: max-age=15768000 ; includeSubDomains Date: Mon, 07 Apr 2014 16:36:10 GMT Transfer-Encoding: chunked Connection: Keep-Alive
Taking down the priority and will try and find someone to work on this as it's paging on call
Severity: major → normal
(In reply to Ryan Watson [:w0ts0n] from comment #3) > Taking down the priority and will try and find someone to work on this as > it's paging on call Any luck finding someone to have a look?
I believe :fubar was looking at this.
Flags: needinfo?(klibby)
Negative, sorry.
Flags: needinfo?(klibby)
Ben, Can you tell me the locations (VPNs) where you are having trouble? It looks like it works from some locations and not others. I suspect that one of the numerous new VPN IP spaces has not made it into the relevant Apache allow sections. If yo can get me this info I will try to hunt down the IP block that is missing. Also, can you try to connect to dev again from the broken location? I might be able to catch something in the logs that way. Thanks
(In reply to Jason Crowe [:jd] from comment #7) > Ben, > > Can you tell me the locations (VPNs) where you are having trouble? It looks > like it works from some locations and not others. I suspect that one of the > numerous new VPN IP spaces has not made it into the relevant Apache allow > sections. If yo can get me this info I will try to hunt down the IP block > that is missing. Does "Mozilla VPN" help? > Also, can you try to connect to dev again from the broken location? I might > be able to catch something in the logs that way. Just did that now. My tunnel IP is 10.22.248.78, and the other end is 10.22.248.77 - not sure if that helps though.
Okay, that last update was incorrect. Here is what is going on. There are no externally published DNS records for these two domains as they are intended to be accesses only while on a VPN. The issue you noted in comment 1 is due to the fall through record for allizom.org: *.allizom.org A 63.245.217.83 So basically and allizom.org lookup that does not exist gets that IP. This accounts for you getting the stage web cluster as that is the IP for the generic stage VIP. Next, from the old jumphosts and from the Mozilla VPN (for myself and cturra) the connection works (we all get the correct DNS records and presumably routes). This means that wherever you are connecting from you are not getting the correct DNS and/or routes. My suspicion is that your user does not have the correct LDAP bits set for the Mozilla VPN. I will kick this bug over to the folks who know the dark magic that makes the new VPNs work and hopefully they can get things sorted. For the VPN peeps, There are three LDAP groups listed in the Apache vhost configs. One called releng that has access to all 3 environments, one called shipit that has access to stage and prod, one called shipitdev that has access to the dev site. Not sure if any of this is relevant. Good luck :)
Assignee: server-ops-webops → vpn-support
Component: WebOps: Product Delivery → Mozilla VPN: Support requests
QA Contact: nmaul → jdow
Can we please get this looked at soon? Without a dev or stage, it's much easier to bust production, which gets in the way of shipping releases.
Severity: normal → major
jabba has been paged
Severity: major → normal
I haven't unraveled this too far yet, but I did just add bhearsum to vpn_shipit vpn group. :bhearsum, can you disconnect/reconnect the mozilla vpn and give it a go, so I can get a data point?
Sorry - I didn't mean for anyone to get paged. By soon, I meant on the scale of days. I still get 403s from https://ship-it-dev.allizom.org/ and https://ship-it.allizom.org/. Dev didn't prompt me for credentials, but stage did.
Ok, so I started down the path to unravel this. It looks like this host is hosted on a shared cluster that is normally fronted by the publicly accessible Zeus cluster - however it is intended that this vhost only be reachable via VPN, so there is an explicit deny entry for the public zeus cluster's internal IPs - meaning that it will only allow connections coming in via the internal vpn connection - this is the case for prod and stage, but dev has that deny commented out. Therefore I'd guess that via the public IP that :jd mentions in comment 9, that you could access dev, but not stage or prod. So that part makes sense. The fact that you are not able to access stage or prod tells me that you are likely resolving the public IP for those - at least for stage, which you resolve via the wildcard DNS record. There is no prod public DNS record for ship-it.mozilla.org, looking through this bug's comments, it looks like prod is not in question at all. So, given all that info, I deduce that you are receiving a public view of DNS instead of a private view, when on the VPN. And potentially not across the board. It sounds like you might be using a flavor of Ubuntu's NetworkManager to create the VPN connection. If you tail your syslog when connecting, I believe you'll find some mention of the local dnsmasq instance "assigning" specific domains to specific nameservers - probably based on the pushed search domains, of which I don't believe we push mozilla.org - only mozilla.com and maybe some other sub domains. Which means that the networkManager/dnsmasq integration is *not* assigning mozilla.org/allizom.org to the VPN's nameservers, but rather your standard ones - provided by your router or ISP. So.. all that is to say that if you are using Ubuntu, that I can see a clear reason why this specific thing is failing in this specific way, and my recommended fix... add /etc/hosts entries for the following three hosts: ship-it-dev.allizom.org has address 10.8.81.226 ship-it.allizom.org has address 10.8.81.227 ship-it.mozilla.org has address 10.8.81.228 This particular vhost is somewhat unique in how it is set up with a private-only address on a public-only domain, and this is how it seems to manifest itself. OSX clients wouldn't have this issue since they would universally set the VPN's nameservers for all lookups, and Ubuntu is the only one I know of that does the "smart" dnsmasq resolving (maybe other distros do too - not sure). I believe if this were to be fixed properly, we would need to move the ship-it vhosts to actually live on 63.245.x.x IP space *and* set specific ACLs in zeus or a firewall to restrict to VPN hosts *and* push a specific route for these hosts through the VPN - and that would likely mean that the *allizom.org domains need to be on their own IPs, separate from any shared VIPs, if they are currently on shared vips. That all sounds like a bigger project, but I do recommend it at some point, as this current setup is definitely something I would consider technical debt that needs to be paid down. The quick/easy fix right now is for :bhearsum to set /etc/hosts entries on his client.
Jabba, thank you so much for digging into this, I truly appreciate it. I don't think I've ever had the pushed domains work correctly with NetworkManager. I looked just now, and I have build.mozilla.org, mozilla.org, and mozilla.com listed as search domains for my VPN connection. I added allizom.org to that list, reconnected, and now both of these environments are working correctly. I think we can mark this as fixed, given that? As for the long term, who knows! We've been talking a little bit about RelEng self hosting small apps like this in AWS, but we're still not sure what we want to do going forward. Things will all be changing when the datacentres go away anyways, I think?
Yeah, things are always changing, so I don't think we need to prioritize this, but we should learn from this that we shouldn't mix private/public namespace and IP space for this reason. I'm glad that adding the allizom.org domain to the search domains fixed it.
Assignee: vpn-support → jdow
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.