Closed
Bug 992940
Opened 11 years ago
Closed 10 years ago
403 errors from ship it dev/stage
Categories
(Infrastructure & Operations :: Corporate VPN: Support requests, task)
Infrastructure & Operations
Corporate VPN: Support requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: jabba)
Details
I'm getting 403 errors from both https://ship-it-dev.allizom.org/ and https://ship-it.allizom.org/. I never get prompted for authentication, so that might be why.
I also noticed that requests to ship-it-dev are going to backend servers with stage in their name. Eg:
➜ release-kickoff git:(master) curl -IL https://ship-it-dev.allizom.org
HTTP/1.1 403 Forbidden
Server: Apache
X-Backend-Server: generic1.stage.webapp.phx1.mozilla.com
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-8859-1
Strict-Transport-Security: max-age=15768000 ; includeSubDomains
Date: Mon, 07 Apr 2014 14:46:34 GMT
Transfer-Encoding: chunked
Connection: Keep-Alive
X-Cache-Info: caching
Not a blocker, but it hurts our ability to sanity check things before pushing to prod.
Comment 1•11 years ago
|
||
o_O i am prompted for auth at both locations.
$ curl -I https://ship-it-dev.allizom.org/
HTTP/1.1 401 Authorization Required
Server: Apache
X-Backend-Server: generic1.dev.webapp.phx1.mozilla.com
WWW-Authenticate: Basic realm="ship-it LDAP Login"
Content-Type: text/html; charset=iso-8859-1
Strict-Transport-Security: max-age=15768000 ; includeSubDomains
Date: Mon, 07 Apr 2014 16:32:17 GMT
Transfer-Encoding: chunked
Connection: Keep-Alive
$ curl -I https://ship-it.allizom.org/
HTTP/1.1 401 Authorization Required
Server: Apache
X-Backend-Server: generic1.stage.webapp.phx1.mozilla.com
WWW-Authenticate: Basic realm="ship-it LDAP Login"
Content-Type: text/html; charset=iso-8859-1
Strict-Transport-Security: max-age=15768000 ; includeSubDomains
Date: Mon, 07 Apr 2014 16:32:12 GMT
Transfer-Encoding: chunked
Connection: Keep-Alive
Reporter | ||
Comment 2•11 years ago
|
||
I wonder if this started when we first switched to the Mozilla VPN...it's been quite awhile since I've tried to access one of these. I do get prompted if I try the same thing on the old jumphost, for what it's worth:
[bhearsum@vpn1.dmz.releng.scl3 ~]$ curl -I https://ship-it-dev.allizom.org/
HTTP/1.1 401 Authorization Required
Server: Apache
X-Backend-Server: generic1.dev.webapp.phx1.mozilla.com
WWW-Authenticate: Basic realm="ship-it LDAP Login"
Content-Type: text/html; charset=iso-8859-1
Strict-Transport-Security: max-age=15768000 ; includeSubDomains
Date: Mon, 07 Apr 2014 16:36:10 GMT
Transfer-Encoding: chunked
Connection: Keep-Alive
Comment 3•11 years ago
|
||
Taking down the priority and will try and find someone to work on this as it's paging on call
Severity: major → normal
Reporter | ||
Comment 4•11 years ago
|
||
(In reply to Ryan Watson [:w0ts0n] from comment #3)
> Taking down the priority and will try and find someone to work on this as
> it's paging on call
Any luck finding someone to have a look?
Comment 7•11 years ago
|
||
Ben,
Can you tell me the locations (VPNs) where you are having trouble? It looks like it works from some locations and not others. I suspect that one of the numerous new VPN IP spaces has not made it into the relevant Apache allow sections. If yo can get me this info I will try to hunt down the IP block that is missing.
Also, can you try to connect to dev again from the broken location? I might be able to catch something in the logs that way.
Thanks
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to Jason Crowe [:jd] from comment #7)
> Ben,
>
> Can you tell me the locations (VPNs) where you are having trouble? It looks
> like it works from some locations and not others. I suspect that one of the
> numerous new VPN IP spaces has not made it into the relevant Apache allow
> sections. If yo can get me this info I will try to hunt down the IP block
> that is missing.
Does "Mozilla VPN" help?
> Also, can you try to connect to dev again from the broken location? I might
> be able to catch something in the logs that way.
Just did that now. My tunnel IP is 10.22.248.78, and the other end is 10.22.248.77 - not sure if that helps though.
Comment 9•11 years ago
|
||
Okay, that last update was incorrect. Here is what is going on. There are no externally published DNS records for these two domains as they are intended to be accesses only while on a VPN. The issue you noted in comment 1 is due to the fall through record for allizom.org:
*.allizom.org A 63.245.217.83
So basically and allizom.org lookup that does not exist gets that IP. This accounts for you getting the stage web cluster as that is the IP for the generic stage VIP.
Next, from the old jumphosts and from the Mozilla VPN (for myself and cturra) the connection works (we all get the correct DNS records and presumably routes). This means that wherever you are connecting from you are not getting the correct DNS and/or routes.
My suspicion is that your user does not have the correct LDAP bits set for the Mozilla VPN. I will kick this bug over to the folks who know the dark magic that makes the new VPNs work and hopefully they can get things sorted.
For the VPN peeps, There are three LDAP groups listed in the Apache vhost configs. One called releng that has access to all 3 environments, one called shipit that has access to stage and prod, one called shipitdev that has access to the dev site. Not sure if any of this is relevant. Good luck :)
Assignee: server-ops-webops → vpn-support
Component: WebOps: Product Delivery → Mozilla VPN: Support requests
QA Contact: nmaul → jdow
Reporter | ||
Comment 10•10 years ago
|
||
Can we please get this looked at soon? Without a dev or stage, it's much easier to bust production, which gets in the way of shipping releases.
Severity: normal → major
Assignee | ||
Comment 12•10 years ago
|
||
I haven't unraveled this too far yet, but I did just add bhearsum to vpn_shipit vpn group. :bhearsum, can you disconnect/reconnect the mozilla vpn and give it a go, so I can get a data point?
Reporter | ||
Comment 13•10 years ago
|
||
Sorry - I didn't mean for anyone to get paged. By soon, I meant on the scale of days.
I still get 403s from https://ship-it-dev.allizom.org/ and https://ship-it.allizom.org/. Dev didn't prompt me for credentials, but stage did.
Assignee | ||
Comment 14•10 years ago
|
||
Ok, so I started down the path to unravel this. It looks like this host is hosted on a shared cluster that is normally fronted by the publicly accessible Zeus cluster - however it is intended that this vhost only be reachable via VPN, so there is an explicit deny entry for the public zeus cluster's internal IPs - meaning that it will only allow connections coming in via the internal vpn connection - this is the case for prod and stage, but dev has that deny commented out. Therefore I'd guess that via the public IP that :jd mentions in comment 9, that you could access dev, but not stage or prod. So that part makes sense. The fact that you are not able to access stage or prod tells me that you are likely resolving the public IP for those - at least for stage, which you resolve via the wildcard DNS record. There is no prod public DNS record for ship-it.mozilla.org, looking through this bug's comments, it looks like prod is not in question at all.
So, given all that info, I deduce that you are receiving a public view of DNS instead of a private view, when on the VPN. And potentially not across the board. It sounds like you might be using a flavor of Ubuntu's NetworkManager to create the VPN connection. If you tail your syslog when connecting, I believe you'll find some mention of the local dnsmasq instance "assigning" specific domains to specific nameservers - probably based on the pushed search domains, of which I don't believe we push mozilla.org - only mozilla.com and maybe some other sub domains. Which means that the networkManager/dnsmasq integration is *not* assigning mozilla.org/allizom.org to the VPN's nameservers, but rather your standard ones - provided by your router or ISP.
So.. all that is to say that if you are using Ubuntu, that I can see a clear reason why this specific thing is failing in this specific way, and my recommended fix... add /etc/hosts entries for the following three hosts:
ship-it-dev.allizom.org has address 10.8.81.226
ship-it.allizom.org has address 10.8.81.227
ship-it.mozilla.org has address 10.8.81.228
This particular vhost is somewhat unique in how it is set up with a private-only address on a public-only domain, and this is how it seems to manifest itself. OSX clients wouldn't have this issue since they would universally set the VPN's nameservers for all lookups, and Ubuntu is the only one I know of that does the "smart" dnsmasq resolving (maybe other distros do too - not sure).
I believe if this were to be fixed properly, we would need to move the ship-it vhosts to actually live on 63.245.x.x IP space *and* set specific ACLs in zeus or a firewall to restrict to VPN hosts *and* push a specific route for these hosts through the VPN - and that would likely mean that the *allizom.org domains need to be on their own IPs, separate from any shared VIPs, if they are currently on shared vips. That all sounds like a bigger project, but I do recommend it at some point, as this current setup is definitely something I would consider technical debt that needs to be paid down. The quick/easy fix right now is for :bhearsum to set /etc/hosts entries on his client.
Reporter | ||
Comment 15•10 years ago
|
||
Jabba, thank you so much for digging into this, I truly appreciate it.
I don't think I've ever had the pushed domains work correctly with NetworkManager. I looked just now, and I have build.mozilla.org, mozilla.org, and mozilla.com listed as search domains for my VPN connection. I added allizom.org to that list, reconnected, and now both of these environments are working correctly. I think we can mark this as fixed, given that?
As for the long term, who knows! We've been talking a little bit about RelEng self hosting small apps like this in AWS, but we're still not sure what we want to do going forward. Things will all be changing when the datacentres go away anyways, I think?
Assignee | ||
Comment 16•10 years ago
|
||
Yeah, things are always changing, so I don't think we need to prioritize this, but we should learn from this that we shouldn't mix private/public namespace and IP space for this reason. I'm glad that adding the allizom.org domain to the search domains fixed it.
Assignee: vpn-support → jdow
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•