Closed
Bug 604688
Opened 14 years ago
Closed 11 years ago
(tracker) move services on dm-wwwbuild01 to *.{pub,pvt}.build.mozilla.org virtualhosts
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: dustin)
References
()
Details
(Whiteboard: [vm][networking])
The fact that build.mozilla.org is both a machine hostname and our primary subdomain is a constant source of communication confusion.
It's currently named dm-wwwbuild01? internally.
To change this, I think we need to change tbpl, Try, cruncher, and a number of developer-facing URLs (pending.html etc.).
This won't be low-touch and will require coordination with other teams, but will hopefully remove a self-inflicted cause of confusion.
Reporter | ||
Comment 1•14 years ago
|
||
We could also accomplish this by setting up a second box, making it duplicate in functionality (or better!), pointing everything to that box, and then shutting off and repurposing the current build.mozilla.org box.
Reporter | ||
Comment 2•14 years ago
|
||
I really don't know what status whiteboard entry to use here.
Comment 3•14 years ago
|
||
(In reply to comment #1)
> We could also accomplish this by setting up a second box, making it duplicate
> in functionality (or better!), pointing everything to that box, and then
> shutting off and repurposing the current build.mozilla.org box.
could we cname build.mozilla.org to something instead of starting over?
Reporter | ||
Comment 4•14 years ago
|
||
The whole point is to get rid of the name build.mozilla.org in DNS entirely, so we know that whenever the words "build dot mozilla dot org" pass our lips, we're referring to the subdomain.
Comment 5•14 years ago
|
||
we could have it live in the interim and switch tools to the new cname
Updated•14 years ago
|
Whiteboard: [vm][networking]
Assignee | ||
Comment 6•14 years ago
|
||
So there's been some discussion of names in IRC, but one thing we didn't cover is the DNS questions. Copying zandr to chase these down:
1. Can the DNS handle a public hostname in the *.build.mozilla.org subdomain, e.g., api.build.mozilla.org or clobberer.build.mozilla.org?
2. How does IT feel, in general, about using CNAME and vhosts to dissociate services from particular machines?
In the interests of someone taking a stand, and assuming the answers to the above are roughly "no" and "good", I'm proposing:
releng-ws.mozilla.org - menu, contact info, links
clobberer.releng-ws.mozilla.org - clobberer
status.releng-ws.mozilla.org - status dashboard
api.releng-ws.mozilla.org - buildapi/self-serve
where all of these are CNAMEs for the same machine, and that machine is publicly available?
The advantage to doing this rename at the same time as the beefing-up (bug 617129) is that we can do this slowly, even hosting both the new and old names on the existing VM for a while. And we get to keep the VM and hostname around until bug 618109 (regarding internal services) is also finished.
There are infrasec concerns here, too. In general, I think we could use some guidance from zandr when mental cycles are available.
Assignee | ||
Comment 7•14 years ago
|
||
Amy, can you answer some of the questions in the previous comment?
Assignee | ||
Comment 8•14 years ago
|
||
ravi, zandr (oops, forgot to copy you earlier)
1. Can the DNS handle a public hostname in the *.build.mozilla.org subdomain,
e.g., api.build.mozilla.org or clobberer.build.mozilla.org?
Comment 9•14 years ago
|
||
Yes, we can accommodate a public and a private IP existing in build.mozilla.org. What I would very much like to avoid, however, is having split horizon where $HOSTNAME.build.mozilla.org has both a public and a private IP. I'm not sure if that is possible with the RelEng infrastructure and will defer to your teams to try to avoid creating this condition.
Also re: Comment 3 for those interested in knowing:
build.mozilla.org is a SOA which is mutually exclusive from being a CNAME.
Assignee | ||
Comment 10•14 years ago
|
||
OK, this is now creating blocking bugs, so we should get moving. Based on what ravi suggests, I'd like to create CNAMEs
pub.build.mozilla.org (menu)
clobberer.pub.build.mozilla.org
buildapi.pub.build.mozilla.org
trychooser.pub.build.mozilla.org
all pointing to the external IP for dm-wwwbuild01. These should resolve both inside and outside the build VPN, and as ravi asked in comment 9 should resolve to the same address. I hope I've understood correctly.
I'm happy to add other names to this list, as we discover more apps that we're hosting on this system.
Aki, does this sound good? Ravi?
Severity: normal → major
Priority: P3 → P2
Assignee | ||
Comment 11•14 years ago
|
||
Amy points out that db-wwwbuild01 will resolve in mtv1 to an address that non-build people cannot access, so simply making CNAMES to dm-wwwbuild01 won't work. So a more correct and detailed suggestion:
dm-wwwbuild01-ext.mozilla.org IN A 63.245.208.186
which is the same as dm-wwwbuild01's external address
clobberer.pub.build.mozilla.org IN CNAME dm-wwwbuild01-ext.mozilla.org.
set to resolve identically both inside and outside the build network. Same for all of the other aliases (buildapi, trychooser, etc.)
We can come up with another solution for internal clobberer requests (from slaves).
Once we're agreed on this, I'll break it down into bugs for each service, as they will each have their hairy bits.
Comment 12•14 years ago
|
||
As I mentioned to Dustin, I think what should probably happen is that we create an A record that points to 63.245.208.186 that does not have an internal 10.x.x.x record (so there are no issues with separate dns views). We can then CNAME each of the services to that external mozilla.org A record (so we only have to change one record should it need to change in the future).
Some things that might be impacted are SSL certificates or things that do forward and reverse dns matching. Do we use anything sensitive to that on bmo?
Assignee | ||
Comment 13•14 years ago
|
||
I don't believe so, but we can work that out in the per-service bugs. Using a pseudo-subdomain like pub.build.mozilla.org will let us use wildcard SSL certs if necessary.
Comment 14•14 years ago
|
||
But build.mozilla.org is already a sub domain...
Assignee | ||
Comment 15•14 years ago
|
||
Ravi, Amy, and I should talk briefly about this via IRC tomorrow. Ravi, do you want to grab us when you have a chance?
Reporter | ||
Comment 16•14 years ago
|
||
(In reply to comment #10)
> Aki, does this sound good?
It a) stops using a CNAME that conflicts with our SOA/subdomain, and b) has a CNAME per service so we can move them easily. And I don't object to any other specifics, so thumbs up from me.
I do think there's a build.m.o SSL cert on dm-wwwbuild01, but I don't know specifics.
Reporter | ||
Comment 17•14 years ago
|
||
[18:19] <aki> catlee-away: will an external address for clobberer break buildslaves' access ?
[18:19] <aki> or will they use the internal ip
[18:19] <catlee-away> that depends
[18:19] <catlee-away> all they need is to be able to hit it without ldap auth
[18:19] <catlee-away> but if we lock down the build network, then they'll need an internal ip
[18:20] <catlee-away> or an exception in the firewall
[18:21] <aki> hm, should we put that in the bug?
[18:21] <bear-afk> so many things may be hitting *.build.mozilla.org from inside that a single ip exception would cover it
[18:21] <catlee-away> I don't care if we hit an external ip or not
[18:21] <catlee-away> as long as it works
[18:21] <bear-afk> still something that should be spelled out in the bug
[18:22] <catlee-away> multi-homed hosts always seem to cause problems
[18:22] <bear-afk> this isn't necessarily multihomed
I *think* clobberer is the only one of the above that the slaves use, but I might be mistaken.
Comment 18•14 years ago
|
||
We also pull a bunch of stuff from build.m.o for talos. Some of these files are private.
Assignee | ||
Comment 19•14 years ago
|
||
Right - I'll dissect the services on sub-bugs. All of them will need some care. Internal services can hit things via a different hostname, so that's not a problem.
I'm not sure what ravi meant in comment 14 - build.mozilla.org is a subdomain, but as I understand it we can still use an SSL cert with *.pub.build.mozilla.org that will cover any SSL'd services we put there. Ravi, does that about cover it? Any other objections or tweaks to the plan?
If not, I'll file some dependent bugs for the services I know of and get this ball rolling.
Comment 20•14 years ago
|
||
I believe Ravi meant that you can't use (for example) a *.mozilla.org wildcard SSL certificate for something under *.build.mozilla.org or *.pub.build.mozilla.org, as RFC 2818 dictates that wildcard certificates only go to the first sub-component ("E.g., *.a.com matches foo.a.com but not bar.foo.a.com. f*.com matches foo.com but not bar.com." --RFC2818).
Assignee | ||
Updated•14 years ago
|
Assignee | ||
Comment 21•14 years ago
|
||
I'll take this for a bit to sort out the IT vs. releng parts of it, and make sure the IT side gets done.
Assignee: nobody → dustin
Assignee | ||
Comment 22•13 years ago
|
||
So I need some downtime in which I can puppetize dm-wwwbuild01 and make sure that it continues serving all of the fun stuff it serves now. Once that's done, it's relatively straightforward to move one service at a time to a vhost, using puppet.
This can easily ride along with any other downtimes, and does not deserve its own.
Flags: needs-treeclosure?
Assignee | ||
Comment 23•13 years ago
|
||
I know we've had workweek and whatnot, but I'd like to get this moving again - can we schedule a downtime for this next week?
Comment 24•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #23)
> I know we've had workweek and whatnot, but I'd like to get this moving again
> - can we schedule a downtime for this next week?
Yeah, I'm looking to. How is 9am EDT on Wednesday for you?
Assignee | ||
Comment 25•13 years ago
|
||
Sounds good to me.
Assignee | ||
Comment 26•13 years ago
|
||
Per comment 22, this downtime is for puppetizing the host - bug 674665, which is infra-only because it involves infra puppet configs. This is using existing infra classes, so it makes some non-obvious changes to the Apache configuration. I *think* that the resulting system will work the same way the existing system does, but I can't be sure. Hence the downtime.
Once this is in place, we'll need to watch for build failures related to changes in services provided by this system. I know about clobberer and the talos downloads, but there may be subtleties in how those are implemented, or other unknown services.
As for rollback, I'll back up /etc/httpd before puppetizing, and that will provide a potential rollback strategy.
Summary: rename build.mozilla.org (the machine, not the subdomain) → (tracker) move services on dm-wwwbuild01 to *.{pub,pvt}.build.mozilla.org virtualhosts
Updated•13 years ago
|
Flags: needs-treeclosure? → needs-treeclosure+
Assignee | ||
Comment 27•13 years ago
|
||
This puppet change was landed successfully.
Flags: needs-treeclosure+
Assignee | ||
Comment 28•13 years ago
|
||
So everything from here on out can be done without significant risk, as follows:
1. Set up a new vhost, copying the config out of that for build.mozilla.org
2. Test out that vhost by using staging slaves or pointing a select few devs at it or whatever
3. Point everything to that vhost
4. Verify and wait until nothing's looking at build.mozilla.org anymore
5. Remove the copied config from build.mozilla.org
I'll take one of the dependent bugs, as a model, and then hopefully releng can handle scheduling and testing the rest -- I'll do the Puppet/Apache changes.
Assignee | ||
Updated•13 years ago
|
Assignee: dustin → nobody
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → dustin
Assignee | ||
Comment 30•13 years ago
|
||
This work will continue in parallel with work in bug 774354 to move these new services to a web cluster.
Assignee | ||
Comment 31•12 years ago
|
||
http{,s}://build.mozilla.org is still hosted on relengweb1.dmz.scl3, and will not be supported on the new releng cluster, in hopes we can finally close this almost-two-year-old bug :)
My rough plan is as follows:
- serve everything from virtualhosts on the new releng cluster (bug 774354)
- make sure bugs are on file to start using those virtualhosts (blocking this bug)
- add 301 redirects to build.mozilla.org paths where/when it won't cause failures
- keep build.mozilla.org hosted on relengweb1 as-is until everyone's satisfied
(cert expires 3/2/14, so let's call that the deadline)
- kill relengweb1 when build.mozilla.org is no longer used.
I'll put this bug back in the releng queue after the first step, as the blocking bugs are out of scope for me, but I'm still happy to help where I can. I'll do the 301's and VM-killing when the time comes.
I surveyed the logs for http{,s}://build.mozilla.org for yesterday (August 6), to make sure there's nothing we're missing still on the host. Here's what I found:
http:
/builds - see below
/tryserver-symbols - bug 702337
/clobberer - bug 657024
/talos - bug 657046
/trychooser - already 302'ing
https:
/buildapi - all POSTs; see below
/clobberer - bug 657024
/trychooser - already 302'ing
/tryserver-builds - only bots, unused per 729667 comment 10
/update-bump-unit-tests - bug 657361
For http://build.mozilla.org/builds, bug 657359 comment 5 explains most of the content (except buildfaster.csv.gz). All of this content is rsync'd from cruncher regularly, so it's easy to mirror it elsewhere while still serving it at its existing URL. I'll take care of that in bug 657359, then file a bug blocking this one to change incoming links.
For https://build.mozilla.org/buildapi, all of the incoming requests (and there are lots) are from autoland. So we can fix that up, then I can add a 301. I'll file a bug blocking this one to make the autoland fix.
As for the 301's:
already 302'ing, change to 301:
http://build.mozilla.org/trychooser -> http://trychooser.pub.build.mozilla.org
https://build.mozilla.org/trychooser -> http://trychooser.pub.build.mozilla.org
pending bug 657359 + link-fixing bug:
http://build.mozilla.org/builds -> http://builddata.pub.build.mozilla.org/reports
pending autoland fix:
https://build.mozilla.org/buildapi -> https://secure.pub.build.mozilla.org/buildapi
pending bug 657024:
https://build.mozilla.org/clobberer -> https://secure.pub.build.mozilla.org/clobberer
https://build.mozilla.org/clobberer-stage -> https://secure.pub.build.mozilla.org/clobberer-stage
http://build.mozilla.org/clobberer -> https://secure.pub.build.mozilla.org/clobberer
http://build.mozilla.org/clobberer-stage -> https://secure.pub.build.mozilla.org/clobberer-stage
pending bug 702337:
http://build.mozilla.org/tryserver-symbols -> ??
Assignee | ||
Comment 32•12 years ago
|
||
301's for trychooser converted from 302's.
301's listed as "pending bug 657024" above are added in puppet.
301's listed as "pending autoland fix" above are added to puppet.
That leaves:
pending bug 657359 + link-fixing bug:
http://build.mozilla.org/builds -> http://builddata.pub.build.mozilla.org/reports
pending bug 702337:
http://build.mozilla.org/tryserver-symbols -> ??
Assignee | ||
Comment 33•12 years ago
|
||
The remaining work on this bug is in the dependent bugs, and all are releng tasks. I'm cc'd, so I'll take care of the 301's as necessary.
Assignee: dustin → nobody
Reporter | ||
Comment 34•12 years ago
|
||
I think relengweb1 has become our tooltool server as well.
Assignee | ||
Comment 35•12 years ago
|
||
Correct:
dustin@Lorentz ~ $ host tooltool.pub.build.mozilla.org
tooltool.pub.build.mozilla.org is an alias for relengweb-zlb.vips.scl3.mozilla.com.
relengweb-zlb.vips.scl3.mozilla.com has address 63.245.215.17
Assignee | ||
Comment 36•12 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #34)
> I think relengweb1 has become our tooltool server as well.
And to be clear, this is not hosted on http://build.mozilla.org, so not related to this bug.
The action remaining on this bug is in the dependencies.
Assignee | ||
Comment 37•12 years ago
|
||
I just did one more sweep of the access logs for build.mozilla.org, both http/https. Other than a bunch of bots and pen-testers, everything of interest is either /talos (bug 657046) or /builds (bug 657359). So, we're close on this!
Once those are finished, this host will only serve 301's. At that point, I'll move it to the new cluster.
Assignee | ||
Comment 38•12 years ago
|
||
All that remains is bug 657046 - ironically probably the most production-sensitive of the services on this VM!
Assignee | ||
Comment 39•12 years ago
|
||
On Sept 30, we'll take down http://build.mozilla.org and https://build.mozilla.org permanently.
Assignee | ||
Comment 40•12 years ago
|
||
I see an authenticated user downloading /talos/zips/tp5n.zip from outside the build network. If that should be allowed, let me know. My impression is that that's not expected and should be disallowed in the new implementation.
Flags: needinfo?(coop)
Comment 41•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #40)
> I see an authenticated user downloading /talos/zips/tp5n.zip from outside
> the build network. If that should be allowed, let me know. My impression
> is that that's not expected and should be disallowed in the new
> implementation.
I can't think of a reason why that access should exist. Let's nix it. We can find another way to get people those files should they really need them.
Flags: needinfo?(coop)
Assignee | ||
Comment 42•12 years ago
|
||
Thanks - that shouldn't be hard to add if necessary, at a sub-URI of https://secure.pub.b.m.o.
Assignee | ||
Comment 43•12 years ago
|
||
Ok, I have build.m.o implemented on the releng cluster. It's actually three vhosts with the same name:
1. http://build.mozilla.org where it resolves to an internal IP
2. http://build.mozilla.org where it resolves to an external IP
3. https://build.mozilla.org where it resolves to an external IP
1 and 2 are similar, except that 2 does not serve talos. There's no internal counterpart to 3 - I don't see any such accesses in the logs. 1 is the only production-critical vhost. I tested it by comparing the old and new with:
curl -H "Host: build.mozilla.org" http://10.22.74.128/talos/findlinks/index.html
curl -H "Host: build.mozilla.org" http://10.22.74.160/talos/findlinks/index.html
(that being the first file I found that wasn't huge and binary).
I'll file a CAB bug to move this last vhost off of the old server, which involves both CNAME changes (internally) and taking over the existing IP for build.mozilla.org in zeus (externally).
Assignee | ||
Comment 44•12 years ago
|
||
http{,s}://build.mozilla.org is now hosted on the releng cluster. This bug will stay open until Sept 30, per comment 39. Bug 657046 still blocks this change, but there are two whole months to close it!
Assignee: nobody → dustin
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Comment 45•11 years ago
|
||
Dependencies are closed, so http{,s}://build.mozilla.org, both internally and externally, has 45 days to live.
Assignee | ||
Comment 46•11 years ago
|
||
For September to date:
Internal:
10.22.81.211 - - [02/Sep/2013:00:01:01 -0700] "GET /builds/last-job-per-slave.txt HTTP/1.0" 301 277 "-" "Wget/1.12 (linux-gnu)"
10.22.81.211 - - [02/Sep/2013:18:04:11 -0700] "GET /builds/pending/pending.html HTTP/1.1" 301 275 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0"
10.22.81.211 - - [03/Sep/2013:00:01:01 -0700] "GET /builds/last-job-per-slave.txt HTTP/1.0" 301 277 "-" "Wget/1.12 (linux-gnu)"
10.22.81.211 - - [03/Sep/2013:06:33:56 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:06:52:32 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:07:49:34 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:08:03:37 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:11:49:50 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:12:17:46 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:12:57:32 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:13:14:53 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:14:36:00 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:14:49:41 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [03/Sep/2013:15:11:20 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [04/Sep/2013:08:14:57 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [04/Sep/2013:08:25:00 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [04/Sep/2013:08:52:13 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [04/Sep/2013:09:04:16 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [04/Sep/2013:13:45:25 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [05/Sep/2013:07:31:48 -0700] "GET /talos/zips/talos.fcbb9d7d3c78.zip HTTP/1.1" 200 17031973 "-" "Python-urllib/2.6"
10.22.81.211 - - [05/Sep/2013:07:31:48 -0700] "GET /talos/zips/talos.fcbb9d7d3c78.zip HTTP/1.1" 200 17031973 "-" "Python-urllib/2.6"
10.22.81.211 - - [05/Sep/2013:07:54:39 -0700] "GET /talos/zips/retry.zip HTTP/1.1" 200 3510 "-" "Python-urllib/2.7"
10.22.81.211 - - [05/Sep/2013:08:02:29 -0700] "GET /talos/zips/talos.fcbb9d7d3c78.zip HTTP/1.1" 200 17031973 "-" "Python-urllib/2.6"
External:
- lots of bots -- you can tell which bots don't cache 301's
- a few dozen hits to / from a copy of Firefox 4.0 on Windows.
- a dozen or so /clobberer hits that look like they're from a browser (301's)
- a few /builds/pending hits, similar (301's)
- a few /tryserver-builds hits that seem to be pingbacks from blog software
So I think the turn-off on the 30th is going to be uneventful.
Assignee | ||
Comment 47•11 years ago
|
||
last reminder sent
Assignee | ||
Comment 48•11 years ago
|
||
httpd config removed
Assignee | ||
Comment 49•11 years ago
|
||
A records removed:
build.mozilla.org A 10.22.74.160
build.mozilla.org A 63.245.215.17
Kim, it looks like we're still seeing a few hits to retry.zip per hour. On the fifth, we assumed these were from rebuilds on try, but it seems the URL is still present in
http://hg.mozilla.org/build/mozharness/annotate/ee2caa1098b9/configs/android/android_panda_talos_releng.py#l15
which you're the last to touch. Can you change that to the new URL, http://talos-bundles.pvt.build.mozilla.org/zips/retry.zip?
TODO: verify that VIP 63.245.215.17 is now unused and remove it from DNS and Zeus.
Flags: needinfo?(kmoir)
Assignee | ||
Comment 51•11 years ago
|
||
* TIG 63.245.215.17 removed from VS releng and releng-https
* TIG 63.245.215.17 deleted
* Forward, reerse DNS for 63.245.215.17 removed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•