Closed Bug 729667 Opened 12 years ago Closed 12 years ago

re-create the services on dm-wwwbuild01 in scl3

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

(Whiteboard: [scl3])

Attachments

(1 file)

This host (VM) proxies and serves a number of small things both internally to the build network and externally.  It sits on the build network and as I understand it has a NAT entry pointing to it.  It serves a few xxx.pub.build.mozilla.org vhosts, a few ..pvt.build.., and a bunch of funny paths at http://build.mozilla.org -- all with varying degrees of IP-based and LDAP-based auth.

The system is puppeted, although it's not been scratch-built from puppet, so we may find pieces missing and it will need some testing.

Jake, can you give me some advice on how this would be best set up in scl3?  Hit me up in IRC for details I haven't included here.
Sorry, I have no knowledge about this system, and I'm not sure who would. You said it's a VM... perhaps we can just float it over to SCL3 (and possibly change some IPs or DNS names around to match)?

Perhaps Dan has some thoughts on how complicated that might be... CC!
From IRC discussion with Jake, we shouldn't try to mix a physical move with a significant reorganization of this service.  So, we should rebuild this machine, using puppet, on a new ESX host in scl3, and migrate services there, then continue the ongoing process of hosting build stuff at subdomains, rather than as paths at http://build.mozilla.org

We discussed one difference from the current arrangement, though.  Presently, traffic comes in three ways:
 - there's a NAT entry that allows the world to talk directly to Apache on dm-wwwbuild01
 - build-network hosts talk directly to the host
 - https://secure.pub.build.mozilla.org has the SSL terminated by zeus

The difference would be to handle all of those terminations in zeus.  That gets us the ability to redirect services to new hosts easily, the non-SPOFfiness of multiple backend servers when required, and all the other goodness that zeus brings.

Because releng is it's own business unit, we can't put a zeus VIP on the build network, so we'd need to think about how to replicate the second item in the list above.  I'd be fine with having the build hosts connect to a non-build VIP for that purpose, with proper ACL's applied in Zeus.
^^ amy, your thoughts?
QA Contact: cshields → arich
Since build hosts already connect to a variety of non-build network IPs, I think this is a workable idea.  My biggest concern would be the stability of zeus (since the phx setup has shown to be less than stable).  Are there tools on this host that will close the tree if they are unreachable from the build network at any time?
Yup. We download Talos and some other tools from here at the start of every talos job. These jobs will burn if build.m.o goes down.
We should get someone with more expertise on Zeus and how it's failing in phx1 to chime in here -- AIUI it is a fairly narrow set of circumstances that cause it to fail under high load.  The collateral damage was to other (relatively very low-load) services hosted on the same Zeus cluster.  I'm not sure what the architecture in scl3 will look like, in terms of what build.m.o would be sharing with.

Corey, can you enlighten us?
Zeus should be fine for this in SCL3.

I would like to see a different hostname - this is asking for a deprecated naming scheme that we are trying to avoid in SCL3.
The zeus cluster in bug 730386 isn't doing multicast, and and is intended for uses like this, so I think we should go ahead with it.  Zeus buys a lot of flexibility and is a well-travelled service demarcation point within Mozilla.
Summary: re-create dm-wwwbuild01 in scl3 → re-create the services on dm-wwwbuild01 in scl3
Ugh, this host is *really* hard to reverse-engineer.  Here's what I've got:

>  pvt pub secure auth description       current vhost                    notes
>  --- --- ------ ---- ----------------- -------------------------------- ----------
>       Y    Y    LDAP clobberer UI      build.m.o/clobberer              bug 657024
>   Y             ACL  clobberer API     build.m.o/clobberer              bug 657024
>   Y             ACL  talos bundles     build.m.o/talos                  bug 657046
>       Y    Y    LDAP  ---"--- via ssl  build.m.o/talos                  bug 657046
>       Y          -   builddata (old)   build.m.o/builds                 bug 657359; unused?
>       Y    Y    LDAP  ---"--- via ssl  build.m.o/builds                 bug 657359; unused?
>       Y          -   builddata (new)   builddata.pub.m.o/buildjson      bug 657359, bug 712317
>       Y    Y    LDAP buildapi (old)    build.mozilla.org/buildapi       proxies to cruncher
>       Y    Y    LDAP buildapi (new)    secure.pub.bmo/buildapi          proxies to buildapi01
>       Y    B     -   update tests      build.m.o/update-bump-unit-tests bug 657361
>       Y          -   trychooser        trychooser.pub.bmo
>   Y             ACL  stackwalker       stackwalker.pvt.bmo
>       Y          -   mobile-dashboard  mobile-dashboard.pub.bmo
>   Y              -   signed-binaries   build.m.o/signed-binaries        unused?
>   Y             ACL  runtime-binaries  runtime-binaries.pvt.bmo         used by tooltool client
>            Y     -   tryserver-builds  build.m.o/tryserver-builds       unused?
>       Y          -   tryserver-symbols build.m.o/tryserver-symbols

'B' for secure means both (available at http: and https:)

I will endeavor to replicate this exactly in scl3, so first -- releng, is this accurate?  Are tree-critical things missing?  Are the "unused" parts really unused?

Second, here's how I'm going to structure this:

I'll build a VM named releng-web1 using the same puppet manifests that are building dm-wwwbuild01.  I'll request netapp shares for its interesting data (/builds, /symbols, and /var/www/html).

Zeus will serve two VIPs, let's call them pub-bmo-vip and pvt-bmo-vip.  The former is a public IP, and the latter is private (I'm not sure what VLAN is appropriate here - casey, input?).

  pub-bmo-vip:80 -> releng-web1:80 for all pub, non-secure vhosts above
  pub-bmo-vip:443 -> releng-web1:81 for https://secure.pub.build.mozilla.org
  pvt-bmo-vip:80 -> releng-web1:82 for all pvt vhosts above

Zeus will be configured to restrict access to pvt-bmo-vip:80 to the releng networks.

Jakem, does this seem ok to you?

This sets us up for a few nice things:
 - I can finish bug 604688
 - we can expand the cluster as needed (for availability or load)
   - to hardware or more VMs
 - we can set up zeus to proxy directly to e.g., buildapi if necessary
 - we can build out new hosts for particular applications if they become high-load (for example, stackwalker killed dm-wwwbuild01 when we turned it on; this could be isolated to its own host and only kill itself)

So, for me, the next steps are:
 1. verify the above list
 2. request VM and puppetize
 3. request netapp shares
 4. request VIPs
 5. network flows for VIPs and the VM
then wait for bug 730386 (zeus in scl3).  We can then start cutting things over one service at a time -- with luck, with no downtime.
(In reply to Dustin J. Mitchell [:dustin] from comment #9)
> Ugh, this host is *really* hard to reverse-engineer.  Here's what I've got:
> 
> >  pvt pub secure auth description       current vhost                    notes
> >  --- --- ------ ---- ----------------- -------------------------------- ----------
> >       Y    Y    LDAP clobberer UI      build.m.o/clobberer              bug 657024
> >   Y             ACL  clobberer API     build.m.o/clobberer              bug 657024

We also have stage and preproduction clobberer instances.

> >   Y             ACL  talos bundles     build.m.o/talos                  bug 657046
> >       Y    Y    LDAP  ---"--- via ssl  build.m.o/talos                  bug 657046

I'm surprised LDAP access is configured for talos, and can't see it in the configs (wouldn't claim any great expertise there though). Worth a double check.

> >       Y          -   builddata (old)   build.m.o/builds                 bug 657359; unused?
> >       Y    Y    LDAP  ---"--- via ssl  build.m.o/builds                 bug 657359; unused?

There's a lot of cruft in there. bug 731090 moved the canonical location for builds*js.gz files to http://builddata.pub.build.mozilla.org/buildjson/ (I blogged about it but would be interesting to know if anyone is still hitting build.m.o for those files, maybe add a redirect). There are some things to delete (eg pending2/), and others that are generated on cruncher that need to be published elsewhere.

> >            Y     -   tryserver-builds  build.m.o/tryserver-builds       unused?

Unused now, they get published to ftp.m.o.

> >       Y          -   tryserver-symbols build.m.o/tryserver-symbols

Bug 702337 covers publishing these to ftp.m.o too, but requires build system & try chooser changes, and probably the bigger netapp in SCL3. Reproduce as-is for now.

You're quite right to remove the locations /1.8-nightly, /1.9-nightly, /mozilla-central-nightly, and directories /var/www/html/build/patches and patches-staging.
In addition to Nick's comments:

> >       Y          -   trychooser        trychooser.pub.bmo

I believe this is just a static html page with some js to make it go. Lukas would know best here. I'm not sure if it merits its own subdomain.

> >   Y             ACL  stackwalker       stackwalker.pvt.bmo

stackwalker.pvt.bmo is longer used

> >   Y              -   signed-binaries   build.m.o/signed-binaries        unused?

this is no longer used
(In reply to Nick Thomas [:nthomas] from comment #10)
> (In reply to Dustin J. Mitchell [:dustin] from comment #9)
> > Ugh, this host is *really* hard to reverse-engineer.  Here's what I've got:
> > 
> > >  pvt pub secure auth description       current vhost                    notes
> > >  --- --- ------ ---- ----------------- -------------------------------- ----------
> > >       Y    Y    LDAP clobberer UI      build.m.o/clobberer              bug 657024
> > >   Y             ACL  clobberer API     build.m.o/clobberer              bug 657024
> 
> We also have stage and preproduction clobberer instances.

Yep, those will go where this goes.

> > >   Y             ACL  talos bundles     build.m.o/talos                  bug 657046
> > >       Y    Y    LDAP  ---"--- via ssl  build.m.o/talos                  bug 657046
> 
> I'm surprised LDAP access is configured for talos, and can't see it in the
> configs (wouldn't claim any great expertise there though). Worth a double
> check.

Well, it works, but given this I'll assume it's not required.

> > >       Y          -   builddata (old)   build.m.o/builds                 bug 657359; unused?
> > >       Y    Y    LDAP  ---"--- via ssl  build.m.o/builds                 bug 657359; unused?
> 
> There's a lot of cruft in there. bug 731090 moved the canonical location for
> builds*js.gz files to http://builddata.pub.build.mozilla.org/buildjson/ (I
> blogged about it but would be interesting to know if anyone is still hitting
> build.m.o for those files, maybe add a redirect). There are some things to
> delete (eg pending2/), and others that are generated on cruncher that need
> to be published elsewhere.

So I think that last part means that both need to be maintained, at least for now.

> You're quite right to remove the locations /1.8-nightly, /1.9-nightly,
> /mozilla-central-nightly, and directories /var/www/html/build/patches and
> patches-staging.

Excellent.

(In reply to Chris AtLee [:catlee] from comment #11)
> In addition to Nick's comments:
> 
> > >       Y          -   trychooser        trychooser.pub.bmo
> 
> I believe this is just a static html page with some js to make it go. Lukas
> would know best here. I'm not sure if it merits its own subdomain.

*Everything* should have its own subdomain.  Yes, this site is pretty simple.

> > >   Y             ACL  stackwalker       stackwalker.pvt.bmo
> 
> stackwalker.pvt.bmo is longer used

Good news - I'll kill it.

> > >   Y              -   signed-binaries   build.m.o/signed-binaries        unused?
> 
> this is no longer used

Ditto.
>  1. verify the above list
  done
>  2. request VM and puppetize
  bug 739360
>  3. request netapp shares
  bug 739454
>  4. request VIPs
>  5. network flows for VIPs and the VM
  I'll wait for bug 730386 for that.
The VM is up and puppetized with the same manifests as dm-wwwbuild01.  It's not quite up yet - the hostname isn't resolving on the host itself, which prevents httpd from starting up.

The external VIP is allocated and configured in zeus.

The netapp share will be created soon.

I'll pick this up on Monday, but the schedule looks doable.
I added the netapp share at /mnt/netapp/relengweb, and symlinked /builds, /symbols, and /var/www/html into it.  I've rsynced /var/www/html, and /symbols is in process.  I added PHP to the system for clobberer.  I removed stackwalker.

Zeus isn't on any good VLANs to act as VIP for access from the build network, so the build network will continue to hit this system directly.

I requested flows in bug 741600.  I'm setting up the SSL VIPs now.

(In reply to Nick Thomas [:nthomas] from comment #10)
> > >       Y          -   builddata (old)   build.m.o/builds                 bug 657359; unused?
> > >       Y    Y    LDAP  ---"--- via ssl  build.m.o/builds                 bug 657359; unused?
> 
> There's a lot of cruft in there. bug 731090 moved the canonical location for
> builds*js.gz files to http://builddata.pub.build.mozilla.org/buildjson/ (I
> blogged about it but would be interesting to know if anyone is still hitting
> build.m.o for those files, maybe add a redirect). There are some things to
> delete (eg pending2/), and others that are generated on cruncher that need
> to be published elsewhere.

I need some more detail here -- how are things in here generated on cruncher, and how do they get onto dm-wwwbuild01?  Can you break out the individual pieces that are in here?
OK, Zeus is set up to send all clear traffic to :82, and all SSL traffic to :81.  Build-network-originated traffic will go to :80.

There are two VIPs for SSL - one for https://build.mozilla.org and one for http://secure.pub.build.mozilla.org; the former will hopefully go away eventually.
DB ACLs requested in bug 741656
I've re-created the crontasks on dm-wwwbuild01 in puppet, too, so this is basically all set as far as I'm concerned.  Remaining:

 * flows and DB ACLs (blocking testing of clobberer, talos bundles, old buildapi)
 * info on *incoming* data, e.g., builds/, which will break when we turn off dm-wwwbuild01
 * validation by releng
 * finish syncing
 * DNS changes to make this primary
(In reply to Dustin J. Mitchell [:dustin] from comment #15)
> I need some more detail here -- how are things in here generated on
> cruncher, and how do they get onto dm-wwwbuild01?  Can you break out the
> individual pieces that are in here?

Everything is generated into or static in cruncher:/var/www/html/builds/. root@cruncher runs a cron job that rsyncs to syncbld@dm-wwwbuild01:/var/www/html/build/builds/ every minute.

Excluding the builds*.js.gz files, and rearranging for logical ordering:
addons/
build.xml
changelog
build/
docs/
jquery-1.6.4.min.js
jquery-latest.js
jquery.metadata.js
jquery.tablesorter.js
jquery.tablesorter.min.js
tests/
themes/
All of this is from jquery.tablesorter used by coop's last-job-per-slave.html. Moved into a jquery.tablesorter directory and updated coop's generating script.

last-job-per-slave.html - coop's cron job
lastbuild.css - for the above

slaves_needing_reboot.txt - possibly superceded by briarpatch, cue coop

last-job-per-slave.txt  - nthomas's cron job, superceded by coop's html so disabled & removed

buildfaster.csv.gz - generated by catlee for a-team, cue catlee

dataTables-1.6/  - unused, moved away
pending2/  - old, deleted

builds-pending.js
builds-running.js
pending.html
running.html
jquery/
DataTables-1.7.1/
First four are generated by me, not aware of them being consumed by anyone and they can be gotten from buildapi anyway. Disabled & deleted.

pending/ - my graphs, I switched them to consume buildapi from buildapi01
Depends on: 742045
Here's the current status (S = "secure?"; K = "ok?")

description       S current vhost                    K notes
----------------- - -------------------------------- - ----------
clobberer UI      Y build.m.o/clobberer              Y   
clobberer API       build.m.o/clobberer              Y see below
talos bundles       build.m.o/talos                  Y   
 ---"--- via ssl  Y build.m.o/talos                  Y   
builddata (old)     build.m.o/builds                 Y need more info for sources
 ---"--- via ssl  Y build.m.o/builds                 Y need more info
builddata (new)     builddata.pub.m.o/buildjson      Y   
buildapi (old)    Y build.mozilla.org/buildapi       Y   
buildapi (new)    Y secure.pub.bmo/buildapi          Y   
update tests        build.m.o/update-bump-unit-tests Y
trychooser          trychooser.pub.bmo               Y   
stackwalker         stackwalker.pvt.bmo              Y (removed)
mobile-dashboard    mobile-dashboard.pub.bmo         Y   
signed-binaries     build.m.o/signed-binaries        Y (ignored)
runtime-binaries    runtime-binaries.pvt.bmo         Y (ACL works)
tryserver-builds  Y build.m.o/tryserver-builds       Y (ignored)
tryserver-symbols   build.m.o/tryserver-symbols      Y   

All of the flows and ACLs are in place, so almost everything checks out

It turns out, contrary to popular opinion, that the clobberer API is starting at http and redirecting to https, although not requiring authentication.  The urllib2.urlopen(..) in clobberer.py follows this 302.  This is complicated on the new host, since I've assumed that build-internal access will be un-encrypted.  I'd like to change this so that internal access to clobberer is unencrypted.  I've implemented this already, and verified it works from a staging slave, but please pay particular attention to testing that!

The existing crontabs to get data onto this host are "push" crontasks, and that causes problems with this new host because they're using a non-LDAP destination account, and because push is hard to reverse-engineer, while pull is not.  So, I'd prefer to pull.

The crontasks to replicate are:
 - talos dirty profiles from cruncher
 - /builds/ from cruncher (as detailed by nthomas above)
and the new crontask to add is
 - sync try symbols from dm-wwwbuild01 (temporarily)

The new, temporary crontask is in place.  The pulls from cruncher require a new flow, sadly - bug 742124.  Even so, this is ready to test.

To test externally, add the following to your /etc/hosts:
63.245.215.17 build.mozilla.org trychooser.pub.mozilla.org builddata.pub.mozilla.org mobile-dashboard.pub.mozilla.org
63.245.215.21 secure.pub.build.mozilla.org

To test internally, add the following to your (or a dev slave's) /etc/hosts:
10.22.74.128 build.mozilla.org trychooser.pub.mozilla.org builddata.pub.mozilla.org mobile-dashboard.pub.mozilla.org
63.245.215.21 secure.pub.build.mozilla.org

Please post here with successful *and* unsucessful test results!
Depends on: 742131
Based on success for both:

[root@relengweb1.dmz.scl3 ~]# CLEANUP_PASSWORD='<cough>' /usr/bin/php /var/www/html/build/stage-clobberer/cleanup.php
[root@relengweb1.dmz.scl3 ~]# export CLEANUP_PASSWORD='<ahem>'; /usr/bin/php /var/www/html/build/clobberer/cleanup.php
[root@relengweb1.dmz.scl3 ~]#

I've re-enabled the cleanup crontasks.

This is fairly time-critical - if possible, I'd like to move this service to scl3 during the downtime next week, rather than just trying to migrate the VM in sjc1.
Assignee: dustin → nobody
Component: Server Operations: RelEng → Release Engineering
From the email thread about the releng VPN, rail has indicated that the clobberer UI should work from the build VPN (but of course slaves should not be expected to provide LDAP creds).

The problem is that internally, build.mozilla.org resolves to relengweb1.dmz.scl3, which wasn't doing SSL.  So I've added an SSL vhost on :443 to this host, and I *think* this works now.  But please test it out.
(In reply to Nick Thomas [:nthomas] from comment #19)
> slaves_needing_reboot.txt - possibly superceded by briarpatch, cue coop

Generated as a side-effect of last-job-per-slave.html. I'll change the script to stop creating it.
I'm testing external access right now, and will test internal access afterwards.
External testing:

description       K notes
----------------- - -----
clobberer UI      Y    
clobberer API     ? Unsure what needs testing here
talos bundles     N                      
 ---"--- via ssl  Y    
builddata (old)   Y gz file gunzip-ed and displyed
 ---"--- via ssl  Y gz file prompted to download
builddata (new)   N Forbidden, expected dir listing
buildapi (old)    Y   
buildapi (new)    Y   
update tests      Y
trychooser        Y   
mobile-dashboard  Y   
signed-binaries   Y
runtime-binaries  Y
tryserver-builds  N redirecting to https?
tryserver-symbols Y http only

Internal testing coming up.
Internal testing:

description       K notes
----------------- - -----
clobberer UI      Y    
clobberer API     ? Unsure what needs testing here
talos bundles     Y                      
 ---"--- via ssl  N    
builddata (old)   Y gz file gunzip-ed and displyed
 ---"--- via ssl  N
builddata (new)   N Forbidden, expected dir listing
buildapi (old)    N   
buildapi (new)    Y https  
update tests      Y
trychooser        Y   
mobile-dashboard  Y   
signed-binaries   N 
runtime-binaries  Y
tryserver-builds  N redirecting to https?
tryserver-symbols Y http only

talos bundles via ssl, builddata via ssl, buildapi (old), and signed binaries all timeout while "Connecting...." Makes me wonder if there's something systemic going on there...missing/faulty ssl redirect or somesuch.
(In reply to Chris Cooper [:coop] from comment #26)
> Internal testing:

Should note for posterity that I performed this testing on moz2-darwin10-slave01.
(In reply to Chris Cooper [:coop] from comment #25)
> External testing:

> clobberer API     ? Unsure what needs testing here
  only that it doesn't work externally!

dustin@cerf ~ $ wget 'http://build.mozilla.org/clobberer/index.php?master=http%253A%252F%252Fbuildbot-master12.build.scl1.mozilla.com%253A8001%252F&slave=w32-ix-slave40&builddir=elm-w32-ntly&branch=elm&buildername=WINNT+5.2+elm+nightly'
--2012-04-06 17:56:02--  http://build.mozilla.org/clobberer/index.php?master=http%253A%252F%252Fbuildbot-master12.build.scl1.mozilla.com%253A8001%252F&slave=w32-ix-slave40&builddir=elm-w32-ntly&branch=elm&buildername=WINNT+5.2+elm+nightly
Resolving build.mozilla.org (build.mozilla.org)... 63.245.215.17
Connecting to build.mozilla.org (build.mozilla.org)|63.245.215.17|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-06 17:56:02 ERROR 403: Forbidden.
 (as expected)

> talos bundles     N      
  that's an expected "N", right?  Talos bundles aren't public.              

> builddata (new)   N Forbidden, expected dir listing
  ah, I didn't have this vhost enabled via zeus.  Fixed.

> tryserver-builds  N redirecting to https?
  dead per comment 10

> Internal testing:

> clobberer API     ? Unsure what needs testing here
  that an API call (as made from builders) works

>   talos  via ssl  N    
  not expected to work per nthomas

> builddata via ssl N
  needs a flow - see below

> builddata (new)   N Forbidden, expected dir listing
  this is working in my testing:

dustin@cerf ~/tmp $ wget http://builddata.pub.build.mozilla.org/buildjson/
--2012-04-06 18:29:09--  http://builddata.pub.build.mozilla.org/buildjson/
Resolving builddata.pub.build.mozilla.org (builddata.pub.build.mozilla.org)... 10.22.74.128
Connecting to builddata.pub.build.mozilla.org (builddata.pub.build.mozilla.org)|10.22.74.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html.1' ... 118,165     96.2K/s   in 1.2s    

2012-04-06 18:29:10 (96.2 KB/s) - `index.html.1' saved [118165]

> buildapi (old)    N   
  needs a flow - see below

> signed-binaries   N 
  dead per comment 11

> tryserver-builds  N redirecting to https?
  dead per comment 10

> talos bundles via ssl, builddata via ssl, buildapi (old), and signed
> binaries all timeout while "Connecting...." Makes me wonder if there's
> something systemic going on there...missing/faulty ssl redirect or somesuch.

Ah! We need a flow for ssl from the build network.  bug 743377.

Summary: some things are fixed, the remainder will be when the flow is in.

I gave the wrong /etc/hosts for internal testing.  All of the *.pub.build.mozilla.org should still go through zeus (IPs starting with 63).  So that's:

10.22.74.128 build.mozilla.org runtime-binaries.pvt.build.mozilla.org
63.245.215.17 builddata.pub.build.mozilla.org trychooser.pub.build.mozilla.org mobile-dashboard.pub.build.mozilla.org
63.245.215.21 secure.pub.build.mozilla.org

With that in place, I tested the following URLs, with expected and actual results:
> http://build.mozilla.org/clobberer/ - UI with no auth - OK
> https://build.mozilla.org/clobberer/ - UI with LDAP auth - FAIL (bug 743377)
> http://build.mozilla.org/talos/zips/tp5.zip - download - OK (from talos boxes anyway)
> https://build.mozilla.org/talos/zips/tp5.zip - download - FAIL (bug 743377)
> http://build.mozilla.org/builds/ - dir listing - OK
> https://build.mozilla.org/builds/ - dir listing - FAIL (bug 743377)
> http://builddata.pub.build.mozilla.org/buildjson - dir listing - OK
> http://build.mozilla.org/buildapi/ - redir to https - OK
> https://build.mozilla.org/buildapi/ - buildapi home - FAIL (bug 743377)
> https://secure.pub.build.mozilla.org/buildapi/ - new buildapi - OK
> http://build.mozilla.org/update-bump-unit-tests - dir listing - OK
> http://trychooser.pub.build.mozilla.org - trychooser - OK
> http://mobile-dashboard.pub.build.mozilla.org/ - dashboard - OK
> http://runtime-binaries.pvt.build.mozilla.org/README.txt - readme - OK
> http://build.mozilla.org/tryserver-symbols/firefox-12.0-WINNT-20120402141734-symbols.txt - symbols - OK

the external /etc/hosts/ was correct:

63.245.215.17 build.mozilla.org builddata.pub.build.mozilla.org trychooser.pub.build.mozilla.org mobile-dashboard.pub.build.mozilla.org
63.245.215.21 secure.pub.build.mozilla.org

With that in place, I tested the following URLs, with expected and actual results (all OK!):
> http://build.mozilla.org/clobberer/ - 403 - OK
> https://build.mozilla.org/clobberer/ - UI with LDAP auth - OK
> http://build.mozilla.org/talos/zips/tp5.zip - 403 - OK
> https://build.mozilla.org/talos/zips/tp5.zip - LDAP, then download - OK
> http://build.mozilla.org/builds/ - dir listing - OK
> https://build.mozilla.org/builds/ - forbidden - OK
> http://builddata.pub.build.mozilla.org/buildjson - OK
> http://build.mozilla.org/buildapi/ - redir to https - OK
> https://build.mozilla.org/buildapi/ - buildapi home - OK
> https://secure.pub.build.mozilla.org/buildapi/ - new buildapi - OK
> http://build.mozilla.org/update-bump-unit-tests - dir listing - OK
> http://trychooser.pub.build.mozilla.org - trychooser - OK
> http://mobile-dashboard.pub.build.mozilla.org/ - dashboard - OK
> http://runtime-binaries.pvt.build.mozilla.org/README.txt - 403 - OK
> http://build.mozilla.org/tryserver-symbols/firefox-12.0-WINNT-20120402141734-symbols.txt - symbols - OK

(note that "expected" means "what dm-wwwbuild01 does" - we can tinker with that after the move)
Depends on: 743377
re-testing internal access now that the flow is in place:

> https://build.mozilla.org/clobberer/ - UI with LDAP auth - FAIL (bug 743377)
  OK
> https://build.mozilla.org/talos/zips/tp5.zip - download - FAIL (bug 743377)
  OK (actually download after LDAP auth)
> https://build.mozilla.org/builds/ - dir listing - FAIL (bug 743377)
  OK (actually 403 forbidden after LDAP auth)
> https://build.mozilla.org/buildapi/ - buildapi home - FAIL (bug 743377)
  OK (actually buildapi home after LDAP auth)

the "actually" are all confirmed against dm-wwwbuild01.

So, as far as my list of URLs above is concerned, this is ready to go.

Remaining:
 * further testing by releng
 * begin uploading symbols to relengweb1.dmz.scl3.mozilla.com
   * make sure --delete isn't in the temporary symbol-sync'ing rsync
   * may require host-key additions? bug 742045?
   * we can add a CNAME if that's helpful, but usually SSH keys and hosts go together
 * change DNS to resemble the /etc/hosts in the previous comment
   * no downtime required

The first two items are for releng, so this is a releng bug for the moment.  I or Amy can change DNS given a bit of warning to make sure it's right.
Depends on: 741774
(In reply to Dustin J. Mitchell [:dustin] from comment #29) 
> Remaining:
>  * further testing by releng

I retested the outstanding internal items that dustin indicated above with corrected /etc/hosts entry, and they all work as expected now. Note: I had to use a new host for this (talos-r3-fed64-002) since moz2-darwin10-slave01 is now being decommissioned (bug 744174).

>  * begin uploading symbols to relengweb1.dmz.scl3.mozilla.com
>    * make sure --delete isn't in the temporary symbol-sync'ing rsync
>    * may require host-key additions? bug 742045?
>    * we can add a CNAME if that's helpful, but usually SSH keys and hosts go
> together

Who has been handling symbols-related work on the releng side so far? 

Nick: do you have time to verify this? I see you're already handling some of it in bug 742045.
Component: Release Engineering → Release Engineering: Developer Tools
Priority: -- → P2
QA Contact: arich → lsblakk
Whiteboard: [scl3]
All set to send symbols to relengweb1.dmz.scl3.mozilla.com. ssh config is deployed and verified access, permissions, etc work:
 https://bugzilla.mozilla.org/show_bug.cgi?id=742131#c2
I apologize if this is obvious somewhere and I'm missing it, but are we on track to shut down the old dm-wwwbuild01 VM by the end of this week?

Thanks in advance!
Corey, it's been migrated to a bm-vmware* host in bug 739787, so there's no rush.  That said, we may still make that timeline :)
Based on comment 29, we're at
 * testing - good per comment 30
 * symbols are good per comment 32
 * --delete removed from sync in puppet (just now)

So I think we're ready for
 * switch to upload symbols to relengweb1 (releng)
 * DNS change (me, with releng signoff)
Blocks: 748814
>  * switch to upload symbols to relengweb1 (releng)
Assignee: nobody → dustin
Attachment #618317 - Flags: review?(catlee)
Attachment #618317 - Flags: review?(catlee) → review+
DNS was changed.

We discovered that build.m.o:{80.443} didn't work from either releng VPN -- they didn't have a route to 10.22/16. ravi added this on the *new* VPN only (ok per coop).

The old VPN works with a manual 'ip route add 10.22.74.128 dev tun0'

So, we're done here.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 618317 [details] [diff] [review]
buildbot-configs.patch

In production since noon PT.
Blocks: 748387
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Incidentally, the old host is still up, but I've shut down crond and httpd.  We'll leave it that way until the ESX host gets powered down, in case anyone needs to rsync anything off of it.
Depends on: 750184
Product: mozilla.org → Release Engineering
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: