Closed Bug 628405 Opened 13 years ago Closed 12 years ago

move tiger hosts to mtv1 as contingent support for firefox 3.6 in mtv1

Categories

(Infrastructure & Operations :: RelOps: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: arich)

References

Details

(Whiteboard: [buildmasters][talos] [sjc1 evac])

On 22/01/11 01:31 AM, Dustin J. Mitchell wrote:
> I think this system is unused.  Can someone confirm this?  I'll remove
> it from the 'Masters' wiki page.

Catlee's reply:

I believe it's still running OSX 10.4 talos tests for older branches, although I can't connect to it via build-vpn, only mpt-vpn.

----

If this machine is still necessary and is doing useful work, let's resuscitate it.  Otherwise, let's turn it off (assuming it's a VM) and reallocate any slaves it still has to somewhere useful.
It's running OSX Tiger slaves (Darwin 8.8.1) on the branches for Firefox 3.5 and 3.6. While 3.5 may not be long for this world we'll have 3.6 around for some time yet. That said, we've turned talos off on other branches as the release drivers weren't look at them (somewhat inexplicably IMO). 

We could move the buildbot master elsewhere if you'd like to consolidate off an ancient machine.
Yes, it's very much still active.

http://talos-master.mozilla.org:8010/one_line_per_build

If we keep this VM, it should be made accessible from build vpn.
OK, assuming that this is not only active but producing results that someone is paying attention to, I'll morph this into a bug to bring it into the flock.

I can't load that URL from the build VPN, the MV VPN, or the MPT VPN.
Summary: put talos-master.mozilla.org out of its misery? → bring talos-master.mozilla.org into the build VPN
And the slaves I suspect, they'll be in .mozilla.org too.
Can we punt this over to IT once we have the list of machines/VMs that need to change networks?
Priority: -- → P3
Whiteboard: [buildmasters][talos]
3.6 EOL tracking bug?
Summary: bring talos-master.mozilla.org into the build VPN → kill talos-master.mozilla.org when we EOL 3.6
I just bothered to gather the information about the reporting state of tiger slaves

There are two ways that the produce of these slaves can be even noticed that they exist:
1) load up http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox3.6 and click several times back until they meet a changeset landing [1]
2) load up graphs or graphs-new and specifically decide to look something up about 3.6 and care about loading up 8.8.1 [2]

In other words:
1) it doesn't show up on tbpl and therefore does not exist
2) there is no email regressions and therefore does not exist

(In reply to Chris Cooper [:coop] from comment #5)
> Can we punt this over to IT once we have the list of machines/VMs that need
> to change networks?
>
talos-rev2-tiger[01-14].mozilla.org
talos-master.mozilla.org

[1] http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox3.6&maxdate=1320984374&hours=24&legend=0&norules=1
[2] http://graphs-new.mozilla.org/graph.html#tests=[[72,10,3]]&sel=none&displayrange=30&datatype=running
From philor, the only visibility of these machines is graphs and possibly the regression finder (although comment 7 suggests not), but these are the only test systems running OS X 10.4 (tiger), which is still supported for 3.6.x.

Given that, and assuming the mighty 3.6.x will live on forever, we will need to move these machines to scl1.  That will require:

* building a new talos-master-style buildmaster (0.7.10?) in scl1
* finding space for the (remaining, functional) minis in scl1
* downtime to move the minis
* reconfiguration of the minis by releng
* debugging (e.g., for any unknown network flows)
Assignee: nobody → dustin
Summary: kill talos-master.mozilla.org when we EOL 3.6 → move talos-master.mozilla.org and talos-rev2-* to scl1, and put them in the build network.
We'll wait until the end of the month to see if we can kill this support.  That means lumping this in with the rev5 builders as a decision of product support vs. significant cost to move the hardware.

For reference, this will require *rebuilding* talos-master, which may be a lot of work since it's an ancient system.  So this will require some time-critical elbow-grease from releng, too -- not just IT hours and dollars.
for reference, these talos boxes are running with 'mozqa', not 'cltbld', and a suitably ancient password.  They're x86.
Blocks: 715337
Morphing the title of this bug to reflect what we need to do to support 3.6 in MTV1 and moving it into the relops queue for action. 

The 3.6 EOL is right down to the wire for the sjc1 evac, so it is quite likely that these machines will never be used after the end of April, but we're plan to move them before then because scrambling to do so on the last week of sjc1 move out would be unwise. Since these machines will likely never be used after April, build/test speed is not a primary concern, but ease of movement/management is.

The current plan of record is to move the following to MTV1:

talos-rev2-tiger[01-14].mozilla.org
moz2-darwin9-slave[45-54].build.sjc1.mozilla.com
rebuild or clone talos-master.mozilla.org (it's a vmware vm)

I chose the highest numbered darwin9 slaves in hopes that they are in the best shape, but if there are 10 other slaves out of the pool, we can take those, instead.

Meanwhile, we can keep the rest of the 3.6 support infrastructure in sjc1 running up until the last week, still connected to talos-master.  In the last week, the remaining r2 minis and all of the xserves will be decommissioned.

Does this plan meet the requirements stated in previous comments?
Component: Release Engineering → Server Operations: RelEng
Priority: P3 → --
QA Contact: release → zandr
Summary: move talos-master.mozilla.org and talos-rev2-* to scl1, and put them in the build network. → contingent support for firefox 3.6 in mtv1
Depends on: 715411
We should also bring the DeployStudio server, deploystudio.build.sjc1.mozilla.com, in the event that we need to recover images off of it and/or reinstall any of these machines.
Note that the DS server is a parallels VM on bm-parallels01.  That xserve is moving to scl3 to support a VM for kev.

Options are:
 - try to deploy these DS images using the DS server in mtv1
 - bring some extra machines and hope not to do many reimages over a year (some historical analysis could help), doing any absolutely necessary reimages by hand with Disk Utility
 - try to build a mini with the same DS image as is on the parallels VM
 - re-purpose cb-parallels01 once seamonkey is done with it (a long pole)
I'd like to build the new buildbot server in mtv.  Who can work with me to set it up so we can bring over one of the tiger machines and one of the moz2-darwin9 machines to test?
Assignee: dustin → arich
Priority: -- → P2
Whiteboard: [buildmasters][talos] → [buildmasters][talos] [sjc1 evac]
Amy, this is one of the few move things that can happen now, before the rush in scl3 - maybe a new bug for the new master VM/config is warranted?
I have created buildbot-master28 in mtv so that releng can set up an older buildbot config that mirrors that of talos-master.  Once that's set up we can move one of the tiger hosts to see if we can replicate the functionality in sjc1 (and eventually move all of the tiger hosts).

I've created a new bug in releng to configure this buildbot master.
Assignee: arich → catlee
Depends on: 727912
Summary: contingent support for firefox 3.6 in mtv1 → move tiger hosts to mtv1 as contingent support for firefox 3.6 in mtv1
Assignee: catlee → arich
Blocks: releng-scl3
Per coop's email, we will not be migrating any of these hosts from sjc1.  They will be decommissioned when the datacenter is, if not before.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.