In order to move hosts off of the corp vmware cluster in mtv1, we need to increase the capacity of the mtv1 kvm cluster. Right now we have two nodes with 4 cores and 12G of RAM each (for a total capacity of 4 cores and 12G). The following nodes already exist on this cluster: ganglia3.build.mtv1.mozilla.com image+rhel-60 kvm2 512M mv-production-puppet-new.build.mtv1.mozilla.com image+centos-55 kvm1 2.0G ns1.build.mtv1.mozilla.com image+centos-55 kvm2 768M ns2.build.mtv1.mozilla.com image+centos-55 kvm2 768M tools-staging-master02.mv.mozilla.com image+centos-55 kvm2 4.0G ns1 and ns2 are designed to replace mv-buildproxy01, and mv-production-puppet-new is designed to replace mv-production-puppet. Assuming that the geriatric master is going to stay on the vmware cluster, we definitely need to move the following host (which are not already accounted for above): test-master01 (which will become buildbot-master01): 2 cores/6G We may also need to move the following two hosts that support the n900s: staging-mobile-master: 2 cores/6G production-mobile-master: 2 cores/6G mobile-image03 And there is one vmware vm that's likely mislabeled and might also need to move: test-master02 (is really a production buildbot master?) At minimum (to move test-master01 and have any CPU allocations left), we will require another node with 4 cores and another 12G of RAM (to match existing servers). If we move any other hosts, we will require a RAM upgrade as well as an additional node, probably to at least 24G per node.
(In reply to comment #0) > > And there is one vmware vm that's likely mislabeled and might also need to > move: > > test-master02 (is really a production buildbot master?) > test-master02 is not an active buildmaster
(In reply to comment #1) > test-master02 is not an active buildmaster For the record, it is - see bug 675793.
(In reply to comment #1) > (In reply to comment #0) > > > > > And there is one vmware vm that's likely mislabeled and might also need to > > move: > > > > test-master02 (is really a production buildbot master?) > > > > test-master02 is not an active buildmaster ok, so i'm being told that, for some silly reason, the name of the vm does not map to the name of the build master so vm test-master02 == buildbot-master3 OMGWTF
We should purchase: 1 IX Systems - IX1204R with 24G of RAM (4G DIMMs) 6 4G DIMMs to upgrade the other two KVM servers to 24G Optionally, if we decide that RAM is cheap and we'll want to reuse these machines later, purchase the new machine with 48G and purchase 18 4G DIMMs to upgrade the other two servers to 48G and fully populate each DIMM slot.
This cluster will be one of the stragglers in mtv1, and may even be stuck there forever with the mobile devices. However, if the nodes have more RAM, that means we can probably pull all but 2 nodes out and move them to scl3 when the time comes, rather than being stuck with more, lower-RAM systems propping up mobile. IMHO, we should make the nodes in this cluster interchangeable with the nodes in the scl1 cluster, for optimal fungibility when moving to scl3.
Should we upgrade disk while we're at it?
We need the CPU power of the nodes as well as the memory capacity. If we want to make these interchangeable with the scl1 nodes, though, we should upgrade the RAM to 48G and also add disks, yes.
per meeting with IT yesterday: * hardware not ordered yet. The mtv cluster needs to be increased so IT can migrate from ESX to KVM in 650castro.
To be clear, this is so releng VMs can be migrated off of the (non-releng) corp vmware servers onto the kvm servers in mtv1. IT needs to reclaim space on the corp vmware servers.
We should be able to retire production-mobile-master and staging-mobile-master (and all n900s) after 7.0 ships in 6 weeks.
By my count, we need to migrate test-master01 and test-master02 (which is actually buildbot-master3). The rest of the vms on the IT vmware cluster can be retired this year and are not worth moving (this includes the completion of ns1 and ns2 to replace mv-buildproxy01).
The replacement hardware hasn't been ordered yet. We're also waiting on 7.0 to ship as per comment 10.
Hardware on order and tracked in the ordering spreadsheet.
Hardware has arrived and is sitting next to LOL. Memory upgrades done to the two existing machines. Need to rack, cable, and install the new machine.
New machine is installed and in the cluster, and cluster verify passes.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.