676560 - increase capacity of mtv1 kvm cluster

Assignee

Description

•

13 years ago

In order to move hosts off of the corp vmware cluster in mtv1, we need to increase the capacity of the mtv1 kvm cluster.  Right now we have two nodes with 4 cores and 12G of RAM each (for a total capacity of 4 cores and 12G).  The following nodes already exist on this cluster:

ganglia3.build.mtv1.mozilla.com image+rhel-60 kvm2 512M
mv-production-puppet-new.build.mtv1.mozilla.com image+centos-55 kvm1 2.0G
ns1.build.mtv1.mozilla.com image+centos-55 kvm2 768M
ns2.build.mtv1.mozilla.com image+centos-55 kvm2 768M
tools-staging-master02.mv.mozilla.com image+centos-55 kvm2 4.0G

ns1 and ns2 are designed to replace mv-buildproxy01, and mv-production-puppet-new is designed to replace mv-production-puppet.


Assuming that the geriatric master is going to stay on the vmware cluster, we definitely need to move the following host (which are not already accounted for above):

test-master01 (which will become buildbot-master01): 2 cores/6G

We may also need to move the following two hosts that support the n900s:

staging-mobile-master: 2 cores/6G
production-mobile-master: 2 cores/6G
mobile-image03

And there is one vmware vm that's likely mislabeled and might also need to move:

test-master02 (is really a production buildbot master?)

At minimum (to move test-master01 and have any CPU allocations left), we will require another node with 4 cores and another 12G of RAM (to match existing servers).  

If we move any other hosts, we will require a RAM upgrade as well as an additional node, probably to at least 24G per node.

Mike Taylor [:bear]

Comment 1

•

13 years ago

(In reply to comment #0)

> 
> And there is one vmware vm that's likely mislabeled and might also need to
> move:
> 
> test-master02 (is really a production buildbot master?)
> 

test-master02 is not an active buildmaster

Dustin J. Mitchell [:dustin] (he/him)

Comment 2

•

13 years ago

(In reply to comment #1)
> test-master02 is not an active buildmaster

For the record, it is - see bug 675793.

Mike Taylor [:bear]

Comment 3

•

13 years ago

(In reply to comment #1)
> (In reply to comment #0)
> 
> > 
> > And there is one vmware vm that's likely mislabeled and might also need to
> > move:
> > 
> > test-master02 (is really a production buildbot master?)
> > 
> 
> test-master02 is not an active buildmaster

ok, so i'm being told that, for some silly reason, the name of the vm does not map to the name of the build master

so vm test-master02 == buildbot-master3

OMGWTF

Amy Rich [:arr] [:arich]

Assignee

Comment 4

•

13 years ago

We should purchase:

1 IX Systems - IX1204R with 24G of RAM (4G DIMMs)
6 4G DIMMs to upgrade the other two KVM servers to 24G

Optionally, if we decide that RAM is cheap and we'll want to reuse these machines later, purchase the new machine with 48G and purchase 18 4G DIMMs to upgrade the other two servers to 48G and fully populate each DIMM slot.

Dustin J. Mitchell [:dustin] (he/him)

Comment 5

•

13 years ago

This cluster will be one of the stragglers in mtv1, and may even be stuck there forever with the mobile devices.  However, if the nodes have more RAM, that means we can probably pull all but 2 nodes out and move them to scl3 when the time comes, rather than being stuck with more, lower-RAM systems propping up mobile.

IMHO, we should make the nodes in this cluster interchangeable with the nodes in the scl1 cluster, for optimal fungibility when moving to scl3.

Zandr Milewski [:zandr]

Comment 6

•

13 years ago

Should we upgrade disk while we're at it?

Amy Rich [:arr] [:arich]

Assignee

Comment 7

•

13 years ago

We need the CPU power of the nodes as well as the memory capacity.  If we want to make these interchangeable with the scl1 nodes, though, we should upgrade the RAM to 48G and also add disks, yes.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 8

•

13 years ago

per meeting with IT yesterday:

* hardware not ordered yet. The mtv cluster needs to be increased so IT can migrate from ESX to KVM in 650castro.

Amy Rich [:arr] [:arich]

Assignee

Comment 9

•

13 years ago

To be clear, this is so releng VMs can be migrated off of the (non-releng) corp vmware servers onto the kvm servers in mtv1.  IT needs to reclaim space on the corp vmware servers.

Aki Sasaki (not active)

Comment 10

•

13 years ago

We should be able to retire production-mobile-master and staging-mobile-master (and all n900s) after 7.0 ships in 6 weeks.

Amy Rich [:arr] [:arich]

Assignee

Comment 11

•

13 years ago

By my count, we need to migrate test-master01 and test-master02 (which is actually buildbot-master3).  The rest of the vms on the IT vmware cluster can be retired this year and are not worth moving (this includes the completion of ns1 and ns2 to replace mv-buildproxy01).

Dustin J. Mitchell [:dustin] (he/him)

Comment 12

•

13 years ago

The replacement hardware hasn't been ordered yet.  We're also waiting on 7.0 to ship as per comment 10.

Amy Rich [:arr] [:arich]

Assignee

Updated

•

13 years ago

Assignee: zandr → arich

Amy Rich [:arr] [:arich]

Assignee

Comment 13

•

13 years ago

Hardware on order and tracked in the ordering spreadsheet.

Amy Rich [:arr] [:arich]

Assignee

Comment 14

•

13 years ago

Hardware has arrived and is sitting next to LOL.  Memory upgrades done to the two existing machines.  Need to rack, cable, and install the new machine.

Amy Rich [:arr] [:arich]

Assignee

Comment 15

•

13 years ago

New machine is installed and in the cluster, and cluster verify passes.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations

Bugzilla

Quick Search

increase capacity of mtv1 kvm cluster

Categories

(Infrastructure & Operations :: RelOps: General, task)

Tracking

(Not tracked)

People

(Reporter: arich, Assigned: arich)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Comment 15

Updated