Closed Bug 739227 Opened 12 years ago Closed 12 years ago

please create ganglia servers for two vlans in releng.scl3

Categories

(Infrastructure & Operations :: Infrastructure: Other, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Unassigned)

Details

After some architectural discussion with jabba, it makes sense to have a ganglia server on each vlan where we're collecting ganglia metrics so that we can use the check_ganglia nagios check to verify that data is correctly getting to the servers.

Please therefore create ganglia servers for vlan 275 (and migrate any ganglia collection from that vlan to the new server), and vlan 248 and migrate the puppet and buildbot clusters to that ganglia server.  

Please also let me know what the new server names are so that I can update the nagios checks.
Should be identical to ganglia1.dmz.releng.scl3.mozilla.com and name them:

ganglia1.private.releng.scl3.mozilla.com
ganglia1.srv.releng.scl3.mozilla.com
Component: Server Operations → Server Operations: Virtualization
QA Contact: phong → dparsons
Assignee: server-ops → mburns
ganglia1.dmz.releng.scl3: 2xCPU, 4GB RAM, 40GB HDD (via esx_sata_releng).

Should I use the releng CentOS5-x64-ref template for this, or the standard RHEL6 template?
Using the rhel6 template, like ganglia1.dmz.releng.scl3. Should be finished shortly.
ganglia1.private.releng.scl3:
2cpu, 4gig ram, 40gig hdd, vlan275 @ https://inventory.mozilla.org/en-US/systems/show/5758/

ganglia1.srv.releng.scl3:
2cpu, 4gig ram, 40gig hdd, vlan275 @ https://inventory.mozilla.org/en-US/systems/show/5761/
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee: mburns → server-ops-infra
Status: RESOLVED → REOPENED
Component: Server Operations: Virtualization → Server Operations: Infrastructure
QA Contact: dparsons → jdow
Resolution: FIXED → ---
Reopening since the servers aren't actually set up yet.
I went to boot the vms in vsphere and noticed that ganglia1.srv.releng.scl3 using the wrong network adapter (vlan275 vs vlan248).  I switched that, but it's still not sending a request to the dhcp server.  Since I don't know what the root pw is for the vm template, I can't log in and verify if it's strictly a network problem or what.  This will require more investigation on the vmware end.

ganglia1.private.releng.scl3.mozilla.com is up (though it doesn't appear to have gotten its hostname from dhcp for some reason).
we don't set hostnames via DHCP. Puppet sets them. I'm puppetizing ganglia1.private.releng.scl3 right now
ganglia1.srv.releng.scl3.mozilla.com is up as well.
Are we able to shut down the old vm in sjc1 now that this is complete?
No, the old vm will stay up until we move the vmware servers in sjc1.  It's still serving the hosts in sjc1.
(In reply to Amy Rich [:arich] [:arr] from comment #10)
> No, the old vm will stay up until we move the vmware servers in sjc1.  It's
> still serving the hosts in sjc1.

The vmware server that this is running on is going down in Monday's move - so we need to do something about this, migrate it to scl3 and make flows?
or move it onto bm-vmware* a la bug 739787
(In reply to Dustin J. Mitchell [:dustin] from comment #12)
> or move it onto bm-vmware* a la bug 739787

defeats the purpose, really..

Best I can tell, the bm-vmware nodes span 6 chassis. Moving those chassis will be blocked until they are vacated, meaning the bm-wmare have to be vacated, too.

That said, looks like the highest concentration are 1, 2, 4, 5 in the same chassis so if you consolidated to that we could pull the rest.

But, until then a lot of other moves are going to block on getting those chassis moved to create chassis space in SCL3.
The one in sjc1 has already been migrated to the releng vmware cluster, and that server ganglia is unrelated to these two.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.