Closed Bug 729090 Opened 12 years ago Closed 12 years ago

please rename and reinstall talos-r4-lion-083 and talos-r4-snow-081 as mac-signing1 & 2

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: bhearsum, Assigned: dividehex)

References

Details

I'll be puppetizing these in bug 729077, but let's start with a fresh OS.
This should be a fresh install of 10.6.8 from media, since I don't think we have a vanilla 10.6.8 install in DS (we should create one after doing this install).
Assignee: server-ops-releng → jwatkins
colo-trip: --- → scl1
bhearsum: are you tied to these particular hosts?  It would make life logistically easier if it was talos-r4-lion-083 and talos-r4-snow-081 since they're already sharing a chassis.
(In reply to Amy Rich [:arich] [:arr] from comment #2)
> bhearsum: are you tied to these particular hosts?  It would make life
> logistically easier if it was talos-r4-lion-083 and talos-r4-snow-081 since
> they're already sharing a chassis.

Per IRC, no strong objection. snow 081 is already loaned to dividehex, and lion 083 is now offline and ready for re-imaging.
Summary: please rename and reinstall talos-r4-{lion,snow}-085 as mac-signing1 & 2 → please rename and reinstall talos-r4-lion-083 and talos-r4-snow-081 as mac-signing1 & 2
Inventory(dhcp) has been updated along with dns.
I installed OSX Snow Leopard (10.6.8) from an install disk and took a clean image of it.  Both mac-signing1 and mac-signing2 have this clean image now.
talos-r4-lion-083 and talos-r4-snow-081 have been disabled in nagios.
:bhearsum: Can you tell us what type of nagios checks you would like to have on mac-signing1 and mac-signing2?
Can we have the same ones that signing1 & signing2 have? Is it possible to have ganglia on these too?
bhearsum: ganglia is something that releng adds/configures with puppet. NRPE as well (which we'll need to add any nagios checks beyond ping).  You might want to coordinate with jhford on this since he's done some of the nagios work for the r5 builders.

FYI: since signing1/2 are linux boxes, I don't think they'll wind up having exactly the same set of checks.

Are there any checks that we should add to both the linux and the mac signing hosts as far as processes, etc?  Right now the linux signers don't have any service-specific checks at all.
(In reply to Amy Rich [:arich] [:arr] from comment #9)
> bhearsum: ganglia is something that releng adds/configures with puppet. NRPE
> as well (which we'll need to add any nagios checks beyond ping).  You might
> want to coordinate with jhford on this since he's done some of the nagios
> work for the r5 builders.

Yeah, I'll have to figure out how to deploy Ganglia to a Mac. I wasn't sure if there was some stuff that IT needed to do server side to make that work though.

> FYI: since signing1/2 are linux boxes, I don't think they'll wind up having
> exactly the same set of checks.

No? I see Ganglia IO, NTP Time, PING, Swap, avg load, disk space, and "signing-server" on the Linux signers. I can't think of any from that list we don't want on the Mac machines too...

> Are there any checks that we should add to both the linux and the mac
> signing hosts as far as processes, etc?  Right now the linux signers don't
> have any service-specific checks at all.

I think they do actually....https://nagios.mozilla.org/nagios/cgi-bin/extinfo.cgi?type=2&host=signing1.build.scl1&service=signing1.build.scl1+-+signing-server

We'll want the same check for the Mac machines - they'll be running similar instances.
I know there was a bug about ganglia on the mac for the foopies (or was it nrpe?), but I think that got killed as a wontfix.  For ganglia, since we're using multicast, there's nothing to be done on the server end.  gmond on the client just starts broadcasting to the correct IP and it magically shows up on the server.

Oh, I missed the service check!  Funny, since I set it up.  Har har.
As far as the generic checks, what I meant was that they may or may not be available as part of the default Mac NRPE plugins (what we do for mac builders doesn't include a few of the ones we do for linux).  We'll work together to see what's available.
(In reply to Amy Rich [:arich] [:arr] from comment #11)
> I know there was a bug about ganglia on the mac for the foopies (or was it
> nrpe?), but I think that got killed as a wontfix.  For ganglia, since we're
> using multicast, there's nothing to be done on the server end.  gmond on the
> client just starts broadcasting to the correct IP and it magically shows up
> on the server.

Alright...I'll see what can I figure out with Ganglia. I hope we can have some sort of disk/cpu/memory/etc. monitoring on them, but it's not the end of the world if we can't.

> Oh, I missed the service check!  Funny, since I set it up.  Har har.
> As far as the generic checks, what I meant was that they may or may not be
> available as part of the default Mac NRPE plugins (what we do for mac
> builders doesn't include a few of the ones we do for linux).  We'll work
> together to see what's available.

Aaaaaaah, OK. Makes sense.
mac-signing1 is not responding to pings and mac-signing2 is not accessible via ssh or vnc.  Both of these servers need to be restarted.

Once this is done. I will be able to install nrpe and finish adding the nagios checks to them.
I enabled both ssh and vnc since neither host had them turned on.  Also rebooted mac-signing1.
These hosts have been turned.  Nagios checks have been added minus Ganglia.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Thank you Jake!
Status: RESOLVED → VERIFIED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.