Closed Bug 765966 Opened 12 years ago Closed 12 years ago

relabs-puppet.build.mtv1.mozilla.com appears to be down.

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Unassigned)

Details

The puppetAgain puppetmaster, relabs-puppet.build.mtv1.mozilla.com is unreachable for me.

I don't *think* anything production in mtv1 needs it, but i do need this host for my bringing up of linux-foopy so would appreciate a quick grab.

CC'ed :arr and :dustin since I haven't seen (and can't find) any nagios alerts identifying it as down, and I certainly can't ping. We should surely get this monitored.

nthomas claims that "inventory has kvm and ganeti references in it"

For reference I never double checked that this host was up after the Sunday mtv1 colo work, I don't think bear did either, and it wasn't until just now that I wanted to log into it to test a puppet change I was doing in my environment.
Inventory : https://inventory.mozilla.org/systems/show/3822/

Marked as temp, so I'm not sure what the status of this machine is.
colo-trip: --- → mtv1
It's a dev/labs/sandbox machine with no production impact.  We can bring it back up next week if someone doesn't get to it before then.  puppetagain should fail over to the server in scl1 if the one in relabs is down.
It is temporary, it shouldn't be monitoried, and it's on the relabs cluster which is still down due to the mtv1 maintenance.  Your hosts should be able to work against another master.  We should also get the relabsXX.build.mtv1.mozilla.com hosts back up, and given that all of the relops people are in Berlin, that remains with the SRE team.

We do need to set up a real master in mtv1, but it's not a high priority right now.

Leaving in Server Ops to bring back up  relabs{01..08}.build.mtv1.mozilla.com.  Currently, the hosts are not pingable.  THe iLOs are pingable, but none respond on tcp/443.  These are relops labs machines, so are not causing a production outage, but among other things they're blocking our ability to do demos at our offsite, so getting this fixed today would be great.
(In reply to Dustin J. Mitchell [:dustin] from comment #3)
> It is temporary, it shouldn't be monitoried, and it's on the relabs cluster
> which is still down due to the mtv1 maintenance. 

Figured it was down due to mtv1 maintenance, but SRE's couldn't find any doc on "where" this was located. And nor did I notice it was down while the SRE's were in mtv1 on the extended sunday downtime *because* it wasn't monitored. IMO anything being used should be monitored even if its a "non crit, we'll get to it when we can" ack when something happens.

> Your hosts should be able to work against another master.

They are not, see private e-mail I sent you, with more details.

> We should also get the
> relabsXX.build.mtv1.mozilla.com hosts back up, and given that all of the
> relops people are in Berlin, that remains with the SRE team.

Sure, didn't actually expect one of you around to do so, just that SRE's couldn't find info, and we don't have (findable by me or them) docs on it. Thanks for giving a "where to look"

> We do need to set up a real master in mtv1, but it's not a high priority
> right now.

If this master is temporary and non-monitorable, then a real master is a priority [imo, not confirmed with rest of releng] (just can happen shortly after Q2) since we want to migrate to linux foopies soon, and mtv1 is the only place we can host tegras atm... Which is the work that this being down is blocking on my end.
relabs{01..08}.build.mtv1.mozilla.com are back up.
However, I don't know where  relabs-puppet.build.mtv1 is hosted, that's still down.
OK, so after digging I think I discovered that relabs-puppet.build.mtv1 is hosted on kvm-relabs.build.mtv1 (which is nowhere to be found in the inventory!).
The kvm node appears to be up, but my key is not there and the root passwords from the gpg files don't work.
Yeah, kvm-relabs is a floating IP for the KVM cluster in relabs, so no inventory entry.  I'll get the VMs back up.
It's back!

Callek, your test box was pretty severely messed up -- I'm not sure what you did to /var/lib/puppet/ssl?  Anyway, re-running puppetize.sh fixed it.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.