Verify nagios monitoring on dm-ausstage01 & dp-ausstage01



8 years ago
6 years ago


(Reporter: nthomas, Assigned: jabba)





8 years ago
Between 3:30am and 5am PDT on the 17th I got cron mail about being out of space on, but can't see any nagios alerts for the same period. Please check what nagios monitoring is in place for that box and for
dp-nagios01 is a host in Phoenix that's managed by ops and has a completely different nagios server which releng doesn't have access to and which does not notify releng of anything as far as I know.  I'm ccing Rob and Justin since they've both been involved with that nagios server.
Assignee: server-ops-releng → arich

Comment 2

8 years ago
I had checked the irc channel were IT nagios reports, and didn't see anything there, but I don't know if reports everything.

Comment 3

8 years ago
Yep, that box never got more than a ping check when it got set up:

22:36 < jabba> nagios-sjc1: status dm-ausstage01:*
22:36 < nagios-sjc1> jabba: dm-ausstage01:avg load is OK: OK - load average: 0.05, 0.04, 
22:36 < nagios-sjc1> jabba: dm-ausstage01:disk - /opt is OK: DISK OK - free space: /opt 
                     49809 MB (69% inode=20%):
22:36 < nagios-sjc1> jabba: dm-ausstage01:PING is OK: PING OK - Packet loss = 0%, RTA = 
                     1.76 ms
22:36 < nagios-sjc1> jabba: dm-ausstage01:root partition is OK: DISK OK - free space: / 
                     18911 MB (52% inode=59%):
22:37 < jabba> nagios-phx1: status dp-ausstage01.phx:*
22:37 < nagios-phx1> jabba: dp-ausstage01.phx:PING is OK: PING OK - Packet loss = 0%, RTA 
                     = 0.76 ms

I can add monitoring to it, but it will involve me puppetizing it.
Assignee: arich → jdow

Comment 4

8 years ago
I puppetized the host and added basic monitoring to it:

10:55 < jabba> nagios-phx1: status dp-ausstage01.phx:*
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:avg load has not yet been checked.
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:disk - /opt has not yet been checked.
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:PING is OK: PING OK - Packet loss = 0%, RTA = 1.27 ms
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:root partition has not yet been checked.

It's set up the same as dm-ausstage01 (reports to #sysadmins)
Last Resolved: 8 years ago
Resolution: --- → FIXED

Comment 5

8 years ago
Component: Server Operations: RelEng → RelOps
Product: → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.