Verify nagios monitoring on dm-ausstage01 & dp-ausstage01

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: nthomas, Assigned: jabba)

Tracking

Details

(Reporter)

Description

6 years ago
Between 3:30am and 5am PDT on the 17th I got cron mail about being out of space on dp-ausstage01.phx.mozilla.com:/opt, but can't see any nagios alerts for the same period. Please check what nagios monitoring is in place for that box and for dm-ausstage01.mozilla.org.
dp-nagios01 is a host in Phoenix that's managed by ops and has a completely different nagios server which releng doesn't have access to and which does not notify releng of anything as far as I know.  I'm ccing Rob and Justin since they've both been involved with that nagios server.
Assignee: server-ops-releng → arich
(Reporter)

Comment 2

6 years ago
I had checked the irc channel were IT nagios reports, and didn't see anything there, but I don't know if reports everything.
(Assignee)

Comment 3

6 years ago
Yep, that box never got more than a ping check when it got set up:

22:36 < jabba> nagios-sjc1: status dm-ausstage01:*
22:36 < nagios-sjc1> jabba: dm-ausstage01:avg load is OK: OK - load average: 0.05, 0.04, 
                     0.00
22:36 < nagios-sjc1> jabba: dm-ausstage01:disk - /opt is OK: DISK OK - free space: /opt 
                     49809 MB (69% inode=20%):
22:36 < nagios-sjc1> jabba: dm-ausstage01:PING is OK: PING OK - Packet loss = 0%, RTA = 
                     1.76 ms
22:36 < nagios-sjc1> jabba: dm-ausstage01:root partition is OK: DISK OK - free space: / 
                     18911 MB (52% inode=59%):
22:37 < jabba> nagios-phx1: status dp-ausstage01.phx:*
22:37 < nagios-phx1> jabba: dp-ausstage01.phx:PING is OK: PING OK - Packet loss = 0%, RTA 
                     = 0.76 ms



I can add monitoring to it, but it will involve me puppetizing it.
Assignee: arich → jdow
(Assignee)

Comment 4

6 years ago
I puppetized the host and added basic monitoring to it:

10:55 < jabba> nagios-phx1: status dp-ausstage01.phx:*
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:avg load has not yet been checked.
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:disk - /opt has not yet been checked.
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:PING is OK: PING OK - Packet loss = 0%, RTA = 1.27 ms
10:55 < nagios-phx1> jabba: dp-ausstage01.phx:root partition has not yet been checked.

It's set up the same as dm-ausstage01 (reports to #sysadmins)
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 5

6 years ago
Thanks!
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.