Closed
Bug 768746
Opened 13 years ago
Closed 13 years ago
Add nagios checks for all new windows infrastructure machines
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mlarrain, Assigned: mlarrain)
Details
I have installed the nagios client on dc7.releng.ad.mozilla.com [10.12.69.19]
and I need assistance setting up the server side checks for this machine. They can mirror the same checks that dc01.winbuild.scl1.mozilla.com [10.12.40.13] have setup.
Updated•13 years ago
|
Assignee: server-ops-releng → mlarrain
Comment 1•13 years ago
|
||
The client and server side checks need to be added for:
dc1.ad.mozilla.com
dc2.ad.mozilla.com
dc6.releng.ad.mozilla.com
dc7.releng.ad.mozilla.com
wds1.releng.ad.mozilla.com
(any others I'm missing?)
Summary: Setup server side checks for dc7.releng.ad → Add nagios checks for all new windows infrastructure machines
Updated•13 years ago
|
Assignee: mlarrain → dustin
Comment 2•13 years ago
|
||
storage1.releng.ad.mozilla.com
Comment 3•13 years ago
|
||
OK, these are in (not storage1 - turns out we'll be killing it).
However, DNS isn't working yet, so nagios isn't monitoring most of them.
| Assignee | ||
Comment 4•13 years ago
|
||
storage1 has been killed off there is also kms1.ad.mozilla.com & kms2.ad.mozilla.com that will be getting configured to be the kms and wsus servers.
Comment 5•13 years ago
|
||
More accurately, dc6, dc7, and wds1 are added to admin1 for monitoring in releng.
I just added dc1, dc2, and kms1 to nagios via puppet, all in the appropriate DC's.
So far:
18:19 < nagios-releng-scl1> [18] dc7.releng.ad:disk - C is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
18:22 < nagios-releng-scl1> [19] wds1.releng.ad:disk - C is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
and dc6 was already red in the web UI, so I downtimed it too.
I'll leave those fixes up to Matt.
Assignee: dustin → mlarrain
Comment 6•13 years ago
|
||
kms1, dc1, and dc2 are failing their NRPE checks too.
Comment 7•13 years ago
|
||
Also, those paged infra oncall.
Can this be fixed? Let them page Matt only or just alert in #somechannel?
Thanks!
Comment 8•13 years ago
|
||
They are supposed to page oncall, although these particular alerts were bogus. Matt *also* gets paged. As this windows forest comes online, Matt will be working with oncall to make sure y'all can solve the problems that come up.
These all got fixed on Friday, or they'd have been paging all weekend..
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•