Closed Bug 792042 Opened 12 years ago Closed 12 years ago

audit dns and nagios for ad.mozilla.com and releng.ad.mozilla.com domain controllers

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: dustin)

Details

I noticed that there were a number of missing/misconfigured nagios bits for the domain controllers and then dug deeper and also found missing DNS information for them as well.  Need to do an audit to make sure that all of the correct information is in place.
Fixed:

* missing PTR records for dc3 and dc8
* nagios for dc1 incorrectly part of the config at admin1.infra.scl1.mozilla.com.  moved to scl3 nagios config in puppet.
* no nagios service checks for dc8, added.

Still needs fixing:
* no nagios checks for dc3.  added to scl3 nagios config in puppet, but all but ping are failing.  likely nrpe is not installed or is not allowed through the host or router firewall.
* no nagios service checks for dc9.  added, but windows services failing.
* all hosts are missing NIC information in inventory some may be labeled building when they're actually production.
* missing SRV records for dc4
* no nagios checks for dc4, added to phx1, but all are failing including ping. likely nrpe is not installed or is not allowed through the host or router firewall and the host is down.

Is dc4 installed at all?
Assignee: server-ops-releng → dustin
(In reply to Amy Rich [:arich] [:arr] from comment #1)
> * no nagios checks for dc3.  added to scl3 nagios config in puppet, but all
> but ping are failing.  likely nrpe is not installed or is not allowed
> through the host or router firewall.

I installed nsclient++ on dc3, and nc -vz works fine on port 5666 when it's running (and not when it's not running), but it's not giving data back to nagios:
  CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages

> * missing SRV records for dc4
> * no nagios checks for dc4, added to phx1, but all are failing including
> ping. likely nrpe is not installed or is not allowed through the host or
> router firewall and the host is down.
> 
> Is dc4 installed at all?

Nope, AFAIK.
Opening the firewall completely on dc3 didn't help.
(In reply to Amy Rich [:arich] [:arr] from comment #1)
> * no nagios service checks for dc9.  added, but windows services failing.

I removed the "Windows Services" check, since IMHO it's bogus and even if it worked it wouldn't tell us much of use.  See bug 792464.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.