Closed Bug 720074 Opened 12 years ago Closed 12 years ago

Fix NTP nagios check to look at ntpq peers, rather than running an ntpdate

Categories

(Infrastructure & Operations :: Infrastructure: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: ashish)

Details

(Whiteboard: [8/20])

The current NTP check essentially runs ntpdate to an NTP server supplied as a check argument (it uses a source port other than 123 so that it can run simultaneously with ntpd).

This has given bogus results in two ways:

 - the check argument is a hostname that is resolved on the client, so the check fails when DNS fails (this has been releng's most reliable indicator of DNS troubles, actually)

 - the check argument may not be the same as the server to which the host is synchronizing, and in fact we've seen both
   - the host unable to synchronize against the check argument, but fine with its configured server; and
   - the host unable to synchronize with its configured server, but fine with the check argument

I think a better check would be to monitor the peers locally:

[root@buildbot-master23 ~]# ntpq -n -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*10.12.48.11     38.229.71.1      3 u    9   64   37    0.560   33.271   0.323

essentially, you'll want to make sure that at least one line has a *, and that the offset isn't too great.  The sync process takes a long time, so we'll want to work out the "soft down" period appropriately to avoid false positives.

I'm very unlikely to find time to hack on this soon, but would appreciate feedback on the idea.  It's also not a horribly difficult script to write, if someone wants to try their hand (nudge nudge amy).
Assignee: dustin → jdow
Assignee: jdow → ashish
Component: Server Operations → Server Operations: Infrastructure
QA Contact: cshields → jdow
We use check_ntp_time vs. check_ntp, which I _think_ used ntpdate. check_ntp_time mentions "It is independent of any commandline programs or external libraries". That said, nagiosplugins also ships with check_ntp_peer. So it'd be fairly trivial to implement:

Notes:
 Use this plugin to check the health of an NTP server. It supports
 checking the offset with the sync peer, the jitter and stratum. This
 plugin will not check the clock offset between the local host and NTP
 server; please use check_ntp_time for that purpose.
We've agreed to update Nagios to use IP addresses for the NTP server instead of the hostname. Shall do on 08/20.
Status: NEW → ASSIGNED
Whiteboard: [8/20]
$ svn commit mozilla/services.pp -m "Updating Time Sync check to query against NTP server IP instead of hostname - Bug 720074"
Sending        mozilla/services.pp
Transmitting file data .
Committed revision 45430.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.