The current NTP check essentially runs ntpdate to an NTP server supplied as a check argument (it uses a source port other than 123 so that it can run simultaneously with ntpd). This has given bogus results in two ways: - the check argument is a hostname that is resolved on the client, so the check fails when DNS fails (this has been releng's most reliable indicator of DNS troubles, actually) - the check argument may not be the same as the server to which the host is synchronizing, and in fact we've seen both - the host unable to synchronize against the check argument, but fine with its configured server; and - the host unable to synchronize with its configured server, but fine with the check argument I think a better check would be to monitor the peers locally: [root@buildbot-master23 ~]# ntpq -n -c peers remote refid st t when poll reach delay offset jitter ============================================================================== *10.12.48.11 184.108.40.206 3 u 9 64 37 0.560 33.271 0.323 essentially, you'll want to make sure that at least one line has a *, and that the offset isn't too great. The sync process takes a long time, so we'll want to work out the "soft down" period appropriately to avoid false positives. I'm very unlikely to find time to hack on this soon, but would appreciate feedback on the idea. It's also not a horribly difficult script to write, if someone wants to try their hand (nudge nudge amy).
Assignee: jdow → ashish
Component: Server Operations → Server Operations: Infrastructure
QA Contact: cshields → jdow
We use check_ntp_time vs. check_ntp, which I _think_ used ntpdate. check_ntp_time mentions "It is independent of any commandline programs or external libraries". That said, nagiosplugins also ships with check_ntp_peer. So it'd be fairly trivial to implement: Notes: Use this plugin to check the health of an NTP server. It supports checking the offset with the sync peer, the jitter and stratum. This plugin will not check the clock offset between the local host and NTP server; please use check_ntp_time for that purpose.
We've agreed to update Nagios to use IP addresses for the NTP server instead of the hostname. Shall do on 08/20.
Status: NEW → ASSIGNED
$ svn commit mozilla/services.pp -m "Updating Time Sync check to query against NTP server IP instead of hostname - Bug 720074" Sending mozilla/services.pp Transmitting file data . Committed revision 45430.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.