[root@buildbot-master04 buildbot.d]# /usr/lib64/nagios/plugins/check_ntp_time -H 10.12.75.12 NTP CRITICAL: No response from NTP server [root@buildbot-master04 buildbot.d]# /usr/lib64/nagios/plugins/check_ntp_time -H 10.12.75.10 NTP OK: Offset 0.0001378059387 secs|offset=0.000138s;60.000000;120.000000; ntp.build.mozilla.org resolves to one of the two addresses above, so hosts sometimes try and sync to 10.12.75.12 and then are forever stuck in this state: [root@buildbot-master06 ~]# ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== 10.12.75.12 .INIT. 16 u - 64 0 0.000 0.000 0.000
I'm going to kick this up to general ops, since i suspect this is controlled by puppet. I'm wondering if ns1 and ns2 should be using border1.sj and border2.sj to obtain time from at all. [root@ns2 ~]# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== border1.sj.mozi .INIT. 16 u - 64 0 0.000 0.000 0.000 border2.sj.mozi .INIT. 16 u - 64 0 0.000 0.000 0.000 mradm01.mozilla 10.2.74.5 3 u 19 64 7 5.963 1.927 9.847 LOCAL(0) .LOCL. 10 l 16 64 7 0.000 0.000 0.001 [root@ns1 ~]# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== border1.sj.mozi .STEP. 16 u - 1024 0 0.000 0.000 0.000 border2.sj.mozi .STEP. 16 u - 1024 0 0.000 0.000 0.000 *mradm01.mozilla 10.2.74.5 3 u 48 1024 377 3.125 0.579 0.842 LOCAL(0) .LOCL. 10 l 40 64 377 0.000 0.000 0.0 Any of our machines that try to use ns2 for ntp sit and wait in INIT.
Assignee: server-ops-releng → server-ops
I looked through the infra puppet configs and it seems as though we're modifying /etc/ntp/step-tickers but not /etc/ntp.conf itself anymore (that section is commented out). step-tickers is fine when ntp is starting and doing an ntpdate to perform its initial sync, but it doesn't dictate what peers the machine connects to to keep in sync once ntp has started. I've added a section to puppet/trunk//manifests/classes/ntp.pp to specify the scl1 core1 and core2 machiens as step-tickers and I've manually modified /etc/ntp.conf on ns1 and ns2.infra.scl1 to also point at core1/core2. The build machines will continue to point at ntp.build.mozilla.org. I think in general it might still be a better idea to have build machines include more than one ntp server in their ntp.conf, though (and if we do that with IPs, we can hand it out via dhcp).
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.