Closed Bug 1051558 Opened 10 years ago Closed 10 years ago

mac-v2-signing3.srv.releng.scl3.mozilla.com has ntp troubles

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: dustin)

References

Details

Attachments

(1 file)

Whatever it was that nthomas said he did that caused nagios to shut up for a while has fallen out of my scrollback, but

<nagios-releng>: Sat 21:10:03 PDT [4166] mac-v2-signing3.srv.releng.scl3.mozilla.com:ntp time is WARNING: NTP WARNING: Offset 7.954523087 secs (http://m.mozilla.org/ntp+time)

every five minutes.
Actually I haven't handled a ntp issue for these new hosts, so dunno what's up here. Looks like puppet hasn't run in a few days on this or mac-v2-signing1, and no reports at all in foreman for mac-v2-signing2.
mac-v2-signing2 isn't up yet - still waiting for ben to finish with that host.

running
  launchctl load /Library/LaunchDaemons/com.mozilla.puppet.plist
seems to put puppet back into the list of running launchd services -- I wonder why it's not there to start with.

So, both NTP and Puppet may need some investigation on Mavericks.
Assignee: nobody → relops
Component: General Automation → RelOps: Puppet
Product: Release Engineering → Infrastructure & Operations
QA Contact: catlee → dustin
Version: unspecified → other
Assignee: relops → dustin
Puppet appears to be starting a lot on this host, too.

        <key>StartInterval</key>
        <integer>1800</integer>

but maybe Apple changed that to be in milliseconds.  It'd be just like them..
I can't reproduce this now, with the fixes in bug 1050513.  I'm going to hope this is a consequence of something there (probably the missing administrator user).
Depends on: 1050513
(In reply to Phil Ringnalda (:philor) from comment #0)
> Whatever it was that nthomas said he did that caused nagios to shut up for a
> while has fallen out of my scrollback, but
> 
> <nagios-releng>: Sat 21:10:03 PDT [4166]
> mac-v2-signing3.srv.releng.scl3.mozilla.com:ntp time is WARNING: NTP
> WARNING: Offset 7.954523087 secs (http://m.mozilla.org/ntp+time)
> 
> every five minutes.

I just kill the ntpd process when I see this. It starts back up automatically and the offset disappears.
Sure enough:

[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# ntpdate -q time
server 63.245.214.135, stratum 1, offset 2.968165, delay 0.02620
server 63.245.214.136, stratum 1, offset 2.968192, delay 0.02649
12 Aug 13:24:04 ntpdate[56615]: step time server 63.245.214.135 offset 2.968165 sec
[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# ps auxww | grep ntpd
root           47857   0.0  0.0  2446572   1320   ??  Ss    1:46PM   0:00.12 /usr/sbin/ntpd -c /private/etc/ntp-restrict.conf -n -g -p /var/run/ntpd.pid -f /var/db/ntp.drift
root           56617   0.0  0.0  2423368    284 s000  R+    1:24PM   0:00.00 grep ntpd

[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# grep ntp /var/log/system.log
Aug 12 02:17:15 mac-v2-signing3.srv.releng.scl3.mozilla.com ntpd[47857]: SYNC state ignoring +1.131017 s
Aug 12 02:57:54 mac-v2-signing3.srv.releng.scl3.mozilla.com ntpd[47857]: ntpd: time set +1.428719 s
Aug 12 05:52:43 mac-v2-signing3.srv.releng.scl3.mozilla.com ntpd[47857]: SYNC state ignoring +1.278912 s
Aug 12 06:38:27 mac-v2-signing3.srv.releng.scl3.mozilla.com ntpd[47857]: ntpd: time set +1.613630 s

[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# systemsetup -getnetworktimeserver
Network Time Server: time.mozilla.org

[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# ntpq
ntpq> assoc
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 59561  9024   yes   yes  none    reject   reachable  2
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 time2.scl3.mozi .CDMA.           1 u 1073   64  357  106.230  2944.05 1926.68

..a little googling later..

https://discussions.apple.com/thread/5604114

So, Apple broke ntpd and didn't update manpages, replacing its clock adjustment with a broken tool called pacemaker.  Somehow that doesn't surprise me.

[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# ps auxww | grep pace
root            6010   0.0  0.0  2447228    596   ??  Ss   Thu12PM   0:41.26 /usr/libexec/pacemaker -b -e 0.0001 -a 10

It looks like the best solution is to build a custom ntpd and then shoot pacemaker in the head.
Hah, Coop I don't know if that was general advice or if you were aware of the this particular Mavericky suckage, but you're absolutely right:

[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# killall ntpd
[root@mac-v2-signing3.srv.releng.scl3.mozilla.com ~]# ntpdate -q time
server 63.245.214.136, stratum 1, offset -0.000193, delay 0.02641
server 63.245.214.135, stratum 1, offset -0.000167, delay 0.02625
12 Aug 13:54:58 ntpdate[68656]: adjust time server 63.245.214.135 offset -0.000167 sec

Jake found http://www.atmythoughts.com/living-in-a-tech-family-blog/2014/2/28/what-time-is-it which is a much nicer description of the problem, and the first comment says 

---
I settled on a different solution: I created a LaunchDaemon plist that runs "/usr/bin/killall ntpd" once every hour. As you wrote, ntpd will start back up since it's a daemon, and when it does, it seems to correctly update the system time.
---

Since this whole operating system is a hack, that seems like a very appropriate fix.  And way easier than building a new package!
Attached patch bug1051558.patchSplinter Review
Will stepping the clock a second or two in either direction every hour hurt signing operations?

Also, do you have any thoughts on how to get word to Apple that this is broken?
Attachment #8471860 - Flags: review?(bhearsum)
Comment on attachment 8471860 [details] [diff] [review]
bug1051558.patch

I don't think this should harm signing in any way...
Attachment #8471860 - Flags: review?(bhearsum) → review+
Attachment #8471860 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: