Closed Bug 894201 Opened 12 years ago Closed 12 years ago

Time drifting on buildbot-master81.srv.releng.scl3

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

x86
All
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: dustin)

Details

Attachments

(2 files)

eg Mon 19:35:14 PDT [490] buildbot-master81.srv.releng.scl3.mozilla.com:ntp time is WARNING: NTP WARNING: Offset 32.10488713 secs vmtoolsd is running but the VM config magic to sync to the host clock may be missing. catlee has fixed it up for the moment. Since this is a critical component in scheduling buildbot jobs please talk to catlee or #releng before making any changes. We should also check buildbot-master82 through 90 (??) which are also ESX VMs in scl3.
These guys were the ones that were rekicked off the templates in 867593. You should add clocksource=pit to the kernel command line in grub.conf and reboot to pick it up at your earliest convenience.
Thanks, lets move this back to one of our components. RelEng, looks like we're not managing grub.conf in puppetagain.
Assignee: server-ops-virtualization → nobody
Severity: normal → critical
Component: Server Operations: Virtualization → Release Engineering: Machine Management
QA Contact: dparsons → armenzg
Thanks, Greg! Who can pick this up?
Component: Release Engineering: Machine Management → RelOps: Puppet
Product: mozilla.org → Infrastructure & Operations
QA Contact: armenzg → dustin
I can probably fix this today.
Assignee: nobody → dustin
Attached patch bug894201.patchSplinter Review
Attachment #776458 - Flags: review?(nthomas)
Comment on attachment 776458 [details] [diff] [review] bug894201.patch Passing this over to Rail, who will have more background on it. One query though, will this affect non-ESX machines too ? Perhaps a condition like if ($::virtual != "vmware") { is more specific. That's from http://hg.mozilla.org/build/puppet/file/826ad3caf69e/modules/ntp/manifests/daemon.pp#l7 is
Attachment #776458 - Flags: review?(nthomas) → review?(rail)
Comment on attachment 776458 [details] [diff] [review] bug894201.patch (In reply to Nick Thomas [:nthomas] from comment #6) > Comment on attachment 776458 [details] [diff] [review] > bug894201.patch > > Passing this over to Rail, who will have more background on it. One query > though, will this affect non-ESX machines too ? Perhaps a condition like > if ($::virtual != "vmware") { > is more specific. That's from > http://hg.mozilla.org/build/puppet/file/826ad3caf69e/modules/ntp/manifests/ > daemon.pp#l7 is http://hg.mozilla.org/build/puppet/file/826ad3caf69e/modules/hardware/manifests/init.pp#l35 checks for that already.
Attachment #776458 - Flags: review?(rail) → review+
Attachment #776458 - Flags: checked-in+
I rebooted buildbot-master{82..89}. Catlee's going to reboot 81 now, since it's the scheduler master.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
This is manifesting again on bm81....
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
install ntp
(In reply to Dan Parsons [:lerxst] from comment #11) > install ntp Because Lerxst didn't explain :(... Per IRC I was explaining why we don't have ntp installed/running in VMWare, because VMWare breaks assumptions of ntp and it doesn't help us. Lerxst said thats wrong "because we have it everywhere" (I asked him to comment here hoping he'd explain why that's wrong and why he recommended installing ntp though). I don't feel comfortable making a one-off change without the "why" understood by people closer to the issue though.
[18:13:08] atoll http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427 [18:13:12] atoll > Note: VMware recommends you to use NTP instead of VMware Tools periodic time synchronization. NTP is an industry standard and ensures accurate timekeeping in your guest. You may have to open the firewall (UDP 123) to allow NTP traffic. [18:13:32] atoll > The configuration directive tinker panic 0 instructs NTP not to give up if it sees a large jump in time. [18:14:00] atoll Callek: please tell "multiple people" to confirm their assumptions with the virtualization team, since they may have out-of-date/inaccurate information about NTP. :dustin, :gcox (others) what say you?
Flags: needinfo?(gcox)
Flags: needinfo?(dustin)
Pretty sure I remember ntpd running when I looked earlier, but I could be mistaken since it's chkconfig'ed off on 81. It's on on other VMs like admin1a/admin1b/ns1/nagios1.private.releng, so, you've shirley got precedent. Config it and fire it up.
Flags: needinfo?(gcox)
I ran 'ntpdate ntp.build.mozilla.org' in the meantime.
Here's the VMWare doc to back up lerxst's comment: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427 buildbot-master81 is RHEL 6.2, 64bit, SMP w/ processors. The output of 'vmware-toolbox-cmd timesync status' is 'Disabled'. Presumably 82 through 89 are the same. If the restriction at http://mxr.mozilla.org/build/source/puppet/modules/ntp/manifests/daemon.pp#7 is removed it looks like we'll get http://mxr.mozilla.org/build/source/puppet/modules/ntp/templates/ntp.conf.erb which lines up pretty closely to what VMWare recommend. I think we just need to figure out if any other machines would also be affected by that change, and how to remove the clock=pit.
I'm certainly not an expert here, but I'll be happy to review a patch. We just *added* clocksource=pit, so now we'll remove it??
Flags: needinfo?(dustin)
OK, so I have it on good authority that we should have *both* clocksource=pit and be running ntp.
Attachment #779863 - Flags: review?(rail)
Attachment #779863 - Flags: review?(rail) → review+
ntpstat exists and reports offsets of 30-70ms for buildbot-master81 through to 89. Just the docs at https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/ntp to update now ?
Thanks for the reminder on that :)
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: