Closed
Bug 894201
Opened 12 years ago
Closed 12 years ago
Time drifting on buildbot-master81.srv.releng.scl3
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: dustin)
Details
Attachments
(2 files)
|
1.02 KB,
patch
|
rail
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
|
3.67 KB,
patch
|
rail
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
eg
Mon 19:35:14 PDT [490] buildbot-master81.srv.releng.scl3.mozilla.com:ntp time is WARNING: NTP WARNING: Offset 32.10488713 secs
vmtoolsd is running but the VM config magic to sync to the host clock may be missing.
catlee has fixed it up for the moment. Since this is a critical component in scheduling buildbot jobs please talk to catlee or #releng before making any changes. We should also check buildbot-master82 through 90 (??) which are also ESX VMs in scl3.
Comment 1•12 years ago
|
||
These guys were the ones that were rekicked off the templates in 867593.
You should add clocksource=pit to the kernel command line in grub.conf and reboot to pick it up at your earliest convenience.
| Reporter | ||
Comment 2•12 years ago
|
||
Thanks, lets move this back to one of our components.
RelEng, looks like we're not managing grub.conf in puppetagain.
Assignee: server-ops-virtualization → nobody
Severity: normal → critical
Component: Server Operations: Virtualization → Release Engineering: Machine Management
QA Contact: dparsons → armenzg
| Assignee | ||
Comment 3•12 years ago
|
||
Thanks, Greg! Who can pick this up?
Component: Release Engineering: Machine Management → RelOps: Puppet
Product: mozilla.org → Infrastructure & Operations
QA Contact: armenzg → dustin
| Assignee | ||
Comment 5•12 years ago
|
||
Attachment #776458 -
Flags: review?(nthomas)
| Reporter | ||
Comment 6•12 years ago
|
||
Comment on attachment 776458 [details] [diff] [review]
bug894201.patch
Passing this over to Rail, who will have more background on it. One query though, will this affect non-ESX machines too ? Perhaps a condition like
if ($::virtual != "vmware") {
is more specific. That's from http://hg.mozilla.org/build/puppet/file/826ad3caf69e/modules/ntp/manifests/daemon.pp#l7 is
Attachment #776458 -
Flags: review?(nthomas) → review?(rail)
| Assignee | ||
Comment 7•12 years ago
|
||
Comment on attachment 776458 [details] [diff] [review]
bug894201.patch
That condition is just a little farther up in the original:
https://github.com/djmitche/releng-puppet/blob/master/modules/hardware/manifests/init.pp#L35
Comment 8•12 years ago
|
||
Comment on attachment 776458 [details] [diff] [review]
bug894201.patch
(In reply to Nick Thomas [:nthomas] from comment #6)
> Comment on attachment 776458 [details] [diff] [review]
> bug894201.patch
>
> Passing this over to Rail, who will have more background on it. One query
> though, will this affect non-ESX machines too ? Perhaps a condition like
> if ($::virtual != "vmware") {
> is more specific. That's from
> http://hg.mozilla.org/build/puppet/file/826ad3caf69e/modules/ntp/manifests/
> daemon.pp#l7 is
http://hg.mozilla.org/build/puppet/file/826ad3caf69e/modules/hardware/manifests/init.pp#l35 checks for that already.
Attachment #776458 -
Flags: review?(rail) → review+
| Assignee | ||
Updated•12 years ago
|
Attachment #776458 -
Flags: checked-in+
| Assignee | ||
Comment 9•12 years ago
|
||
I rebooted buildbot-master{82..89}. Catlee's going to reboot 81 now, since it's the scheduler master.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 10•12 years ago
|
||
This is manifesting again on bm81....
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 11•12 years ago
|
||
install ntp
Comment 12•12 years ago
|
||
(In reply to Dan Parsons [:lerxst] from comment #11)
> install ntp
Because Lerxst didn't explain :(...
Per IRC I was explaining why we don't have ntp installed/running in VMWare, because VMWare breaks assumptions of ntp and it doesn't help us. Lerxst said thats wrong "because we have it everywhere" (I asked him to comment here hoping he'd explain why that's wrong and why he recommended installing ntp though).
I don't feel comfortable making a one-off change without the "why" understood by people closer to the issue though.
Comment 13•12 years ago
|
||
[18:13:08] atoll http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
[18:13:12] atoll > Note: VMware recommends you to use NTP instead of VMware Tools periodic time synchronization. NTP is an industry standard and ensures accurate timekeeping in your guest. You may have to open the firewall (UDP 123) to allow NTP traffic.
[18:13:32] atoll > The configuration directive tinker panic 0 instructs NTP not to give up if it sees a large jump in time.
[18:14:00] atoll Callek: please tell "multiple people" to confirm their assumptions with the virtualization team, since they may have out-of-date/inaccurate information about NTP.
:dustin, :gcox (others) what say you?
Flags: needinfo?(gcox)
Flags: needinfo?(dustin)
Comment 14•12 years ago
|
||
Pretty sure I remember ntpd running when I looked earlier, but I could be mistaken since it's chkconfig'ed off on 81.
It's on on other VMs like admin1a/admin1b/ns1/nagios1.private.releng, so, you've shirley got precedent. Config it and fire it up.
Flags: needinfo?(gcox)
| Reporter | ||
Comment 15•12 years ago
|
||
I ran 'ntpdate ntp.build.mozilla.org' in the meantime.
| Reporter | ||
Comment 16•12 years ago
|
||
Here's the VMWare doc to back up lerxst's comment:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
buildbot-master81 is RHEL 6.2, 64bit, SMP w/ processors. The output of 'vmware-toolbox-cmd timesync status' is 'Disabled'. Presumably 82 through 89 are the same.
If the restriction at
http://mxr.mozilla.org/build/source/puppet/modules/ntp/manifests/daemon.pp#7
is removed it looks like we'll get
http://mxr.mozilla.org/build/source/puppet/modules/ntp/templates/ntp.conf.erb
which lines up pretty closely to what VMWare recommend. I think we just need to figure out if any other machines would also be affected by that change, and how to remove the clock=pit.
| Assignee | ||
Comment 17•12 years ago
|
||
I'm certainly not an expert here, but I'll be happy to review a patch. We just *added* clocksource=pit, so now we'll remove it??
Flags: needinfo?(dustin)
| Assignee | ||
Comment 18•12 years ago
|
||
OK, so I have it on good authority that we should have *both* clocksource=pit and be running ntp.
| Assignee | ||
Comment 19•12 years ago
|
||
Attachment #779863 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #779863 -
Flags: review?(rail) → review+
| Assignee | ||
Comment 20•12 years ago
|
||
Comment on attachment 779863 [details] [diff] [review]
bug894201-p2.patch
Checked in yestrday:
http://hg.mozilla.org/build/puppet/rev/420e6c49c922
Attachment #779863 -
Flags: checked-in+
| Reporter | ||
Comment 21•12 years ago
|
||
ntpstat exists and reports offsets of 30-70ms for buildbot-master81 through to 89. Just the docs at https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/ntp to update now ?
| Assignee | ||
Comment 22•12 years ago
|
||
Thanks for the reminder on that :)
| Assignee | ||
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•