Closed Bug 1257171 Opened 8 years ago Closed 8 years ago

Ensure to use internal NTP servers for all Windows and Linux VMs

Categories

(Mozilla QA Graveyard :: Infrastructure, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

Details

Something similar like bug 1217564 for OS X. I noticed that some of our Windows VMs have a time drift of about half a minute. mm-win7-64-1 has even 54s and this is causing trouble with lots of remote hosts.

Problems exist e.g. when using the default NTP server time.windows.com. Then no connection can be made to this host. Trying our local NTP server (10.22.75.40) all works fine.

Chris, what is the recommended setting for NTP in Windows VMs? I find the above kinda suspicious given that we haven't seen such things in all the last years. So has something been changed?
Flags: needinfo?(cknowles)
This sounds like a firewall blocking the request - the "local works, but external doesn't" certainly points that way.  I'm unaware of any infra changes in this respect, but I'm not in the path of firewall changes, and might not know about them.

Windows has poor troubleshooting tools ... so looking at your linux boxes to help nail this down.

Doing ntpdate <server> works only if it's internal - any external source I use is timing out - tried windows and nist.  UDP is hard to test - can't telnet, and nc gives little information as to why it failed.

Two paths I can see forward - 1) set the windows boxes to use the internal time source.  2) get netops (possibly by way of opsec?) to open all your VLAN's access to the time.windows.com on udp port 123.
Flags: needinfo?(cknowles)
Talked with Q on IRC and he proposed to make use of the internal NTP servers. So we should update it best for everything, means also including Linux. OS X was already done via bug 1217564.
Summary: Windows VMs have connection problems with remote NTP servers → Ensure to use internal NTP servers for all Windows and Linux VMs
Let me know the specific path you'd like to take and I'm happy to do them on the templates.
I have added a new batch file on fs1 which will set the local NTP servers and restarts the service. The script can be found here:

\\fs1.qa.scl3.mozilla.com\config\set_ntp_hosts.cmd

Chris, I think we can defer any work on templates until May when I do a general update of all systems. I can take care of those things then. For now I will update the active VMs.
I noticed that also our Jenkins master machines do not have NTP installed and show a difference of >1min right now. I will also have to update those.

The command to use is: sudo nano /etc/ntp.conf && sudo service ntp reload && sudo ntpd -q -g -x -n

Whereby we add the following entries to the config file:

server 10.22.75.40 iburst
server 10.22.75.41 iburst

Everything has already been updated in the ESX documentation.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Every host is now in sync again for mozmill-ci staging:
http://mm-ci-staging.qa.scl3.mozilla.com:8080/computer/
I updated the Ubuntu VM for mozmill-ci production which corrected its time. So it looks way better now even without any slaves being updated.

http://mm-ci-production.qa.scl3.mozilla.com:8080/computer/

I will refrain from updating all production slave nodes now. I will wait for a reply from Chris on bug 1258736. There might be a more elegant way to get this solved for all other machines.

I'm closing this bug as fixed now.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.