Closed Bug 396253 Opened 12 years ago Closed 12 years ago

build-console.build.mozilla.org clock is 25mins slow

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Unassigned)

References

Details

during the FF2007rc2 release, we discovered that build-console.build.mozilla.org machine clock was 25mins slower then the times of the slaves. I just checked, and build-console is 25mins slower then my machine also:

h-132:~ john$ date
Fri Sep 14 18:58:11 PDT 2007
h-132:~ john$ ssh -l cltbld build-console.build.mozilla.org
cltbld@build-console.build.mozilla.org's password: 
Last login: Fri Sep 14 16:48:35 2007 from cm-vpn01.mozilla.org
[cltbld@build-console ~]$ 
[cltbld@build-console ~]$ date
Fri Sep 14 18:33:10 PDT 2007
[cltbld@build-console ~]$
does this mean build-console is not configured to use ntp?
Priority: -- → P3
IIRC ntp doesn't work very well in a VM and it's recommended to let the VMware tools keep the clock in sync with the host hardware. Do I have that right Paul ?
vmware-guestd seems to be running on build-console, although I didn't start up X to see what settings are enabled. Also I checked that clock=pit is one of the kernel arguments in /boot/grub/grub.conf.

Interestingly, staging-build-console is fine; build-console is on bm-vwmare08 (named build-console-local), while staging-build-console is on bm-vmware01. Other time lags on those hosts:
 
  bm-vmware08 (host, L):          correct time
  build-console-local (L):        25 min 10 sec slow
  cerberus-vm (W):                40 seconds fast
  fxexp-win32-tbox (W):           40 seconds fast
  production-pacifica-vm (W):     36 seconds fast
  production-prometheus-vm (L):   7 seconds fast
  staging-pacifica-vm (W):        40 seconds fast
  
  bm-vmware01 (host, L):          correct
  fxdbug-linux-tbox (L):          1 second slow
  l10n-linux-tbox (L):            correct
  staging-build-console (L):      1 second slow
  staging-prometheus-vm (L):      correct

These are relative to my laptop time, which was synced up to the Apple time server before starting.

It's interesting that all the windows boxes on -08 are also out (by about 40 seconds), and the other linux box is a little off, which suggests something on the host. When I had a look in the logs on bm-vmware08 I couldn't see anything obvious. Really need someone with more ESX/VMware experience to take this forward now.
On bm-vmware08 (or in VI if the VM is shutdown), you should check to see if tools.syncTime is set to "TRUE" or not. If it's not, you'll need to shutdown the VM, set it, and then restart the VM.

Main things to remember:
* Only run ntp on the vmware host machine (never on the VMs)
* Install VMware guest tools on all VMs
* Make sure to set clock=pit in grub config
* Set tools.syncTime to TRUE for all VMs before you start them (can only be done when VM is shutdown)
build-console-local has tools.syncTime = "FALSE" so that looks like the problem. All the other boxes on bm-vmware08 have value TRUE. 

Once the builds for 2.0.0.7 RC2 are done we could fix the config. The setting is made at
  Select VM in VI client
    > Edit Settings
      > Options
      > Configuration Parameters
        > Add Row, Name: tools.syncTime, Value: TRUE
grepping over the storage for the VMs, these other boxes also have the problem (vmx file - VI display name)

build-console.vmx            - build-console (off)
crazyhorse.vmx               - crazyhorse (28 seconds fast)
fxnewref-linux-tbox (1).vmx  - mrz-fxnewref-linux-tbox-iscsi (off)

I went ahead and fixed these ones. 

Corrected steps are:
  Select VM in VI client
    > Edit Settings
      > Options
      > Advanced   Logging
      > Configuration Parameters
        > Change value for tools.syncTime from FALSE to TRUE
(In reply to comment #4)
> Once the builds for 2.0.0.7 RC2 are done we could fix the config.

Yeah, I agree with holding off until after we finish the 2007 release. Nice background investigation though, good to know. Thanks!
Assignee: build → nobody
QA Contact: mozpreed → build
Assignee: nobody → nrthomas
Priority: P3 → P2
Shut down the VM and made the change in comment #5, but the time was still about 30 seconds fast after restarting it (the same as before I started, someone must have reset it previously).

When I started VNC and ran vmware-toolbox, it had time sycing turned on; there were no running ntp processes.

For 2.0.0.9, I've set the time manually. We can revisit this afterwards.
Assignee: nrthomas → nobody
Priority: P2 → P3
Seems like it's been ok for a while now, looks spot-on at the moment.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.