Closed Bug 457770 Opened 16 years ago Closed 16 years ago

several tinderboxes with bad date/time

Categories

(Release Engineering :: General, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aja+bugzilla, Assigned: joduinn)

References

Details

Several linux-slave boxen have wrong date/time.
Possibly fallout from recent kernel upgrades.
Aja, originally filed this because the buildid's of the 27th & 28th tracemonkey nightlies for linux are wrong. Upon investigation:

moz2-linux-slave01 : Mon Sep 29 15:18:00 PDT 2008
moz2-linux-slave02 : Mon Sep 29 15:18:23 PDT 2008
moz2-linux-slave03 : Mon Sep 29 15:18:34 PDT 2008
moz2-linux-slave04 : Mon Sep 29 00:22:04 PDT 2008
moz2-linux-slave05 : Mon Sep 29 15:18:52 PDT 2008
moz2-linux-slave06 : Mon Sep 29 08:23:16 PDT 2008

Both those nighlties were done on 06. Looks like the two machines that have the kernel updated are at fault, perhaps because we need to use clocksource=pit instead of clock=pit as a kernel boot arg. 

But, both of these were getting downgraded John ?
(In reply to comment #1)
> Aja, originally filed this because the buildid's of the 27th & 28th tracemonkey
> nightlies for linux are wrong. Upon investigation:
> 
> moz2-linux-slave01 : Mon Sep 29 15:18:00 PDT 2008
> moz2-linux-slave02 : Mon Sep 29 15:18:23 PDT 2008
> moz2-linux-slave03 : Mon Sep 29 15:18:34 PDT 2008
> moz2-linux-slave04 : Mon Sep 29 00:22:04 PDT 2008
> moz2-linux-slave05 : Mon Sep 29 15:18:52 PDT 2008
> moz2-linux-slave06 : Mon Sep 29 08:23:16 PDT 2008
> 
> Both those nighlties were done on 06. Looks like the two machines that have the
> kernel updated are at fault, perhaps because we need to use clocksource=pit
> instead of clock=pit as a kernel boot arg. 
Urgh. Sorry. 


> But, both of these were getting downgraded John ?
Actually, talking with bhearsum last week, he was in favor of leaving slave04,06 and upgrading the other VMs to match.

I dont mind whether we upgrade them all, or downgrade them all. I just want all the slaves to be on the same version of kernel.

Grabbing this bug, as I caused it by starting the kernel updates for the mobile work.
Assignee: nobody → joduinn
Blocks: 430200
Priority: -- → P2
(In reply to comment #2)
> > But, both of these were getting downgraded John ?
> Actually, talking with bhearsum last week, he was in favor of leaving
> slave04,06 and upgrading the other VMs to match.
> 

IIRC upgrading the kernel allowed us to use the 'vm.mmap_min_addr = 4096' option. If we need the option, we need to upgrade the kernels. If we don't need it or don't want it let's downgrade the two slaves instead of upgrading the rest.
Quick survey of build & unittest slaves show the following kernel versions in use:

moz2-linux-slave1.build.mozilla.org  2.6.18-53.1.6.el5
moz2-linux-slave02.build.mozilla.org 2.6.18-53.1.21.el5
moz2-linux-slave03.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave04.build.mozilla.org 2.6.18-92.1.10.el5
moz2-linux-slave05.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave06.build.mozilla.org 2.6.18-92.1.10.el5
moz2-linux-slave07.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave08.build.mozilla.org 2.6.18-53.1.19.el5 
moz2-linux-slave09.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave10.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave11.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave12.build.mozilla.org 2.6.18-53.1.19.el5

From this, looks like slave1/02/04/06 will need kernel version changes to bring them to 2.6.18-53.1.19.el5 but all the other VMs are fine.
moz2-linux-slave04 is now booted in older (preexisting) kernel. I tried to upgrade the kernel to 2.6.18-53.1.19.el5 using these instructions:
https://bugzilla.mozilla.org/show_bug.cgi?id=407796#c78
..but hit:

# yumdownloader kernel-2.6.18-53.1.19.el5
Loading "fastestmirror" plugin
Loading mirror speeds from cached hostfile
 * base: mirror.its.uidaho.edu
 * updates: pubmirrors.reflected.net
 * addons: mirror.hmc.edu
 * extras: ftp.osuosl.org
No Match for argument kernel-2.6.18-53.1.19.el5
Nothing to download
#

These instructions cut-and-paste *used* to work; I've used them before. Is it possible that this kernel is now too old and has now been aged off? Or am I missing something?
(In reply to comment #5)
> These instructions cut-and-paste *used* to work; I've used them before. Is it
> possible that this kernel is now too old and has now been aged off? Or am I
> missing something?

I would guess that that's the case.
On moz2-linux-slave07 all of the necessary RPMs exist in root's home directory. To install them, do the following on each offending slave:
scp root@moz2-linux-slave07:*.rpm .
rpm --force -U kernel*.rpm

Reboot.
As root, while in the /root directory, I followed the steps in comment#7. After reboot, confirmed that the kernel version was correct. 

Noticed that time/date still wrong. Enabled VMware tools on moz2-linux-slave04, and confirmed that the time/date are now correct. After finishing VMware tools install, had to manually restart the network connection on console  using:
# /etc/init.d/network start


moz2-linux-slave04 is now fixed.
moz2-linux-slave06.build.mozilla.org now fixed.
moz2-linux-slave02.build.mozilla.org now fixed.
ok, kernels now consistent across linux slaves in m-c pool, and the linux ref image, and all clocks now running correctly. 

moz2-linux-slave1.build.mozilla.org being deleted in bug#460887.
moz2-linux-slave01.build.mozilla.org being created in bug#460889.

moz2-linux-slave02.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave03.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave04.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave05.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave06.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave07.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave08.build.mozilla.org 2.6.18-53.1.19.el5 
moz2-linux-slave09.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave10.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave11.build.mozilla.org 2.6.18-53.1.19.el5
moz2-linux-slave12.build.mozilla.org 2.6.18-53.1.19.el5

Closing.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.