Closed Bug 373235 Opened 17 years ago Closed 17 years ago

Build farm DST issues

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: preed, Assigned: mrz)

References

()

Details

We'll have to deal with the same DST issues (http://en.wikipedia.org/wiki/Energy_Policy_Act_of_2005#Change_to_daylight_saving_time) in the build farm as well.

Unfortunately, it's not as simple as patching all of the machines, because there are configuration management issues.

On the mac side (bm-xserve0N), coop found that you can't just install the timezone patch from Apple, you'd need to install a kernel upgrade. We can do so, I think, for the trunk tinderboxen, but for 1.8/1.8.0, we need to investigate more before applying updates, so for this time, can we just turn off time syncing and change the time on these machines? (If there's a better solution that doesn't involve changing the kernel config, that'd be awesome.)

On the VM side, I think the simplest way to approach this is to see if we can apply the fix to the ESX host/kernel side, and then reboot (possibly suspend/resume) all of the VMs. They should pick up the new time from the service console. I think this will avoid having to patch all of the guest OSs.

We'll need to come up with a longer-term strategy, since this obviously isn't sustainable, but since it's coming up quickly, we need to do something.

The build team can help with whatever solution we decide to implement; it would be helpful to get suggestions, too, in case we've forgotten/missed something huge that would make this easier (or harder, for that matter).
(In reply to comment #0)
(If there's a better
> solution that doesn't involve changing the kernel config, that'd be awesome.)

Switch to UTC until April (or stay on UTC)

> On the VM side, I think the simplest way to approach this is to see if we can
> apply the fix to the ESX host/kernel side, and then reboot (possibly
> suspend/resume) all of the VMs. They should pick up the new time from the
> service console. I think this will avoid having to patch all of the guest OSs.

RedHat's tzdata was correct for PST back in 2005 so you may already have the fix.  

There are a number of critical and non-critical VMWare patches, including tzdata that should be updated.  Might as well get those in if you're planning a reboot.


OS: Linux → All
Hardware: PC → All
(In reply to comment #1)

> Switch to UTC until April (or stay on UTC)

I don't think this will work; the build machines use time to report into the server machine, and if they're not in the local time zone, the tinderbox pages will be skewed incorrectly.

rhelmer/coop: can you confirm I'm remember this right; in particular, I think you ran into this issue, rhelmer; doesn't tbox do something weird like use the client time for start time and the server time for end time?

> There are a number of critical and non-critical VMWare patches, including
> tzdata that should be updated.  Might as well get those in if you're planning a
> reboot.

Could we get a list of these patches for 2.5.2 and 3.0.1? I care less about 3.0.1, but for 2.5.2, I'd rather patch the tz issue by itself for now, if we can, and then do an upgrade cycle later, when we're not under such time--haha--pressures.
For 3.0.1, http://www.vmware.com/download/vi/vi3_patches.html .

ESX 2 is @ http://www.vmware.com/download/esx/esx2_patches.html .

You could probably get away with using UTC in ESX and just bother updating the guest OS.  
Assignee: server-ops → mrz
ESX boxes:
bm-vmware01 - VMware ESX Server 2.5.2 build-16390
bm-vmware02 - VMware ESX Server 2.5.2 build-16390
bm-vmware07 - VMware ESX Server 3.0.1 build-37303
bm-vmware03 - VMware ESX Server 2.5.2 build-16390
bm-vmware04 - VMware ESX Server 2.5.2 build-16390
bm-vmware05 - VMware ESX Server 2.5.2 build-16390
bm-vmware06 - VMware ESX Server 3.0.1 build-32039
bm-vmware-swap - VMware ESX Server 3.0.1 build-32039
ESX 2.5 servers updated.  

I'm not clear on what you want me to do on the apples - manually move the clock forward Saturday evening?  Switch to UTC?

How do you want to handle the time on the host OSes that have the wrong timezone configs?
Everything is done except for the xserves.  We are waiting on a list from build (per conversation on Friday).
List of xserves:

# community network
cg-xserve01.mozilla.com
cg-xserve02.mozilla.com
cg-xserve03.mozilla.com

# build network
bm-xserve01.build.mozilla.org
bm-xserve02.build.mozilla.org
bm-xserve03.build.mozilla.org
bm-xserve04.build.mozilla.org
bm-xserve05.build.mozilla.org
bm-xserve06.build.mozilla.org
bm-xserve07.build.mozilla.org
bm-xserve08.build.mozilla.org
bm-xserve09.build.mozilla.org
Amended list (03 and 06 became community macs)

(In reply to comment #7)
> # build network
> bm-xserve01.build.mozilla.org
> bm-xserve02.build.mozilla.org
> bm-xserve04.build.mozilla.org
> bm-xserve05.build.mozilla.org
> bm-xserve07.build.mozilla.org
> bm-xserve08.build.mozilla.org
> bm-xserve09.build.mozilla.org
> bm-xserve10.build.mozilla.org
> bm-xserve11.build.mozilla.org
> bm-xserve12.build.mozilla.org

I don't know vnc logins for any of these and I'm guessing I also don't know any other logins.
(In reply to comment #9)
> I don't know vnc logins for any of these and I'm guessing I also don't know any
> other logins.

I only tried xserve04, but it was the new cltbld password, which is expected.
I don't quite understand why this should be from an off by one hour type of issue, but ever since the change to DST over the weekend, the partial updates seem to show up on stage around 8 to 9 hours later than before.
Whiteboard: Still need VNC & logins to Xservers
(In reply to comment #12)
> I don't quite understand why this should be from an off by one hour type of
> issue, but ever since the change to DST over the weekend, the partial updates
> seem to show up on stage around 8 to 9 hours later than before.

Since the change over to DST, we've have multiple outages of several days with no updates being generated, and these outages have had *absolutely nothing to do with DST*. We're also trying to slam through updates for various firedrill releases, which obviously take precedence. 

I would be interested to see whether update generation timing normalized over the next week.
Checked all of these - times & timezones are all correct.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
The full log on the machine that creates windows trunk nightlies says:

-->Tinderbox Config Info<--------------------------
Begin: Thu Mar 29 06:48:27 2007

This is from the log file of the build that began at 7:48:27 PDT today.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Which build machine are you referring to?
Th(In reply to comment #16)
> Which build machine are you referring to?
> 

The machine identified as "WINNT 5.2 fx-win32-tbox Dep Nightly" on this page:

http://tinderbox.mozilla.org/Firefox/
Clock is off an hour but I don't know what the "right" fix is.  It's missing SP2 so presumably not patching was done for some build-related reason?  Perhaps it was never "paused" like preed had suggested for the other VM guests OSes?

How do you want me to fix this?
Whiteboard: Still need VNC & logins to Xservers
Wasn't set to sync time anyways so I just moved the clock forward and hour.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
(In reply to comment #19)
> Wasn't set to sync time anyways so I just moved the clock forward and hour.
> 

OK, but won't that just make it be an hour off in the opposite direction come Sunday?
Probably.  I didn't get any guidance on the best fix short of patching the OS.  If that's the case on Sunday, we'll move the clock back.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.