Closed Bug 769971 Opened 13 years ago Closed 13 years ago

zimbra outage, 30/6/2012

Categories

(Infrastructure & Operations :: Infrastructure: Other, task)

x86
macOS
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cshields, Assigned: justdave)

References

Details

17:44 <@justdave> zimbra servers started paging for non-responsiveness about 5:15 17:45 <@justdave> the graphs show everything fell off a cliff at 5pm 17:45 <@justdave> which happens to be midnight gmt 17:45 <@justdave> too much coincidence for it to be anything other than leap second, even though RH claims rhel6 was immune to the leap second hang 17:45 <@justdave> ok, confirmed reboot cleaned it up 17:45 <@justdave> (restarting zimbra did not) 17:46 <@justdave> yeah, java's what hung 17:46 <@justdave> maybe java had its own issue with the leap second Will update when we know more..
As of about 18:18 Zimbra should be all back up. This required a reboot across the board. This leap year bug seems common with RHEL6/CentOS6 servers running Java processes, according to just about every other sysadmin on twitter right now. According to Dave the services themselves were never fully down with the exception of the reboots themselves which had to be done to clear the load. Up until this point mail delivery would have been slow at best.
upgraded the kernels before rebooting on the first few, in case there was anything helpful in the kernel to fix the problem. This was turning out to take a long time, so I did a couple without upgrading just to see if it would go away, and those succeeded, so proceeded to just do straight reboots on everything else to clear everything out as quickly as possible. No more is really known at this point other than things seem to be working again. We'll probably hear more from outside soon since it seemed to affect half the Internet according to twitter.
See Also: → 769972, 769973
guess I should mark this resolved, stuff's been doing fine for a few hours.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.