Closed
Bug 769971
Opened 13 years ago
Closed 13 years ago
zimbra outage, 30/6/2012
Categories
(Infrastructure & Operations :: Infrastructure: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cshields, Assigned: justdave)
References
Details
17:44 <@justdave> zimbra servers started paging for non-responsiveness about
5:15
17:45 <@justdave> the graphs show everything fell off a cliff at 5pm
17:45 <@justdave> which happens to be midnight gmt
17:45 <@justdave> too much coincidence for it to be anything other than leap
second, even though RH claims rhel6 was immune to the leap
second hang
17:45 <@justdave> ok, confirmed reboot cleaned it up
17:45 <@justdave> (restarting zimbra did not)
17:46 <@justdave> yeah, java's what hung
17:46 <@justdave> maybe java had its own issue with the leap second
Will update when we know more..
Reporter | ||
Comment 1•13 years ago
|
||
As of about 18:18 Zimbra should be all back up. This required a reboot across the board.
This leap year bug seems common with RHEL6/CentOS6 servers running Java processes, according to just about every other sysadmin on twitter right now.
According to Dave the services themselves were never fully down with the exception of the reboots themselves which had to be done to clear the load. Up until this point mail delivery would have been slow at best.
Assignee | ||
Comment 2•13 years ago
|
||
upgraded the kernels before rebooting on the first few, in case there was anything helpful in the kernel to fix the problem. This was turning out to take a long time, so I did a couple without upgrading just to see if it would go away, and those succeeded, so proceeded to just do straight reboots on everything else to clear everything out as quickly as possible.
No more is really known at this point other than things seem to be working again. We'll probably hear more from outside soon since it seemed to affect half the Internet according to twitter.
Reporter | ||
Updated•13 years ago
|
Assignee | ||
Comment 3•13 years ago
|
||
guess I should mark this resolved, stuff's been doing fine for a few hours.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•