Last Comment Bug 769972 - Java is choking on leap second.
: Java is choking on leap second.
Status: RESOLVED FIXED
:
Product: Mozilla Metrics
Classification: Other
Component: Metrics Operations (show other bugs)
: unspecified
: All All
: -- blocker (vote)
: Unreviewed
Assigned To: Eric Ziegenhorn :ericz
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-30 17:49 PDT by Eric Ziegenhorn :ericz
Modified: 2012-09-05 07:06 PDT (History)
11 users (show)
See Also:
Due Date:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description Eric Ziegenhorn :ericz 2012-06-30 17:49:17 PDT
Servers running java apps such as Hadoop and ElasticSearch and java doesn't appear to be working.  We believe this is related to the leap second happening tonight becuase it happened at midnight GMT.
Comment 1 Eric Ziegenhorn :ericz 2012-06-30 17:50:31 PDT
Elevating to blocker.  I believe we need to restart Java everywhere, and possibly reboot servers but need some feedback from Hadoop owners, etc.
Comment 2 Corey Shields [:cshields] 2012-06-30 17:53:41 PDT
opening this bug up
Comment 3 Pedro Alves 2012-06-30 18:45:13 PDT
Still needs to be confirmed, but I was able to fix one of the issues with an elasticsearch server that I have installed by manually adjusting the date "date --help" (there was a service restart involved, but no reboots)
Comment 4 Daniel Einspanjer [:dre] [:deinspanjer] 2012-06-30 19:01:33 PDT
We are updating kernels and rebooting HBase clusters right now.
Comment 5 Ricardo Pardini 2012-06-30 19:27:14 PDT
For those machines that shouldn't be rebooted:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

Then restart affected Java applications.
This stops ntpd, sets the date manually to the current date, confirms it.
You may or may not get the bug back after restarting ntpd.
Comment 6 Corey Shields [:cshields] 2012-06-30 19:55:29 PDT
(In reply to Ricardo Pardini from comment #5)
> For those machines that shouldn't be rebooted:
> 
> /etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

we are injecting this fix into all systems through our base puppet module as we speak
Comment 7 Mina Naguib 2012-06-30 20:06:06 PDT
For what it's worth, we've stabilized our java apps across our servers simply via:

date; date `date +"%m%d%H%M%C%y.%S"`; date;

The CPU of the JVMs drops instantly when that is run. There was no need to stop/restart ntpd nor the JVMs themselves.
Comment 8 Ricardo Pardini 2012-06-30 20:09:15 PDT
(In reply to Mina Naguib from comment #7)

> The CPU of the JVMs drops instantly when that is run. There was no need to
> stop/restart ntpd nor the JVMs themselves.

I've got mixed results, some machines go back to 100% when ntpd is restarted, some don't. Out of uncertainty, I'm keeping ntpd stopped for now. I will bring some back online later and report.
Comment 9 Ricardo Pardini 2012-06-30 21:02:21 PDT
> I'm keeping ntpd stopped for now.
> I will bring some back online later and report.

I've brought ntpd back online now on all my servers, and it seems stable.
It definitely caused the CPU issue to reappear some time before, but no longer.
Comment 10 Eric Ziegenhorn :ericz 2012-06-30 21:07:06 PDT
Socorro and most Hadoop stuff is back up.  Everything else hadoop-related can wait until Monday.
Comment 11 Corey Shields [:cshields] 2012-06-30 21:29:07 PDT
For reference, this is the fix that is getting pushed out:  http://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/

Note You need to log in before you can comment on or make changes to this bug.