Closed Bug 769972 Opened 8 years ago Closed 8 years ago

Java is choking on leap second.

Categories

(Mozilla Metrics :: Metrics Operations, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED
Unreviewed

People

(Reporter: ericz, Assigned: ericz)

References

Details

Servers running java apps such as Hadoop and ElasticSearch and java doesn't appear to be working.  We believe this is related to the leap second happening tonight becuase it happened at midnight GMT.
Elevating to blocker.  I believe we need to restart Java everywhere, and possibly reboot servers but need some feedback from Hadoop owners, etc.
Severity: critical → blocker
opening this bug up
Group: metrics-private
Still needs to be confirmed, but I was able to fix one of the issues with an elasticsearch server that I have installed by manually adjusting the date "date --help" (there was a service restart involved, but no reboots)
See Also: → 769973, 769971
We are updating kernels and rebooting HBase clusters right now.
For those machines that shouldn't be rebooted:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

Then restart affected Java applications.
This stops ntpd, sets the date manually to the current date, confirms it.
You may or may not get the bug back after restarting ntpd.
Assignee: nobody → eziegenhorn
(In reply to Ricardo Pardini from comment #5)
> For those machines that shouldn't be rebooted:
> 
> /etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

we are injecting this fix into all systems through our base puppet module as we speak
For what it's worth, we've stabilized our java apps across our servers simply via:

date; date `date +"%m%d%H%M%C%y.%S"`; date;

The CPU of the JVMs drops instantly when that is run. There was no need to stop/restart ntpd nor the JVMs themselves.
(In reply to Mina Naguib from comment #7)

> The CPU of the JVMs drops instantly when that is run. There was no need to
> stop/restart ntpd nor the JVMs themselves.

I've got mixed results, some machines go back to 100% when ntpd is restarted, some don't. Out of uncertainty, I'm keeping ntpd stopped for now. I will bring some back online later and report.
> I'm keeping ntpd stopped for now.
> I will bring some back online later and report.

I've brought ntpd back online now on all my servers, and it seems stable.
It definitely caused the CPU issue to reappear some time before, but no longer.
Socorro and most Hadoop stuff is back up.  Everything else hadoop-related can wait until Monday.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.