Closed Bug 730986 Opened 12 years ago Closed 12 years ago

Socorro staging hbase having connection blips

Categories

(Mozilla Metrics :: Hadoop/HBase Operations, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Unreviewed

People

(Reporter: laura, Assigned: tmary)

References

Details

The logged error looks like this:
http://lonnen.pastebin.mozilla.org/1493971

Specifically on the HBase side, this looks like the root cause:
IOError(message='org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@1b45e2d5 closed')

It seems to recover on its own, but this doesn't seem like a good thing.
Happening again, but it looks actively down now.  Upping severity, we have a release tomorrow and it would be good it it was fixed.
Severity: normal → major
Blocks: 726570
AFAIK HBase Master and at least one HBase RS are alive (node7.generic.metrics.sjc1.mozilla.com) - thrifttester service reports that the service is alive - not sure which parts are broken 

--
Restarting HBase daemons on all nodes - hopefully this fixes the connection issues (temporary)

--
Just to make sure, this is the old hbase staging cluster in sjc1, not the new one in phx, correct?
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #4)
> Just to make sure, this is the old hbase staging cluster in sjc1, not the
> new one in phx, correct?

Yes.

-

HBase Thrift service is up on at least 4 nodes ATM 

--
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Version: 1.0 → unspecified
You need to log in before you can comment on or make changes to this bug.