Closed Bug 1112508 Opened 11 years ago Closed 11 years ago

Socorro - Thrift connection on sp-admin01.phx1.mozilla.com is CRITICAL: CRITICAL: Thrift connection to socorro-thrift-single.zlb.phx1.mozilla.com is not viable

Categories

(Infrastructure & Operations Graveyard :: WebOps: Socorro, task)

Other
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dgarvey, Unassigned)

References

()

Details

(Whiteboard: [id=nagios1.private.phx1.mozilla.com:413664])

+++ This bug was initially created as a clone of Bug #1112492 +++ Automated alert report from nagios1.private.phx1.mozilla.com: Hostname: sp-admin01.phx1.mozilla.com Service: Socorro - Thrift connection State: CRITICAL Output: CRITICAL: Thrift connection to socorro-thrift-single.zlb.phx1.mozilla.com is not viable Runbook: http://m.allizom.org/Socorro+-+Thrift+connection
As per :rhelmer (via email) > We've been getting very frequent timeouts hitting production hbase via thrift, > it is holding up crashmoving at this point. > > So, crashes are being collected to local FS on collectors, but crashmover is > giving up when hbase times out and not moving them into hbase or s3. > > The only option I can think of right now is to turn hbase writes off, > but I don't > want to do that unilaterally at midnight :) Collectors should survive overnight > but we will likely have to backfill in the morning.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.