Closed Bug 1112508 Opened 9 years ago Closed 9 years ago

Socorro - Thrift connection on sp-admin01.phx1.mozilla.com is CRITICAL: CRITICAL: Thrift connection to socorro-thrift-single.zlb.phx1.mozilla.com is not viable

Categories

(Infrastructure & Operations Graveyard :: WebOps: Socorro, task)

Other
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dgarvey, Unassigned)

References

()

Details

(Whiteboard: [id=nagios1.private.phx1.mozilla.com:413664])

+++ This bug was initially created as a clone of Bug #1112492 +++

Automated alert report from nagios1.private.phx1.mozilla.com:

Hostname: sp-admin01.phx1.mozilla.com
Service:  Socorro - Thrift connection
State:    CRITICAL
Output:   CRITICAL: Thrift connection to socorro-thrift-single.zlb.phx1.mozilla.com is not viable

Runbook:  http://m.allizom.org/Socorro+-+Thrift+connection
As per :rhelmer (via email)
> We've been getting very frequent timeouts hitting production hbase via thrift,
> it is holding up crashmoving at this point.
>
> So, crashes are being collected to local FS on collectors, but crashmover is
> giving up when hbase times out and not moving them into hbase or s3.
>
> The only option I can think of right now is to turn hbase writes off,
> but I don't
> want to do that unilaterally at midnight :) Collectors should survive overnight
> but we will likely have to backfill in the morning.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.