If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Verify Zeus thift_check.py properly reports failures to Zeus for socorro staging

RESOLVED FIXED

Status

Infrastructure & Operations
WebOps: Other
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: solarce, Assigned: solarce)

Tracking

Details

(Assignee)

Description

5 years ago
We're seeing ongoing connectivity issues to Socorro staging's HBase pool, but Zeus is not recording any errors with backend nodes, unlike the Postgres pool.

Work with :tmary to stop HBase on one of the Socorro Staging nodes (hp-node62 - hp-node69) and confirm the Zeus check (https://pp-zlb01.phx.mozilla.net:9090/apps/zxtm/index.fcgi?section=Extra%20Files%3AExternProgMonitors) is working properly.
(Assignee)

Comment 1

5 years ago
Per IRC and Zeus logs

20:34:13       tmary | solarce: done

[30/Jul/2012:11:34:25 -0700]	WARN	monitors/socorro-thrift-check	monitorfail	Monitor has detected a failure in node '10.8.100.62:9090': Monitor exited, exit code 1, no output generated
[30/Jul/2012:11:34:25 -0700]	SERIOUS	pools/socorro-thrift-stage:9090	nodes/10.8.100.62:9090	nodefail	Node 10.8.100.62 has failed - A monitor has detected a failure
Status: NEW → ASSIGNED
(Assignee)

Comment 2

5 years ago
Confirmed happy again, monitor is working 

[30/Jul/2012:11:45:56 -0700]	INFO	monitors/socorro-thrift-check	monitorok	Monitor is working for node '10.8.100.62:9090'.
[30/Jul/2012:11:45:57 -0700]	INFO	pools/socorro-thrift-stage:9090	nodes/10.8.100.62:9090	nodeworking	Node 10.8.100.62 is working again
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.