Closed Bug 778805 Opened 8 years ago Closed 8 years ago

Verify Zeus thift_check.py properly reports failures to Zeus for socorro staging

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

x86
macOS
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bburton, Assigned: bburton)

References

Details

We're seeing ongoing connectivity issues to Socorro staging's HBase pool, but Zeus is not recording any errors with backend nodes, unlike the Postgres pool.

Work with :tmary to stop HBase on one of the Socorro Staging nodes (hp-node62 - hp-node69) and confirm the Zeus check (https://pp-zlb01.phx.mozilla.net:9090/apps/zxtm/index.fcgi?section=Extra%20Files%3AExternProgMonitors) is working properly.
Per IRC and Zeus logs

20:34:13       tmary | solarce: done

[30/Jul/2012:11:34:25 -0700]	WARN	monitors/socorro-thrift-check	monitorfail	Monitor has detected a failure in node '10.8.100.62:9090': Monitor exited, exit code 1, no output generated
[30/Jul/2012:11:34:25 -0700]	SERIOUS	pools/socorro-thrift-stage:9090	nodes/10.8.100.62:9090	nodefail	Node 10.8.100.62 has failed - A monitor has detected a failure
Status: NEW → ASSIGNED
Confirmed happy again, monitor is working 

[30/Jul/2012:11:45:56 -0700]	INFO	monitors/socorro-thrift-check	monitorok	Monitor is working for node '10.8.100.62:9090'.
[30/Jul/2012:11:45:57 -0700]	INFO	pools/socorro-thrift-stage:9090	nodes/10.8.100.62:9090	nodeworking	Node 10.8.100.62 is working again
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.