Closed Bug 778805 Opened 8 years ago Closed 8 years ago
Verify Zeus thift
_check .py properly reports failures to Zeus for socorro staging
We're seeing ongoing connectivity issues to Socorro staging's HBase pool, but Zeus is not recording any errors with backend nodes, unlike the Postgres pool. Work with :tmary to stop HBase on one of the Socorro Staging nodes (hp-node62 - hp-node69) and confirm the Zeus check (https://pp-zlb01.phx.mozilla.net:9090/apps/zxtm/index.fcgi?section=Extra%20Files%3AExternProgMonitors) is working properly.
Per IRC and Zeus logs 20:34:13 tmary | solarce: done [30/Jul/2012:11:34:25 -0700] WARN monitors/socorro-thrift-check monitorfail Monitor has detected a failure in node '10.8.100.62:9090': Monitor exited, exit code 1, no output generated [30/Jul/2012:11:34:25 -0700] SERIOUS pools/socorro-thrift-stage:9090 nodes/10.8.100.62:9090 nodefail Node 10.8.100.62 has failed - A monitor has detected a failure
Status: NEW → ASSIGNED
Confirmed happy again, monitor is working [30/Jul/2012:11:45:56 -0700] INFO monitors/socorro-thrift-check monitorok Monitor is working for node '10.8.100.62:9090'. [30/Jul/2012:11:45:57 -0700] INFO pools/socorro-thrift-stage:9090 nodes/10.8.100.62:9090 nodeworking Node 10.8.100.62 is working again
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.