We're seeing ongoing connectivity issues to Socorro staging's HBase pool, but Zeus is not recording any errors with backend nodes, unlike the Postgres pool. Work with :tmary to stop HBase on one of the Socorro Staging nodes (hp-node62 - hp-node69) and confirm the Zeus check (https://pp-zlb01.phx.mozilla.net:9090/apps/zxtm/index.fcgi?section=Extra%20Files%3AExternProgMonitors) is working properly.
Per IRC and Zeus logs 20:34:13 tmary | solarce: done [30/Jul/2012:11:34:25 -0700] WARN monitors/socorro-thrift-check monitorfail Monitor has detected a failure in node '10.8.100.62:9090': Monitor exited, exit code 1, no output generated [30/Jul/2012:11:34:25 -0700] SERIOUS pools/socorro-thrift-stage:9090 nodes/10.8.100.62:9090 nodefail Node 10.8.100.62 has failed - A monitor has detected a failure
Confirmed happy again, monitor is working [30/Jul/2012:11:45:56 -0700] INFO monitors/socorro-thrift-check monitorok Monitor is working for node '10.8.100.62:9090'. [30/Jul/2012:11:45:57 -0700] INFO pools/socorro-thrift-stage:9090 nodes/10.8.100.62:9090 nodeworking Node 10.8.100.62 is working again