Closed Bug 1284487 Opened 8 years ago Closed 8 years ago

Unable to deploy Treeherder when zlb1.ops.scl3 down

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

In bug 1284456, zlb1 has been taken offline temporarily.

The other Zeus nodes are handling requests to treeherder.{allizom,mozilla}.org fine, however for the actual deployments, the drain/undrain commands are hardcoded to zlb1, so now fail.

Our chief deployment script [1] calls /root/bin/restart-jobs, which sources /root/bin/th_functions.sh, which in turn uses /root/bin/zxtmpool, which contains:

# List Zeus endpoints
my $zeus_scl3 = 'REDACTED_IP:9090'; # 'zlb1.ops.scl3.mozilla.com:9090';
my $zeus_phx1 = 'REDACTED_IP:9090'; # 'zlb8.ops.phx1.mozilla.com:9090';

eg just running restart-jobs directly:
[emorley@treeherderadm.private.scl3 ~]$ sudo /root/bin/restart-jobs -p web

syntax error at line 1, column 0, byte 0 at /usr/lib64/perl5/XML/Parser.pm line 187
500 Connect failed: connect: Connection refused; Connection refused
NOTICE: treeherder1.webapp.scl3.mozilla.com hasn't drained in 300s. Please verify active connections!
NOTICE: Push script will abort in 300s if still in wait
...


This may end up being wontfix, since:
* Treeherder moving to Heroku soon
* I'm not sure how the Zeus nodes are set up, and zlb1 still might be the most appropriate to make API calls to (eg if it's the master)
* zlb1 will presumably be back online today

However I was under the impression the treeherder drain/undrain script was used by other sites too, which may not be moving to Heroku any time soon - so may still be good to make it handle this case more gracefully :-)


[1] https://github.com/mozilla/treeherder/blob/8c1e8f9fccea18b1606c7bb3ea9e2808a808a7af/deployment/update/update.py#L130
Flags: needinfo?(klibby)
Summary: Treeherder deployments broken when zlb1.ops.scl3 down → Unable to deploy Treeherder when zlb1.ops.scl3 down
See Also: → 1284505
zxtmpool updated to use external.zlb CNAME
Flags: needinfo?(klibby)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.