Closed
Bug 1284487
Opened 8 years ago
Closed 8 years ago
Unable to deploy Treeherder when zlb1.ops.scl3 down
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P1)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Unassigned)
References
Details
In bug 1284456, zlb1 has been taken offline temporarily. The other Zeus nodes are handling requests to treeherder.{allizom,mozilla}.org fine, however for the actual deployments, the drain/undrain commands are hardcoded to zlb1, so now fail. Our chief deployment script [1] calls /root/bin/restart-jobs, which sources /root/bin/th_functions.sh, which in turn uses /root/bin/zxtmpool, which contains: # List Zeus endpoints my $zeus_scl3 = 'REDACTED_IP:9090'; # 'zlb1.ops.scl3.mozilla.com:9090'; my $zeus_phx1 = 'REDACTED_IP:9090'; # 'zlb8.ops.phx1.mozilla.com:9090'; eg just running restart-jobs directly: [emorley@treeherderadm.private.scl3 ~]$ sudo /root/bin/restart-jobs -p web syntax error at line 1, column 0, byte 0 at /usr/lib64/perl5/XML/Parser.pm line 187 500 Connect failed: connect: Connection refused; Connection refused NOTICE: treeherder1.webapp.scl3.mozilla.com hasn't drained in 300s. Please verify active connections! NOTICE: Push script will abort in 300s if still in wait ... This may end up being wontfix, since: * Treeherder moving to Heroku soon * I'm not sure how the Zeus nodes are set up, and zlb1 still might be the most appropriate to make API calls to (eg if it's the master) * zlb1 will presumably be back online today However I was under the impression the treeherder drain/undrain script was used by other sites too, which may not be moving to Heroku any time soon - so may still be good to make it handle this case more gracefully :-) [1] https://github.com/mozilla/treeherder/blob/8c1e8f9fccea18b1606c7bb3ea9e2808a808a7af/deployment/update/update.py#L130
Flags: needinfo?(klibby)
Reporter | ||
Updated•8 years ago
|
Summary: Treeherder deployments broken when zlb1.ops.scl3 down → Unable to deploy Treeherder when zlb1.ops.scl3 down
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•