Unable to deploy Treeherder when zlb1.ops.scl3 down

RESOLVED FIXED

Status

Tree Management
Treeherder: Infrastructure
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: emorley, Unassigned)

Tracking

Details

(Reporter)

Description

2 years ago
In bug 1284456, zlb1 has been taken offline temporarily.

The other Zeus nodes are handling requests to treeherder.{allizom,mozilla}.org fine, however for the actual deployments, the drain/undrain commands are hardcoded to zlb1, so now fail.

Our chief deployment script [1] calls /root/bin/restart-jobs, which sources /root/bin/th_functions.sh, which in turn uses /root/bin/zxtmpool, which contains:

# List Zeus endpoints
my $zeus_scl3 = 'REDACTED_IP:9090'; # 'zlb1.ops.scl3.mozilla.com:9090';
my $zeus_phx1 = 'REDACTED_IP:9090'; # 'zlb8.ops.phx1.mozilla.com:9090';

eg just running restart-jobs directly:
[emorley@treeherderadm.private.scl3 ~]$ sudo /root/bin/restart-jobs -p web

syntax error at line 1, column 0, byte 0 at /usr/lib64/perl5/XML/Parser.pm line 187
500 Connect failed: connect: Connection refused; Connection refused
NOTICE: treeherder1.webapp.scl3.mozilla.com hasn't drained in 300s. Please verify active connections!
NOTICE: Push script will abort in 300s if still in wait
...


This may end up being wontfix, since:
* Treeherder moving to Heroku soon
* I'm not sure how the Zeus nodes are set up, and zlb1 still might be the most appropriate to make API calls to (eg if it's the master)
* zlb1 will presumably be back online today

However I was under the impression the treeherder drain/undrain script was used by other sites too, which may not be moving to Heroku any time soon - so may still be good to make it handle this case more gracefully :-)


[1] https://github.com/mozilla/treeherder/blob/8c1e8f9fccea18b1606c7bb3ea9e2808a808a7af/deployment/update/update.py#L130
Flags: needinfo?(klibby)
(Reporter)

Updated

2 years ago
Summary: Treeherder deployments broken when zlb1.ops.scl3 down → Unable to deploy Treeherder when zlb1.ops.scl3 down
(Reporter)

Updated

2 years ago
See Also: → bug 1284505

Updated

2 years ago
No longer blocks: 1284456
Depends on: 1284456

Updated

2 years ago
Depends on: 1284876
zxtmpool updated to use external.zlb CNAME
Flags: needinfo?(klibby)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.