Closed Bug 1058732 Opened 10 years ago Closed 9 years ago

[docker-worker] be more aggressive with shutdowns

Categories

(Taskcluster :: Workers, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jlal, Unassigned, Mentored)

Details

Right now the docker worker only shutdowns if idle (by its internal standards) but its entirely possible for docker to fail to start correctly or some other startup condition which causes the worker node to stay running forever (until someone manually kills it) our start scripts should be smart enough to try for N amount of times to start correctly then shutdown.
Mentor: jlal
In some ways I like this for the simplicity, it's basically:
$ node ./bin/docker-worker  # Maybe repeat a few times
$ sudo shutdown -h now

I've done the above before in a similar setting, and it works... Until something breaks :)
Then the aws-provisioner will continue to spawn nodes and nodes will start, fail and shutdown.

So I think we need to add a ping end-point, so that aws-provisioner can ping the node to see if docker-worker is running. And then put this shutdown condition into the aws-provisioner. And make some limitations on the number of worker starts that can fail, before alarms go off, and eventually aws-provisioner stops provisioning the _broken_ workerType.

There are other reasons for having a secondary instance-killer built into the aws-provisioner too.
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
With changes recently for secrets handling, if the worker crashes it cannot retrieve the worker secrets so it sets the capacity to 0 and will wait out the billing cycle.  The choice for this behavior is so we don't get into a startup/crash/respawn cycle with all the workers.

As far as workers starting up and not doing anything, I have not seen this issue in a long time.  Closing this issue for now as it's a year old and hasn't been reported as a necessary thing.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Component: Docker-Worker → Workers
You need to log in before you can comment on or make changes to this bug.