Closed Bug 1225811 Opened 9 years ago Closed 7 years ago

Release the monkeys!

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: pmoore, Unassigned)

Details

We could maybe benefit from a Chaos Monkey rampaging through our TaskCluster kingdom, exposing dark corners and unlit paths. The canonical Chaos Monkey used at NetFlix operates within an AWS autoscaling group - since we are not deploying applications directly under our own AWS account but instead are using heroku for apps (with the exception of s3copyproxy), the original Chaos Monkey might not serve us well. The Janitor Monkey and Conformity Monkey may well be useful though: see details here: https://github.com/Netflix/SimianArmy/wiki Most of our instances run on spot nodes which need no attention from a monkey. They are already regularly brought down at inopportune times. Some things we could consider letting the janitor/conformity monkeys loose at are: Snapshots AMIs S3 buckets on-demand instances I'm not sure if there are any monkeys out in the wild that already pillage heroku resources, might be also something to think about. I'm not sure if heroku provides APIs to bring down individual nodes, but maybe some kind of janitor monkey would be good at cleaning up unused application environments etc. I'm not sure. I also like the idea of a monkey that can interrupt tcp/ip sessions between nodes, etc - or maybe we have enough network failures already without needing a monkey to cause additional ones. Business goals: * increase stability by detecting potential areas of failure * reduce costs by auto-clean up of (undesired) objects * improve consistency of resource management with conformity monkey Affected applications: * if we can find or implement a heroku monkey, then all the taskcluster core components * In AWS: s3copyproxy (any others?) Other affected resources: * snapshots, amis, on-demand nodes, loan/user machines, s3 buckets, ... Not affected: * docker-worker spot instances (AFAIK we don't support on-demand instances yet)
Component: General → Discussion
Component: Discussion → Operations
Found in triage. While this would be nice, I don't think we'll be investing effort in this any time soon.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
besides spot already does this for us.
Component: Operations → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.