Closed
Bug 1225811
Opened 9 years ago
Closed 7 years ago
Release the monkeys!
Categories
(Taskcluster :: Operations and Service Requests, task)
Taskcluster
Operations and Service Requests
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: pmoore, Unassigned)
Details
We could maybe benefit from a Chaos Monkey rampaging through our TaskCluster kingdom, exposing dark corners and unlit paths.
The canonical Chaos Monkey used at NetFlix operates within an AWS autoscaling group - since we are not deploying applications directly under our own AWS account but instead are using heroku for apps (with the exception of s3copyproxy), the original Chaos Monkey might not serve us well.
The Janitor Monkey and Conformity Monkey may well be useful though: see details here: https://github.com/Netflix/SimianArmy/wiki
Most of our instances run on spot nodes which need no attention from a monkey. They are already regularly brought down at inopportune times. Some things we could consider letting the janitor/conformity monkeys loose at are:
Snapshots
AMIs
S3 buckets
on-demand instances
I'm not sure if there are any monkeys out in the wild that already pillage heroku resources, might be also something to think about. I'm not sure if heroku provides APIs to bring down individual nodes, but maybe some kind of janitor monkey would be good at cleaning up unused application environments etc. I'm not sure.
I also like the idea of a monkey that can interrupt tcp/ip sessions between nodes, etc - or maybe we have enough network failures already without needing a monkey to cause additional ones.
Business goals:
* increase stability by detecting potential areas of failure
* reduce costs by auto-clean up of (undesired) objects
* improve consistency of resource management with conformity monkey
Affected applications:
* if we can find or implement a heroku monkey, then all the taskcluster core components
* In AWS: s3copyproxy (any others?)
Other affected resources:
* snapshots, amis, on-demand nodes, loan/user machines, s3 buckets, ...
Not affected:
* docker-worker spot instances (AFAIK we don't support on-demand instances yet)
Updated•9 years ago
|
Component: General → Discussion
Updated•9 years ago
|
Component: Discussion → Operations
Comment 1•7 years ago
|
||
Found in triage.
While this would be nice, I don't think we'll be investing effort in this any time soon.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Comment 2•7 years ago
|
||
besides spot already does this for us.
Assignee | ||
Updated•6 years ago
|
Component: Operations → Operations and Service Requests
You need to log in
before you can comment on or make changes to this bug.
Description
•