use elastic load balancing for EC2 collectors

RESOLVED FIXED

Status

Socorro
Infra
RESOLVED FIXED
3 years ago
a year ago

People

(Reporter: rhelmer, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
We should be able to minimize cost and also handle large crash spikes by using the elastic loadbalancer feature to automatically spin up/down collector EC2 nodes.

Comment 1

2 years ago
It sounds like we're thinking of setting up autoscaling here, rather than a feature of ELBs.

To that end, I looked to verify this was already setup, and found some bugs in the update infrastructure script which had prevented the scaling policies from being properly applied.

I've run this myself, and now we have scaling policies:

---
as-prod-collector-scale-down Execute policy when:
as-prod-collector-CPULow breaches the alarm threshold: CPUUtilization < 20 for 3 consecutive periods of 300 seconds
for the metric dimensions
AutoScalingGroupName = as-prod-collector
Take the action:
Remove 3 instances
	
  
And then wait:
300 seconds before allowing another scaling activityas-prod-collector-scale-up
Execute policy when:
as-prod-collector-CPUHig breaches the alarm threshold: CPUUtilization > 70 for 300 seconds
for the metric dimensions
AutoScalingGroupName = as-prod-collector
Take the action:
Add 6 instances
And then wait:
300 seconds before allowing another scaling activity
---

Do we think we should scale a bit harder?  I have it defaulted to adding 6, but am considering (and would like feedback on) the idea of doing, say...12 or 18.

https://github.com/mozilla/socorro-infra/pull/182 should fix the script errors to make this an automated addition to our infra, as a runnable Jenkins job.

Comment 2

a year ago
We are in prod, and have been for a while.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.