Closed Bug 1223744 Opened 9 years ago Closed 8 years ago

Dynamically auto-scale WebApp instances based on load

Categories

(Cloud Services :: Server: Location, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hschlichting, Assigned: ckolos)

References

Details

There is good chance that MLS will see spikes of traffic to the country API's going past the current capacity from new clients. These spikes are likely going to last at most several days in duration, with request volumes reaching thousands of requests per second, possibly 10k requests/sec or more. We don't expect to see a sustained increased in traffic beyond the current capacity.

The spike would occur from all Firefox Desktop clients hitting the service once while refreshing their cached GeoIP / country information. As Firefox Desktop updates have staggered rollouts, there shouldn't be a completely abrupt spike.

Since this only concerns the country API, we only need to scale the WebApp role to meet the increased demand. These queries don't hit Redis or the database.

A single EC2 instance of the current size should handle around 2500 requests / second. One of these requests takes ~1.5ms to process and there are 4 application processes running on each instance ((1000ms/1s) / (1.5ms/request) * 4 processes == 2666 requests/sec). This number was confirmed in load testing benchmarks earlier this year.

Since a single instance gets us a significant increase in capacity, I'd suggest we do auto-scaling actions that increase the ASG size by increments of 1. As a trigger we might use reaching 70% or 80% of average CPU on the instances. But we probably have proven configs from other services for this.
Blocks: 1223761
Depends on: 1239375
Stacks containing this configuration were deployed during the week of Jan 10 2016. This went live in prod on Jan 14th 2016.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.