Closed Bug 1447785 Opened 7 years ago Closed 7 years ago

large spike in number of running instances for balrog prod on Feb 5

Categories

(Cloud Services :: Operations: Miscellaneous, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: oremj)

Details

I noticed this while staring at Datadog today - on Feb 5th (around 0800 UTC) we jumped from ~40 instances to ~90 instances. I don't see a spike in requests around that time, nor am I aware of a deployment happening. https://screenshots.firefox.com/1nMdAZCf25LZBOpL/app.datadoghq.com It's not causing any problems, but it seems like something we should be able to explain. I see instances cycling around this time, which makes me wonder if we switched to smaller instances, and need more to handle the traffic now?
Flags: needinfo?(oremj)
I was occasionally getting paged, because the app would scale down too far and then get a huge burst of requests, which would degrade the service until it had time to scale back up, so pinned it to never go below 70, which seems about what we need to handle the traffic bursts. It's possible that on Feb 5th, I manually flipped it to 90 for recovery.
Assignee: nobody → oremj
Flags: needinfo?(oremj)
Ahhhh, okay. Thanks!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.