Closed Bug 1009029 Opened 11 years ago Closed 11 years ago

Spin up Bouncer VMs in SCL3

Categories

(Infrastructure & Operations :: Virtualization, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gozer, Assigned: cknowles)

References

Details

Currently, bouncer is being served out of PHX1 from bouncer{1,10}.webapp.phx1.mozilla.com. I would like to request some VMs to spin up a bouncer cluster in SCL3. As for the exact number of VMs and capacity, I'd like to request something similar in total capacity to what is being used in PHX1. That could be 4 bigger VMs instead of 10 nodes, for instance. Also, looking at the historical usage of that cluster, it looks like it's somewhat over-sized, so that could factor in as well.
So, bouncer{1..10}.webapp.phx1.mozilla.com are still hardware at this point - however, I looked at them as if they were going to be P2V'd - which is kinda what we're dealing with here. So, based on that, looks like 1 CPU and 8G RAM will more than suffice for them. None of the bouncers in the last week have gone above 10% CPU and for the vast majority of that time, have been below 2% CPU in use. (8 cores, so at most, that should be 80% with a normal load of 20ish%. for 1 core.) RAM's stayed pretty consistent at 6-7G in actual use. So then the question is, if you agree with those specs, and the nodes scale linearly, how many do you want? 10? (at 1 core, 8G) 5? (2 core, 16G)? You mention that the cluster in PHX1 may be oversized - how would you recommend scaling that down? I wouldn't recommend going up-to/beyond 4 core, 16G, as then the inefficiencies in scheduling all that starts to eat into the performance increase. If they don't scale linearly, or you think I've (entirely plausibly) missed something, let me know and we can work out how to proceed. CJK
Assignee: server-ops-virtualization → cknowles
(In reply to Chris Knowles [:cknowles] from comment #1) > So, bouncer{1..10}.webapp.phx1.mozilla.com are still hardware at this point > - however, I looked at them as if they were going to be P2V'd - which is > kinda what we're dealing with here. Yes, more or less. I considered for a moment just moving some of the HW from PHX1 to SCL3, but that seemed more trouble than it would be worth. > So, based on that, looks like 1 CPU and 8G RAM will more than suffice for > them. I think so, and it will be easily proven otherwise if that turns out not to be the case. But the numbers I see agree with you there. Bouncer is a heavily used service, but it's very lightweight on the server-side. > None of the bouncers in the last week have gone above 10% CPU and for the > vast majority of that time, have been below 2% CPU in use. (8 cores, so at > most, that should be 80% with a normal load of 20ish%. for 1 core.) RAM's > stayed pretty consistent at 6-7G in actual use. And that's a slightly high estimate, most of that RAM being used for caches/buffers. > So then the question is, if you agree with those specs, and the nodes scale > linearly, how many do you want? 10? (at 1 core, 8G) 5? (2 core, 16G)? I'd start with less VMs, personally, so 5 x (2 cores / 8 G), keeping the possibility of growing the cluster if it shows a need to. > You mention that the cluster in PHX1 may be oversized - how would you > recommend scaling that down? For now, I don't feel like I have quite enough data to back that up, but back of the envelope math tells me it could shrink by 40%-50% without affecting service levels at this time. And that's without the new cluster in SCL3. Assuming this goes well, I would think we'd probably end up with 2 clusters of 4-5 nodes each and have plenty of capacity to spare for bursts. It's important to remember Bouncer is the download redirector, so it will see spikes during releases and is a very critical user-facing service, so I'd want to not play it too close to the edge of what we need and keep it somewhat over-provisioned, just in case. > I wouldn't recommend going up-to/beyond 4 core, 16G, as then the > inefficiencies in scheduling all that starts to eat into the performance > increase. Yeah, that's not needed. > If they don't scale linearly, or you think I've (entirely plausibly) missed > something, let me know and we can work out how to proceed. Yes, they do. The only non-linear scaling point is external, it's a MySQL driven PHP app, so if anything causes issues of scaling, that's where it's going to happen.
So, to wrap up what I think I understand... names: bouncer{1..5}.webapp.scl3.mozilla.com OS: RHEL6_x86-64 CPU: 2 RAM: 8G Disk: Default (40G) With the understanding that we may need to spin up 6..10 in the future if traffic needs it, or the estimates were low. just hit me with a yes/no, and we can get spinning on these. CJK
(In reply to Chris Knowles [:cknowles] from comment #3) > So, to wrap up what I think I understand... > > names: bouncer{1..5}.webapp.scl3.mozilla.com > OS: RHEL6_x86-64 > CPU: 2 > RAM: 8G > Disk: Default (40G) Disk space could even be lower than that if you want to spare some, 20G would more than sufficient. > With the understanding that we may need to spin up 6..10 in the future if > traffic needs it, or the estimates were low. Yes, but I would consider that *very* unlikely, since the PHX1 cluster will still be there to catch the load. > just hit me with a yes/no, and we can get spinning on these. Yes please!
Alright, bouncer{1..5}.webapp.scl3.m.c have been spun up. Initial puppet has been applied, and initial Nagios has been put in, and should be updating shortly, they are ready for your customizations. Let me know if you need further assistance. CJK
Blocks: 1010453
Awesome turn-around time, thanks!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.