Closed Bug 1047392 Opened 11 years ago Closed 11 years ago

p2v phx bouncer1-5 web nodes

Categories

(Infrastructure & Operations :: Virtualization, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gozer, Assigned: nmaul)

Details

(Keywords: p2v, Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/672] [vm-p2v:5])

After bug 1047073, I'd like to also request P2Ving the bouncer nodes in phx, specifically, these are: - bouncer1.webapp.phx1.mozilla.com - bouncer2.webapp.phx1.mozilla.com - bouncer3.webapp.phx1.mozilla.com - bouncer4.webapp.phx1.mozilla.com - bouncer5.webapp.phx1.mozilla.com - bouncer6.webapp.phx1.mozilla.com - bouncer7.webapp.phx1.mozilla.com - bouncer8.webapp.phx1.mozilla.com - bouncer9.webapp.phx1.mozilla.com - bouncer10.webapp.phx1.mozilla.com If you need to coordinate with me, so I can drain them out of Zeus, just let me know. Thanks!
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/672]
Pausing for a bit on this, as there needs to be some capacity planning conversations - look for an email from :cshields later today.
Per a verbal in the webops standup meeting: webops to drain 1-5 and leave drained; storvirt to opportunistically p2v 1-5 and then power off, pending new hardware in phx1 or an emergency, and leave emergency instructions for MOC while the V's remain off.
1-5 are drained and ready to go at any time.
Assignee: server-ops-virtualization → nmaul
alright, starting on bouncer1 - will set nagios downtimes for expiration on Monday 8/11, as we expect to have that capacity in place by then.
So, funny story - bouncers1,2,4 aren't able to be reached from the P2V host - they can ping P2V but P2V can't ping them, so I'm not able to P2V them at this time - I can reach 3 and 5 however, and will P2V those for happiness sake. If you have other directions/desires, let me know.
Keywords: p2v
So, looks like 3 and 5 are just liars. Anytime I try to do anything more than ping or simple ssh, and real data transfer, they go non-responsive on the network. So, I can't P2V these - I need good working network access - the P2V host has an interface in the webapp vlan - so it's all local, and I was able to get about 5% through one of 7 conversions - it's just not designed to work with a dodgy network. So, How can I help? can I setup new VMs? can I P2V some of the others?
Well, with continued reduction of load on the seamicros, I was able to P2V bouncer3.webapp.phx1 - which is now virtual, and off until the previously mentioned capacity comes online. I have hopes that I will be able to get 5 P2V'd as well, but so far, 1,2,4 remain unreponsive. If you have any change direction commentary, please let me know, else, I'll try to P2V 5 first thing Monday.
And it's now first thing Monday morning. 1,2,4 remain unresponsive, 5 also has gone unresponsive. So, we've P2V'd 3, but I'm not sure how much you care - what would you like me to do about 1,2,4,5? I can create new blank VMs for them ... let me know of any other desires.
Yeah, not critical to do anything with them right now, we've got plenty of capacity on bouncer6-10 on the other seamicro chassis. We can put this on the back-burner for at least a few days/weeks. Related question: what specs (CPU/mem) does the p2v'd bouncer3 have? When we have the UCS capacity, we can spin up 3, and p2v any of 6-10 (probably just 2 of them, even). I wouldn't be surprised if 1 VM can handle the load entirely, but I definitely want at least 2 for better resiliency.
bouncer3 is currently at 2CPU and 8G RAM - which matches the VM bouncers in SCL3 - which is my memory of what was verbaled - if that's wrong, let me know and we can suss out the right path forward. and yup - we're on tenterhooks, waiting for the UCS spinup. Glad to hear we're continuing to look at true needs - that's *really* helpful. Let me know if you've got questions/concerns.
So, thanks to continuing work on stabilizing the seamicros - looks like I'll be able to P2V 1-5 after all. (they're still down, but now we can turn one blade on and evacuate) I've gotten 1&2 this morning so far, and I'll see if I can't get 4&5 later today - that way we'll have 1-5 ready to roll when the new capacity is in place (hopefully the physical rack& stack will happen today) Then we can see about pivoting to purely virtual. Sound like a plan?
And now 1-5 have all been P2V'd and are waiting on capacity.
Capacity has been added, bouncer{1-5}.webapp.phx1.mozilla.com are now up and should be ready for you to test/use. Let me know of any concerns.
bouncer1-5 seem good, so I've enabled them in ZLB and they're serving traffic. All seems well. If/when we proceed with decommissioning all our seamicro nodes, we should consider if we even need 6-10. For now I've left them on since they're behaving just fine, but if that chassis starts to act up we should be able to shut them all down and not replace or P2V them. In light of that, I'm going to close this out even though we haven't done anything with 6-10. If they start breaking, we'll just kill them outright.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Summary: p2v phx bouncer web nodes → p2v phx bouncer1-5 web nodes
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/672] → [kanban:https://kanbanize.com/ctrl_board/4/672] [vm-p2v:5]
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.