Closed
Bug 1200800
Opened 9 years ago
Closed 9 years ago
Create 2 Ubuntu 14.04 64bit VMs to replace mm-ci-staging.qa.scl3.mozilla.com and mm-ci-production.qa.scl3.mozilla.com
Categories
(Infrastructure & Operations :: Virtualization, task)
Infrastructure & Operations
Virtualization
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: whimboo, Assigned: cknowles)
References
Details
(Whiteboard: [qa-automation-blocked][vm-create:2][vm-delete:2])
In bug 1200139 we see constant crashes of Java due to no more memory available. Reason here seems to be that we reach the 32bit boundary. Sadly both machines have initially setup with a 32bit OS. We should change that to have 64bit OSes. For the specs please use the same as what we have for the current machines. For mm-ci-production this will be: 8GB, 2 CPU, 16G /, 50G /data. I don't have the specs handy for staging, but those will be easy to find. Chris, it would be great if you could do this soon given that a crash each of the last days always brings our CI system down. Thanks!
Flags: needinfo?(cknowles)
Reporter | ||
Updated•9 years ago
|
Whiteboard: [qa-automation-blocked]
Assignee | ||
Comment 1•9 years ago
|
||
assuming I can get a quick answer to this, I should be able to at least start on that tomorrow. The question is this ... am I destroying the current mm-ci-{staging,production} and spinning up new, or am I creating new ones with new names (mm-ci-{staging,production}-new ? ) and then we'll cleanup and reconcile later?
Flags: needinfo?(cknowles)
Reporter | ||
Comment 2•9 years ago
|
||
Chris also asked my on IRC and i gave a reply there. Just to sync up here... we need two new VMs and we cannot replace the current ones right now. I need some time to setup the servers, so the service would not be available. Means we will have to swap machines once the new VMs have been finished.
Assignee | ||
Comment 3•9 years ago
|
||
Alright, those boxes exist. Not puppeted, per your usual, generated from your templates. mm-ci-production-new.qa.scl3.mozilla.com as specced mm-ci-staging-new.qa.scl3.mozilla.com same drive geometry. 1 CPU and 2G RAM to match its predecessor. I have not added to Nagios, as these are temporary as named.
Assignee: server-ops-virtualization → cknowles
Whiteboard: [qa-automation-blocked] → [qa-automation-blocked][vm-create:2]
Reporter | ||
Comment 4•9 years ago
|
||
So I configured both VMs for mozmill-ci and it seems to work fine. Chris, I would like to replace the staging machine before the weekend so that we have some test results from over the weekend to make a decision about production most likely early next week. Best would be to coordinate the switch (IP change, DNS name change) via IRC. I will be around the whole day tomorrow. Please let me know when it would work best for you. Thanks!
Assignee | ||
Comment 5•9 years ago
|
||
I should be available in the morning 0700-ish Eastern ... (any earlier than that and you're likely running into non-caffeinated me, which has all sorts of risks.) I'll ping on IRC.
Assignee | ||
Comment 6•9 years ago
|
||
Pinged, did the switcheroo. we now have mm-ci-staging-old.qa.scl3 and mm-ci-staging.qa.scl3 (the VM created in this bug) :whimboo verified that things look acceptable - will monitor and make sure all is well, before scheduling the prod cutover. From IRC, potentially Mon/Tues - though I'm flexible. VMs renamed in inventory, vsphere (migrated datastores to make the names "real"), spreadsheets. Let me know if you need anything else here. Thanks!
Reporter | ||
Comment 7•9 years ago
|
||
I did a check of the new staging VM and it all is working perfect. I cannot see anything which is broken since we swapped those VMs. I think we can replace production when Chris is back and QA doesn't have to run any tests. So I hope it will be tomorrow.
Status: NEW → ASSIGNED
Assignee | ||
Comment 8•9 years ago
|
||
I will be around on Tuesday, though I have a Dentist appointment ~0845 Eastern - but other than that I should be available. Let me know timings on when QA is able to let us have this.
Reporter | ||
Comment 9•9 years ago
|
||
Robert, will there be any beta build to test tomorrow or do we have enough time for the transition to the new box? Thanks.
Flags: needinfo?(kairo)
Comment 10•9 years ago
|
||
I was out yesterday - but yes, today we are running update tests. Usually we do every Tuesday and Friday.
Flags: needinfo?(kairo)
Reporter | ||
Comment 11•9 years ago
|
||
Thanks Robert. Chris, so we will do it tomorrow then! It should give us enough time until Friday.
Assignee | ||
Comment 12•9 years ago
|
||
Alright Henrik, I'll ping you in the AM and we can work on getting it switched over.
Assignee | ||
Comment 13•9 years ago
|
||
Alright, switched the prod machines, and we now have mm-ci-production.qa.scl3 (the new prod box) and mm-ci-production-old.qa.scl3. I've updated inventory, spreadsheets, and done storage migration to make the names stick. Per IRC conversation with :whimboo - will reapproach on Monday 9/14 to determine if the -old machines can be removed.
Reporter | ||
Comment 14•9 years ago
|
||
We haven't run any tests for the latest 41.0b9 last Friday. This happened for both staging and production. I don't think that this is related to our replacement here, but more a Pulse problem. I will have to figure that out first before we can finally close this bug.
Assignee | ||
Comment 15•9 years ago
|
||
Alright, moved the reminder to reapproach about decom to later. Let me know if you need anything from me on this.
Reporter | ||
Comment 16•9 years ago
|
||
Ok, it turns out that the underlying issue as reported in my last comment is not related to this host replacement. So we are fine here. I don't see why we should keep the old boxes around anytime longer.
No longer depends on: 1204488
Assignee | ||
Comment 17•9 years ago
|
||
Alright the -old VMs are powered off. no nagios, no puppet, no NFS, in that they've changed IP addresses, really nothing is left ... Will keep these down for a week, and then destroy, unless you let me know that you need them back.
Assignee | ||
Comment 18•9 years ago
|
||
Alright, a week has passed, no screaming. The -old boxes are deleted from disk and removed from inventory. Never were puppeted by that name, and never RHN'd no nfs and no backups - they're all gone. Closing things out.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [qa-automation-blocked][vm-create:2] → [qa-automation-blocked][vm-create:2][vm-delete:2]
Reporter | ||
Comment 19•9 years ago
|
||
Thanks Chris.
You need to log in
before you can comment on or make changes to this bug.
Description
•