1200800 - Create 2 Ubuntu 14.04 64bit VMs to replace mm-ci-staging.qa.scl3.mozilla.com and mm-ci-production.qa.scl3.mozilla.com

Reporter

Description

•

9 years ago

In bug 1200139 we see constant crashes of Java due to no more memory available. Reason here seems to be that we reach the 32bit boundary. Sadly both machines have initially setup with a 32bit OS. We should change that to have 64bit OSes.

For the specs please use the same as what we have for the current machines. For mm-ci-production this will be: 8GB, 2 CPU, 16G /, 50G /data. I don't have the specs handy for staging, but those will be easy to find.

Chris, it would be great if you could do this soon given that a crash each of the last days always brings our CI system down. Thanks!

Flags: needinfo?(cknowles)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

9 years ago

Whiteboard: [qa-automation-blocked]

Chris Knowles [:cknowles]

Assignee

Comment 1

•

9 years ago

assuming I can get a quick answer to this, I should be able to at least start on that tomorrow.

The question is this ... am I destroying the current mm-ci-{staging,production} and spinning up new, or am I creating new ones with new names (mm-ci-{staging,production}-new ? ) and then we'll cleanup and reconcile later?

Flags: needinfo?(cknowles)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 2

•

9 years ago

Chris also asked my on IRC and i gave a reply there. Just to sync up here... we need two new VMs and we cannot replace the current ones right now. I need some time to setup the servers, so the service would not be available. Means we will have to swap machines once the new VMs have been finished.

Chris Knowles [:cknowles]

Assignee

Comment 3

•

9 years ago

Alright, those boxes exist.  Not puppeted, per your usual, generated from your templates. 

mm-ci-production-new.qa.scl3.mozilla.com as specced
mm-ci-staging-new.qa.scl3.mozilla.com same drive geometry.  1 CPU and 2G RAM to match its predecessor.  

I have not added to Nagios, as these are temporary as named.

Assignee: server-ops-virtualization → cknowles

Whiteboard: [qa-automation-blocked] → [qa-automation-blocked][vm-create:2]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 4

•

9 years ago

So I configured both VMs for mozmill-ci and it seems to work fine. 

Chris, I would like to replace the staging machine before the weekend so that we have some test results from over the weekend to make a decision about production most likely early next week.

Best would be to coordinate the switch (IP change, DNS name change) via IRC. I will be around the whole day tomorrow. Please let me know when it would work best for you. Thanks!

Chris Knowles [:cknowles]

Assignee

Comment 5

•

9 years ago

I should be available in the morning 0700-ish Eastern ... (any earlier than that and you're likely running into non-caffeinated me, which has all sorts of risks.)

I'll ping on IRC.

Chris Knowles [:cknowles]

Assignee

Comment 6

•

9 years ago

Pinged, did the switcheroo.  we now have mm-ci-staging-old.qa.scl3 and mm-ci-staging.qa.scl3 (the VM created in this bug)

:whimboo verified that things look acceptable - will monitor and make sure all is well, before scheduling the prod cutover.  From IRC, potentially Mon/Tues - though I'm flexible.

VMs renamed in inventory, vsphere (migrated datastores to make the names  "real"), spreadsheets.  

Let me know if you need anything else here.  Thanks!

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 7

•

9 years ago

I did a check of the new staging VM and it all is working perfect. I cannot see anything which is broken since we swapped those VMs. I think we can replace production when Chris is back and QA doesn't have to run any tests. So I hope it will be tomorrow.

Status: NEW → ASSIGNED

Chris Knowles [:cknowles]

Assignee

Comment 8

•

9 years ago

I will be around on Tuesday, though I have a Dentist appointment ~0845 Eastern - but other than that I should be available.  Let me know timings on when QA is able to let us have this.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 9

•

9 years ago

Robert, will there be any beta build to test tomorrow or do we have enough time for the transition to the new box? Thanks.

Flags: needinfo?(kairo)

Robert Kaiser

Comment 10

•

9 years ago

I was out yesterday - but yes, today we are running update tests. Usually we do every Tuesday and Friday.

Flags: needinfo?(kairo)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 11

•

9 years ago

Thanks Robert. Chris, so we will do it tomorrow then! It should give us enough time until Friday.

Chris Knowles [:cknowles]

Assignee

Comment 12

•

9 years ago

Alright Henrik, I'll ping you in the AM and we can work on getting it switched over.

Chris Knowles [:cknowles]

Assignee

Comment 13

•

9 years ago

Alright, switched the prod machines, and we now have mm-ci-production.qa.scl3 (the new prod box) and mm-ci-production-old.qa.scl3.

I've updated inventory, spreadsheets, and done storage migration to make the names stick.

Per IRC conversation with :whimboo - will reapproach on Monday 9/14 to determine if the -old machines can be removed.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 14

•

9 years ago

We haven't run any tests for the latest 41.0b9 last Friday. This happened for both staging and production. I don't think that this is related to our replacement here, but more a Pulse problem. I will have to figure that out first before we can finally close this bug.

Chris Knowles [:cknowles]

Assignee

Comment 15

•

9 years ago

Alright, moved the reminder to reapproach about decom to later.  Let me know if you need anything from me on this.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

9 years ago

Depends on: 1204488

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 16

•

9 years ago

Ok, it turns out that the underlying issue as reported in my last comment is not related to this host replacement. So we are fine here. I don't see why we should keep the old boxes around anytime longer.

No longer depends on: 1204488

Chris Knowles [:cknowles]

Assignee

Comment 17

•

9 years ago

Alright the -old VMs are powered off.  no nagios, no puppet, no NFS, in that they've changed IP addresses, really nothing is left ...

Will keep these down for a week, and then destroy, unless you let me know that you need them back.

Chris Knowles [:cknowles]

Assignee

Comment 18

•

9 years ago

Alright, a week has passed, no screaming.  The -old boxes are deleted from disk and removed from inventory.  Never were puppeted by that name, and never RHN'd no nfs and no backups - they're all gone.  Closing things out.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Whiteboard: [qa-automation-blocked][vm-create:2] → [qa-automation-blocked][vm-create:2][vm-delete:2]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 19

•

9 years ago

Thanks Chris.

Bugzilla

Quick Search

Create 2 Ubuntu 14.04 64bit VMs to replace mm-ci-staging.qa.scl3.mozilla.com and mm-ci-production.qa.scl3.mozilla.com

Categories

(Infrastructure & Operations :: Virtualization, task)

Tracking

(Not tracked)

People

(Reporter: whimboo, Assigned: cknowles)

References

Details

(Whiteboard: [qa-automation-blocked][vm-create:2][vm-delete:2])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Comment 16

Comment 17

Comment 18

Comment 19