Closed Bug 1226180 Opened 9 years ago Closed 7 years ago

decommission t-yosemite-r5 machines

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: arich)

References

Details

Attachments

(3 files)

We need to do rolling decommissions of the t-yosemite-r5 machines so we have enough space for the new r7 machines.
To make life easier for dcops, I'm going to pick machines all from the same rack that are (mostly) sharing chassis together. There's one or two outliers that are sharing with an OS X build machine.

Here are the first 48 machines we're going to decommission (also listed in https://docs.google.com/spreadsheets/d/1o4C9aUDmyIwn7VgAxur2_GIlOwx7f92hii_oL8lnjMU/edit#gid=72561714):

t-yosemite-r5-0008.test.releng.scl3.mozilla.com
t-yosemite-r5-0009.test.releng.scl3.mozilla.com
t-yosemite-r5-0010.test.releng.scl3.mozilla.com
t-yosemite-r5-0011.test.releng.scl3.mozilla.com
t-yosemite-r5-0051.test.releng.scl3.mozilla.com
t-yosemite-r5-0054.test.releng.scl3.mozilla.com
t-yosemite-r5-0055.test.releng.scl3.mozilla.com
t-yosemite-r5-0056.test.releng.scl3.mozilla.com
t-yosemite-r5-0057.test.releng.scl3.mozilla.com
t-yosemite-r5-0058.test.releng.scl3.mozilla.com
t-yosemite-r5-0059.test.releng.scl3.mozilla.com
t-yosemite-r5-0060.test.releng.scl3.mozilla.com
t-yosemite-r5-0061.test.releng.scl3.mozilla.com
t-yosemite-r5-0062.test.releng.scl3.mozilla.com
t-yosemite-r5-0063.test.releng.scl3.mozilla.com
t-yosemite-r5-0064.test.releng.scl3.mozilla.com
t-yosemite-r5-0065.test.releng.scl3.mozilla.com
t-yosemite-r5-0066.test.releng.scl3.mozilla.com
t-yosemite-r5-0067.test.releng.scl3.mozilla.com
t-yosemite-r5-0068.test.releng.scl3.mozilla.com
t-yosemite-r5-0069.test.releng.scl3.mozilla.com
t-yosemite-r5-0070.test.releng.scl3.mozilla.com
t-yosemite-r5-0071.test.releng.scl3.mozilla.com
t-yosemite-r5-0072.test.releng.scl3.mozilla.com
t-yosemite-r5-0073.test.releng.scl3.mozilla.com
t-yosemite-r5-0074.test.releng.scl3.mozilla.com
t-yosemite-r5-0075.test.releng.scl3.mozilla.com
t-yosemite-r5-0076.test.releng.scl3.mozilla.com
t-yosemite-r5-0077.test.releng.scl3.mozilla.com
t-yosemite-r5-0078.test.releng.scl3.mozilla.com
t-yosemite-r5-0079.test.releng.scl3.mozilla.com
t-yosemite-r5-0080.test.releng.scl3.mozilla.com
t-yosemite-r5-0081.test.releng.scl3.mozilla.com
t-yosemite-r5-0082.test.releng.scl3.mozilla.com
t-yosemite-r5-0083.test.releng.scl3.mozilla.com
t-yosemite-r5-0084.test.releng.scl3.mozilla.com
t-yosemite-r5-0085.test.releng.scl3.mozilla.com
t-yosemite-r5-0086.test.releng.scl3.mozilla.com
t-yosemite-r5-0087.test.releng.scl3.mozilla.com
t-yosemite-r5-0088.test.releng.scl3.mozilla.com
t-yosemite-r5-0089.test.releng.scl3.mozilla.com
t-yosemite-r5-0090.test.releng.scl3.mozilla.com
t-yosemite-r5-0091.test.releng.scl3.mozilla.com
t-yosemite-r5-0093.test.releng.scl3.mozilla.com
t-yosemite-r5-0104.test.releng.scl3.mozilla.com
t-yosemite-r5-0105.test.releng.scl3.mozilla.com
t-yosemite-r5-0106.test.releng.scl3.mozilla.com
t-yosemite-r5-0107.test.releng.scl3.mozilla.com

Coop, can you please disable them and create patches to pull them out of configs (unless you want to do one big config cleanup at the end)?

Dcops: can you please plan out which of the new machines are going where and put the PDU, switch, rack, etc info in https://docs.google.com/spreadsheets/d/1o4C9aUDmyIwn7VgAxur2_GIlOwx7f92hii_oL8lnjMU/edit#gid=1373260922 ? We're starting at t-yosemite-r7-0065.test.releng.scl3.mozilla.com since we've already racked and installed 0001 - 0064. 

Once that's information is in place, I'll update inventory with the new host's info. And modify nagios.
Flags: needinfo?(vle)
Flags: needinfo?(coop)
Attached you can find the production_config.py
Attachment #8689583 - Flags: review?(kmoir)
Comment on attachment 8689583 [details] [diff] [review]
bug1226180_production_config.py.patch

You need to disable the machines in slavealloc before we can land this change.  Also, we will need patches to remove the machines from slavealloc once the existing jobs have completed.
Attachment #8689583 - Flags: review?(kmoir) → review+
Disable them in slavealloc and added "decommed" as a note to them
do you guys plan to format these machines as well?
No we are not going to reuse them.
(In reply to Van Le [:van] from comment #5)

I believe the last we talked, we could just hand them over to the disposal company with a chain of custody that verified they disposed of the disk in an approved way, right Van?
:arr, yup we can do that during the disposal process. i wasn't sure if you guys were going to donate them like last time so a pre-wipe would be required.
I think the decision was that donation was too time consuming for everyone involved.
Flags: needinfo?(coop)
I've commented out this batch of machines from nagios so that we don't get alerts for them as buildbot stops.
Attachment #8689583 - Flags: checked-in+
arr: do you know when these machines will be decommissioned and new ones moved into their racks?  We're seeing high pending counts on the yosemite pools so we are anxious to add the new machines
Flags: needinfo?(arich)
I enabled 65-86 because I saw they were puppetized, they are now running jobs.
Flags: needinfo?(arich)
Dcops has been replacing the 48 r5 minis in the first batch yesterday and today. They're turning them on in batches of 10 to install.
Flags: needinfo?(vle)
Kim is decommissioning the following machines today so dcops can work on staging their replacements (and installing them on Friday):


t-yosemite-r5-0012.test.releng.scl3.mozilla.com
t-yosemite-r5-0013.test.releng.scl3.mozilla.com
t-yosemite-r5-0014.test.releng.scl3.mozilla.com
t-yosemite-r5-0015.test.releng.scl3.mozilla.com
t-yosemite-r5-0016.test.releng.scl3.mozilla.com
t-yosemite-r5-0017.test.releng.scl3.mozilla.com
t-yosemite-r5-0018.test.releng.scl3.mozilla.com
t-yosemite-r5-0019.test.releng.scl3.mozilla.com
t-yosemite-r5-0020.test.releng.scl3.mozilla.com
t-yosemite-r5-0021.test.releng.scl3.mozilla.com
t-yosemite-r5-0022.test.releng.scl3.mozilla.com
t-yosemite-r5-0023.test.releng.scl3.mozilla.com
t-yosemite-r5-0024.test.releng.scl3.mozilla.com
t-yosemite-r5-0025.test.releng.scl3.mozilla.com
t-yosemite-r5-0026.test.releng.scl3.mozilla.com
t-yosemite-r5-0027.test.releng.scl3.mozilla.com
t-yosemite-r5-0028.test.releng.scl3.mozilla.com
t-yosemite-r5-0052.test.releng.scl3.mozilla.com
t-yosemite-r5-0053.test.releng.scl3.mozilla.com
t-yosemite-r5-0029.test.releng.scl3.mozilla.com
t-yosemite-r5-0030.test.releng.scl3.mozilla.com
t-yosemite-r5-0031.test.releng.scl3.mozilla.com
t-yosemite-r5-0032.test.releng.scl3.mozilla.com
t-yosemite-r5-0033.test.releng.scl3.mozilla.com
They are disabled in slavealloc, just waiting for the jobs to finish.  Van said he would pull them after PT lunch
The following hosts have been decommissioned:

t-yosemite-r5-0034.test.releng.scl3.mozilla.com
t-yosemite-r5-0035.test.releng.scl3.mozilla.com
t-yosemite-r5-0036.test.releng.scl3.mozilla.com
t-yosemite-r5-0037.test.releng.scl3.mozilla.com
t-yosemite-r5-0038.test.releng.scl3.mozilla.com
t-yosemite-r5-0003.test.releng.scl3.mozilla.com
t-yosemite-r5-0039.test.releng.scl3.mozilla.com
t-yosemite-r5-0040.test.releng.scl3.mozilla.com
t-yosemite-r5-0095.test.releng.scl3.mozilla.com
t-yosemite-r5-0096.test.releng.scl3.mozilla.com
t-yosemite-r5-0097.test.releng.scl3.mozilla.com
t-yosemite-r5-0098.test.releng.scl3.mozilla.com
t-yosemite-r5-0099.test.releng.scl3.mozilla.com
t-yosemite-r5-0100.test.releng.scl3.mozilla.com
t-yosemite-r5-0101.test.releng.scl3.mozilla.com
t-yosemite-r5-0102.test.releng.scl3.mozilla.com
t-yosemite-r5-0103.test.releng.scl3.mozilla.com
t-yosemite-r5-0094.test.releng.scl3.mozilla.com
Attached you can find the production_config.py updated by removing the yosemite_r5 that has been decomm
Attachment #8707330 - Flags: feedback?(kmoir)
Attachment #8707330 - Flags: feedback?(kmoir) → feedback+
Comment on attachment 8707330 [details] [diff] [review]
bug1226180_production_config.py_v2.patch

I wanted to ask you for review not feedback
Attachment #8707330 - Flags: review?(kmoir)
Attachment #8707330 - Flags: review?(kmoir) → review+
We're aiming for 2016-02-09 to do the hardware work for batch 5.
The following hosts have been decommissioned:

t-snow-r4-0147.test.releng.scl3.mozilla.com
t-snow-r4-0148.test.releng.scl3.mozilla.com
t-snow-r4-0149.test.releng.scl3.mozilla.com
t-snow-r4-0150.test.releng.scl3.mozilla.com
t-snow-r4-0151.test.releng.scl3.mozilla.com
t-snow-r4-0152.test.releng.scl3.mozilla.com
t-snow-r4-0154.test.releng.scl3.mozilla.com
t-snow-r4-0155.test.releng.scl3.mozilla.com
t-snow-r4-0156.test.releng.scl3.mozilla.com
t-snow-r4-0157.test.releng.scl3.mozilla.com
t-snow-r4-0158.test.releng.scl3.mozilla.com
t-snow-r4-0159.test.releng.scl3.mozilla.com
t-snow-r4-0161.test.releng.scl3.mozilla.com
t-snow-r4-0162.test.releng.scl3.mozilla.com
t-snow-r4-0163.test.releng.scl3.mozilla.com
t-yosemite-r5-0041.test.releng.scl3.mozilla.com
t-yosemite-r5-0043.test.releng.scl3.mozilla.com
t-yosemite-r5-0044.test.releng.scl3.mozilla.com
t-yosemite-r5-0045.test.releng.scl3.mozilla.com
t-yosemite-r5-0046.test.releng.scl3.mozilla.com
t-yosemite-r5-0047.test.releng.scl3.mozilla.com
t-yosemite-r5-0048.test.releng.scl3.mozilla.com
t-yosemite-r5-0049.test.releng.scl3.mozilla.com
t-yosemite-r5-0050.test.releng.scl3.mozilla.com
t-yosemite-r5-0006.test.releng.scl3.mozilla.com
t-yosemite-r5-0005.test.releng.scl3.mozilla.com
t-yosemite-r5-0004.test.releng.scl3.mozilla.com
Attached patch bug1226180.patchSplinter Review
Attachment #8717568 - Flags: review?(vlad.ciobancai)
Attachment #8717568 - Flags: review?(vlad.ciobancai)
Attachment #8717568 - Flags: checked-in+
batch5 swap and reimage completed.
:arr do you have the hostnames for the new mac minis in batch5?
I thinks the hostnames are covered in bug 1247372
latest patch in production
Depends on: 1264407
Could probably do 0001, 0002 and 0007 now, apparently it's been 117 days since there was last any work for them.
Depends on: 1317878
(In reply to Phil Ringnalda (:philor) from comment #28)
> Could probably do 0001, 0002 and 0007 now, apparently it's been 117 days
> since there was last any work for them.

Yes please!
I did some DB cleaning for the t-yosemite-r5 machines. 
From what I could see, 001, 002 and 007 are still online with "status=production" in inventory.
:alin: are those ready to be decommissioned now?
Flags: needinfo?(aselagea)
Yes, they are.
Flags: needinfo?(aselagea)
All references to t-yosemite-r5 removed from nagios:
   efbe01f..c581bf1  master -> master

1, 2, and 7 shut down and decommissioned in inventory.
Van, please unrack these for disposal/donation.
Flags: needinfo?(vle)
1,2, and 7 unracked. will wipe/pull drive and dispose.
Flags: needinfo?(vle)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: