Make required changes so we can re-purpose all talos-r4-lion machines as talos-r4-snow machines

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: armenzg, Assigned: arr)

Tracking

Details

I've created this etherpad to help us coordinate this:
https://etherpad.mozilla.org/releng-10-7-10-6
We can add state under each line to help.

I would like to keep it as accurate as possible so we can keep track of which machine used to be which. This is very helpful for the sheriffs and to determine what hardware replacements have already happened on a specific machine.

What will the order of sequence be?
I remember we talked about it, however, I can't remember the specifics.
Mind if we meet in the morning to make sure that I got things right?

FYI, the machines are currently idle.

IIRC we wanted to leave things on a state by end of Wednesday so I could re-image the machines little by little during the Thursday and Friday.

Do I have to do changes on the machine to update the hostname? or is that updated through inventory?
> 
> Do I have to do changes on the machine to update the hostname? or is that
> updated through inventory?

The hostnames will need to change in inventory before reimaging.  I'll set the default (unknown) group to match the OSX 10.6 imaging workflow.

When you are ready to reimage the steps will be:
1. (assuming inventory has been updated and dhcp/dns has propagated)
1. delete old computer record from r4-lion group in Deploy Studio
2. log in to mini to be reimaged
3. issue a bless command and reboot (/usr/sbin/bless --netboot --server bsdp://10.12.48.8; reboot)

The mini will reboot and begin the imaging workflow under the default group.  Hostname will be looked up from DNS during this process and will be set on the computer and a new deploystudio record will appear with the new hostname under the default group.

Armen, if you would like, I could walk you through the process in deploy studio.
(Assignee)

Comment 2

5 years ago
The main bit of information we need here is which 10.7 machines we are NOT repurposing (if we're doing all or just some subset).  We'll take care of doing all the inventory and deploystudio updates and you guys will just need to do the bless and reboot.
(Assignee)

Updated

5 years ago
Flags: needinfo?(armenzg)
(Assignee)

Updated

5 years ago
Assignee: relops → arich
Depends on: 937125
(Assignee)

Updated

5 years ago
Depends on: 943946
(Assignee)

Comment 3

5 years ago
uberj and I worked to update inventory/DNS/DHCP today.  
I've filed bug 943946 to update nagios.  
I've reimaged talos-r4-snow-085 as a test.  
Armen, can you please verify, and I can start the process of doing the rest (and what I don't get to, you guys can do on Thursday/Friday)?

The one thing that we discussed but did not have a plan for updating was the outlet labels on the PDUs.  Armen, are you guys going to handle that?  Ask DCOps?  Leave it as is until we move to scl3?
(In reply to Amy Rich [:arich] [:arr] from comment #3)
> The one thing that we discussed but did not have a plan for updating was the
> outlet labels on the PDUs.  Armen, are you guys going to handle that?  Ask
> DCOps?  Leave it as is until we move to scl3?

I've done the executive decision of not updating the PDU labeling since it is too manual and we have no need for it. I've nevertheless documented it on the etherpad from lines 15 to 104.

On the other hand we will need the physical re-labeling done.
Filed as bug 944023.
Blocks: 937125
No longer depends on: 937125
Flags: needinfo?(armenzg)
No longer blocks: 881374
No longer blocks: 937125
(Assignee)

Comment 5

5 years ago
talos-r4-snow-085 - talos-r4-snow-120 are done imaging, and I've kicked off the rest of them (up to talos-r4-snow-170) to run overnight.  Hopefully by the time you get in in the morning they'll be all done.  :}
(Assignee)

Comment 6

5 years ago
All but talos-r4-snow-166 have been reimaged.  That one would not reboot, and I opened up a bug to get PDU info in inventory for it.

I also noticed this morning that 103 and 104 are not responding to ping, so they may need a reboot.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(In reply to Amy Rich [:arich] [:arr] from comment #6)
> All but talos-r4-snow-166 have been reimaged.  That one would not reboot,
> and I opened up a bug to get PDU info in inventory for it.
> 
I can't a bug for this host. I can ping it and reboot it, however, I cannot ssh into it.
Flags: needinfo?(arich)
(Assignee)

Comment 8

5 years ago
Please open new bugs for any further work, since the mass reimage is done and everything after this is break/fix.
Flags: needinfo?(arich)
(In reply to Amy Rich [:arich] [:arr] from comment #8)
> Please open new bugs for any further work, since the mass reimage is done
> and everything after this is break/fix.

Sounds good.
You need to log in before you can comment on or make changes to this bug.