Closed Bug 816299 Opened 12 years ago Closed 11 years ago

relocate 5 original production panda chassis and prototype chassis to p10

Categories

(Infrastructure & Operations :: DCOps, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Unassigned)

Details

In order to integrate the original chassis with the p<number> structure, we'd like to eventually move the original five production chassis (2-6) into p10 (where there's only on chassis slated to go at the moment).

The all of the original chassis have 12 boards in them instead of 11.  DCOps, is this going to be an issue?

If not, can we start by orchestrating the move of panda-chassis-006 to p10?  https://releng.etherpad.mozilla.org/foopy-pandas-mapping  shows who's using what at the moment, and that one is free.  We'll need to coordinate to fix up the hostnames, rack and switch port assignments, nagios, etc.
>The all of the original chassis have 12 boards in them instead of 11.  DCOps, is this going to be an issue?

Well it depends, if we intend to populate the whole rack sooner or later. Currently, we are using 2x 48 port switches for the pandas. There are currently 11 boards + 1 relay = 12 connections for each chassis installed in the new cage.

2x48 = 96 available ports
8 chassis x 12 connections = 96 ports needed

We can work around that if there isnt an issue using the 24 port top of rack switch.
I'd prefer to keep this rack identical to all the other racks. The loss of 5 panda boards should be immeasurable, given the size of the current pool.
I agree with derek, I think we should take one panda out of each chassis so they match the others.
colo-trip: --- → scl1
When we pull the panda in the 12th slot, please record it's serial number in inventory so we don't lose track of it or accidentally re/mis-name it in the future.
:arr, can we start moving the 5 chassis on Monday?
also, instead of pulling the panda out of the 12th slot, can we leave it mounted, just not connected and mark it as a spare? they're easily accessible if we need to remove them.
If we leave one in there, it won't be identical to the others, so if that's the motivation, we should remove it from the chassis.  We can only move chassis 6 at this point, because the rest are in use.
Van: Only chassis 6 is ready to move, and I want to coordinate that with dcops so we're doing the work at the same time unless dcops is going to handle all of the name change work in inventory (key value store as well as the primary name), dns, nagios, etc.
Jake would also like to move the prototype chassis (the first one) to p10 as well.  SO we can coordinate moving chassis 5 and chassis 1 at a time that works for both dcops and relops (me).
Summary: relocate 5 original production panda chassis to p10 → relocate 5 original production panda chassis and prototype chassis to p10
I should have said chassis 6 in comment 9, not chassis 5.
panda-chassis-006 was successfully moved coordinating in the google spreadsheet:

https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0AhyKG0L2cstIdHk3WWlfSHJqN3dJSlFoV2laRVhJOVE#gid=0

The things that needed to be updated includes:

* inventory
** hostname
** switch
** rack
** slot
** relay
** imaging server
** all of the adapter information except interface (remained nic0), and MAC address
* DNS
* nagios (after DNS)

The last panda I did by hand and marked it as a spare in the possession of dcops (and added this bug number in the notes section).
FYI I've also made tabs for the other 4 production machines if you want to front load information in the spreadsheet.

We're still debating what to do with panda-chassis-001.
Bug 829238 is about vacating chassis 2..5, at which point we can do this again.
OK, we're clear to move chassis 2..5 at any time.  Same process as before.

Let's start by filling in the rack and slot information in the google doc in comment 11, if that's known beforehand.  Then, let me know what day you plan to do this, and I'll get inventory and DNS set up, nagios downtime'd, etc. beforehand, and test when they're back in place.
I don't have access to the google spreadhsheet.  What are the new hostnames of the pandas that were moved from chassis 6 to p10?
The hostname atom (panda-xxxx) hasn't changed.  Just the fqdn has - they're in p10.releng.scl1 now.  You can see them in the mozpool UI.
So, according to mozpool, pandas 522-609 are all listed as "fix-boot-scr†", do they need to be re-imaged?  Bug 826694 seems to indicate not.   Also, inventory still lists them as in p6.
Yes, the'll need to be reimaged.  It can't hurt, anyway.

As for the names, looks like a goof in updating inventory - they weren't in p6 before, either!  Amy, do you want to look at that?
Also, their dns entries say p6, and some of them are pingable

ozillas-MacBook-Pro-2:mobile kmoir$ nslookup panda-0522
Server:		10.26.75.40
Address:	10.26.75.40#53

panda-0522.build.mozilla.org	canonical name = panda-0522.p6.releng.scl1.mozilla.com.
Name:	panda-0522.p6.releng.scl1.mozilla.com
Address: 10.12.133.65

mozillas-MacBook-Pro-2:mobile kmoir$ nslookup panda-0570
Server:		10.26.75.40
Address:	10.26.75.40#53

panda-0570.build.mozilla.org	canonical name = panda-0570.p6.releng.scl1.mozilla.com.
Name:	panda-0570.p6.releng.scl1.mozilla.com
Address: 10.12.133.113
kmoir: chassis 6, not p6.  The pandas that were moved were 70-80 (and 81 got removed).  The ones that still need to move are 22-68.
There's been some discussion above, so to reiterate, this is waiting on additional data in the google spreadsheet about the remaining chassis..
Flags: needinfo?(vle)
I've just filled in the missing fields in the spreadsheet.

Van
Flags: needinfo?(vle)
I've updated inventory, but I need to update the A records for the panda-relays and wait for the new DHCP autogen stuff to propagate.  I'll do that tomorrow when able.
I've updated dns, so this should propagate shortly.
I've also updated nagios.
Should panda-0022-0069 have p10 in their fqdn?  

The pandas from 70-80 do

mozillas-MacBook-Pro-2:buildbot-configs kmoir$ nslookup panda-0069
Server:		10.26.75.40
Address:	10.26.75.40#53

panda-0069.build.mozilla.org	canonical name = panda-0069.build.scl1.mozilla.com.
Name:	panda-0069.build.scl1.mozilla.com
Address: 10.12.52.173

mozillas-MacBook-Pro-2:buildbot-configs kmoir$ nslookup panda-0069.p10.build.scl1.mozilla.com
Server:		10.26.75.40
Address:	10.26.75.40#53

** server can't find panda-0069.p10.build.scl1.mozilla.com: NXDOMAIN

mozillas-MacBook-Pro-2:buildbot-configs kmoir$ nslookup panda-0070
Server:		10.26.75.40
Address:	10.26.75.40#53

panda-0070.build.mozilla.org	canonical name = panda-0070.p10.releng.scl1.mozilla.com.
Name:	panda-0070.p10.releng.scl1.mozilla.com
Address: 10.12.137.73
it's panda-0022.p10.releng.scl1.mozilla.com, etc, not build.scl1.mozilla.com.  Also, the last panda in each chassis is being removed to match the other production chassis.
That's 33, 45, 57, and 69 if you don't have those numbers mapped.
thanks Amy, I guess the records that map, for example

Last login: Tue Jan 15 13:55:47 2013 from vpn1.dmz.releng.scl3.mozilla.com
[cltbld@buildbot-master13 ~]$ nslookup  panda-0022
Server:		10.12.48.19
Address:	10.12.48.19#53

Name:	panda-0022.build.scl1.mozilla.com
Address: 10.12.52.123
 
to scl1.build.mozilla.com are old, should they be removed?
Hi,

We intend to keep the extra panda boards that we remove in rack 101-15 at scl1, on one of unused Mac mini shelves.

Van
I had been planning on waiting till the boards were actually removed, but I've removed the old records and repointed the CNAMEs since you asked.
Please also reimage all of the SD cards for chassis 2-5 with the new preseed image when you move them today.
We have moved chassis[2-5] to the new cage and the relays are all accessible.

[vle@natasha]~% /usr/sbin/fping panda-relay-00{2..5}.p10.releng.scl1.mozilla.com
panda-relay-002.p10.releng.scl1.mozilla.com is alive
panda-relay-003.p10.releng.scl1.mozilla.com is alive
panda-relay-004.p10.releng.scl1.mozilla.com is alive
panda-relay-005.p10.releng.scl1.mozilla.com is alive

I updated the spare pandas in inventory with their new location, r101-15, slot 25.0[1-5].

FYI, the spares are pandas-00[33,45,57,69,81].
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Also, all the SD cards have been reimaged with the new preseed image.
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.