Closed Bug 803354 Opened 12 years ago Closed 12 years ago

please switch to non-os specific host names for new windows test slaves

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: hwine, Unassigned)

References

Details

(Whiteboard: [reit-win8])

within the world of windows test slaves, we'd like to re-image to a new os with zero to minor changes to the rest of the configs.

Step 1 is to not bake the OS name into the DNS name of the machine. As I understand it, we're limited to max of 15 characters, so let's go with:
  ixt-slaveXXXX

("ix" node used for "t"est) That's 14 so leaves us room to grow to > 10K nodes of these :)
(In reply to Hal Wine [:hwine] from comment #0)
> within the world of windows test slaves, we'd like to re-image to a new os
> with zero to minor changes to the rest of the configs.
> 
> Step 1 is to not bake the OS name into the DNS name of the machine. As I
> understand it, we're limited to max of 15 characters, so let's go with:
>   ixt-slaveXXXX
> 
> ("ix" node used for "t"est) That's 14 so leaves us room to grow to > 10K
> nodes of these :)

bikeshed warning:

Assuming we intend to keep OS==windows* for these, rather than moving back and forth from Win to Linux etc. I'd prefer something like

win-ixtXXXX

or at the very least baking "win" into the name somehow.
This breaks a longstanding tradition (I say 'tradition' because I'm not sure it was ever intentional) of being able to tell what a slave is *supposed* to be doing based on its hostname. 

Isn't this going to make it awfully hard to debug slave lists in configs when all the slaves are ixt-slaveXXXX?
Why is not having the OS baked into the dns name a requirement?

We could do a few other things:
- Use buildbot slave names that contain the OS name. so win-ixt01234 in buildbot would be running on hostname ixt-slave01234. Personally I would find this pretty confusing to have the hostname not match the slave name...but we have a similar situation in AWS right now where the slave names aren't in DNS.

- Add cnames from win-ixt01234 -> ixt-slave01234. RelEng uses the cnames in our configs. I don't know how we'd avoid listing each possible slave for each platform though.
I'd prefer to have the OS name be part of the hostname, but I can see how that would be extra work to change (I'm assuming because of the IT intervention required to change DNS).

Possible solution: preallocate IP address ranges and DNS names in blocks, where each block is a pool of machines with a common OS.

For example:

win-ixt-[1..1000] in block 10.12.0.0/16
linux-ixt-[1..1000] in block 10.13.0.0/16

To switch from one pool to another, change the IP address and host name of the machine the same number in another pool.  eg. win-ixt-100 becomes linux-ixt-100

A downside of this approach is that it's very wasteful of IP addresses.

A second option would be for releng to manage our own DNS configuration.
To be clear, I'm agnostic on this.  I've already set up all the new w8 talos hosts as t-w864-ix-NNN in DNS and inventory, but I can go back and change them before the hardware gets in.  

callek: we would not be changing between windows and linux.  all of hte windows hosts will be on one vlan and all of the linux hosts will be on another because of DHCP scopes and requirements.

jhopkins: If the point is to be able to move them around, allocating them in blocks does not work.  We are definitely not going to split out DNS allocations, no.
Doing "last call" for concerns in email - will update in a few days with final decision.
What was the verdict on this?
We found that the graph DB would not be able to cope with this.

We have to have the OS in the hostname until we move to datazilla which won't happen before Q1 for m-c.
So, resolve/invalid?
yep - so close, but so far :(
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.