please switch to non-os specific host names for new windows test slaves

RESOLVED INVALID

Status

Infrastructure & Operations
RelOps
RESOLVED INVALID
5 years ago
5 years ago

People

(Reporter: hwine, Unassigned)

Tracking

Details

(Whiteboard: [reit-win8])

(Reporter)

Description

5 years ago
within the world of windows test slaves, we'd like to re-image to a new os with zero to minor changes to the rest of the configs.

Step 1 is to not bake the OS name into the DNS name of the machine. As I understand it, we're limited to max of 15 characters, so let's go with:
  ixt-slaveXXXX

("ix" node used for "t"est) That's 14 so leaves us room to grow to > 10K nodes of these :)
(In reply to Hal Wine [:hwine] from comment #0)
> within the world of windows test slaves, we'd like to re-image to a new os
> with zero to minor changes to the rest of the configs.
> 
> Step 1 is to not bake the OS name into the DNS name of the machine. As I
> understand it, we're limited to max of 15 characters, so let's go with:
>   ixt-slaveXXXX
> 
> ("ix" node used for "t"est) That's 14 so leaves us room to grow to > 10K
> nodes of these :)

bikeshed warning:

Assuming we intend to keep OS==windows* for these, rather than moving back and forth from Win to Linux etc. I'd prefer something like

win-ixtXXXX

or at the very least baking "win" into the name somehow.

Comment 2

5 years ago
This breaks a longstanding tradition (I say 'tradition' because I'm not sure it was ever intentional) of being able to tell what a slave is *supposed* to be doing based on its hostname. 

Isn't this going to make it awfully hard to debug slave lists in configs when all the slaves are ixt-slaveXXXX?
Why is not having the OS baked into the dns name a requirement?

We could do a few other things:
- Use buildbot slave names that contain the OS name. so win-ixt01234 in buildbot would be running on hostname ixt-slave01234. Personally I would find this pretty confusing to have the hostname not match the slave name...but we have a similar situation in AWS right now where the slave names aren't in DNS.

- Add cnames from win-ixt01234 -> ixt-slave01234. RelEng uses the cnames in our configs. I don't know how we'd avoid listing each possible slave for each platform though.
I'd prefer to have the OS name be part of the hostname, but I can see how that would be extra work to change (I'm assuming because of the IT intervention required to change DNS).

Possible solution: preallocate IP address ranges and DNS names in blocks, where each block is a pool of machines with a common OS.

For example:

win-ixt-[1..1000] in block 10.12.0.0/16
linux-ixt-[1..1000] in block 10.13.0.0/16

To switch from one pool to another, change the IP address and host name of the machine the same number in another pool.  eg. win-ixt-100 becomes linux-ixt-100

A downside of this approach is that it's very wasteful of IP addresses.

A second option would be for releng to manage our own DNS configuration.
To be clear, I'm agnostic on this.  I've already set up all the new w8 talos hosts as t-w864-ix-NNN in DNS and inventory, but I can go back and change them before the hardware gets in.  

callek: we would not be changing between windows and linux.  all of hte windows hosts will be on one vlan and all of the linux hosts will be on another because of DHCP scopes and requirements.

jhopkins: If the point is to be able to move them around, allocating them in blocks does not work.  We are definitely not going to split out DNS allocations, no.
(Reporter)

Comment 6

5 years ago
Doing "last call" for concerns in email - will update in a few days with final decision.
What was the verdict on this?

Comment 8

5 years ago
We found that the graph DB would not be able to cope with this.

We have to have the OS in the hostname until we move to datazilla which won't happen before Q1 for m-c.
So, resolve/invalid?
(Reporter)

Comment 10

5 years ago
yep - so close, but so far :(
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → INVALID
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.