Closed Bug 704975 Opened 13 years ago Closed 11 years ago

Set hostnames automatically

Categories

(Release Engineering :: General, defect, P4)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Unassigned)

Details

(Whiteboard: [slaveminator])

This project will have a lot of work but will help to reduce maintenance efforts to minimal.

I will use [slaveminator] as a whiteboard tag to keep track of any bugs that optimize the setup and maintenance of our pool of slaves.

There are other projects like "buildbot start" check that should take higher priority that this work.

There is also the part of the work that should make sure that once a slave has the correct name it should make the slave sync up with puppet/OPSI.

[4:08pm] armenzg: hi
[4:08pm] armenzg: I am trying to figure out what options we have to set automatically the hostname for a slave that has been re-imaged
[4:08pm] armenzg: right now each slave depending on the OS has a different set of manual steps to follow
[4:08pm] armenzg: what options/solutions could we look into?
[4:09pm] armenzg: I am trying to analyze each manual step we have for a machine that has been re-imaged and we want it to go back to the production pool
[4:09pm] armenzg: and find out how to solve each - if possible
[4:25pm] arr: the best option is to let dhcp set the hostname
[4:25pm] arr: (when possible)
[4:26pm] armenzg: oh cool
[4:27pm] bear: armenzg - we are (should) be able to have dhcp set the name, except for osx it seems
[4:27pm] armenzg: what changes would I have to do to the machines to set that info from DNS?
[4:27pm] armenzg: is this available to most OSes?
[4:27pm] armenzg: oh
[4:27pm] • armenzg notes osx seems to be the exception
[4:28pm] bear: me saying osx is also a generalization from memory - it may be solvable, I just know that we have had issues with osx recently and had to force the host name using scutil
[4:29pm] bear: this may be something we could "fix" using something that would extend slavealloc - but it would need researching to see if the single exception (osx) is worth such a tool to be created
[4:30pm] armenzg: ah
[4:30pm] arr: I recommend a fix for that (OS X)
[4:32pm] bear: yea, if we can fix OS X then we can go back to an IT-sane method for the others
[4:37pm] armenzg: do we have a bug file for that issue? I believe jhford-work has tons of notes in a bug but not sure where that is
[4:37pm] armenzg: is there anything in IT's court that would need to be done for any of this work to happen?
[4:37pm] jhford-work: armenzg: my experience is that for macs it just doesn't work right
[4:37pm] arr: I made the recommendation on the initial bug.  I think someone in releng closed it
[4:38pm] armenzg: I think it has value for us in the long term
[4:38pm] jhford-work: for whatever reason, they loose their heads and use a fallback $LONG_USER's-mac-mini(XXX)
[4:38pm] jhford-work: it does
[4:38pm] arr: armenzg: no, it would be a puppet change, nothing for IT
[4:38pm] jhford-work: but its not working right now
[4:38pm] jhford-work: arr: automatic hostnames?
[4:38pm] armenzg: arr: do you have by any chance info for what is needed for every OS?
[4:39pm] arr: I suggested setting the bonjour name via puppet so that, in the event that dhcp failed, it would still get the right hostname
[4:39pm] armenzg: or what the concept is named in case I want to google for it
[4:39pm] armenzg: ?
[4:39pm] arr: (because that's what it uses if it can not reach dhcp or dns)
[4:39pm] arr: armenzg: I don't
[4:39pm] jhford-work: arr: but what happens if the hostname changes?
[4:40pm] jhford-work: and it fails to get the hostname automatically
[4:40pm] arr: jhford-work: that's why I recommend having puppet check dns
[4:40pm] jhford-work: if we are doing that, why not use that check to set the HostName isntead of bonjour?
[4:40pm] arr: because if the host changes name, you don't get the right hostname
[4:41pm] arr: whereas you would if you were using dhcp
[4:41pm] jhford-work: but its not guarunteed that the dhcp hostname will be read
[4:41pm] jhford-work: so when it fails, we're in the same state w're in now
[4:41pm] arr: which is why you set the bonjour name ever time puppet runs if it does not match the dns name
[4:42pm] jhford-work: but if the hostname doesn't match, it won't run puppet at all
[4:42pm] jhford-work: match a known node
[4:43pm] arr: I thought you were auto-signing puppet stuff?
[4:43pm] arr: so if a host came up with a different name, it would just auto-sign and move on?
[4:44pm] jhford-work: we don't have real autosign, and if the value that bash -c hostname on the slave isn't in the manifest, it doesn't know what to do
[4:45pm] jhford-work: worse, if its the wrong value, it re-syncs what the slave *used* to be
[4:45pm] bear: armenzg - the puppet master selection process may also have to be part of the solution 
[4:45pm] jhford-work: i mean, i have set the HostName on 160 slave with csshX in the last little bit
[4:46pm] jhford-work: having a script that does all the dns lookup stuff to set things is also great
[4:46pm] bear: we may have to back up and find out for sure how hostname fallback works in osx and if there are any plists that we can use to control it
[4:47pm] arr: it works like this:
[4:47pm] arr: dhcp -> dns -> bounjour
[4:47pm] arr: which is why I suggested setting the bonjour hostname via puppet
[4:47pm] bear: that seems simple enough to setup a test for
[4:47pm] arr: (reverse dns, I should say)
[4:48pm] armenzg: this is so interesting
[4:49pm] jhford-work: manually setting the hostname happens once per reimage or hostname change
[4:49pm] jhford-work: i am not saying its unimportant, i just think we might want to work on other things first
[4:49pm] jhford-work: and ftr, i've done it for all of the talos-r4 machines
[4:49pm] jhford-work: it took ~10 minutes to set on all the hosts
[4:50pm] arr: (bear: and fyi, the plist file is /Library/Preferences/SystemConfiguration/preferences.plist)
[4:50pm] bear: armenzg and I (I think) are trying to figure out a way that requires no intervention once IT has reimaged a slave
[4:51pm] arr: (but you dont need to edit the plist file, there's a command to set the three different hostnames)
[4:51pm] armenzg: exactly what bear says
[4:51pm] jhford-work: bear: we are almost there
[4:51pm] jhford-work: *unless* the slave defaults to the bonjour hostname
[4:51pm] bear: sure, we are getting the info set in our brains so we can move on to the final bits of the problem
[4:51pm] jhford-work: cool
[4:52pm] armenzg: we're aiming for the reimaging scenario rather than the first time setup
[4:52pm] jhford-work: so... doing the scutil --set HostName in puppet *if* the slave has the right hostname from dhcp seems like the ideal case
[4:52pm] jhford-work: sure
[4:52pm] jhford-work: so the success case is dhcp works, and what is set by dhcp is set permanently
[4:53pm] jhford-work: and if the hostname needs to change, run "hostname newname" then sync with puppet and that'd set the HostName
[4:53pm] jhford-work: to whatever you set hostname to
[4:54pm] armenzg: I guess filing a releng bug with all this info would have all we need to tackle it at some point, right?
[4:54pm] jhford-work: proabably?
[4:54pm] bear: armenzg - please do - we need, IMO, to start tracking this for the work I think you and I will be doing
[4:54pm] jhford-work: we should also figure out why they fail to get hostnames from dhcp when they fail
[4:54pm] arr: jhford-work: yeah, that was my suggestion, except using the bonjour name, since that's the automatic fall back if both dhcp and reverse dns fail
[4:54pm] bear: yea, I think we will need to double check all of the steps just ot make sure
[4:55pm] arr: so the host will continue to try to use dhcp
[4:55pm] arr: instead of never trying dhcp again
[4:55pm] bear: armenzg - do you want to use a new whiteboard for this so we can group them?
[4:56pm] jhford-work: if we want to do that, we should just reboot (and notify us) when the machine has a hostname with a hostname containing ClientBuilder
[4:56pm] arr: but rebooting might still not get you a hostname if there really is a dhcp problem
[4:56pm] jhford-work: caching the last dns/dhcp name on the previous boot in bonjour hostname feels like it could cause problems
[4:56pm] bear: that may be something we have to add in the slavealloc part
[4:57pm] arr: jhford-work: such as?
[4:57pm] armenzg: bear: [slaveminator]
[4:57pm] bear: take the slave offline and raise a flag
[4:57pm] armenzg:
[4:57pm] bear: armenzg \o/
[4:57pm] bear: dooo IT
[4:57pm] jhford-work: but if dhcp isn't working, would the slave even have the right ip?
[4:57pm] arr: you do a sanity check to make sure that the current hostname is set to the right string pattern.  if it matches, you set the bonjour name (if it's different than the hostname)
[4:57pm] arr: jhford-work: the mac will reuse the last IP it had
[4:58pm] jhford-work: but like caching bonjour name, that might not be correct
[4:58pm] jhford-work: so if dhcp isn't working and returning the right ip/hostname, the slave can't know what's correct
[4:58pm] jhford-work: (the thing that might not be correct is the hostname/ip, not your statement
Product: mozilla.org → Release Engineering
This is happening now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.