Closed
Bug 1440062
Opened 6 years ago
Closed 6 years ago
[MDC2] kickstart releng-puppet1/2 in MDC2
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Infrastructure & Operations
RelOps: Puppet
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dividehex, Assigned: dividehex)
References
Details
Attachments
(3 files, 1 obsolete file)
Before we can kickstart the rest of the VMs in mdc2, we need the puppet environment up and running.
Getting a failure downloading the kickstart cfg file in mdc2. In mdc1 we had this same problem and it was a missing network flow for http to admin1a from [srv,test].releng.mdc1
I tested reaching over to the mdc1 admin server for the kickstart (also failed). My previous screenshot showed that. I'm updating with the correct mdc2 admin connection failure
Attachment #8952848 -
Attachment is obsolete: true
Changing the kickstart cfg url in the grub works. http://admin1.vips.private.mdc2.mozilla.com/kickstart/profiles/pa-c65-64-vmware.cfg instead of: http://10.50.75.31/kickstart/profiles/pa-c65-64-vmware.cfg After that, the centos install needs to download the 6.5 image. It doesn't have anywhere to do that from (cannot reach scl3 or mdc1 puppet masters through port 80/http). So I checked the official centos 6.5 mirror and I'm using that for releng-puppet2; they are not the exact same image. So I'm thinking to image puppet1 first from there, and then I can re-image puppet1 from puppet2 once I have puppet2 built from the correct image on puppet1.
http://mirrors.kernel.org/centos/6/os/x86_64/images/install.img No need to diff, they are a different size: ``` -rw-r--r-- 1 puppetsync puppetsync 144060416 Nov 29 2013 /data/repos/yum/mirrors/centos/6.5/os/x86_64/images/install.img -rw-rw-r-- 1 dhouse dhouse 146558976 Mar 28 2017 ./install.img ```
I requested network flows to allow HTTP/https/and the puppet ports to the puppet masters from all mdc2 releng vlans (srv, relabs, test, wintest).
Some flows were fixed in bug 1440157. I am kickstarting releng-puppet1 again. i changed the kickstart.cfg hostname to use admin1.vips, and created a "repos" cname to point at releng-puppet1.srv.releng.mdc1 (http connection still fails to scl3 puppetmasters)
I created the CNAME "puppet" so that the puppetize script can reach over to the mdc1 puppet master. Also, I manually pulled the certs.sh because that fails (perhaps python urllib2 is failing on ssl). ``` curl --user user:pass --insecure https://puppet/deploy/getcert.cgi > certs.sh #then re-start puppetize.sh ``` During the puppet first run, I get a timeout on ssh to the scl3 puppet master: ``` root@releng-puppet1.srv.releng.mdc2.mozilla.com (Cron Daemon) wrote: > ssh: connect to host releng-puppet2.srv.releng.scl3.mozilla.com port 22: Connection timed out > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.6] ``` These steps allow the puppetize to continue.
(In reply to Dave House [:dhouse] from comment #7) > During the puppet first run, I get a timeout on ssh to the scl3 puppet > master: > ``` > root@releng-puppet1.srv.releng.mdc2.mozilla.com (Cron Daemon) wrote: > > > ssh: connect to host releng-puppet2.srv.releng.scl3.mozilla.com port 22: Connection timed out > > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > > rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.6] > ``` so rsync is failing from scl3. I may change it to mdc1 temporarily > These steps allow the puppetize to continue. the previous cname and curl of certs.sh allowed the puppetize
I've re-kickstarted both releng-puppet1 and releng-puppet2 in mdc2 so that they are continually retrying to puppetize (cert first) against releng-puppet2.srv.releng.scl3 (every 60 seconds until success).
Assignee | ||
Comment 10•6 years ago
|
||
This is on hold until VMware infrastructure is racked, powered on, and available in MDC2.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee: dhouse → jwatkins
Assignee | ||
Comment 11•6 years ago
|
||
Attachment #8958522 -
Flags: review?(klibby)
Comment 12•6 years ago
|
||
Comment on attachment 8958522 [details] [diff] [review] bug1440062_update_moco_config_mdc2.patch Review of attachment 8958522 [details] [diff] [review]: ----------------------------------------------------------------- lgtm
Attachment #8958522 -
Flags: review?(klibby) → review+
Assignee | ||
Comment 13•6 years ago
|
||
The /data partition and LVM is setup on both masters. I'm currently rsyncing /data to from releng-puppet2.srv.releng.scl3 to releng-puppet2.srv.releng.mdc2. From there I'll rsync the /data on between the two mdc2 puppetmasters which should be much faster.
Assignee | ||
Comment 14•6 years ago
|
||
Both puppet masters are up and running. The /data partitions have been fully sync'ed and I've added cnames for puppet and repos to test,wintest and srv. I've also added the new puppet masters to the puppetagain-apt A records. I was able to successfully reimage t-yosemite-r7-235 and get it to puppetize, so we should be just about ready to go on reimaging the yosemite mac minis. I just need to double check the default deploystudio group before green lighting :van on the rest of the minis. When I attempted to kickstart the log aggregators, I ran into the issue :dhouse ran into in c#1, c#2, and c#3. The url does not fetch the kickstart profile. It seem dave was able to use the VIPs as a work around but this will need to be troubleshooted and fixed ASAP. AFAIK, this worked fine when we kickstarted all of releng in MDC1 so why does it not work now? I'm fairly sure this is a netops/firewall issue since I wasn't able to see any incoming http requests using tcpdump on the admin host that serves the kickstart profile. The traffic is being blocked somewhere in between.
Comment 15•6 years ago
|
||
Took a look in panorama as well as the admin1a.private.mdc2 logs. I see the initial pxe boot work in the admin1a logs, the fetching of the pxe boot and needed bootstrap files. I do *not* see any attempt at the fetching of the kickstart file. (which is to say "I see the problem you describe"). The request never seems to reach the admin server. Looking at Panorama - I had thought we were logging everything, accepts and denies, so I would expect to see entries in the log for the traffic of the fetching of pxe boot - but I don't. I don't see anything other than a couple of pings being done ~10 hours ago. So, I'm in the land of "either my assumptions about logging are off, or something is just not right here." I'm searching in both cases for entries mentioning log-aggregator1's IP of 10.51.48.60. In the interests of getting some eyes on this - NI'ing rmfd, as he's our netops resource for this project.
Flags: needinfo?(dmurphy)
Assignee | ||
Comment 16•6 years ago
|
||
I was able to replicate this from the puppet master in mdc2 since I wasn't able to breakout into a shell from the centos installer on the log aggregator. Using curl on the master and tcpdump on the admin hosts show no traffic. cknowles was kind enough to check the panorama logs and screenshot the denies. See attached picture.
Comment 17•6 years ago
|
||
Looking further at the rules - I can't see anywhere where the admin host (10.[48,50].75.31) is allowed to HTTP things to the rest of things. However, the VIPs are totally allowed, per the rules, to do so. (10.[48,50].122.5). So, two paths I can see - either we go with how things are set in the firewall currently and use the VIPs, *or* we shift the rules that mention the VIPs to mention the direct addresses as well.
Assignee | ||
Comment 18•6 years ago
|
||
(In reply to Chris Knowles [:cknowles] from comment #17) > So, two paths I can see - either we go with how things are set in the > firewall currently and use the VIPs, *or* we shift the rules that mention > the VIPs to mention the direct addresses as well. I agree with using the VIPs as the 'right' way of fixing this.
Comment 19•6 years ago
|
||
Alright, in looking at hiera/datacenter/mdc2.yaml there's a line: # Update to admin1.vips.private.mdc2 once Bug 1428843 is resolved pxe_ks_listen_ip: '10.50.75.31' That bug is resolved, so updated to 10.50.122.5 and committed in abc7d820ec0c32c4071bcaf3ee7d1fc393abd723 Let me know if I can do anything else.
Assignee | ||
Comment 20•6 years ago
|
||
(In reply to Chris Knowles [:cknowles] from comment #19) > Alright, in looking at hiera/datacenter/mdc2.yaml there's a line: > > # Update to admin1.vips.private.mdc2 once Bug 1428843 is resolved > pxe_ks_listen_ip: '10.50.75.31' > > That bug is resolved, so updated to 10.50.122.5 and committed in > abc7d820ec0c32c4071bcaf3ee7d1fc393abd723 > > Let me know if I can do anything else. Thanks! Puppet had propagated and kickstarts are working in MDC2. <clearing NI on dmurphy>
Flags: needinfo?(dmurphy)
Assignee | ||
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•