Closed Bug 1130548 Opened 10 years ago Closed 10 years ago

Update base AMIs so builders and masters can use the same

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

Attachments

(4 files, 2 obsolete files)

Begun in https://github.com/mozilla/build-cloud-tools/pull/28 but that caused 10:53:44 INFO - + tar -Jxf gcc.tar.xz 10:53:44 INFO - tar (child): xz: Cannot exec: No such file or directory 10:53:44 INFO - tar (child): Error is not recoverable: exiting now 10:53:44 INFO - tar: Child returned status 2 10:53:44 INFO - tar: Error is not recoverable: exiting now when run outside of a mock environment. I need to figure out what's causing xz to be installed on a regular (non-AWS) builder, and why that's not happening here.
Attached file diffs.txt (obsolete) —
On a throwaway Amazon Linux host, I ran `yum -d 1 -c /chroot/etc/yum-local.cfg -y --installroot=/chroot groupinstall Core` against the a yum-local.cfg pointing to the puppetagain repos. I also created a VM with VMWare and puppetized it with the wrong puppet password. The diffs are in this attachment.
Attached file diffs.txt
This strips the leading /\d+:/ -- I don't know what that means, but presumably it's a difference in how Anaconda (in install.log) and Yum (in yum.log) record their activities.
Attachment #8562854 - Attachment is obsolete: true
That's pretty huge. Time to dive into anaconda and see what it's up to. For what it's worth, this is the KS file: http://hg.mozilla.org/build/puppet/file/292f582b1657/setup/centos6-kickstart.cfg.erb#l33 and it only specifies @core -rhgb
Oh, interesting: (h/t tmary) https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Installation_Guide/s1-kickstart2-packageselection.html --- Note that the Core and Base groups are always selected by default, so it is not necessary to specify them in the %packages section. --- So KS *is* installing Base. I'll try comparing KS against a groupinstall of Core and Base, and see what the differences are.
And indeed, yum groupinstall Core Base installs EXACTLY the 397 packages that the kickstart script does. I'll make an update to the KS script so I don't forget what I've learned today, and revert to using Core and Base in the base AMIs too.
Attached patch bug1130548-puppetagain.patch (obsolete) — Splinter Review
Attachment #8562899 - Flags: review?(jwatkins)
Comment on attachment 8562899 [details] [diff] [review] bug1130548-puppetagain.patch er, hang on, it looks a little different in the live PXE configs
Attachment #8562899 - Attachment is obsolete: true
Attachment #8562899 - Flags: review?(jwatkins)
(add -subscription-manager, since that's what we've been using to date)
Attachment #8562901 - Flags: review?(jwatkins)
The thing that triggered all of this was finding ssmtp installed on the base AMI, while buildmasters wanted postfix. Switching from one to the other must be done in a transaction, and puppet doesn't support transactions. I suspect the difference is that the Core and Base groups are installed with all yum repos activated (base, updates, epel, and releng), while Anaconda only installs against base. I'm verifying now.
That didn't really make a difference: --- just-core-base 2015-02-11 12:41:47.863150600 -0500 +++ just-core-base-all-repos 2015-02-11 13:09:01.594539097 -0500 @@ -1,3 +1,4 @@ + abrt-2.0.8-21.el6.centos.x86_64 abrt-addon-ccpp-2.0.8-21.el6.centos.x86_64 abrt-addon-kerneloops-2.0.8-21.el6.centos.x86_64 @@ -101,7 +102,7 @@ fontconfig-2.8.0-3.el6.x86_64 fprintd-0.1-21.git04fd09cfa.el6.x86_64 fprintd-pam-0.1-21.git04fd09cfa.el6.x86_64 -freetype-2.3.11-14.el6_3.1.x86_64 +freetype-2.4.12-6.el6.1.x86_64 gamin-0.1.10-9.el6.x86_64 gawk-3.1.7-10.el6.x86_64 gdbm-1.8.0-36.el6.x86_64 @@ -234,7 +235,7 @@ lzo-2.03-3.1.el6.x86_64 m4-1.4.13-5.el6.x86_64 mailx-12.4-7.el6.x86_64 -make-3.81-20.el6.x86_64 +make-3.82-19.el6.x86_64 MAKEDEV-3.24-6.el6.x86_64 man-1.6f-32.el6.x86_64 man-pages-3.22-20.el6.noarch @@ -376,7 +377,7 @@ vim-enhanced-7.2.411-1.8.el6.x86_64 vim-minimal-7.2.411-1.8.el6.x86_64 virt-what-1.11-1.2.el6.x86_64 -wget-1.12-1.8.el6.x86_64 +wget-1.15-2.el6.x86_64 which-2.19-6.el6.x86_64 wireless-tools-29-5.1.1.el6.x86_64 words-3.0-17.el6.noarch And none of these lists include ssmtp. So I think this is a false lead. Something else must be installing ssmtp as a dependency (via the virtual), and only when postfix isn't isntalled first. So, a puppet ordering problem. Yuck.
I have also manually confirmed that all of the additional_packages are included in Core/Base: dhclient openssh-server kernel grub lvm2 yum
The error for masters is Wed Feb 11 10:55:29 -0800 2015 Puppet (err): Execution of '/bin/rpm -e ssmtp-2.61-15.el6.x86_64' returned 1: error: Failed dependencies: /usr/bin/mailq is needed by (installed) nagios-plugins-mailq-1.4.15-2.el6.x86_64 /usr/sbin/sendmail is needed by (installed) cronie-1.4.4-12.el6.x86_64 Wed Feb 11 10:55:29 -0800 2015 /Stage[main]/Packages::Postfix/Package[ssmtp]/ensure (err): change from 2.61-15.el6 to absent failed: Execution of '/bin/rpm -e ssmtp-2.61-15.el6.x86_64' returned 1: error: Failed dependencies: /usr/bin/mailq is needed by (installed) nagios-plugins-mailq-1.4.15-2.el6.x86_64 /usr/sbin/sendmail is needed by (installed) cronie-1.4.4-12.el6.x86_64 So either nagios-plugins or cronie is pulling in ssmtp as the default, preventing postfix from being installed. I'll need to do some hacking in puppet to manage to use a transaction.
The packages don't actually conflict, and `alternates` prefers postfix in auto mode. So I think we could just remove the reference to ssmtp, and make the `alternates` run depend on postfix being installed. That's certainly a lot simpler than setting up a transaction. However, it will leave us in a mix of states: ssmtp installed on some hosts and not on others. Depending on the order puppet runs in. So I'm going to use the transaction anyway.
Summary: Switch builders to use the new Core-only base AMI → Update base AMIs so builders and masters can use the same
Attachment #8562956 - Flags: review?(jwatkins)
Testing: * in progress for masters (non-production) * I'll build a golden AMI tomorrow and see how it flies
Attachment #8562999 - Flags: review?(rail)
Master worked fine.
Attachment #8562999 - Flags: review?(rail) → review+
I'm testing the golden AMI now. The patch is merged, but can be backed out if the test fails.
2015-02-12 08:46:49,534 - INFO - AMI created 2015-02-12 08:46:49,534 - INFO - ID: ami-e0e8a388, name: spot-bld-linux64-2015-02-12-16-21 2015-02-12 08:46:49,662 - INFO - AMI spot-bld-linux64-2015-02-12-16-21 (ami-e0e8a388) is ready
I spot-checked one, and I see runner running. And xz is installed. So I'm cautiously optimistic. I'm also going to kill the AMI, just in case.
There are currently 23 instances in use1 running this image (all build, not try).
Attachment #8562901 - Flags: review?(jwatkins) → review+
Attachment #8562956 - Flags: review?(jwatkins) → review+
I see at least one green build (hard to tell if there are more from the times) in looking at one of the remaining hosts, so I'm calling this good.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: