(splitting off from bug 1126428 as this is getting complicated) This involves: * building a new AMI * CentOS 6.5 * Using only the "Core" group, not "Base"; otherwise ssmtp gets installed * Creating a 'generic-server' instance type * Based on the new AMIs * With the correct configuration to actually boot * Instantiating rpmbuilder1 with it * Updating the 'buildbot-master' instance type to use the new AMIs * Re-instantiating each master with it
There's some background work I've already done in https://github.com/mozilla/build-cloud-tools/pull/26#issuecomment-72981849 but I'll try to summarize the important bits here, for archaeological purposes. The Core/Base bit is important. We've successfully puppetized toplevel::server hosts on CentOS 6.5 in scl3, but on EC2: Wed Feb 04 15:53:52 -0800 2015 Puppet (err): Execution of '/bin/rpm -e ssmtp-2.61-15.el6.x86_64' returned 1: error: Failed dependencies: /usr/bin/mailq is needed by (installed) nagios-plugins-mailq-1.4.15-2.el6.x86_64 /usr/sbin/sendmail is needed by (installed) cronie-1.4.4-12.el6.x86_64 Wed Feb 04 15:53:52 -0800 2015 /Stage[main]/Packages::Postfix/Package[ssmtp]/ensure (err): change from 2.61-15.el6 to absent failed: Execution of '/bin/rpm -e ssmtp-2.61-15.el6.x86_64' returned 1: error: Failed dependencies: /usr/bin/mailq is needed by (installed) nagios-plugins-mailq-1.4.15-2.el6.x86_64 /usr/sbin/sendmail is needed by (installed) cronie-1.4.4-12.el6.x86_64 I don't see any record of ssmtp being installed, ever, on an onsite host (bm103). So I wonder if it's being installed as part of the base image by aws_create_ami. Timestamps in /var/log/yum.log on rpmpackager1 suggest that's the case. cronie and nagios-plugins-mailq require a virtual package that's being satisfied with ssmtp, but could also be satisfied with postfix. However, making the switch requires switching within a single yum transaction, and puppet (still) doesn't support yum transactions. In general, EC2 instances' initialization should be as similar to kickstart as possible. And kickstart only installs Core; the rest is up to puppet.
That appears to have caused > blkfront: xvda: barriers disabled > xvda: xvda1 xvda2 > blkfront: xvdb: barriers disabled > xvdb: unknown partition table > blkfront: xvdc: barriers disabled > xvdc: unknown partition table > Refined TSC clocksource calibration: 2793.267 MHz. > Switching to clocksource tsc > input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input4 > dracut Warning: No root device "block:/dev/disk/by-label/root_dev" found > > > > cut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line. > > > dracut Warning: Signal caught! > > dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line. > Kernel panic - not syncing: Attempted to kill init! > Pid: 1, comm: init Not tainted 2.6.32-431.el6.x86_64 #1 > Call Trace: > [<ffffffff815271fa>] ? panic+0xa7/0x16f > [<ffffffff81077622>] ? do_exit+0x862/0x870 > [<ffffffff8118a865>] ? fput+0x25/0x30 > [<ffffffff81077688>] ? do_group_exit+0x58/0xd0 > [<ffffffff81077717>] ? sys_exit_group+0x17/0x20 > [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b > dracut Warning: No root device "block:/dev/disk/by-label/root_dev" found
The problem there is that 'lvm2' isn't installed by Core. Neither is 'yum', come to think of it.
..nor wget, although aws_create_instance can install that itself.
Created attachment 8560008 [details] PR There are some funny bits buried in this PR, so please do have a look and say something if you don't like what you see.
Oh, I'd also like to update configs/bld-linux64 to use these new base images. I can try that Friday morning, if that's easiest.
This bug's done. I have some updates for buildmasters in Bug 1130176 and probably updates to the base AMI in bug 1130548.