Closed
Bug 1342518
Opened 7 years ago
Closed 5 years ago
Update releng amis to have larger /boot
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: dhouse, Assigned: dhouse)
References
Details
Attachments
(1 file, 1 obsolete file)
60 bytes,
text/x-github-pull-request
|
dividehex
:
review+
rail
:
review+
dividehex
:
checked-in+
|
Details | Review |
We cannot fit two of the recent kernels for CentOS 6.5 into the 60mb /boot on the current AMI's: Quoting :arr on #specops re: bug 1330695 problems with kernel update failing because of not enough space: @arr> ami-4dc07a26 and ami-58246f30 both have tiny /boot, too
looking over the amis: us-east-1 AMI ID ami-4dc07a26 snap-7e19a509 50GB instances: pushapkworker-1 t2.micro signing-linux-1 t2.micro signing-linux-3 t2.micro AMI Name centos-65-x86_64-hvm-base-2015-08-28-15-51 314336048151/centos-65-x86_64-hvm-base-2015-08-28-15-51 Creation date August 28, 2015 at 10:02:27 AM UTC-6 moz-created 1440795748 moz-instance-family c3 moz-type base moz-virtualization-type hvm AMI ID ami-58246f30 snap-1067f19f 50GB instances: signingworker-3 t2.micro buildbot-master138 m3.large buildbot-master137 m3.large buildbot-master69 m3.large releng-puppet1 c3.xlarge dev-master2 m3.medium buildbot-master128 m3.large signingworker-1 t2.micro buildbot-master01 m3.large AMI Name centos-65-x86_64-hvm-base-2015-02-11-20-33 314336048151/centos-65-x86_64-hvm-base-2015-02-11-20-33 Creation date February 11, 2015 at 1:40:59 PM UTC-7 moz-created 1423705259 moz-instance-family c3 moz-type base moz-virtualization-type hvm
The instances spun up with build-cloud-tools have 64mb /boot from here: https://github.com/mozilla-releng/build-cloud-tools/blob/0010b72a4690d370ecd4b8714af5f559e8849dbd/cloudtools/scripts/aws_create_ami.py#L56
Steps for resize: 1. check/fix filesystem for / and /boot 2. Create mbr and /boot partition on new disk 2a. Partition /boot and set as boot 2b. Copy from old /boot to new /boot 2c. Copy mbr to new disk 3. Move / to new disk 3a.Create lvm partition on new disk 3b. Shrink current / filesystem to minimal used 3c. Add new lvm volume to group 3d. Copy lvm disk to new volume 3e. Remove old lvm volume 4. Snapshot new disk 5. Test new machine with new disk snapshot As performed for the first ami, snap-7e19a509: $ e2fsck -f /dev/cloud_root/lv_root e2fsck 1.42.12 (29-Aug-2014) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information root_dev: 31402/3276800 files (0.1% non-contiguous), 464763/13090816 blocks $ dd if=/dev/sdb of=mbr bs=512 count=1 $ dd if=mbr of=/dev/sdc bs=446 count=1 # skip partition table, 446-512 $ dd if=/dev/sdb1 of=/dev/sdc1 # copy /boot $ parted -s -a optimal /dev/sdc -- mkpart primary ext2 2048s 512MiB $ parted /dev/sdc align-check optimal 1 $ parted -s -a optimal /dev/sdc -- mkpart primary ext2 512MiB -1s $ parted /dev/sdc align-check optimal 2 $ parted -s /dev/sdc -- set 1 boot on $ parted -s /dev/sdc -- set 2 lvm on $ mkfs.ext2 /dev/sdc1 $ pvcreate /dev/sdc2 $ vgextend cloud_root /dev/sdc2 /dev/sdc1 * 2048 1048575 523264 83 Linux /dev/sdc2 1048576 104857599 51904512 8e Linux LVM $ resize2fs /dev/cloud_root/lv_root -M resize2fs 1.42.12 (29-Aug-2014) Resizing the filesystem on /dev/cloud_root/lv_root to 388871 (4k) blocks. The filesystem on /dev/cloud_root/lv_root is now 388871 (4k) blocks long. $ pvmove /dev/sdb2 Insufficient free space: 12784 extents needed, but only 12671 available Unable to allocate mirror extents for pvmove0. Failed to convert pvmove LV to mirrored $ bc -l <<< '12671 * 8192' 103800832 $ lvreduce --size 103800832s /dev/cloud_root/lv_root WARNING: Reducing active logical volume to 49.50 GiB THIS MAY DESTROY YOUR DATA (filesystem etc.) Do you really want to reduce lv_root? [y/n]: y Size of logical volume cloud_root/lv_root changed from 49.50 GiB (12672 extents) to 49.50 GiB (12671 extents). Logical volume lv_root successfully resized $ pvmove /dev/sdb2 /dev/sdb2: Moved: 0.0% /dev/sdb2: Moved: 1.9% [...] $ vgreduce cloud_root /dev/sdb2 Removed "/dev/sdb2" from volume group "cloud_root" $ pvremove cloud_root /dev/sdb2 Requested snapshot of new disk: snap-03289d578a0bef4ed Created Image: centos-65-x86_64-hvm-base-2017-03-17-16-00 (ami-2c7cd63a)
Steps for the second ami were slightly different (did not need to resize the logical volume to mirror it to the new disk), e2fsck -f /dev/cloud_root/lv_root dd if=/dev/xvdb of=mbr bs=512 count=1 dd if=mbr of=/dev/xvdc bs=446 count=1 # skip partition table, 446-512 dd if=/dev/xvdb1 of=/dev/xvdc1 # copy /boot parted -s /dev/xvdc -- mklabel msdos parted -s -a optimal /dev/xvdc -- mkpart primary ext2 2048s 512MiB parted /dev/xvdc align-check optimal 1 parted -s -a optimal /dev/xvdc -- mkpart primary ext2 512MiB -1s parted /dev/xvdc align-check optimal 2 parted -s /dev/xvdc -- set 1 boot on parted -s /dev/xvdc -- set 2 lvm on mkfs.ext2 /dev/xvdc1 pvcreate /dev/xvdc2 vgextend cloud_root /dev/xvdc2 resize2fs /dev/cloud_root/lv_root -M pvmove /dev/xvdb2 [...] vgreduce cloud_root /dev/xvdb2 pvremove /dev/xvdb2 lvextend -l +100%FREE --resizefs /dev/cloud_root/lv_root Requested snapshot of new disk: snap-0b1eb8cb621f22757 Created Image: centos-65-x86_64-hvm-base-2017-03-20-12-00 (ami-3833842e)
Hi Jake, could you tell me how to test/use the ami's and if I need to do the resize differently? For these two ami's, I expanded the /boot partition to 500mb and moved the / partition (reduced lvm group, partition and filesystem to 49.5gb from 49.94 to keep the image at 50gb). I'm expecting that I need to change the ami specified in the build-cloud-tools config for each of the servers and then build new instances of each machine; I am not sure if that is a good plan or if there are files/config not in the scripts that I would need to manually copy and set up. I am wondering if it may be less risk (of missed config/files) and take less time to resize the disks on each machine manually instead of rebuilding them manually on the updated ami's.
Flags: needinfo?(jwatkins)
Comment 6•7 years ago
|
||
So I was under the impression that aws_create_ami.py script allowed you to create ami's from the ground up and that was where the 64mb /boot limit originated from. If that is still the case, then we will need to modify the script [1][2] and rebuild the amis with that tool. Once that is done, you can change the various ami_configs to the new ami ids. At that point, we need to decide if we want to terminate and rebuild with aws_create_instance.py for each service or if we should login to each instance and manually manipulate the ebs stores volumes in place. If an instance is 'instance based' (non-hvm), it may still need a terminate/rebuild anyway. :rail might be a better resource for answering this question. NI :rail for a sanity check here. [1] https://github.com/mozilla-releng/build-cloud-tools/blob/0010b72a4690d370ecd4b8714af5f559e8849dbd/cloudtools/scripts/aws_create_ami.py#L37 [2] https://github.com/mozilla-releng/build-cloud-tools/blob/0010b72a4690d370ecd4b8714af5f559e8849dbd/cloudtools/scripts/aws_create_ami.py#L55
Flags: needinfo?(jwatkins) → needinfo?(rail)
Comment 7•7 years ago
|
||
I'd just bump https://github.com/mozilla-releng/build-cloud-tools/blob/0010b72a4690d370ecd4b8714af5f559e8849dbd/cloudtools/scripts/aws_create_ami.py#L56 to something like 128 or 256M. It won't help with the existing AMIs/instances though. We'll need to resize them manually or remove the existing kernel before we install the new one.
Flags: needinfo?(rail)
Attachment #8850650 -
Flags: review?(rail)
Attachment #8850650 -
Flags: review?(jwatkins)
Attachment #8850650 -
Attachment description: github pr → Bug 1342518 - enlarge /boot to 256M
Updated•7 years ago
|
Attachment #8850650 -
Flags: review?(rail) → review+
Comment 9•7 years ago
|
||
Comment on attachment 8850650 [details] [review] Bug 1342518 - enlarge /boot to 256M r+ and merged
Attachment #8850650 -
Flags: review?(jwatkins)
Attachment #8850650 -
Flags: review+
Attachment #8850650 -
Flags: checked-in+
Assignee | ||
Comment 10•7 years ago
|
||
Thanks!
Assignee | ||
Comment 11•7 years ago
|
||
Hi Rail, I have some questions for my next step on this. Is it reasonable for me to create all new AMIs? Is there some one/group who usually does the base ami builds? Could you tell me a good set to start with, bld-linux64? (If we get a critical patch and we haven't rebuilt the services then I'll need to do as you said and "resize them manually or remove the existing kernel before we install the new one".) It looks like I can do the following (I'm looking at the wiki here, https://wiki.mozilla.org/ReleaseEngineering:AWS#Create_AMI, and the mention here, https://wiki.mozilla.org/ReleaseEngineering/How_To/Work_with_Golden_AMIs#Base_AMI): (aws_manager)[buildduty@aws-manager1.srv.releng.scl3.mozilla.com aws_manager]$ python /builds/aws_manager/cloud-tools/scripts/aws_create_ami.py --config bld-linux64 --region us-east-1 --secrets /builds/aws_manager/secrets/aws-secrets.json --key-name aws-releng --ssh-key /home/buildduty/.ssh/aws-ssh-key ref-centos-6-x86_64-hvm-base However, that fails (I'm attaching the instance's system log). Do I need to use a different config to create the base, or do I need to run this from aws-manager2?
Flags: needinfo?(rail)
Comment 12•7 years ago
|
||
The procedure sounds correct to me. Can you paste some logs from aws-manager? There is usually $hostname.log file there.
Flags: needinfo?(rail)
Assignee | ||
Comment 13•7 years ago
|
||
The script doesn't save a log file, but here is the stdout/err for when I run it: (aws_manager)[buildduty@aws-manager1.srv.releng.scl3.mozilla.com aws_manager]$ python /builds/aws_manager/cloud-tools/scripts/aws_create_ami.py --verbose --config bld-linux64 --region us-east-1 --secrets /builds/aws_manager/secrets/aws-secrets.json --key-name aws-releng --ssh-key /home/buildduty/.ssh/aws-ssh-key ref-centos-6-x86_64-hvm-base INFO:cloudtools.aws.instance:instance Instance:i-0c0f790b65f4265b4 created, waiting to come up DEBUG:cloudtools.aws:waiting for Instance:i-0c0f790b65f4265b4 availability INFO:cloudtools.fabric:Using public DNS [ec2-50-19-57-178.compute-1.amazonaws.com] run: date DEBUG:cloudtools.aws.instance:hit error waiting for instance to come up [ec2-50-19-57-178.compute-1.amazonaws.com] run: date DEBUG:cloudtools.aws.instance:hit error waiting for instance to come up [repeated until I interrupt it]
Assignee | ||
Comment 14•7 years ago
|
||
I was using the wrong config. I needed to use a config json from ami_configs: (aws_manager)[buildduty@aws-manager1.srv.releng.scl3.mozilla.com build-cloud-tools-dhouse]$ python scripts/aws_create_ami.py --verbose --config centos-65-x86_64-hvm-base --region us-east-1 --secrets /builds/aws_manager/secrets/aws-secrets.json --key-name dhouse-test --ssh-key /home/buildduty/.ssh/aws-ssh-key ref-centos-6-x86_64-hvm-base [...] # also stuck in the "error waiting for instance to come up" loop, but packages were installed/etc
Comment 15•7 years ago
|
||
yeah, it creates an instance outside of the VPC, and aws-manager has issues accessing it. Can you try to repeat the same thing from "your laptop"? I vaguely remember running those outside of SCL3.
Attachment #8856968 -
Attachment is obsolete: true
Updated•5 years ago
|
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•