The default bug view has changed. See this FAQ.

Don't require puppet or DNS to launch new instances

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: catlee, Assigned: rail)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(11 attachments, 5 obsolete attachments)

2.99 KB, patch
catlee
: review+
Details | Diff | Splinter Review
1.43 KB, patch
catlee
: review+
Details | Diff | Splinter Review
3.38 KB, patch
catlee
: review+
Details | Diff | Splinter Review
3.64 KB, patch
catlee
: review+
Details | Diff | Splinter Review
1000 bytes, patch
Callek
: review+
Details | Diff | Splinter Review
1.35 KB, patch
dustin
: review+
Details | Diff | Splinter Review
93.22 KB, patch
Details | Diff | Splinter Review
1.61 KB, patch
dustin
: review+
Details | Diff | Splinter Review
1.61 KB, patch
dustin
: review+
Details | Diff | Splinter Review
4.04 KB, patch
dustin
: review+
Details | Diff | Splinter Review
4.88 KB, patch
dustin
: review+
Details | Diff | Splinter Review
(Reporter)

Description

3 years ago
Currently all our instances run puppet on boot, which requires valid forward and reverse DNS. This is sub-optimal for several reasons:

- Running puppet on boot means we're waiting longer before we can get real work done on the machine
- Adding new instances is painful since it takes 10-20 minutes for DNS changes to propagate
- Inventory and AWS get out-of-sync easily if instances are being added/deleted.
- We need to keep a pool of detached network interfaces to allocate to spot instances. This artificially limits how many spot instances we can have running at once, and unnecessarily complicates our code.

I'd like to re-vamp our AMI process and at the same time remove our dependency on puppet and DNS.

The process will look something like this:

- Create a base root snapshot for our target OS (e.g. Centos6.4)
- Create two boot snapshots, one for HVM and one for PV virtualization.
- Create a pair of base AMIs that have the boot snapshot as the "root" device, and the root snapshot as a second EBS volume. The boot volume mounts the root volume on boot.

(we have up to this part working)

- For our various end worker types (e.g. bld-linux64, try-linux64), create a reference instance from the base AMI.
- Run puppet on the reference instance so it gets all the required configuration, packages, etc. installed.
- Disable/remove puppet on the instance
- Create snapshot from the puppetized reference instance's root volume.
- Create new AMIs for HVM, PV using the new root snapshot and the existing boot snapshots for each virtualization type

At this point we should be left with AMIs that we can create directly for ondemand or spot and don't require puppet or DNS to function properly.
(Assignee)

Comment 1

3 years ago
We also should figure out how to allocate/release hostnames used by buildslave to connect to masters. Relasing may be tricky for spot instances. Maybe we need a service to check used but dead hostnames.
Have you considered using Packer to build AMIs from templates?
http://www.packer.io/intro

You could also reuse existing puppet manifests:
http://www.packer.io/docs/provisioners/puppet-masterless.html
(Assignee)

Updated

3 years ago
Assignee: nobody → rail
(Assignee)

Updated

3 years ago
Depends on: 989814
If you don't use CentOS 6.2, please use CentOS 6.5, since that's what we'll be supporting on onsite hardware.  Preferably that would be created from the repos in puppet, so we don't have minor/release version differences between AWS and onsite.
(:thumbsup: for the idea by the way!)
(Assignee)

Updated

3 years ago
Depends on: 1001714
(Assignee)

Comment 5

3 years ago
Created attachment 8418374 [details] [diff] [review]
aws_publish_amis.diff

publish all available AMIs somewhere accessible from everywhere!
Attachment #8418374 - Flags: review?(catlee)
(Reporter)

Updated

3 years ago
Attachment #8418374 - Flags: review?(catlee) → review+
(Assignee)

Comment 6

3 years ago
Comment on attachment 8418374 [details] [diff] [review]
aws_publish_amis.diff

https://hg.mozilla.org/build/cloud-tools/rev/5a73c7cf1d5f
Attachment #8418374 - Flags: checked-in+
(Assignee)

Comment 7

3 years ago
Created attachment 8418384 [details] [diff] [review]
puppet_aws_publish_amis.diff

Enable publishing
Attachment #8418384 - Flags: review?(catlee)
For my reference -- the AMIs themselves are *not* public, just the https://s3.amazonaws.com/mozilla-releng-amis/amis.json file?
(Assignee)

Comment 9

3 years ago
(In reply to Dustin J. Mitchell [:dustin] (PTO until ~5/20) from comment #8)
> For my reference -- the AMIs themselves are *not* public, just the
> https://s3.amazonaws.com/mozilla-releng-amis/amis.json file?

Correct, the AMIs may have some secrets.
(Assignee)

Comment 10

3 years ago
Created attachment 8418823 [details] [diff] [review]
configs

To make the testing part simpler I'd prefer to add some slaves not backed by network interfaces. Still need to add them to slavealloc.
Attachment #8418823 - Flags: review?(catlee)
(Assignee)

Comment 11

3 years ago
Created attachment 8418826 [details] [diff] [review]
configs

err, some garbage removed
Attachment #8418823 - Attachment is obsolete: true
Attachment #8418823 - Flags: review?(catlee)
Attachment #8418826 - Flags: review?(catlee)
(Assignee)

Comment 12

3 years ago
Created attachment 8418832 [details] [diff] [review]
configs

Bah, padding!
Attachment #8418826 - Attachment is obsolete: true
Attachment #8418826 - Flags: review?(catlee)
Attachment #8418832 - Flags: review?(catlee)
(Reporter)

Updated

3 years ago
Attachment #8418384 - Flags: review?(catlee) → review+
(Reporter)

Updated

3 years ago
Attachment #8418832 - Flags: review?(catlee) → review+
(Assignee)

Comment 13

3 years ago
Comment on attachment 8418832 [details] [diff] [review]
configs

https://hg.mozilla.org/build/buildbot-configs/rev/84f9cd267546
Attachment #8418832 - Flags: checked-in+
(Assignee)

Comment 14

3 years ago
Comment on attachment 8418384 [details] [diff] [review]
puppet_aws_publish_amis.diff

remote:   https://hg.mozilla.org/build/puppet/rev/19b038a962c5
remote:   https://hg.mozilla.org/build/puppet/rev/3e6c51f3e8ec
Attachment #8418384 - Flags: checked-in+
(Assignee)

Comment 15

3 years ago
(In reply to Rail Aliiev [:rail] from comment #14)
> Comment on attachment 8418384 [details] [diff] [review]
> puppet_aws_publish_amis.diff
> 
> remote:   https://hg.mozilla.org/build/puppet/rev/19b038a962c5
> remote:   https://hg.mozilla.org/build/puppet/rev/3e6c51f3e8ec

I had to adjust IAM policies for aws-manager user and added s3:* actions for the mozilla-releng-amis bucket.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1399488079000",
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "arn:aws:s3:::mozilla-releng-amis/*"
      ]
    }
  ]
}
(Assignee)

Comment 16

3 years ago
I added the "golden" DNS entries in both regions, just in case (we will be copying the AMIs across the regions):

invtool A create --ip 10.134.49.65 --fqdn try-linux64-ec2-golden.build.releng.use1.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.134.49.65 --target try-linux64-ec2-golden.build.releng.use1.mozilla.com  --private --description "Golden AMI"
invtool A create --ip 10.134.49.4 --fqdn tst-linux64-ec2-golden.build.releng.use1.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.134.49.4 --target tst-linux64-ec2-golden.build.releng.use1.mozilla.com  --private --description "Golden AMI"
invtool A create --ip 10.134.49.89 --fqdn tst-linux32-ec2-golden.build.releng.use1.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.134.49.89 --target tst-linux32-ec2-golden.build.releng.use1.mozilla.com  --private --description "Golden AMI"
invtool A create --ip 10.132.49.90 --fqdn try-linux64-ec2-golden.build.releng.usw2.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.132.49.90 --target try-linux64-ec2-golden.build.releng.usw2.mozilla.com  --private --description "Golden AMI"
invtool A create --ip 10.132.50.36 --fqdn tst-linux64-ec2-golden.build.releng.usw2.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.132.50.36 --target tst-linux64-ec2-golden.build.releng.usw2.mozilla.com  --private --description "Golden AMI"
invtool A create --ip 10.132.49.98 --fqdn tst-linux32-ec2-golden.build.releng.usw2.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.132.49.98 --target tst-linux32-ec2-golden.build.releng.usw2.mozilla.com  --private --description "Golden AMI"
Merged and deployed to production.
(Assignee)

Updated

3 years ago
Depends on: 1007967
(Assignee)

Updated

3 years ago
Depends on: 1011257
(Assignee)

Updated

3 years ago
Depends on: 1008241
(Assignee)

Comment 18

3 years ago
Created attachment 8427257 [details] [diff] [review]
Use spot request tags

in the new world we won't be using network interface tags. This code works fine for the current setup as well.
Attachment #8427257 - Flags: review?(catlee)
(Reporter)

Updated

3 years ago
Attachment #8427257 - Flags: review?(catlee) → review+
(Assignee)

Comment 19

3 years ago
Comment on attachment 8427257 [details] [diff] [review]
Use spot request tags

https://hg.mozilla.org/build/cloud-tools/rev/e7161a6a9cd9
Attachment #8427257 - Flags: checked-in+
(Assignee)

Comment 20

3 years ago
Created attachment 8429331 [details] [diff] [review]
no-puppet2-cloud-tools.diff

This version is in semi-production now (running in parallel with some hacks to avoid collisions). Still need to address some inline todos and remove some of them when the code lands.
Attachment #8429331 - Flags: feedback?(catlee)
(Assignee)

Updated

3 years ago
Depends on: 1016579
(Assignee)

Updated

3 years ago
Depends on: 1017634
(Assignee)

Comment 21

3 years ago
Created attachment 8432480 [details] [diff] [review]
no-puppet2-cloud-tools-1.diff
Attachment #8429331 - Attachment is obsolete: true
Attachment #8429331 - Flags: feedback?(catlee)
Attachment #8432480 - Flags: feedback?(catlee)
(Assignee)

Updated

3 years ago
Depends on: 1019013
(Assignee)

Comment 22

3 years ago
Current spot capacity (evenly split across 2 regions):

tst-linux64: 200 old + 900 new
tst-linux32: 200 old + 700 new
bld-linux64: 200 old + 300 new
try-linux64: 200 old + 300 new
(Assignee)

Comment 23

3 years ago
I deleted the following ranges of network interfaces to free up some IP space for new style instances:

tst-linux64-spot-600..999, leaving 300 network interfaces per region
tst-linux64-spot-600..799, leaving 300 network interfaces per region

If everything goes as expected, I'm going to shrink the range again today.
(Assignee)

Comment 24

3 years ago
moved the following ranges:

bld-linux64-spot-001..099
bld-linux64-spot-300..399
try-linux64-spot-001..099
try-linux64-spot-300..399

At this point all bld and try instances are supposed to use the new system
(Assignee)

Updated

3 years ago
Blocks: 1019869
(Assignee)

Comment 25

3 years ago
Created attachment 8433755 [details] [diff] [review]
stop running instance2ami

no need to run this anymore
Attachment #8433755 - Flags: review?(dustin)
(Assignee)

Comment 26

3 years ago
Created attachment 8433756 [details] [diff] [review]
0002-Remove-the-cronjob.patch

once it's deleted we can delete the code
Attachment #8433756 - Flags: review?(dustin)
Comment on attachment 8433755 [details] [diff] [review]
stop running instance2ami

r- due to http://mxr.mozilla.org/build/source/puppet/modules/aws_manager/manifests/cron.pp#61 existing, but change that to absent and test it does what we think with a --noop run and you can have a "I don't need to see this again" r+
Attachment #8433755 - Flags: review?(dustin) → review-
Comment on attachment 8433755 [details] [diff] [review]
stop running instance2ami

err ignore me
Attachment #8433755 - Flags: review- → review+
(Assignee)

Comment 29

3 years ago
Comment on attachment 8433755 [details] [diff] [review]
stop running instance2ami

remote:   https://hg.mozilla.org/build/puppet/rev/6d48aa7a1c09
remote:   https://hg.mozilla.org/build/puppet/rev/bdbe9d44e822
Attachment #8433755 - Flags: checked-in+
Attachment #8433756 - Flags: review?(dustin) → review+
(Assignee)

Comment 30

3 years ago
Created attachment 8434157 [details] [review]
use slavealloc for reportor
Attachment #8434157 - Flags: review?(bhearsum)
(Assignee)

Comment 31

3 years ago
Comment on attachment 8433756 [details] [diff] [review]
0002-Remove-the-cronjob.patch

remote:   https://hg.mozilla.org/build/puppet/rev/b914ae871f39
remote:   https://hg.mozilla.org/build/puppet/rev/b0bb38b07cda
Attachment #8433756 - Flags: checked-in+
Comment on attachment 8434157 [details] [review]
use slavealloc for reportor

Looks like catlee merged this already...
Attachment #8434157 - Attachment is obsolete: true
Attachment #8434157 - Flags: review?(bhearsum)
(Assignee)

Comment 33

3 years ago
Created attachment 8434905 [details] [diff] [review]
no-puppet2-cloud-tools-1.diff

The current version. I plan to land this version today and stop the one running in parallel.
Attachment #8432480 - Attachment is obsolete: true
Attachment #8432480 - Flags: feedback?(catlee)
(Assignee)

Comment 34

3 years ago
Comment on attachment 8434905 [details] [diff] [review]
no-puppet2-cloud-tools-1.diff

https://hg.mozilla.org/build/cloud-tools/rev/2aff440e33b0
Attachment #8434905 - Flags: checked-in+
(Assignee)

Comment 35

3 years ago
Created attachment 8435001 [details] [diff] [review]
watch_pending_puppet.diff

certs no more!
Attachment #8435001 - Flags: review?(dustin)
Comment on attachment 8435001 [details] [diff] [review]
watch_pending_puppet.diff

\o/
Attachment #8435001 - Flags: review?(dustin) → review+
(Assignee)

Comment 37

3 years ago
Comment on attachment 8435001 [details] [diff] [review]
watch_pending_puppet.diff

remote:   https://hg.mozilla.org/build/puppet/rev/10f5fdcc56b7
remote:   https://hg.mozilla.org/build/puppet/rev/5158abe705cf
Attachment #8435001 - Flags: checked-in+
(Assignee)

Updated

3 years ago
Blocks: 1022368
(Assignee)

Comment 38

3 years ago
Added a wiki page regarding generated AMIs: https://wiki.mozilla.org/ReleaseEngineering/How_To/Manage_spot_AMIs
(Assignee)

Comment 39

3 years ago
Created attachment 8440788 [details] [diff] [review]
delete_amis-puppet.diff

Deletes old AMIs (leaving last 10) once a day
Attachment #8440788 - Flags: review?(dustin)
Attachment #8440788 - Flags: review?(dustin) → review+
(Assignee)

Comment 40

3 years ago
Comment on attachment 8440788 [details] [diff] [review]
delete_amis-puppet.diff

remote:   https://hg.mozilla.org/build/puppet/rev/13efd6060f41
remote:   https://hg.mozilla.org/build/puppet/rev/2c386824150f
Attachment #8440788 - Flags: checked-in+
(Assignee)

Comment 41

3 years ago
Created attachment 8441546 [details] [diff] [review]
amis-puppet.diff

The last piece!
Attachment #8441546 - Flags: review?(dustin)
Comment on attachment 8441546 [details] [diff] [review]
amis-puppet.diff

Will it be annoying to have to add new slave types here?
Attachment #8441546 - Flags: review?(dustin) → review+
(Assignee)

Comment 43

3 years ago
(In reply to Dustin J. Mitchell [:dustin] from comment #42)
> Comment on attachment 8441546 [details] [diff] [review]
> amis-puppet.diff
> 
> Will it be annoying to have to add new slave types here?

Not a big deal now. We can refactor the code in the future to use a config file.
(Assignee)

Comment 44

3 years ago
Comment on attachment 8441546 [details] [diff] [review]
amis-puppet.diff

remote:   https://hg.mozilla.org/build/puppet/rev/d296651583b6
remote:   https://hg.mozilla.org/build/puppet/rev/548945c8a4c1
Attachment #8441546 - Flags: checked-in+
(Assignee)

Comment 45

3 years ago
Created attachment 8442034 [details] [diff] [review]
fix paths

err, forgot to adjust according to production cwd.
Attachment #8442034 - Flags: review?(dustin)
Comment on attachment 8442034 [details] [diff] [review]
fix paths

Not a change I really understand, but from a puppet perspective this is fine.
Attachment #8442034 - Flags: review?(dustin) → review+
(Assignee)

Comment 47

3 years ago
Comment on attachment 8442034 [details] [diff] [review]
fix paths

https://hg.mozilla.org/build/puppet/rev/3c949a7ef18e
Attachment #8442034 - Flags: checked-in+
(Assignee)

Comment 48

3 years ago
All works fine here! \o/
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.