Closed
Bug 986477
Opened 11 years ago
Closed 11 years ago
Don't require puppet or DNS to launch new instances
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: rail)
References
Details
Attachments
(11 files, 5 obsolete files)
2.99 KB,
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
1.43 KB,
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
3.38 KB,
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
3.64 KB,
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
1000 bytes,
patch
|
Callek
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
1.35 KB,
patch
|
dustin
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
93.22 KB,
patch
|
rail
:
checked-in+
|
Details | Diff | Splinter Review |
1.61 KB,
patch
|
dustin
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
1.61 KB,
patch
|
dustin
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
4.04 KB,
patch
|
dustin
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
4.88 KB,
patch
|
dustin
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
Currently all our instances run puppet on boot, which requires valid forward and reverse DNS. This is sub-optimal for several reasons:
- Running puppet on boot means we're waiting longer before we can get real work done on the machine
- Adding new instances is painful since it takes 10-20 minutes for DNS changes to propagate
- Inventory and AWS get out-of-sync easily if instances are being added/deleted.
- We need to keep a pool of detached network interfaces to allocate to spot instances. This artificially limits how many spot instances we can have running at once, and unnecessarily complicates our code.
I'd like to re-vamp our AMI process and at the same time remove our dependency on puppet and DNS.
The process will look something like this:
- Create a base root snapshot for our target OS (e.g. Centos6.4)
- Create two boot snapshots, one for HVM and one for PV virtualization.
- Create a pair of base AMIs that have the boot snapshot as the "root" device, and the root snapshot as a second EBS volume. The boot volume mounts the root volume on boot.
(we have up to this part working)
- For our various end worker types (e.g. bld-linux64, try-linux64), create a reference instance from the base AMI.
- Run puppet on the reference instance so it gets all the required configuration, packages, etc. installed.
- Disable/remove puppet on the instance
- Create snapshot from the puppetized reference instance's root volume.
- Create new AMIs for HVM, PV using the new root snapshot and the existing boot snapshots for each virtualization type
At this point we should be left with AMIs that we can create directly for ondemand or spot and don't require puppet or DNS to function properly.
Assignee | ||
Comment 1•11 years ago
|
||
We also should figure out how to allocate/release hostnames used by buildslave to connect to masters. Relasing may be tricky for spot instances. Maybe we need a service to check used but dead hostnames.
Comment 2•11 years ago
|
||
Have you considered using Packer to build AMIs from templates?
http://www.packer.io/intro
You could also reuse existing puppet manifests:
http://www.packer.io/docs/provisioners/puppet-masterless.html
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → rail
Comment 3•11 years ago
|
||
If you don't use CentOS 6.2, please use CentOS 6.5, since that's what we'll be supporting on onsite hardware. Preferably that would be created from the repos in puppet, so we don't have minor/release version differences between AWS and onsite.
Comment 4•11 years ago
|
||
(:thumbsup: for the idea by the way!)
Assignee | ||
Comment 5•11 years ago
|
||
publish all available AMIs somewhere accessible from everywhere!
Attachment #8418374 -
Flags: review?(catlee)
Reporter | ||
Updated•11 years ago
|
Attachment #8418374 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 6•11 years ago
|
||
Comment on attachment 8418374 [details] [diff] [review]
aws_publish_amis.diff
https://hg.mozilla.org/build/cloud-tools/rev/5a73c7cf1d5f
Attachment #8418374 -
Flags: checked-in+
Comment 8•11 years ago
|
||
For my reference -- the AMIs themselves are *not* public, just the https://s3.amazonaws.com/mozilla-releng-amis/amis.json file?
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (PTO until ~5/20) from comment #8)
> For my reference -- the AMIs themselves are *not* public, just the
> https://s3.amazonaws.com/mozilla-releng-amis/amis.json file?
Correct, the AMIs may have some secrets.
Assignee | ||
Comment 10•11 years ago
|
||
To make the testing part simpler I'd prefer to add some slaves not backed by network interfaces. Still need to add them to slavealloc.
Attachment #8418823 -
Flags: review?(catlee)
Assignee | ||
Comment 11•11 years ago
|
||
err, some garbage removed
Attachment #8418823 -
Attachment is obsolete: true
Attachment #8418823 -
Flags: review?(catlee)
Attachment #8418826 -
Flags: review?(catlee)
Assignee | ||
Comment 12•11 years ago
|
||
Bah, padding!
Attachment #8418826 -
Attachment is obsolete: true
Attachment #8418826 -
Flags: review?(catlee)
Attachment #8418832 -
Flags: review?(catlee)
Reporter | ||
Updated•11 years ago
|
Attachment #8418384 -
Flags: review?(catlee) → review+
Reporter | ||
Updated•11 years ago
|
Attachment #8418832 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 13•11 years ago
|
||
Comment on attachment 8418832 [details] [diff] [review]
configs
https://hg.mozilla.org/build/buildbot-configs/rev/84f9cd267546
Attachment #8418832 -
Flags: checked-in+
Assignee | ||
Comment 14•11 years ago
|
||
Comment on attachment 8418384 [details] [diff] [review]
puppet_aws_publish_amis.diff
remote: https://hg.mozilla.org/build/puppet/rev/19b038a962c5
remote: https://hg.mozilla.org/build/puppet/rev/3e6c51f3e8ec
Attachment #8418384 -
Flags: checked-in+
Assignee | ||
Comment 15•11 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #14)
> Comment on attachment 8418384 [details] [diff] [review]
> puppet_aws_publish_amis.diff
>
> remote: https://hg.mozilla.org/build/puppet/rev/19b038a962c5
> remote: https://hg.mozilla.org/build/puppet/rev/3e6c51f3e8ec
I had to adjust IAM policies for aws-manager user and added s3:* actions for the mozilla-releng-amis bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1399488079000",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::mozilla-releng-amis/*"
]
}
]
}
Assignee | ||
Comment 16•11 years ago
|
||
I added the "golden" DNS entries in both regions, just in case (we will be copying the AMIs across the regions):
invtool A create --ip 10.134.49.65 --fqdn try-linux64-ec2-golden.build.releng.use1.mozilla.com --private --description "Golden AMI"
invtool PTR create --ip 10.134.49.65 --target try-linux64-ec2-golden.build.releng.use1.mozilla.com --private --description "Golden AMI"
invtool A create --ip 10.134.49.4 --fqdn tst-linux64-ec2-golden.build.releng.use1.mozilla.com --private --description "Golden AMI"
invtool PTR create --ip 10.134.49.4 --target tst-linux64-ec2-golden.build.releng.use1.mozilla.com --private --description "Golden AMI"
invtool A create --ip 10.134.49.89 --fqdn tst-linux32-ec2-golden.build.releng.use1.mozilla.com --private --description "Golden AMI"
invtool PTR create --ip 10.134.49.89 --target tst-linux32-ec2-golden.build.releng.use1.mozilla.com --private --description "Golden AMI"
invtool A create --ip 10.132.49.90 --fqdn try-linux64-ec2-golden.build.releng.usw2.mozilla.com --private --description "Golden AMI"
invtool PTR create --ip 10.132.49.90 --target try-linux64-ec2-golden.build.releng.usw2.mozilla.com --private --description "Golden AMI"
invtool A create --ip 10.132.50.36 --fqdn tst-linux64-ec2-golden.build.releng.usw2.mozilla.com --private --description "Golden AMI"
invtool PTR create --ip 10.132.50.36 --target tst-linux64-ec2-golden.build.releng.usw2.mozilla.com --private --description "Golden AMI"
invtool A create --ip 10.132.49.98 --fqdn tst-linux32-ec2-golden.build.releng.usw2.mozilla.com --private --description "Golden AMI"
invtool PTR create --ip 10.132.49.98 --target tst-linux32-ec2-golden.build.releng.usw2.mozilla.com --private --description "Golden AMI"
Comment 17•11 years ago
|
||
Merged and deployed to production.
Assignee | ||
Comment 18•11 years ago
|
||
in the new world we won't be using network interface tags. This code works fine for the current setup as well.
Attachment #8427257 -
Flags: review?(catlee)
Reporter | ||
Updated•11 years ago
|
Attachment #8427257 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 19•11 years ago
|
||
Comment on attachment 8427257 [details] [diff] [review]
Use spot request tags
https://hg.mozilla.org/build/cloud-tools/rev/e7161a6a9cd9
Attachment #8427257 -
Flags: checked-in+
Assignee | ||
Comment 20•11 years ago
|
||
This version is in semi-production now (running in parallel with some hacks to avoid collisions). Still need to address some inline todos and remove some of them when the code lands.
Attachment #8429331 -
Flags: feedback?(catlee)
Assignee | ||
Comment 21•11 years ago
|
||
Attachment #8429331 -
Attachment is obsolete: true
Attachment #8429331 -
Flags: feedback?(catlee)
Attachment #8432480 -
Flags: feedback?(catlee)
Assignee | ||
Comment 22•11 years ago
|
||
Current spot capacity (evenly split across 2 regions):
tst-linux64: 200 old + 900 new
tst-linux32: 200 old + 700 new
bld-linux64: 200 old + 300 new
try-linux64: 200 old + 300 new
Assignee | ||
Comment 23•11 years ago
|
||
I deleted the following ranges of network interfaces to free up some IP space for new style instances:
tst-linux64-spot-600..999, leaving 300 network interfaces per region
tst-linux64-spot-600..799, leaving 300 network interfaces per region
If everything goes as expected, I'm going to shrink the range again today.
Assignee | ||
Comment 24•11 years ago
|
||
moved the following ranges:
bld-linux64-spot-001..099
bld-linux64-spot-300..399
try-linux64-spot-001..099
try-linux64-spot-300..399
At this point all bld and try instances are supposed to use the new system
Assignee | ||
Comment 25•11 years ago
|
||
no need to run this anymore
Attachment #8433755 -
Flags: review?(dustin)
Assignee | ||
Comment 26•11 years ago
|
||
once it's deleted we can delete the code
Attachment #8433756 -
Flags: review?(dustin)
Comment 27•11 years ago
|
||
Comment on attachment 8433755 [details] [diff] [review]
stop running instance2ami
r- due to http://mxr.mozilla.org/build/source/puppet/modules/aws_manager/manifests/cron.pp#61 existing, but change that to absent and test it does what we think with a --noop run and you can have a "I don't need to see this again" r+
Attachment #8433755 -
Flags: review?(dustin) → review-
Comment 28•11 years ago
|
||
Comment on attachment 8433755 [details] [diff] [review]
stop running instance2ami
err ignore me
Attachment #8433755 -
Flags: review- → review+
Assignee | ||
Comment 29•11 years ago
|
||
Comment on attachment 8433755 [details] [diff] [review]
stop running instance2ami
remote: https://hg.mozilla.org/build/puppet/rev/6d48aa7a1c09
remote: https://hg.mozilla.org/build/puppet/rev/bdbe9d44e822
Attachment #8433755 -
Flags: checked-in+
Updated•11 years ago
|
Attachment #8433756 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 30•11 years ago
|
||
Attachment #8434157 -
Flags: review?(bhearsum)
Assignee | ||
Comment 31•11 years ago
|
||
Comment on attachment 8433756 [details] [diff] [review]
0002-Remove-the-cronjob.patch
remote: https://hg.mozilla.org/build/puppet/rev/b914ae871f39
remote: https://hg.mozilla.org/build/puppet/rev/b0bb38b07cda
Attachment #8433756 -
Flags: checked-in+
Comment 32•11 years ago
|
||
Comment on attachment 8434157 [details] [review]
use slavealloc for reportor
Looks like catlee merged this already...
Attachment #8434157 -
Attachment is obsolete: true
Attachment #8434157 -
Flags: review?(bhearsum)
Assignee | ||
Comment 33•11 years ago
|
||
The current version. I plan to land this version today and stop the one running in parallel.
Attachment #8432480 -
Attachment is obsolete: true
Attachment #8432480 -
Flags: feedback?(catlee)
Assignee | ||
Comment 34•11 years ago
|
||
Comment on attachment 8434905 [details] [diff] [review]
no-puppet2-cloud-tools-1.diff
https://hg.mozilla.org/build/cloud-tools/rev/2aff440e33b0
Attachment #8434905 -
Flags: checked-in+
Comment 36•11 years ago
|
||
Comment on attachment 8435001 [details] [diff] [review]
watch_pending_puppet.diff
\o/
Attachment #8435001 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 37•11 years ago
|
||
Comment on attachment 8435001 [details] [diff] [review]
watch_pending_puppet.diff
remote: https://hg.mozilla.org/build/puppet/rev/10f5fdcc56b7
remote: https://hg.mozilla.org/build/puppet/rev/5158abe705cf
Attachment #8435001 -
Flags: checked-in+
Assignee | ||
Comment 38•11 years ago
|
||
Added a wiki page regarding generated AMIs: https://wiki.mozilla.org/ReleaseEngineering/How_To/Manage_spot_AMIs
Assignee | ||
Comment 39•11 years ago
|
||
Deletes old AMIs (leaving last 10) once a day
Attachment #8440788 -
Flags: review?(dustin)
Updated•11 years ago
|
Attachment #8440788 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 40•11 years ago
|
||
Comment on attachment 8440788 [details] [diff] [review]
delete_amis-puppet.diff
remote: https://hg.mozilla.org/build/puppet/rev/13efd6060f41
remote: https://hg.mozilla.org/build/puppet/rev/2c386824150f
Attachment #8440788 -
Flags: checked-in+
Comment 42•11 years ago
|
||
Comment on attachment 8441546 [details] [diff] [review]
amis-puppet.diff
Will it be annoying to have to add new slave types here?
Attachment #8441546 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 43•11 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #42)
> Comment on attachment 8441546 [details] [diff] [review]
> amis-puppet.diff
>
> Will it be annoying to have to add new slave types here?
Not a big deal now. We can refactor the code in the future to use a config file.
Assignee | ||
Comment 44•11 years ago
|
||
Comment on attachment 8441546 [details] [diff] [review]
amis-puppet.diff
remote: https://hg.mozilla.org/build/puppet/rev/d296651583b6
remote: https://hg.mozilla.org/build/puppet/rev/548945c8a4c1
Attachment #8441546 -
Flags: checked-in+
Assignee | ||
Comment 45•11 years ago
|
||
err, forgot to adjust according to production cwd.
Attachment #8442034 -
Flags: review?(dustin)
Comment 46•11 years ago
|
||
Comment on attachment 8442034 [details] [diff] [review]
fix paths
Not a change I really understand, but from a puppet perspective this is fine.
Attachment #8442034 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 47•11 years ago
|
||
Comment on attachment 8442034 [details] [diff] [review]
fix paths
https://hg.mozilla.org/build/puppet/rev/3c949a7ef18e
Attachment #8442034 -
Flags: checked-in+
Assignee | ||
Comment 48•11 years ago
|
||
All works fine here! \o/
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•