consolidate node definitions

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps: Puppet
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: dustin, Assigned: arr)

Tracking

Details

Attachments

(1 attachment, 3 obsolete attachments)

Now that all of our buildslave nodes are pretty much in scl3 or AWS, we can shorten the list quite a bit by using regexps.
(Assignee)

Comment 1

4 years ago
Created attachment 8459600 [details] [diff] [review]
bug-1040926.patch

This is a first stab at this to get verification how we should regex these. In some cases, I left things split out even though they were including the same classes, just because the regex would have been a mess. In others, I glommed together a bunch of things that we may eventually want to split out (e.g. openstack). I think I managed to cover everything that was in the file before. Feel free to also do a syntactic check, too, if desired.
Attachment #8459600 - Flags: review?(jwatkins)
Attachment #8459600 - Flags: review?(dustin)
Comment on attachment 8459600 [details] [diff] [review]
bug-1040926.patch

Review of attachment 8459600 [details] [diff] [review]:
-----------------------------------------------------------------

r- for the missing 'include diamond'.  The rest is just commentary.

::: manifests/moco-nodes.pp
@@ +22,2 @@
>  node /tst-.*\.build\.aws-.*\.mozilla\.com/ {
> +    # tst-anything in any region of the build.aws mozilla zones

I don't think there's anything left with these hostnames.  You can verify with a grep of /var/log/messages on the puppetmasters.

@@ +25,5 @@
>      include toplevel::slave::releng::test::headless
>  }
>  
>  node /tst-.*\.test\.releng\.(use1|usw2)\.mozilla\.com/ {
> +    # tst-anything in any region of the test.releng mozilla zones

Why tst.-* here, but .*-\d+ in scl3?

@@ +49,2 @@
>  node /bld-.*\.build\.releng\.(use1|usw2)\.mozilla.com/ {
> +    # any bld-(something) host in the use1 and usw2 releng build zones

This could be combined with the rule above for scl3, too

@@ +75,1 @@
>      # dev-* hosts are *always* staging

Not sure these should be combined -- the try hosts have 'include diamond', and not the dev hosts.  That include is missing here.

@@ +95,5 @@
>  
>  ## puppetmasters
>  
>  node /puppetmaster-\d+\..*\.aws-.*\.mozilla\.com/ {
> +    # all puppet masters in the legacy aws zones

These are gone, too.

@@ +119,5 @@
>  
>  ## openstack admin servers
>  
> +node /(ironic|glance|keystone|horizon|neutron)\d+\.admin\.cloud\.releng\.scl3\.mozilla\.com/ {
> +    # all openstack servers

This is probably already broken out by type in Jake's environment, so I don't see a great benefit to this cleanup.

@@ +167,5 @@
> +    # RPM and DPKG package servers
> +    include toplevel::server::pkgbuilder
> +}
> +
> +node /celery\d+.srv.releng.scl3.mozilla.com/ {

Needs a comment

@@ +287,4 @@
>      include toplevel::server::buildmaster::mozilla
>  }
>  
> +node /buildbot-master(5[5-9]|6[0-5])\.srv\.releng\.(scl3|use1|usw2)\.mozilla\.com" {

I'd be interested to hear commentary from releng folks on this -- I think the "free masters" are instances which are just waiting for an assignment to a pool,  so the most common change will be to assign one or a few to a pool.  As-is, that's just filling in some node defs, but with this patch it involves editing some regexps - a bit more error-prone.
Attachment #8459600 - Flags: review?(dustin) → review-
Assignee: relops → arich
Comment on attachment 8459600 [details] [diff] [review]
bug-1040926.patch

Review of attachment 8459600 [details] [diff] [review]:
-----------------------------------------------------------------

::: manifests/moco-nodes.pp
@@ +79,1 @@
>      include instance_metadata::diamond

There's an `include diamond` in here, it just doesn't show up in the diff.
Attachment #8459600 - Flags: review- → review+
Comment on attachment 8459600 [details] [diff] [review]
bug-1040926.patch

Review of attachment 8459600 [details] [diff] [review]:
-----------------------------------------------------------------

::: manifests/moco-nodes.pp
@@ +287,4 @@
>      include toplevel::server::buildmaster::mozilla
>  }
>  
> +node /buildbot-master(5[5-9]|6[0-5])\.srv\.releng\.(scl3|use1|usw2)\.mozilla\.com" {

fwiw I agree, trying to edit this regex when needing to "take" a free master for other work will be pretty error prone, and harder to identify which colo zone a specific master number is. (e.g. if I need an use and a usw master for a specific slave type, I won't easily know which ones to use, with the new format.
(Assignee)

Comment 5

4 years ago
Created attachment 8459633 [details] [diff] [review]
bug-1040926-1.patch

According to the use1 and usw2 puppet logs, the following machines are still compiling catalogs:

bld-linux64-ec2-001.build.releng.use1.mozilla.com
bld-linux64-ec2-023.build.releng.use1.mozilla.com
bld-linux64-ec2-025.build.releng.use1.mozilla.com
bld-linux64-ec2-045.build.releng.use1.mozilla.com
bld-linux64-ec2-golden.build.releng.use1.mozilla.com
buildbot-master01.srv.releng.use1.mozilla.com
buildbot-master02.srv.releng.use1.mozilla.com
buildbot-master03.srv.releng.use1.mozilla.com
buildbot-master04.srv.releng.usw2.mozilla.com
buildbot-master05.srv.releng.usw2.mozilla.com
buildbot-master06.srv.releng.usw2.mozilla.com
buildbot-master113.srv.releng.use1.mozilla.com
buildbot-master114.srv.releng.use1.mozilla.com
buildbot-master115.srv.releng.usw2.mozilla.com
buildbot-master116.srv.releng.usw2.mozilla.com
buildbot-master117.bb.releng.use1.mozilla.com
buildbot-master118.bb.releng.usw2.mozilla.com
buildbot-master51.srv.releng.use1.mozilla.com
buildbot-master52.srv.releng.use1.mozilla.com
buildbot-master53.srv.releng.usw2.mozilla.com
buildbot-master54.srv.releng.usw2.mozilla.com
buildbot-master66.srv.releng.usw2.mozilla.com
buildbot-master67.srv.releng.use1.mozilla.com
buildbot-master68.srv.releng.usw2.mozilla.com
buildbot-master69.srv.releng.use1.mozilla.com
buildbot-master70.srv.releng.use1.mozilla.com
buildbot-master71.srv.releng.use1.mozilla.com
buildbot-master72.srv.releng.usw2.mozilla.com
buildbot-master73.srv.releng.usw2.mozilla.com
buildbot-master74.srv.releng.usw2.mozilla.com
buildbot-master75.srv.releng.use1.mozilla.com
buildbot-master76.srv.releng.use1.mozilla.com
buildbot-master77.srv.releng.use1.mozilla.com
buildbot-master78.srv.releng.usw2.mozilla.com
buildbot-master79.srv.releng.usw2.mozilla.com
buildbot-master91.srv.releng.usw2.mozilla.com
buildbot-master94.srv.releng.use1.mozilla.com
dev-linux64-ec2-yurenju.dev.releng.use1.mozilla.com
releng-puppet1.srv.releng.use1.mozilla.com
releng-puppet1.srv.releng.usw2.mozilla.com
releng-puppet2.srv.releng.use1.mozilla.com
releng-puppet2.srv.releng.usw2.mozilla.com
try-linux64-ec2-golden.try.releng.use1.mozilla.com
tst-emulator64-ec2-golden.test.releng.use1.mozilla.com
tst-linux32-ec2-golden.test.releng.use1.mozilla.com
tst-linux64-ec2-golden.test.releng.use1.mozilla.com

And one of the scl3 puppet masters is also taking care of:
dev-linux64-ec2-iconnolly.dev.releng.use1.mozilla.com


Per that, I've removed:

node /tst-.*\.build\.aws-.*\.mozilla\.com/
node /puppetmaster-\d+\..*\.aws-.*\.mozilla\.com/

This is also why there's a .* instead of a digit match (golden images, etc) for:
node /tst-.*\.test\.releng\.(use1|usw2)\.mozilla\.com/


node /bld-.*\.build\.releng\.(use1|usw2)\.mozilla.com/ and  /b-linux64-\w+-\d+.build.releng.scl3.mozilla.com/ differ in that the former also includes diamond and instance_metadata::diamond.

The mess that is the buildbot-master node definitions makes me sad, but I'll defer to bug 891859 to clean that up and leave the separate node defs in place.

Jake, what about the casper and openstack node defs? Are they split out in your env just waiting to be checked in?
Attachment #8459600 - Attachment is obsolete: true
Attachment #8459600 - Flags: review?(jwatkins)
Attachment #8459633 - Flags: review?(jwatkins)
Attachment #8459633 - Flags: review?(dustin)
Attachment #8459633 - Flags: review?(dustin) → review+
(In reply to Amy Rich [:arich] [:arr] from comment #5)
> 
> Jake, what about the casper and openstack node defs? Are they split out in
> your env just waiting to be checked in?

As for the openstack node defs, yes, each of those defs distinguished by different toplevel classes in my puppet env.  They will get refactored when I land them and will probably be consolidated to a few server defs (eg controllers , compute nodes and network nodes) since a server per api service is a bit overkill.  Regarding the casper nodes, each of the 3 node defs would have been distinguished if the puppet modules had been finished.  If we are not going to be continuing with casper we should probably remove defs and move the machines over to relabs as spares for testing and poking at.
Attachment #8459633 - Flags: review?(jwatkins) → review+
(Assignee)

Comment 7

4 years ago
Created attachment 8460210 [details] [diff] [review]
bug-1040926-2.patch

Okay, here I've removed the deprecated buildbot masters, and added back the split for openstack and casper.
Attachment #8460210 - Flags: review?(jwatkins)
Attachment #8460210 - Flags: review?(dustin)
Attachment #8460210 - Flags: review?(coop)
(Assignee)

Updated

4 years ago
Attachment #8459633 - Attachment is obsolete: true

Updated

4 years ago
Duplicate of this bug: 1042019

Updated

4 years ago
Attachment #8460210 - Flags: review?(coop) → review+
Attachment #8460210 - Flags: review?(dustin) → review+
Attachment #8460210 - Flags: review?(jwatkins) → review+
(Assignee)

Comment 9

4 years ago
checked in and waiting on a merge to production (dustin, can you do the honors, please?)
Comment on attachment 8460210 [details] [diff] [review]
bug-1040926-2.patch

backed out: http://hg.mozilla.org/build/puppet/rev/a09b4e59ae96

due to snow bustage.
Attachment #8460210 - Flags: checked-in-
(Assignee)

Comment 12

4 years ago
Looking at the manifest, I'm not exactly sure why this failed.  My only speculation is that the node definition doesn't like having a wildcard match right at the beginning.  The errors it generated were:

Puppet (err): Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid tag -d.test.releng.scl3.mozilla.com

but the offending line is:

node /.*-\d+.test.releng.scl3.mozilla.com/ {

Which seems like perfectly valid syntax.

Perhaps changing this to the following would fix the problem:

node /(t|r4).*-\d+\.test\.releng\.scl3\.mozilla\.com/ {
(Assignee)

Comment 14

4 years ago
Created attachment 8460385 [details] [diff] [review]
bug-1040926-3.patch

Okay, based on the assumption that it's not correctly expanding the node def regex that begins with a wild card, I've added a t in front of the testers regex and re-split r4-mini-001 back out into its own node def.

I tested on t-snow-r4-0058 (disabled for being crashy, and was down), and r4-mini-001, and neither exhibited the tags error.
Attachment #8460210 - Attachment is obsolete: true
Attachment #8460385 - Flags: review?(dustin)
Comment on attachment 8460385 [details] [diff] [review]
bug-1040926-3.patch

Applied on top of a re-landing of the original patch
Attachment #8460385 - Flags: review?(dustin) → review+
(Assignee)

Comment 16

4 years ago
http://hg.mozilla.org/build/puppet/rev/8edd1255e11d
That change seems to have done the trick.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.