Closed Bug 552058 Opened 15 years ago Closed 15 years ago

re-architect puppet infrastructure

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [puppet][q2goal])

Attachments

(11 files, 4 obsolete files)

refactor puppet manifests, phase 1 15 years ago bhearsum@mozilla.com (:bhearsum) 162.81 KB, patch		Details \| Diff \| Splinter Review
phase 1 refactoring, updated to not re-install existing mac packages 15 years ago bhearsum@mozilla.com (:bhearsum) 164.31 KB, patch		Details \| Diff \| Splinter Review
phase 1 with consistent node names 15 years ago bhearsum@mozilla.com (:bhearsum) 166.71 KB, patch	catlee : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
spec file for Python 2.5.1 (32-bit centos) 15 years ago bhearsum@mozilla.com (:bhearsum) 605 bytes, text/plain	rail : feedback+	Details
flexible python.spec 15 years ago Rail Aliiev [:rail] 24.07 KB, text/plain	bhearsum : checked-in+	Details
fix busted linux nodes, dependencies for mac machines 15 years ago bhearsum@mozilla.com (:bhearsum) 7.77 KB, patch	bear : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
use Install_package[] instead of Package[] 15 years ago bhearsum@mozilla.com (:bhearsum) 5.84 KB, patch	bear : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
remove pkgdmg.rb dependency for darwin9 installs 15 years ago bhearsum@mozilla.com (:bhearsum) 1.85 KB, patch	bear : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
fix bad merge on staging slaves, the includes too 15 years ago bhearsum@mozilla.com (:bhearsum) 1.06 KB, patch	catlee : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
convert Linux tarballs to RPMs, a few other minor things 15 years ago bhearsum@mozilla.com (:bhearsum) 69.29 KB, patch	bear : review+ catlee : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
check-for-rpm.sh 15 years ago bhearsum@mozilla.com (:bhearsum) 599 bytes, patch	bear : review+ catlee : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
mostly finished multi-location support 15 years ago bhearsum@mozilla.com (:bhearsum) 103.20 KB, patch	rail : feedback+ bear : feedback+	Details \| Diff \| Splinter Review
multilocation, ready for review 15 years ago bhearsum@mozilla.com (:bhearsum) 114.10 KB, patch	rail : review+ catlee : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
diff of production vs staging files 15 years ago bhearsum@mozilla.com (:bhearsum) 16.59 KB, patch	catlee : review+ rail : review+	Details \| Diff \| Splinter Review
speed up host key accept script 15 years ago bhearsum@mozilla.com (:bhearsum) 726 bytes, patch	rail : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review

bhearsum@mozilla.com (:bhearsum)

Assignee

Description

•

15 years ago

Our Puppet infrastructure isn't scaling all too well, and certainly will not scale to remote datacenters. We need to make it handle high volume of slaves better, able to cope with remote datacenters, and perhaps other things. Here's a few ideas off the top of my head: * Point slaves to a dumb proxy that feeds them to one of many masters * Move off serving files over NFS to serving files with the Puppet file server I think this needs to happen before we roll out machines in a second colo.

Armen [:armenzg]

Updated

•

15 years ago

Whiteboard: [puppet]

John Ford [:jhford] CET/CEST Berlin Time

Comment 1

•

15 years ago

we are hoping to have the new scratchbox ready for deployment in the not too distant future. This is going to be a 3.0-4.5GB file. Are we going to be able to handle this with the current infrastructure?

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 2

•

15 years ago

I have no way to know for sure, but I would *guess* that it would strain the NFS server significantly. There's ways to work around it, ping me for details.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 3

•

15 years ago

I'll be driving work on this this quarter.

Assignee: nobody → bhearsum

Chris AtLee [:catlee]

Updated

•

15 years ago

Whiteboard: [puppet] → [puppet][q2goal]

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 4

•

15 years ago

Attached patch refactor puppet manifests, phase 1 (obsolete) — Details — Splinter Review

So...this is a big patch with mostly just ground work for refactoring. Here's the high level details: * Segregate nodes into platform/arch/type * Use new locations for all files ($platform_{http,file}root) * Create new fileserver configs * Kill off some unused/pointless stuff (comment out bits, {build,sandbox}-network classes that only include one thing) * Add test-manifests.sh for rudimentary testing * Add DMG creator * Switch all file {} types to use puppet:// * Switch all Mac devtools tarballs and macports repo to DMGs I've tested this on every slave type, and it works correctly in staging. Deployment is a little tricky, here's my plan: * Do final sync of /N/puppet-files-reorg/staging -> /N/puppet-files-reorg/production * Land manifests * Update production-puppet repo, fileserver symlink * Move /N/puppet-files to /N/puppet-files.old, set read-only * Move /N/puppet-files-reorg to /N/puppet-files * Do a test Puppet run on one of each slave type My next planned steps are as follows: * Convert Linux devtools tarballs to RPMs * Convert scripts or other things that depend on /N to file{} + exec {} or some other thing that doesn't need the NFS share * Separate site-production.pp into site-castro.pp and site-mpt.pp, and setup a new master in Castro to care for those slaves I may also look at reorganizing the configs as a whole, as originally planned, but that's not the most crucial bit of this.

Attachment #444703 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 5

•

15 years ago

Whoops, forgot to mention one big caveat here: All of the tarballs which were switched to DMGs will install *again*. This isn't an issue bandwidth-wise, because they'll hit the cache, but before deploying this it would be best to mark them all as already installed to avoid unnecessary installs. DMGs are tracked by creating files like: /var/db/.puppet_pkgdmg_installed_python-2.6.4.dmg so its simply a matter of touching those files....the tricky part is finding a good way to do it.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 6

•

15 years ago

Attached patch phase 1 refactoring, updated to not re-install existing mac packages (obsolete) — Details — Splinter Review

I also uncommented the cleanup at the end of create-dmg.sh.

Attachment #444703 - Attachment is obsolete: true

Attachment #444937 - Flags: review?(catlee)

Attachment #444703 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 7

•

15 years ago

Attached patch phase 1 with consistent node names — Details — Splinter Review

Attachment #444937 - Attachment is obsolete: true

Attachment #445168 - Flags: review?(catlee)

Attachment #444937 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

15 years ago

Blocks: 564914

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

15 years ago

No longer blocks: 564914

Chris AtLee [:catlee]

Updated

•

15 years ago

Attachment #445168 - Flags: review?(catlee) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 8

•

15 years ago

Comment on attachment 445168 [details] [diff] [review] phase 1 with consistent node names I landed this and am currently working through bustage.

Attachment #445168 - Flags: checked-in+

Armen [:armenzg]

Updated

•

15 years ago

Blocks: 566337

Armen [:armenzg]

Updated

•

15 years ago

Blocks: 566333

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 9

•

15 years ago

I had to fix a few bustages, mostly related to merging. I did a test on one each of all the platforms: centos 64-bit centos 32-bit darwin10 build darwin9 build darwin10 test darwin9 test fedora 64-bit fedora 32-bit After fixing all the bustage there is one now, but non-blocking issue that came up: * darwin10 slaves require updated pkgdmg.rb for Package to work correctly. Puppet syncs this file out, but because it's already running when the file is replaced any package {} checks will still fail. This sucks pretty bad for new slaves that come up since they'll almost certainly fail their first build.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 10

•

15 years ago

For posterity, here's what I've been using for staging -> production syncs of files: rsync --delete -av --include="**usr/local" --exclude=local /N/staging/ /N/production/ I'll be wrapping this with something soon.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 11

•

15 years ago

Attached file spec file for Python 2.5.1 (32-bit centos) (obsolete) — Details

Rail, I briefly tested this RPM and it seems to work fine. I'd love to hear some feedback from somebody with more experience building RPMs on the SPEC file itself, though. I tried using built-in Macros such as %setup and %configure, but %setup seems to require that %name is the same as the directory the source tarball creates. And configure seems to set way too many flags that we don't need, and screws up the --prefix we want. I had to set AutoReqProv: no because without it, the package somehow depended on /usr/local/bin/python and other non-sensible things. For Twisted, Zope, Buildbot, etc. I intend to set dependencies on the other custom packages. (eg, Twisted will have Requires: python25 I had to set the package name to 'python25' to avoid overriding the system Python install.

Attachment #446301 - Flags: feedback?(rail)

Rail Aliiev [:rail]

Comment 12

•

15 years ago

Attached file flexible python.spec — Details

(In reply to comment #11) > I tried using built-in Macros such as %setup and %configure, but %setup seems > to require that %name is the same as the directory the source tarball creates. > And configure seems to set way too many flags that we don't need, and screws up > the --prefix we want. Attaching a spec file based on the original used in Centos 5 and yours. I removed all patches because we don't need them, left the optimization flags as is, redefined _prefix, so now we can use default %configure (had to tweak _mandir, _infodir and _localstatedir, because they somehow didn't pick up _prefix). Now you need to change only 3 lines to compile an RPM for python 2.6. > I had to set AutoReqProv: no because without it, the package somehow depended > on /usr/local/bin/python and other non-sensible things. sed -i 's,/usr/local/bin/python,/usr/bin/env python,g' Lib/cgi.py solves the /usr/local problem. I'd remove AutoReqProv to be sure that we have all needed libraries installed. > For Twisted, Zope, > Buildbot, etc. I intend to set dependencies on the other custom packages. (eg, > Twisted will have Requires: python25 +1 > I had to set the package name to 'python25' to avoid overriding the system > Python install. +1 Feel free to ask a review for your specs/rpms!

Rail Aliiev [:rail]

Updated

•

15 years ago

Attachment #446301 - Flags: feedback?(rail) → feedback+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 13

•

15 years ago

Thanks for your comments, Rail. I'll give that spec file a try.

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

15 years ago

Depends on: 567149

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 14

•

15 years ago

Attached patch fix busted linux nodes, dependencies for mac machines — Details — Splinter Review

Attachment #446752 - Flags: review?(bear)

Mike Taylor [:bear]

Updated

•

15 years ago

Attachment #446752 - Flags: review?(bear) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 15

•

15 years ago

Comment on attachment 446752 [details] [diff] [review] fix busted linux nodes, dependencies for mac machines changeset: 154:95f0f0595a8a

Attachment #446752 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 16

•

15 years ago

Attached patch use Install_package[] instead of Package[] — Details — Splinter Review

Apparently resources within functions are private, and cannot be accessed outside of them.

Attachment #446760 - Flags: review?(bear)

Mike Taylor [:bear]

Updated

•

15 years ago

Attachment #446760 - Flags: review?(bear) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 17

•

15 years ago

Comment on attachment 446760 [details] [diff] [review] use Install_package[] instead of Package[] changeset: 155:5576ec374af9

Attachment #446760 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 18

•

15 years ago

Attached patch remove pkgdmg.rb dependency for darwin9 installs — Details — Splinter Review

I was overzealous in adding these, apparently. We only need to upgrade pkgdmg.rb on 10.6, thus we only need Package {}'s to depend on it in 10.6.

Attachment #446764 - Flags: review?(bear)

Mike Taylor [:bear]

Updated

•

15 years ago

Attachment #446764 - Flags: review?(bear) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 19

•

15 years ago

Comment on attachment 446764 [details] [diff] [review] remove pkgdmg.rb dependency for darwin9 installs changeset: 156:622b58fa049d

Attachment #446764 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 20

•

15 years ago

Attached patch fix bad merge on staging slaves, the includes too — Details — Splinter Review

Attachment #446769 - Flags: review?(catlee)

Chris AtLee [:catlee]

Updated

•

15 years ago

Attachment #446769 - Flags: review?(catlee) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 21

•

15 years ago

Comment on attachment 446769 [details] [diff] [review] fix bad merge on staging slaves, the includes too changeset: 157:b8a823a3a9c7

Attachment #446769 - Flags: checked-in+

Nick Thomas [:nthomas] (UTC+12)

Comment 22

•

15 years ago

Some (all?) of the darwin9 slaves have this sort of log: notice: Starting catalog run notice: //Node[build]/base/osx/Exec[check-for-macports]/returns: executed successfully err: //Node[build]/base/osx/Exec[mount-nfs]/returns: change from notrun to 0 failed: /sbin/mount /N && /bin/sleep 10 returned 1 instead of 0 at /etc/puppet/manifests/os/osx.pp:232 notice: //Node[build]/base/osx/Exec[refresh-automount]/returns: executed successfully notice: Finished catalog run in 1.84 seconds I think this is because /N is already mounted.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 23

•

15 years ago

(In reply to comment #22) > Some (all?) of the darwin9 slaves have this sort of log: > notice: Starting catalog run > notice: //Node[build]/base/osx/Exec[check-for-macports]/returns: executed > successfully > err: //Node[build]/base/osx/Exec[mount-nfs]/returns: change from notrun to 0 > failed: /sbin/mount /N && /bin/sleep 10 returned 1 instead of 0 at > /etc/puppet/manifests/os/osx.pp:232 > notice: //Node[build]/base/osx/Exec[refresh-automount]/returns: executed > successfully > notice: Finished catalog run in 1.84 seconds > > I think this is because /N is already mounted. Thanks for catching this. It's not hurting anything, merely annoying, so I'm going to roll it in with the the next larger patch.

Nick Thomas [:nthomas] (UTC+12)

Comment 24

•

15 years ago

I was expecting to see it check some other checks before the catalog finished, and so assumed it was aborting when mount-nfs had a non-zero exit status. Is that not the case ?

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 25

•

15 years ago

(In reply to comment #24) > I was expecting to see it check some other checks before the catalog finished, > and so assumed it was aborting when mount-nfs had a non-zero exit status. Is > that not the case ? It would only abort checks which require[] the mount-nfs one. It would even explicitly list them in the log. AFAICT this isn't interfering with anything.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 26

•

15 years ago

Attached patch convert Linux tarballs to RPMs, a few other minor things — Details — Splinter Review

Summary of this patch: * Convert Linux tarballs to RPMs * Convert separate Mercurial class into RPM in devtools * Separate install_package into install_rpm and install_dmg * Remove fstab entirely from Mac machines * Remove NFS mount from Linux 64-bit machines * Remove unused files/vars The check {} in install_rpm is a bit tricky, let me know if the comment doesn't sufficiently explain it. I think that's it! I did test runs on 64 and 32-bit Linux machines which involved: * Running Puppet, and verifying that no packages were re-installed * Moving /tools away, running Puppet, and diffing (diff results here: http://spreadsheets.google.com/pub?key=tn9krgLEm8JhdFdLp_3LfIQ&output=html)

Attachment #448845 - Flags: review?(catlee)

Attachment #448845 - Flags: review?(bear)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 27

•

15 years ago

Attached patch check-for-rpm.sh — Details — Splinter Review

Here's the check-for-rpm.sh script. I was originally just passing the http:// link to rpm, but I had some issues with the Scratchbox RPM causing a segfault. Note that with the wget, we need at least 1.2GB free on each 32-bit slave to roll this out (which I've already verified we have). It sucks. We might be able to drop this check altogether after this is rolled out everywhere, since we won't have to worry about pre-existing installs anymore.

Attachment #448847 - Flags: review?(catlee)

Attachment #448847 - Flags: review?(bear)

Mike Taylor [:bear]

Updated

•

15 years ago

Attachment #448847 - Flags: review?(bear) → review+

Mike Taylor [:bear]

Comment 28

•

15 years ago

Comment on attachment 448845 [details] [diff] [review] convert Linux tarballs to RPMs, a few other minor things wow, it's a huge patch but it will make package management so much nicer.

Attachment #448845 - Flags: review?(bear) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 29

•

15 years ago

A couple things I forgot to mention: * Rail is converting GCC 4.5 to an RPM in bug 559964 * Two slaves which have been rebuilt with the new manifests (mv-moz2-linux-slave01 and moz2-linux64-slave01) have been cycling in staging without issue.

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

15 years ago

Depends on: 570141

Chris AtLee [:catlee]

Updated

•

15 years ago

Attachment #448845 - Flags: review?(catlee) → review+

Chris AtLee [:catlee]

Comment 30

•

15 years ago

Comment on attachment 448847 [details] [diff] [review] check-for-rpm.sh r+ with the duplicate check for already-installed taken out, since it's handled in the puppet manifests.

Attachment #448847 - Flags: review?(catlee) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 31

•

15 years ago

I landed all of the spec files I could find to the newly created rpm-sources repository. http://hg.mozilla.org/build/rpm-sources

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 32

•

15 years ago

Comment on attachment 448845 [details] [diff] [review] convert Linux tarballs to RPMs, a few other minor things changeset: 174:839dbf91332d

Attachment #448845 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 33

•

15 years ago

Comment on attachment 448847 [details] [diff] [review] check-for-rpm.sh This is in staging+production now, minus the rpm -ql bit.

Attachment #448847 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 34

•

15 years ago

Comment on attachment 448845 [details] [diff] [review] convert Linux tarballs to RPMs, a few other minor things This landing went very well. I manually tested each of: 32-bit centos, 64-bit centos, darwin9 build, darwin10 build -- which all synced up properly on the first and second runs. I've been watching the log on the master and found one small bustage: talos_osx.pp was still using install_package instead of install_dmg. That has been fixed now, and I haven't seen any other issues. All of the slaves which I synced up by hand have run a build successfully.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 35

•

15 years ago

Attached patch mostly finished multi-location support (obsolete) — Details — Splinter Review

This is my first pass on multi-location support for our Puppet infrastructure. I worked through this based on what will need to happen when new slaves come up. For slaves in MPT, nothing changes, because the ref platforms are all hardcoded to talk with the Puppet server there. Non-MPT slaves will connect to the MPT puppet master on first boot and receive an updated configuration file, pointing them to the right master. Next time they sync up (generally, at next boot) they'll connect to their local puppet master and receive whatever updates are available. I've tested this and it works well for all of our build/test slaves, with one caveat: They will not be up-to-date with all of the changes until their second boot. For build machines this isn't so bad, because they head to staging before production. Test machines however, connect directly to the production masters after syncing with Puppet once. There's the possibility of developer-visible bustage when new machines come up because of this. Forcing a reboot after the configuration file is synced up may fix this issue, I haven't tested it yet. I'd love to get any feedback anyone has on this patch before going further with it.

Attachment #456085 - Flags: feedback?(rail)

Attachment #456085 - Flags: feedback?(catlee)

Attachment #456085 - Flags: feedback?(bear)

Rail Aliiev [:rail]

Comment 36

•

15 years ago

Comment on attachment 456085 [details] [diff] [review] mostly finished multi-location support Except of ownership of /home/cltbld/.fonts.conf file (should be cltbld:cltbld) looks good.

Attachment #456085 - Flags: feedback?(rail) → feedback+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 37

•

15 years ago

I chatted with some folks in #puppet and they said that the best way to reload the Puppet config mid-run is rebooting, so I'll be trying that. A side-effect of making that change is the need to ensure that nothing runs before Puppet is done. This is already the case for test machines, 32-bit linux build machines, and possibly 64-bit linux build machines. Shouldn't be a big deal for the remaining platforms.

Mike Taylor [:bear]

Comment 38

•

15 years ago

Comment on attachment 456085 [details] [diff] [review] mostly finished multi-location support sorry for the delayed f+ I had looked this over a while ago but must not have saved it properly

Attachment #456085 - Flags: feedback?(bear) → feedback+

Chris AtLee [:catlee]

Updated

•

15 years ago

Attachment #456085 - Flags: feedback?(catlee)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 39

•

15 years ago

Attached patch multilocation, ready for review — Details — Splinter Review

OK, this is finally ready for final review. Comment #35 still applies, plus the following: - Started syncing all of the puppet runner scripts; add buildbot blocking and error catching to all of them - Move config files to ${local_fileroot} to allow for easier syncing between Puppet masters. - Lots of merges from default, mostly affecting slaves in site files. I've got a second patch to post, with diffs on the configuration files and startup scripts. Tomorrow I'm going to write out my plan for rolling this out, as it's going to be a non-trivial affair.

Attachment #456085 - Attachment is obsolete: true

Attachment #462614 - Flags: review?(rail)

Attachment #462614 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 40

•

15 years ago

Attached patch diff of production vs staging files — Details — Splinter Review

This patch shows all of the startup scripts being updated for better error handling and buildbot blocking. Blocking works differently on different platforms. On CentOS, Puppet runs with --no-daemonize, which blocks any other init scripts from running until it's done. On Fedora, the Puppet running script also takes care of starting Buildbot. On Mac, we use the "WatchPaths" feature of launchd, which launches buildbot when one of the files it is watching is modified. This patch also shows all of the staging configuration files. Before landing I'll be creating/updating the mpt and mv production ones. I'll also need to give the MPT production master copies of the mv and staging configuration files so it can effectively move slaves.

Attachment #462758 - Flags: review?(rail)

Attachment #462758 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 41

•

15 years ago

Comment on attachment 446301 [details] spec file for Python 2.5.1 (32-bit centos) This didn't end up landing as is: http://hg.mozilla.org/build/rpm-sources/file/c8fe426b3bdb/python25/centos5-i686/python25.spec

Attachment #446301 - Attachment is obsolete: true

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 42

•

15 years ago

Comment on attachment 446461 [details] flexible python.spec This one landed

Attachment #446461 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

15 years ago

Depends on: 584409

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 43

•

15 years ago

Attached patch speed up host key accept script — Details — Splinter Review

I found that the host key accept script is really slow. O(number of hosts in site.pp). It's getting to the point where runs start to overlap each other, which just makes things worse. There's no reason for it to be this slow, this patch changes it to loop over unaccepted keys rather than all valid hosts.

Attachment #462808 - Flags: review?(rail)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 44

•

15 years ago

Roll-out plan: * Sync updated scripts to production file store * Create/verify configs files on production, staging, and mv-production * Land puppet manifests patch Once that's done, mv based slaves will start getting switched over to mv-production-puppet as they sync up. There will be burning on 10.5 and 10.6 build machines located in Castro because they sync up every 30 minutes, not just at boot. The next part is kindof tricky. All Linux build machines (even the ones in MPT) will be re-creating their host keys because of the config file change. We need to clear their keys out on mpt-production-puppet, but not until after each one syncs up. It's going to be a bit of a cat and mouse game because of that: Watch the logs for Linux slaves connected, then clear their keys. Mac build machines are note affected by this. After that, I'll be watching for any failures and verifying that all slaves are connected to the correct master and syncing up. Finally, once the castro machines are all done migrating, we need to remove their old hostkeys from the mpt-production-puppet. Assuming positive reviews by the end of the week, I plan to land in a downtime on Monday morning.

Rail Aliiev [:rail]

Updated

•

15 years ago

Attachment #462808 - Flags: review?(rail) → review+

Chris AtLee [:catlee]

Updated

•

15 years ago

Attachment #462614 - Flags: review?(catlee) → review+

Rail Aliiev [:rail]

Comment 45

•

15 years ago

Comment on attachment 462758 [details] [diff] [review] diff of production vs staging files Looks fine.

Attachment #462758 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Updated

•

15 years ago

Attachment #462614 - Flags: review?(rail) → review+

Chris AtLee [:catlee]

Updated

•

15 years ago

Attachment #462758 - Flags: review?(catlee) → review+

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

15 years ago

Blocks: 585605

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 47

•

15 years ago

Comment on attachment 462808 [details] [diff] [review] speed up host key accept script Landed

Attachment #462808 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 48

•

15 years ago

Comment on attachment 462614 [details] [diff] [review] multilocation, ready for review This landed in ac510f20b930 There was two bustage fixes a few additions/removals of slaves from site files as unreviewed follow-ups.

Attachment #462614 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 49

•

15 years ago

Most slaves are up and running successfully after this landing. A few aren't and there are a few more checks to do. That's being tracked in bug 586443. This bug is FIXED, woo!

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nick Thomas [:nthomas] (UTC+12)

Updated

•

14 years ago

Depends on: 593734

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.