Socorro staging (and production) are currently installed manually (except for PHP, which is periodically updated from SVN via cron). We would like the staging install/update process for Socorro staging to be automatic instead. The current process is something like: 1) basic server install + dependencies (IT) 2) configuration of dependencies like Apache, PostgreSQL, HBase (IT, Metrics) 3) svn checkout/update of Socorro services from googlecode SVN (IT) 4) Socorro configs edited manually (IT) 5) start/restart updated services (Socorro scripts, crontab, Apache mod_python/wsgi) For the purposes of this bug, I am only interested in automating steps #3, #4, and #5. I want to focus on automating what we do now, before we try to get fancier. The simplest place to start seems like a shell script for each component, that: 1) changes to the right directory and does an svn checkout (this will do an update if the directory already exists) 2) copies the configs from a known location (internal repo would be a good place) 3) restart required services This could be driven by cron on staging, but I'd prefer it if we could trigger an update from outside staging (specifically I want to do it from the Socorro job on hudson.mozilla.org). For example if the hudson user could be allowed to SSH in to the staging boxes and run a single command, this would be fine. Finally, we should check the following into an internal repository, if it's not already: 1) any hand-modified config files 2) any modified crontab
(In reply to comment #0) > The simplest place to start seems like a shell script for each component, that: > 1) changes to the right directory and does an svn checkout (this will do an > update if the directory already exists) > 2) copies the configs from a known location (internal repo would be a good > place) > 3) restart required services After discussion with jabba, looks like this is just as easy to do directly from puppet (instead of writing install shell scripts for each component). To make both puppet and manual startup/restart of services easier, I am going to create init scripts (filed bug 608024 for that). The only shell command we should have to run is the SVN checkout/update (#1 above, I think we can eliminate the cwd too if we use absolute paths)
(In reply to comment #0) > Finally, we should check the following into an internal repository, if it's not > already: > > 1) any hand-modified config files > 2) any modified crontab It just occurred to me that we're going to want to have wrapper scripts along the lines of bug 608024 for cronjobs too (to do locking and such). In fact I happen to know that this is how it's done in staging and production right now :) We should get those script(s) checked in, and just create a generalized cron wrapper that does locking as well as configuration. This might be a useful place to provide hooks for monitoring. Filed bug 608190 for this.
I presume we'll wait for everything on prod to be squared away before we wipe staging and try this out?
(In reply to comment #3) > I presume we'll wait for everything on prod to be squared away before we wipe > staging and try this out? We talked about this a little in irc earlier - let's practice "from scratch" deployment in the new test cluster you put together. That will help us get ready for the new cluster in PHX, and work out any possible issues with the basic puppet config. Since PHX is now a ways off, we may want to use staging to practice upgrading current production (MPT) instead. I think the most critical things to get into puppet for the current production are: * ganglia install / config * socorro backup / install These seem low-risk/high-value, and will start helping us right away. We're going to have to do these by hand if we don't automate it now, and I think the lack of them is actively biting us. Improvements like the Socorro init script, cron job, config override, and standard install/log directories are bigger changes and will require more testing, so let's plan to roll those out separately (or possibly wait for PHX, depending on timing).
(In reply to comment #4) > Improvements like the Socorro init script, cron job, config override, and > standard install/log directories are bigger changes and will require more > testing, so let's plan to roll those out separately (or possibly wait for PHX, > depending on timing). Per the 2010-Nov-10 Socorro meeting, we're going to get as many improvements into staging and current production as we have time for, and roll them out in their own maintenance window separate from the next Socorro release. We're going to start in on this tomorrow.
Created attachment 493777 [details] [diff] [review] socorro staging install script This script will download the latest successful build from Hudson and install into the staging environment. This could be run by puppet, or directly from cron. I expect we'll want to make adjustments to this when we figure out exactly how production installs are going to work.
(In reply to comment #6) > Created attachment 493777 [details] [diff] [review] > socorro staging install script I've been testing this, three things so far: 1) this script errors out if "socorro.tar.gz" does not exist; probably should not do this, to support new installs 2) mware needs an .htpasswd file (incorrectly referred to as ".htaccess" in the Apache config) 3) webapp-php needs config files copied in after install. For #2/#3, right now these files live in /etc/socoro/mware.htpasswd and /etc/socorro/web/*.php respectively (not checked in anywhere). Jabba, any objections to checking in the above and having puppet manage them? I think the install script should copy them in place, but we should have a puppet rule to enforce this also (for updates and to make sure things don't diverge). I'll do some more testing and submit a new install script later today, feel free to review the current one with those in mind though.
Created attachment 493791 [details] socorro staging install script Addresses issues from comment 7
Comment on attachment 493791 [details] socorro staging install script >if [ -d /etc/socorro/mware.htpasswd ] >then > # FIXME should be named .htpasswd not .htaccess > rsync /etc/socorro/mware.htpasswd /data/socorro/application/.htaccess >fi This should be "-f" not "-d"... Besides this, script tests fine. Old Socorro install is moved out of the way, can't find any problems with new install.
I think the script should restart httpd and all the socorro services after running, so that it can be run from cron. Otherwise looks to be ready for testing. I can then check it into puppet to have it live in /root/bin or something like that and have puppet manage a cron for stage. This script would/should then also be the way that prod pushes are done (during scheduled windows, of course).
(In reply to comment #10) > I think the script should restart httpd and all the socorro services after > running, so that it can be run from cron. Otherwise looks to be ready for > testing. I can then check it into puppet to have it live in /root/bin or > something like that and have puppet manage a cron for stage. This script > would/should then also be the way that prod pushes are done (during scheduled > windows, of course). Good idea.. per IRC I am going to make it restart: 1) httpd (if the init script exists) 2) any init scripts matching "socorro-*" The only thing stopping us from using this for production is that the URL to pull the build from would not be the one hardcoded in this script: https://hudson.mozilla.org/job/socorro/lastSuccessfulBuild/artifact/trunk/socorro.tar.gz Perhaps this should be an argument to the script? On staging, we'll have cron or puppet pass that argument, and on production we'd run it manually with a path to the production version we want installed. Not sure where this should live; could be the "releases" section on google code, could be a specific Hudson build (Hudson supports release automation, tagging, etc. but not sure if we want to use it for this). Laura, what do you think?
Created attachment 493862 [details] socorro staging install script Address comment 11. Auto-service restart, and URL is required argument. I made the script print a couple informational messages, for example a warning if the new socorro.tar.gz matches the checksum of the one in the current directory, and success messages like the following (checksum, backup directory if rollback is necessary, etc). Stopping socorro-crashmover: [ OK ] Starting socorro-crashmover: [ OK ] Stopping httpd: [ OK ] Starting httpd: [ OK ] Socorro build installed successfully! Downloaded from https://hudson.mozilla.org/job/socorro/lastSuccessfulBuild/artifact/trunk/socorro.tar.gz Checksum: dde0f1a28bc26d3e4752266bb51074b8 socorro.tar.gz Backed up original to /data/socorro.29-11-2010_17_35_29 These can be safely /dev/null'd on staging, the script will exit non-0 if something goes wrong.
I feel like this is a huge accomplishment. Am I wrong?
Comment on attachment 493862 [details] socorro staging install script Laura, should we just check this into socorro/scripts instead of keeping it in private/puppet SVN? This can be called from cron on staging, and called directly for production builds.
I'm going to have puppet just put this script in /root/bin/
This is deployed and working! Thanks Rob!