Closed Bug 833868 Opened 11 years ago Closed 11 years ago

Add cron on Socorro Stage to restart services after a database rebuild

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P3)

x86_64
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: selenamarie, Assigned: bburton)

Details

This cron runs on socorro1.stage.db: 

0 2 * * 6 postgres /var/lib/pgsql/bin/create_replica_stage.sh /var/lib/pgsql/bin/replica_variables.sh

This should be completed by 5am PT on Saturday. 

At that point, we need the following restarted: 

on socorro1.stage.db:
pgbouncer-web
pgbouncer-processor

rest:
stage processors restarted
stage monitor restarted
stage middleware restarted
Assignee: server-ops-webops → bburton
Priority: -- → P3
Restart script: /root/bin/stage-db-refresh.sh

#!/bin/sh

# restarting db stuff
echo "/sbin/service pgbouncer-web restart" | /usr/bin/issue-multi-command -i /root/.ssh/id_rsa db
echo "/sbin/service pgbouncer-processor restart" | /usr/bin/issue-multi-command -i /root/.ssh/id_rsa db

# restarting processors
echo "/sbin/service socorro-processor restart" | /usr/bin/issue-multi-command -i /root/.ssh/id_rsa processors

# restarting monitor
/sbin/service socorro-monitor restart

# restart middleware
echo "/sbin/service httpd restart" | /usr/bin/issue-multi-command -i /root/.ssh/id_rsa mware
Proposed cron job

bburton@voltaire [09:22:40] [~/code/mozilla/sysadmins/puppet/trunk/modules/socorro/files/prod/etc-crond] 
-> % svn diff
Index: socorro
===================================================================
--- socorro	(revision 56920)
+++ socorro	(working copy)
@@ -27,3 +27,6 @@
 
 #fixes minidump stuff for b2g
 0 * * * * socorro /data/socorro/application/scripts/crons/cron_fixbrokendumps.sh
+
+# stage restarts after db replica rebuild, https://bugzilla.mozilla.org/show_bug.cgi?id=833868
+00 05 * * 6 root /root/bin/stage-db-refresh.sh
Let me know if these look good and let's coordinate a test run of the cron job on stage tomorrow to confirm it does what we expect
Status: NEW → ASSIGNED
(In reply to Brandon Burton [:solarce] from comment #3)
> Let me know if these look good and let's coordinate a test run of the cron
> job on stage tomorrow to confirm it does what we expect

Looks great.  Thanks!
Script added to puppet and deployed after much testing

notice: /File[/root/bin/stage-db-refresh.sh]/ensure: current_value absent, should be file (noop)
#!/bin/sh
NOW=`date +%Y-%m-%d_%H-%M-%S`
LOGFILE="/var/log/socorro/stage-refresh-$NOW.log"
SSH_ID_FILE="/root/.ssh/socorro-stage-updater"

if [ ! -f $SSH_ID_FILE ];.
then
    echo -e "ERROR: could not find $SSH_ID_FILE"
    exit 1
fi

# restarting db stuff
echo -e "STATUS: beginning stage restart at $NOW" > $LOGFILE
echo -e "STATUS: restarting db stuff\n" >> $LOGFILE

echo -e "STATUS: ..restarting pgbouncer-web\n" >> $LOGFILE
echo "/sbin/service pgbouncer-web restart" | /usr/bin/issue-multi-command -i $SSH_ID_FILE db | tee -a $LOGFILE

echo -e "STATUS: ..restarting pgbouncer-processor\n" >> $LOGILE.
echo "/sbin/service pgbouncer-processor restart" | /usr/bin/issue-multi-command -i $SSH_ID_FILE db | tee -a $LOGFILE

# restarting processors
echo -e "STATUS: restarting processors\n" >> $LOGFILE
echo "/sbin/service socorro-processor restart" | /usr/bin/issue-multi-command -i $SSH_ID_FILE processors | tee -a $LOGFILE

# restarting monitor
echo -e "STATUS: restarting monitor\n" >> $LOGFILE
/sbin/service socorro-monitor restart >> $LOGFILE

# restart middleware
echo -e "STATUS: restarting middleware\n" >> $LOGFILE
echo "/sbin/service httpd restart" | /usr/bin/issue-multi-command -i $SSH_ID_FILE mware | tee -a $LOGFILE

echo -e "STATUS: fixing perms on $LOGFILE" >> $LOGFILE
chown socorro:socorro $LOGFILE
echo -e "STATUS: finished, exiting\n" >> $LOGFILE
Cron confirmed good and deploy

notice: /File[/etc/cron.d/socorro]/content: 
--- /etc/cron.d/socorro	2013-01-25 14:37:25.000000000 -0800
+++ /tmp/puppet-file20130125-1912-i0p81w-0	2013-01-25 15:10:03.000000000 -0800
@@ -21,4 +21,4 @@
 0 * * * * socorro /data/socorro/application/scripts/crons/cron_fixbrokendumps.sh
 
 # stage restarts after db replica rebuild, https://bugzilla.mozilla.org/show_bug.cgi?id=833868
-15 13 * * 5 root /root/bin/stage-db-refresh.sh
+01 05 * * 6 root /root/bin/stage-db-refresh.sh

info: FileBucket adding {md5}b3b32ae3648c74e8b6198ed0ec8d7222
info: /File[/etc/cron.d/socorro]: Filebucketed /etc/cron.d/socorro to main with sum b3b32ae3648c74e8b6198ed0ec8d7222
notice: /File[/etc/cron.d/socorro]/content: content changed '{md5}b3b32ae3648c74e8b6198ed0ec8d7222' to '{md5}0c1d93e242355449bf624677e1584cb7'
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Per irc, bumping to sunday

bburton@voltaire [03:29:10] [~/code/mozilla/sysadmins/puppet/trunk] 
-> % svn diff
Index: modules/webapp/files/socorro-stage/admin/etc-cron.d/socorro
===================================================================
--- modules/webapp/files/socorro-stage/admin/etc-cron.d/socorro	(revision 57162)
+++ modules/webapp/files/socorro-stage/admin/etc-cron.d/socorro	(working copy)
@@ -21,4 +21,4 @@
 0 * * * * socorro /data/socorro/application/scripts/crons/cron_fixbrokendumps.sh
 
 # stage restarts after db replica rebuild, https://bugzilla.mozilla.org/show_bug.cgi?id=833868
-01 05 * * 6 root /root/bin/stage-db-refresh.sh
+01 05 * * 7 root /root/bin/stage-db-refresh.sh

bburton@voltaire [03:29:12] [~/code/mozilla/sysadmins/puppet/trunk] 
-> % svn ci -m "adding stage restart cron, bug 833868"
Sending        trunk/modules/webapp/files/socorro-stage/admin/etc-cron.d/socorro
Transmitting file data .
Committed revision 57163.
Confirmed deployed

[bburton@socorroadm.stage.private.phx1 ~]$ sudo cat /etc/cron.d/socorro
MAILTO="cron-socorro@mozilla.com"

# Socorro cron job manager
*/5 * * * * socorro /data/socorro/application/scripts/crons/crontabber.sh

# Generate data for /status page
*/5 * * * * socorro /data/socorro/application/scripts/crons/cron_status.sh

# Modulelist report for crash-analysis
00 17 * * * socorro export JAVA_HOME=/usr/lib/jvm/java-1.6.0;/data/socorro/application/scripts/crons/cron_modulelist.sh

# Correlation report for crash-analysis
05 00 * * * socorro /data/socorro/application/scripts/crons/cron_libraries.sh > /var/log/socorro/cron_libraries.log 2>&1

# Daily CSV report for crash-analysis
55 03 * * * socorro /data/bin/cron_daily_reports.sh

# STAGE ONLY - submit crashes to dev
*/3 * * * * socorro /data/socorro/application/scripts/crons/cron_submitter.sh crash-reports-dev.allizom.org 1000 > /dev/null 2>&1

0 * * * * socorro /data/socorro/application/scripts/crons/cron_fixbrokendumps.sh

# stage restarts after db replica rebuild, https://bugzilla.mozilla.org/show_bug.cgi?id=833868
01 05 * * 7 root /root/bin/stage-db-refresh.sh
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.