Closed
Bug 823186
Opened 12 years ago
Closed 11 years ago
puppetize pg_dump backups
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: scabral, Assigned: scabral)
References
Details
(Whiteboard: [2013q1])
Attachments
(2 files, 9 obsolete files)
Earlier today, Selena and I were trying to find the stage backups for socorro. I checked puppet, and I couldn't find a pg_dump file in puppet. localhost:trunk scabral$ find . -type f -exec grep -l pg_dump {} \; | grep -v \.svn ./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/jira_backup.sh ./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/mana_backup.sh localhost:trunk scabral$ (and just for verification: localhost:trunk scabral$ cat files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/*backup.sh #!/bin/bash /usr/bin/pg_dump projects_mozilla_org > /backup/jira-`date +%y%m%d_%H%M%S`.dump #!/bin/bash /usr/bin/pg_dump mana_mozilla_org > /backup/mana-`date +%y%m%d_%H%M%S`.dump ) So we should find the backup script for stage, clean it up so it meets standards, and put it in puppet including the cron file. I can help with this, I put the backup scripts in place for MySQL.
Assignee | ||
Comment 2•12 years ago
|
||
As per IRC, Matt's OK with Selena working on this.
Comment 3•12 years ago
|
||
From a manual run of a new script we're testing: Started around 12:30pm PT. Trying to backup breakpad for tp-socorro01-master02.phx1.mozilla.com real 211m47.838s user 206m33.543s sys 2m53.581s Backup for database tp-socorro01-master02.phx1.mozilla.com/breakpad on 20121219 succeeded Trying to backup globals for tp-socorro01-master02.phx1.mozilla.com real 0m0.228s user 0m0.003s sys 0m0.007s Backup for database globals on tp-socorro01-master02.phx1.mozilla.com on 20121219 succeeded
Comment 4•12 years ago
|
||
At least part of this backup is good. Would like to verify, but don't currently have enough disk space anywhere :/ [postgres@socorro1 backups]$ time pg_restore -Fc -a -d breakpad -t reports_20121210 -v --disable-triggers -v -v -v breakpad-db-20121219.dump pg_restore: connecting to database for restore pg_restore: disabling triggers for reports_20121210 pg_restore: restoring data for table "reports_20121210" pg_restore: enabling triggers for reports_20121210 pg_restore: setting owner and privileges for TABLE DATA reports_20121210 real 28m6.136s user 0m17.220s sys 0m1.528s
Comment 5•12 years ago
|
||
Comment 6•12 years ago
|
||
Added find to delete old files and timestamps on start/finish
Attachment #694365 -
Attachment is obsolete: true
Assignee | ||
Comment 7•12 years ago
|
||
Can you verify by using the backup to refresh stage?
Comment 8•12 years ago
|
||
It's possible, with the following caveat: I'm not yet sure how long it takes to restore a pg_dump backup - it could be up to three days. :/ It would be better to restore this somewhere else first to understand how long it takes, so that we don't disrupt the staging environment for an uncertain amount of time.
Updated•12 years ago
|
Summary: puppetize stage backups for socorro → puppetize pg_dump backups for socorro that are written to stage
Assignee | ||
Comment 9•11 years ago
|
||
How big is the backup? I can give you 2 machines that have 246G each...
Comment 10•11 years ago
|
||
Can we retry using the pgslave instance? The last time this was attempted the backup taken was bad, the last few, just by size are more in line with what a good backup should be.
Comment 11•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #9) > How big is the backup? I can give you 2 machines that have 246G each... The restore requires about 750 GB. (In reply to Matt Pressman [:mpressman] from comment #10) > Can we retry using the pgslave instance? The last time this was attempted > the backup taken was bad, the last few, just by size are more in line with > what a good backup should be. Retry the restore or the backup itself?
Comment 12•11 years ago
|
||
retry the restore
Comment 13•11 years ago
|
||
Scheduled a meeting today at 1:30 to review mpressman's backup testing script.
Comment 14•11 years ago
|
||
Script for restores tested on socorro1.dev with a 'test' database. Here's the TODOs left: # TODO see if we could do this without the awk statement #TODO if we don't have REMOTEDUMPFILE or REMOVEDUMPGLOBALS set, fail # TODO figure out how to do this with the postgres user # TODO Could we have puppet install the configs? Plan: * Monday: Blow about pgslave DB and do a full test restore into a new directory called "testrestore" * Monitor progress - expect this to take between 24-72 hours to complete
Comment 15•11 years ago
|
||
Comment 16•11 years ago
|
||
Next steps for these scripts: * Get backup_postgres.sh included in Puppet config for stage, located in /root/bin * Get restore_postgres.sh included in Pupppet config for dev, located in /var/lib/postgres/bin :mpressman to take care of this config
Updated•11 years ago
|
Assignee: sdeckelmann → mpressman
Comment 17•11 years ago
|
||
The awk statement is gone and a check to fail if the variables REMOTEDUMPFILE and REMOTEDUMPGLOBALS has been put in.
Updated•11 years ago
|
Attachment #695855 -
Attachment mime type: application/x-shellscript → text/plain
Comment 18•11 years ago
|
||
Comment 19•11 years ago
|
||
Comment 20•11 years ago
|
||
We kicked off a restore of a pg_dump backup. [postgres@socorro1 ~]$ scripts/restore_postgres.sh Mon Jan 7 15:17:54 PST 2013
Updated•11 years ago
|
Attachment #698096 -
Attachment is obsolete: true
Updated•11 years ago
|
Attachment #698859 -
Attachment is obsolete: true
Updated•11 years ago
|
Attachment #698895 -
Attachment description: Working Version → pg_dump Restore script
Attachment #698895 -
Attachment mime type: application/x-shellscript → text/plain
Comment 21•11 years ago
|
||
## Restore of breakpad dump from /pgdata/tmp/breakpad-db-20130106.dump succeeded # Done! Tue Jan 8 03:01:28 PST 2013 [postgres@socorro1 pgslave]$ du -sh . 444G . Still need to do some verification.
Comment 22•11 years ago
|
||
Adding puppetizing the pg_dump backups for mana to this ticket
Summary: puppetize pg_dump backups for socorro that are written to stage → puppetize pg_dump backups
Comment 23•11 years ago
|
||
Attachment #695855 -
Attachment is obsolete: true
Assignee | ||
Comment 24•11 years ago
|
||
So we've made the postgres2::server::backups::pgdump class at /modules/postgres2/manifests/backups/pgdump.pp, see the file in svn, and the socorro1.stage.db.phx1.mozilla.com entry in manifests/nodes/socorro.pp references it with: postgres2::server::backups::pgdump { "socorro-stage": database => 'breakpad', globals => true, hostname => 'tp-socorro01-master02', cluster => 'main', postgres_version => '9.2', pg_backup_prefix => '/pgdata/backups', frequency => "weekly"; } But we can't figure out why it's not working (specifically, why /etc/cron.d doesn't have backup_postgres_daily and backup_postgres_weekly.....halp?
Comment 25•11 years ago
|
||
Ok -- I made a few mistakes :) the classname should have been: postgres2::backups::pgdump and then I failed to define a variable needed by the crontab .erbs. Those errors are fixed in a diff against version 55716.
Updated•11 years ago
|
Attachment #699387 -
Attachment is obsolete: true
Comment 26•11 years ago
|
||
Attachment #699437 -
Attachment is obsolete: true
Comment 27•11 years ago
|
||
Still busted. Waiting for someone better with puppet to help out.
Comment 28•11 years ago
|
||
I put this in socorroadm instead of socorro1.db - moved to the correct node, and fixed an error in a template that crept in.
Attachment #699439 -
Attachment is obsolete: true
Updated•11 years ago
|
Assignee: mpressman → sdeckelmann
Comment 29•11 years ago
|
||
bburton@voltaire [04:32:41] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/templates/default] -> % svn ci -m "updates to template from selenamarie, bug 823186" Sending default/backup_postgres_daily.erb Transmitting file data . Committed revision 55779. bburton@voltaire [04:34:23] [~/code/mozilla/sysadmins/puppet/trunk/manifests/nodes] -> % svn ci -m "changes to stage postgres2 class paramaeters, from selenamarie, bug 823186" Sending nodes/socorro.pp Transmitting file data . Committed revision 55781.
Comment 30•11 years ago
|
||
File pushed info: FileBucket adding {md5}3f41c8d2d455310f833f4192a54f1ac8 info: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]: Filebucketed /root/bin/backup_postgres.sh to main with sum 3f41c8d2d455310f833f4192a54f1ac8 notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/content: content changed '{md5}3f41c8d2d455310f833f4192a54f1ac8' to '{md5}40b71cc4c90b2fc10f25ab80e106c4e0' notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/mode: mode changed '0700' to '0755' notice: Finished catalog run in 42.58 seconds
Comment 31•11 years ago
|
||
bburton@voltaire [05:22:19] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/manifests] -> % svn ci -m "fixes for psql paths, from selenamarie, bug 823186" Sending manifests/create/database.pp Sending manifests/create/role.pp Sending manifests/server.pp Transmitting file data ... Committed revision 55791.
Comment 32•11 years ago
|
||
Attachment #699494 -
Attachment is obsolete: true
Comment 33•11 years ago
|
||
Attachment #700544 -
Attachment is obsolete: true
Comment 34•11 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #33) > Created attachment 700546 [details] [diff] [review] > Fix three problems :) - cronjob duplicate removal, script path and > filenaming problem in backup_postgres.sh Alright - could someone apply the above diff? After that, this ticket can be closed. :)
Assignee | ||
Comment 35•11 years ago
|
||
localhost:trunk scabral$ svn commit -m "updating as per bug https://bugzilla.mozilla.org/show_bug.cgi?id=823186 comment 33-34" Sending modules/postgres2/files/scripts/backup_postgres.sh Sending modules/postgres2/templates/default/backup_postgres_weekly.erb Transmitting file data .. Committed revision 56012. Can you verify the file is what you want it to be?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 36•11 years ago
|
||
Jan 8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly) Jan 8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly) The cron job didn't have a username specified. Please add 'root' as the username in /etc/cron.d/backup_postgres_weekly and then the cron will run. Probably has to wait until tomorrow because of svn outage.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Assignee: sdeckelmann → scabral
Assignee | ||
Comment 37•11 years ago
|
||
Updated the template to have root: +++ modules/postgres2/files/scripts/backup_postgres.sh (working copy) # Backup postgres database on a weekly basis as root user <% if globals and nodb -%> 0 2 * * 6 root /root/bin/backup_postgres.sh -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g -n <% elsif globals -%> 0 2 * * 6 root /root/bin/backup_postgres.sh -d <%= database %> -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g <% end -%> Committed revision 56536.
Status: REOPENED → NEW
Comment 39•11 years ago
|
||
This is done.
Status: NEW → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•