Closed Bug 823186 Opened 12 years ago Closed 11 years ago

puppetize pg_dump backups

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: scabral, Assigned: scabral)

References

Details

(Whiteboard: [2013q1])

Attachments

(2 files, 9 obsolete files)

7.27 KB, text/plain
Details
2.27 KB, patch
Details | Diff | Splinter Review
Earlier today, Selena and I were trying to find the stage backups for socorro. I checked puppet, and I couldn't find a pg_dump file in puppet.

localhost:trunk scabral$ find . -type f -exec grep -l pg_dump {} \; | grep -v \.svn
./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/jira_backup.sh
./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/mana_backup.sh
localhost:trunk scabral$

(and just for verification:
localhost:trunk scabral$ cat files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/*backup.sh
#!/bin/bash
/usr/bin/pg_dump projects_mozilla_org > /backup/jira-`date +%y%m%d_%H%M%S`.dump
#!/bin/bash
/usr/bin/pg_dump mana_mozilla_org > /backup/mana-`date +%y%m%d_%H%M%S`.dump
)

So we should find the backup script for stage, clean it up so it meets standards, and put it in puppet including the cron file. I can help with this, I put the backup scripts in place for MySQL.
Dumitru and I are working on this!
Assignee: mpressman → sdeckelmann
As per IRC, Matt's OK with Selena working on this.
Blocks: 823194
From a manual run of a new script we're testing: 

Started around 12:30pm PT.

Trying to backup breakpad for tp-socorro01-master02.phx1.mozilla.com

real    211m47.838s
user    206m33.543s
sys     2m53.581s
Backup for database tp-socorro01-master02.phx1.mozilla.com/breakpad on 20121219 succeeded
Trying to backup globals for tp-socorro01-master02.phx1.mozilla.com

real    0m0.228s
user    0m0.003s
sys     0m0.007s
Backup for database globals on tp-socorro01-master02.phx1.mozilla.com on 20121219 succeeded
At least part of this backup is good.

Would like to verify, but don't currently have enough disk space anywhere :/

[postgres@socorro1 backups]$ time pg_restore  -Fc -a -d breakpad -t reports_20121210 -v --disable-triggers -v -v -v breakpad-db-20121219.dump
pg_restore: connecting to database for restore
pg_restore: disabling triggers for reports_20121210
pg_restore: restoring data for table "reports_20121210"
pg_restore: enabling triggers for reports_20121210
pg_restore: setting owner and privileges for TABLE DATA reports_20121210

real    28m6.136s
user    0m17.220s
sys     0m1.528s
Added find to delete old files and timestamps on start/finish
Attachment #694365 - Attachment is obsolete: true
Can you verify by using the backup to refresh stage?
It's possible, with the following caveat: 

I'm not yet sure how long it takes to restore a pg_dump backup - it could be up to three days. :/ It would be better to restore this somewhere else first to understand how long it takes, so that we don't disrupt the staging environment for an uncertain amount of time.
Summary: puppetize stage backups for socorro → puppetize pg_dump backups for socorro that are written to stage
How big is the backup? I can give you 2 machines that have 246G each...
Can we retry using the pgslave instance? The last time this was attempted the backup taken was bad, the last few, just by size are more in line with what a good backup should be.
(In reply to Sheeri Cabral [:sheeri] from comment #9)
> How big is the backup? I can give you 2 machines that have 246G each...

The restore requires about 750 GB.

(In reply to Matt Pressman [:mpressman] from comment #10)
> Can we retry using the pgslave instance? The last time this was attempted
> the backup taken was bad, the last few, just by size are more in line with
> what a good backup should be.

Retry the restore or the backup itself?
retry the restore
Scheduled a meeting today at 1:30 to review mpressman's backup testing script.
Script for restores tested on socorro1.dev with a 'test' database.

Here's the TODOs left: 
# TODO see if we could do this without the awk statement
#TODO if we don't have REMOTEDUMPFILE or REMOVEDUMPGLOBALS set, fail
# TODO figure out how to do this with the postgres user
# TODO Could we have puppet install the configs?

Plan: 

* Monday: Blow about pgslave DB and do a full test restore into a new directory called "testrestore"
* Monitor progress - expect this to take between 24-72 hours to complete
Next steps for these scripts: 

* Get backup_postgres.sh included in Puppet config for stage, located in /root/bin
* Get restore_postgres.sh included in Pupppet config for dev, located in /var/lib/postgres/bin

:mpressman to take care of this config
Assignee: sdeckelmann → mpressman
The awk statement is gone and a check to fail if the variables REMOTEDUMPFILE and REMOTEDUMPGLOBALS has been put in.
Attachment #695855 - Attachment mime type: application/x-shellscript → text/plain
Attached file Restore script (obsolete) —
We kicked off a restore of a pg_dump backup.

[postgres@socorro1 ~]$ scripts/restore_postgres.sh
Mon Jan  7 15:17:54 PST 2013
Attachment #698096 - Attachment is obsolete: true
Attachment #698859 - Attachment is obsolete: true
Attachment #698895 - Attachment description: Working Version → pg_dump Restore script
Attachment #698895 - Attachment mime type: application/x-shellscript → text/plain
## Restore of breakpad dump from /pgdata/tmp/breakpad-db-20130106.dump succeeded
# Done!
Tue Jan  8 03:01:28 PST 2013

[postgres@socorro1 pgslave]$ du -sh .
444G    .

Still need to do some verification.
Adding puppetizing the pg_dump backups for mana to this ticket
Summary: puppetize pg_dump backups for socorro that are written to stage → puppetize pg_dump backups
So we've made the postgres2::server::backups::pgdump class at /modules/postgres2/manifests/backups/pgdump.pp, see the file in svn, and the 

socorro1.stage.db.phx1.mozilla.com

entry in manifests/nodes/socorro.pp references it with:


    postgres2::server::backups::pgdump {
        "socorro-stage":
            database        => 'breakpad',
            globals         => true,
            hostname        => 'tp-socorro01-master02',
            cluster         => 'main',
            postgres_version => '9.2',
            pg_backup_prefix  => '/pgdata/backups',
            frequency       => "weekly";
    }


But we can't figure out why it's not working (specifically, why /etc/cron.d doesn't have backup_postgres_daily and backup_postgres_weekly.....halp?
Attached file Diff to add backups to puppet (obsolete) —
Ok -- I made a few mistakes :)

the classname should have been: postgres2::backups::pgdump and then I failed to define a variable needed by the crontab .erbs. 

Those errors are fixed in a diff against version 55716.
Attachment #699387 - Attachment is obsolete: true
Attachment #699437 - Attachment is obsolete: true
Still busted.  Waiting for someone better with puppet to help out.
Attached file This should fix it all. (obsolete) —
I put this in socorroadm instead of socorro1.db - moved to the correct node, and fixed an error in a template that crept in.
Attachment #699439 - Attachment is obsolete: true
Assignee: mpressman → sdeckelmann
bburton@voltaire [04:32:41] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/templates/default] 
-> % svn ci -m "updates to template from selenamarie, bug 823186"
Sending        default/backup_postgres_daily.erb
Transmitting file data .
Committed revision 55779.

bburton@voltaire [04:34:23] [~/code/mozilla/sysadmins/puppet/trunk/manifests/nodes] 
-> % svn ci -m "changes to stage postgres2 class paramaeters, from selenamarie, bug 823186"
Sending        nodes/socorro.pp
Transmitting file data .
Committed revision 55781.
File pushed

info: FileBucket adding {md5}3f41c8d2d455310f833f4192a54f1ac8
info: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]: Filebucketed /root/bin/backup_postgres.sh to main with sum 3f41c8d2d455310f833f4192a54f1ac8
notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/content: content changed '{md5}3f41c8d2d455310f833f4192a54f1ac8' to '{md5}40b71cc4c90b2fc10f25ab80e106c4e0'
notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/mode: mode changed '0700' to '0755'
notice: Finished catalog run in 42.58 seconds
bburton@voltaire [05:22:19] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/manifests] 
-> % svn ci -m "fixes for psql paths, from selenamarie, bug 823186"
Sending        manifests/create/database.pp
Sending        manifests/create/role.pp
Sending        manifests/server.pp
Transmitting file data ...
Committed revision 55791.
(In reply to Selena Deckelmann :selenamarie :selena from comment #33)
> Created attachment 700546 [details] [diff] [review]
> Fix three problems :) - cronjob duplicate removal, script path and
> filenaming problem in backup_postgres.sh

Alright - could someone apply the above diff? After that, this ticket can be closed. :)
localhost:trunk scabral$ svn commit -m "updating as per bug https://bugzilla.mozilla.org/show_bug.cgi?id=823186 comment 33-34"
Sending        modules/postgres2/files/scripts/backup_postgres.sh
Sending        modules/postgres2/templates/default/backup_postgres_weekly.erb
Transmitting file data ..
Committed revision 56012.

Can you verify the file is what you want it to be?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Jan  8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly)
Jan  8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly)

The cron job didn't have a username specified. 

Please add 'root' as the username in /etc/cron.d/backup_postgres_weekly and then the cron will run.

Probably has to wait until tomorrow because of svn outage.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: sdeckelmann → scabral
Updated the template to have root:

+++ modules/postgres2/files/scripts/backup_postgres.sh  (working copy)
# Backup postgres database on a weekly basis as root user
<% if globals and nodb -%>
0 2 * * 6 root /root/bin/backup_postgres.sh -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g -n
<% elsif globals -%>
0 2 * * 6 root /root/bin/backup_postgres.sh -d <%= database %> -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g
<% end -%>

Committed revision 56536.
Status: REOPENED → NEW
Is there any more work on this?
Whiteboard: [2013q1]
This is done.
Status: NEW → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: