puppetize pg_dump backups

RESOLVED FIXED

Status

Data & BI Services Team
DB: MySQL
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: sheeri, Assigned: sheeri)

Tracking

(Blocks: 1 bug)

Details

(Whiteboard: [2013q1])

Attachments

(2 attachments, 9 obsolete attachments)

7.27 KB, text/plain
Details
2.27 KB, patch
Details | Diff | Splinter Review
(Assignee)

Description

5 years ago
Earlier today, Selena and I were trying to find the stage backups for socorro. I checked puppet, and I couldn't find a pg_dump file in puppet.

localhost:trunk scabral$ find . -type f -exec grep -l pg_dump {} \; | grep -v \.svn
./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/jira_backup.sh
./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/mana_backup.sh
localhost:trunk scabral$

(and just for verification:
localhost:trunk scabral$ cat files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/*backup.sh
#!/bin/bash
/usr/bin/pg_dump projects_mozilla_org > /backup/jira-`date +%y%m%d_%H%M%S`.dump
#!/bin/bash
/usr/bin/pg_dump mana_mozilla_org > /backup/mana-`date +%y%m%d_%H%M%S`.dump
)

So we should find the backup script for stage, clean it up so it meets standards, and put it in puppet including the cron file. I can help with this, I put the backup scripts in place for MySQL.
Dumitru and I are working on this!
Assignee: mpressman → sdeckelmann
(Assignee)

Comment 2

5 years ago
As per IRC, Matt's OK with Selena working on this.
(Assignee)

Updated

5 years ago
Blocks: 823194
From a manual run of a new script we're testing: 

Started around 12:30pm PT.

Trying to backup breakpad for tp-socorro01-master02.phx1.mozilla.com

real    211m47.838s
user    206m33.543s
sys     2m53.581s
Backup for database tp-socorro01-master02.phx1.mozilla.com/breakpad on 20121219 succeeded
Trying to backup globals for tp-socorro01-master02.phx1.mozilla.com

real    0m0.228s
user    0m0.003s
sys     0m0.007s
Backup for database globals on tp-socorro01-master02.phx1.mozilla.com on 20121219 succeeded
At least part of this backup is good.

Would like to verify, but don't currently have enough disk space anywhere :/

[postgres@socorro1 backups]$ time pg_restore  -Fc -a -d breakpad -t reports_20121210 -v --disable-triggers -v -v -v breakpad-db-20121219.dump
pg_restore: connecting to database for restore
pg_restore: disabling triggers for reports_20121210
pg_restore: restoring data for table "reports_20121210"
pg_restore: enabling triggers for reports_20121210
pg_restore: setting owner and privileges for TABLE DATA reports_20121210

real    28m6.136s
user    0m17.220s
sys     0m1.528s
Blocks: 823507
Created attachment 694365 [details]
First rev of a script to run backups from prod to stage for Socorro
Created attachment 695855 [details]
Second rev of backup script, now deletes backups older than 41 days

Added find to delete old files and timestamps on start/finish
Attachment #694365 - Attachment is obsolete: true
(Assignee)

Comment 7

5 years ago
Can you verify by using the backup to refresh stage?
It's possible, with the following caveat: 

I'm not yet sure how long it takes to restore a pg_dump backup - it could be up to three days. :/ It would be better to restore this somewhere else first to understand how long it takes, so that we don't disrupt the staging environment for an uncertain amount of time.
Summary: puppetize stage backups for socorro → puppetize pg_dump backups for socorro that are written to stage
(Assignee)

Comment 9

5 years ago
How big is the backup? I can give you 2 machines that have 246G each...
Can we retry using the pgslave instance? The last time this was attempted the backup taken was bad, the last few, just by size are more in line with what a good backup should be.
(In reply to Sheeri Cabral [:sheeri] from comment #9)
> How big is the backup? I can give you 2 machines that have 246G each...

The restore requires about 750 GB.

(In reply to Matt Pressman [:mpressman] from comment #10)
> Can we retry using the pgslave instance? The last time this was attempted
> the backup taken was bad, the last few, just by size are more in line with
> what a good backup should be.

Retry the restore or the backup itself?
retry the restore
Blocks: 821856
Scheduled a meeting today at 1:30 to review mpressman's backup testing script.
Script for restores tested on socorro1.dev with a 'test' database.

Here's the TODOs left: 
# TODO see if we could do this without the awk statement
#TODO if we don't have REMOTEDUMPFILE or REMOVEDUMPGLOBALS set, fail
# TODO figure out how to do this with the postgres user
# TODO Could we have puppet install the configs?

Plan: 

* Monday: Blow about pgslave DB and do a full test restore into a new directory called "testrestore"
* Monitor progress - expect this to take between 24-72 hours to complete
Created attachment 698096 [details]
Restore script for pg_dump backups
Next steps for these scripts: 

* Get backup_postgres.sh included in Puppet config for stage, located in /root/bin
* Get restore_postgres.sh included in Pupppet config for dev, located in /var/lib/postgres/bin

:mpressman to take care of this config
Assignee: sdeckelmann → mpressman
The awk statement is gone and a check to fail if the variables REMOTEDUMPFILE and REMOTEDUMPGLOBALS has been put in.
Attachment #695855 - Attachment mime type: application/x-shellscript → text/plain
Created attachment 698859 [details]
Restore script
Created attachment 698895 [details]
pg_dump Restore script
We kicked off a restore of a pg_dump backup.

[postgres@socorro1 ~]$ scripts/restore_postgres.sh
Mon Jan  7 15:17:54 PST 2013
Attachment #698096 - Attachment is obsolete: true
Attachment #698859 - Attachment is obsolete: true
Attachment #698895 - Attachment description: Working Version → pg_dump Restore script
Attachment #698895 - Attachment mime type: application/x-shellscript → text/plain
## Restore of breakpad dump from /pgdata/tmp/breakpad-db-20130106.dump succeeded
# Done!
Tue Jan  8 03:01:28 PST 2013

[postgres@socorro1 pgslave]$ du -sh .
444G    .

Still need to do some verification.
Adding puppetizing the pg_dump backups for mana to this ticket
Summary: puppetize pg_dump backups for socorro that are written to stage → puppetize pg_dump backups
Created attachment 699387 [details]
Diff that was applied - something isn't quite right with the import for the pgdump.pp though :/
Attachment #695855 - Attachment is obsolete: true
(Assignee)

Comment 24

5 years ago
So we've made the postgres2::server::backups::pgdump class at /modules/postgres2/manifests/backups/pgdump.pp, see the file in svn, and the 

socorro1.stage.db.phx1.mozilla.com

entry in manifests/nodes/socorro.pp references it with:


    postgres2::server::backups::pgdump {
        "socorro-stage":
            database        => 'breakpad',
            globals         => true,
            hostname        => 'tp-socorro01-master02',
            cluster         => 'main',
            postgres_version => '9.2',
            pg_backup_prefix  => '/pgdata/backups',
            frequency       => "weekly";
    }


But we can't figure out why it's not working (specifically, why /etc/cron.d doesn't have backup_postgres_daily and backup_postgres_weekly.....halp?
Created attachment 699437 [details]
Diff to add backups to puppet

Ok -- I made a few mistakes :)

the classname should have been: postgres2::backups::pgdump and then I failed to define a variable needed by the crontab .erbs. 

Those errors are fixed in a diff against version 55716.
Attachment #699387 - Attachment is obsolete: true
Created attachment 699439 [details]
Addresses commits by sheeri in version 55730
Attachment #699437 - Attachment is obsolete: true
Still busted.  Waiting for someone better with puppet to help out.
Created attachment 699494 [details]
This should fix it all.

I put this in socorroadm instead of socorro1.db - moved to the correct node, and fixed an error in a template that crept in.
Attachment #699439 - Attachment is obsolete: true
Assignee: mpressman → sdeckelmann
bburton@voltaire [04:32:41] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/templates/default] 
-> % svn ci -m "updates to template from selenamarie, bug 823186"
Sending        default/backup_postgres_daily.erb
Transmitting file data .
Committed revision 55779.

bburton@voltaire [04:34:23] [~/code/mozilla/sysadmins/puppet/trunk/manifests/nodes] 
-> % svn ci -m "changes to stage postgres2 class paramaeters, from selenamarie, bug 823186"
Sending        nodes/socorro.pp
Transmitting file data .
Committed revision 55781.
File pushed

info: FileBucket adding {md5}3f41c8d2d455310f833f4192a54f1ac8
info: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]: Filebucketed /root/bin/backup_postgres.sh to main with sum 3f41c8d2d455310f833f4192a54f1ac8
notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/content: content changed '{md5}3f41c8d2d455310f833f4192a54f1ac8' to '{md5}40b71cc4c90b2fc10f25ab80e106c4e0'
notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/mode: mode changed '0700' to '0755'
notice: Finished catalog run in 42.58 seconds
bburton@voltaire [05:22:19] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/manifests] 
-> % svn ci -m "fixes for psql paths, from selenamarie, bug 823186"
Sending        manifests/create/database.pp
Sending        manifests/create/role.pp
Sending        manifests/server.pp
Transmitting file data ...
Committed revision 55791.
Created attachment 700544 [details] [diff] [review]
Fix for duplication in the backup_postgres_weekly cron template
Attachment #699494 - Attachment is obsolete: true
Created attachment 700546 [details] [diff] [review]
Fix three problems :) - cronjob duplicate removal, script path and filenaming problem in backup_postgres.sh
Attachment #700544 - Attachment is obsolete: true
(In reply to Selena Deckelmann :selenamarie :selena from comment #33)
> Created attachment 700546 [details] [diff] [review]
> Fix three problems :) - cronjob duplicate removal, script path and
> filenaming problem in backup_postgres.sh

Alright - could someone apply the above diff? After that, this ticket can be closed. :)
(Assignee)

Comment 35

5 years ago
localhost:trunk scabral$ svn commit -m "updating as per bug https://bugzilla.mozilla.org/show_bug.cgi?id=823186 comment 33-34"
Sending        modules/postgres2/files/scripts/backup_postgres.sh
Sending        modules/postgres2/templates/default/backup_postgres_weekly.erb
Transmitting file data ..
Committed revision 56012.

Can you verify the file is what you want it to be?
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Jan  8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly)
Jan  8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly)

The cron job didn't have a username specified. 

Please add 'root' as the username in /etc/cron.d/backup_postgres_weekly and then the cron will run.

Probably has to wait until tomorrow because of svn outage.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: sdeckelmann → scabral
(Assignee)

Comment 37

5 years ago
Updated the template to have root:

+++ modules/postgres2/files/scripts/backup_postgres.sh  (working copy)
# Backup postgres database on a weekly basis as root user
<% if globals and nodb -%>
0 2 * * 6 root /root/bin/backup_postgres.sh -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g -n
<% elsif globals -%>
0 2 * * 6 root /root/bin/backup_postgres.sh -d <%= database %> -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g
<% end -%>

Committed revision 56536.
Status: REOPENED → NEW
(Assignee)

Comment 38

5 years ago
Is there any more work on this?
Whiteboard: [2013q1]
This is done.
Status: NEW → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.