Closed Bug 823186 Opened 12 years ago Closed 11 years ago

puppetize pg_dump backups

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: scabral, Assigned: scabral)

References

Details

(Whiteboard: [2013q1])

Attachments

(2 files, 9 obsolete files)

First rev of a script to run backups from prod to stage for Socorro 12 years ago Selena Deckelmann :selenamarie :selena 1.19 KB, text/plain		Details
Second rev of backup script, now deletes backups older than 41 days 12 years ago Selena Deckelmann :selenamarie :selena 1.27 KB, text/plain		Details
Restore script for pg_dump backups 11 years ago Selena Deckelmann :selenamarie :selena 7.18 KB, text/plain		Details
Restore script 11 years ago Matt Pressman [:mpressman] 7.12 KB, application/x-shellscript		Details
pg_dump Restore script 11 years ago Matt Pressman [:mpressman] 7.27 KB, text/plain		Details
Diff that was applied - something isn't quite right with the import for the pgdump.pp though :/ 11 years ago Selena Deckelmann :selenamarie :selena 6.92 KB, text/plain		Details
Diff to add backups to puppet 11 years ago Selena Deckelmann :selenamarie :selena 3.68 KB, text/plain		Details
Addresses commits by sheeri in version 55730 11 years ago Selena Deckelmann :selenamarie :selena 3.39 KB, text/plain		Details
This should fix it all. 11 years ago Selena Deckelmann :selenamarie :selena 2.56 KB, text/plain		Details
Fix for duplication in the backup_postgres_weekly cron template 11 years ago Selena Deckelmann :selenamarie :selena 876 bytes, patch		Details \| Diff \| Splinter Review
Fix three problems :) - cronjob duplicate removal, script path and filenaming problem in backup_postgres.sh 11 years ago Selena Deckelmann :selenamarie :selena 2.27 KB, patch		Details \| Diff \| Splinter Review

Sheeri Cabral [:sheeri]

Assignee

Description

•

12 years ago

Earlier today, Selena and I were trying to find the stage backups for socorro. I checked puppet, and I couldn't find a pg_dump file in puppet.

localhost:trunk scabral$ find . -type f -exec grep -l pg_dump {} \; | grep -v \.svn
./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/jira_backup.sh
./files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/mana_backup.sh
localhost:trunk scabral$

(and just for verification:
localhost:trunk scabral$ cat files/specific/internal2.db.phx1.mozilla.com/var/lib/pgsql/*backup.sh
#!/bin/bash
/usr/bin/pg_dump projects_mozilla_org > /backup/jira-`date +%y%m%d_%H%M%S`.dump
#!/bin/bash
/usr/bin/pg_dump mana_mozilla_org > /backup/mana-`date +%y%m%d_%H%M%S`.dump
)

So we should find the backup script for stage, clean it up so it meets standards, and put it in puppet including the cron file. I can help with this, I put the backup scripts in place for MySQL.

Selena Deckelmann :selenamarie :selena

Comment 1

•

12 years ago

Dumitru and I are working on this!

Assignee: mpressman → sdeckelmann

Sheeri Cabral [:sheeri]

Assignee

Comment 2

•

12 years ago

As per IRC, Matt's OK with Selena working on this.

Sheeri Cabral [:sheeri]

Assignee

Updated

•

12 years ago

Blocks: 823194

Selena Deckelmann :selenamarie :selena

Comment 3

•

12 years ago

From a manual run of a new script we're testing: 

Started around 12:30pm PT.

Trying to backup breakpad for tp-socorro01-master02.phx1.mozilla.com

real    211m47.838s
user    206m33.543s
sys     2m53.581s
Backup for database tp-socorro01-master02.phx1.mozilla.com/breakpad on 20121219 succeeded
Trying to backup globals for tp-socorro01-master02.phx1.mozilla.com

real    0m0.228s
user    0m0.003s
sys     0m0.007s
Backup for database globals on tp-socorro01-master02.phx1.mozilla.com on 20121219 succeeded

Selena Deckelmann :selenamarie :selena

Comment 4

•

12 years ago

At least part of this backup is good.

Would like to verify, but don't currently have enough disk space anywhere :/

[postgres@socorro1 backups]$ time pg_restore  -Fc -a -d breakpad -t reports_20121210 -v --disable-triggers -v -v -v breakpad-db-20121219.dump
pg_restore: connecting to database for restore
pg_restore: disabling triggers for reports_20121210
pg_restore: restoring data for table "reports_20121210"
pg_restore: enabling triggers for reports_20121210
pg_restore: setting owner and privileges for TABLE DATA reports_20121210

real    28m6.136s
user    0m17.220s
sys     0m1.528s

Selena Deckelmann :selenamarie :selena

Updated

•

12 years ago

Blocks: 823507

Selena Deckelmann :selenamarie :selena

Comment 5

•

12 years ago

Attached file First rev of a script to run backups from prod to stage for Socorro (obsolete) — Details

Selena Deckelmann :selenamarie :selena

Comment 6

•

12 years ago

Attached file Second rev of backup script, now deletes backups older than 41 days (obsolete) — Details

Added find to delete old files and timestamps on start/finish

Attachment #694365 - Attachment is obsolete: true

Sheeri Cabral [:sheeri]

Assignee

Comment 7

•

12 years ago

Can you verify by using the backup to refresh stage?

Selena Deckelmann :selenamarie :selena

Comment 8

•

12 years ago

It's possible, with the following caveat: 

I'm not yet sure how long it takes to restore a pg_dump backup - it could be up to three days. :/ It would be better to restore this somewhere else first to understand how long it takes, so that we don't disrupt the staging environment for an uncertain amount of time.

Selena Deckelmann :selenamarie :selena

Updated

•

12 years ago

Summary: puppetize stage backups for socorro → puppetize pg_dump backups for socorro that are written to stage

Sheeri Cabral [:sheeri]

Assignee

Comment 9

•

11 years ago

How big is the backup? I can give you 2 machines that have 246G each...

Matt Pressman [:mpressman]

Comment 10

•

11 years ago

Can we retry using the pgslave instance? The last time this was attempted the backup taken was bad, the last few, just by size are more in line with what a good backup should be.

Selena Deckelmann :selenamarie :selena

Comment 11

•

11 years ago

(In reply to Sheeri Cabral [:sheeri] from comment #9)
> How big is the backup? I can give you 2 machines that have 246G each...

The restore requires about 750 GB.

(In reply to Matt Pressman [:mpressman] from comment #10)
> Can we retry using the pgslave instance? The last time this was attempted
> the backup taken was bad, the last few, just by size are more in line with
> what a good backup should be.

Retry the restore or the backup itself?

Matt Pressman [:mpressman]

Comment 12

•

11 years ago

retry the restore

Selena Deckelmann :selenamarie :selena

Comment 13

•

11 years ago

Scheduled a meeting today at 1:30 to review mpressman's backup testing script.

Selena Deckelmann :selenamarie :selena

Comment 14

•

11 years ago

Script for restores tested on socorro1.dev with a 'test' database.

Here's the TODOs left: 
# TODO see if we could do this without the awk statement
#TODO if we don't have REMOTEDUMPFILE or REMOVEDUMPGLOBALS set, fail
# TODO figure out how to do this with the postgres user
# TODO Could we have puppet install the configs?

Plan: 

* Monday: Blow about pgslave DB and do a full test restore into a new directory called "testrestore"
* Monitor progress - expect this to take between 24-72 hours to complete

Selena Deckelmann :selenamarie :selena

Comment 15

•

11 years ago

Attached file Restore script for pg_dump backups (obsolete) — Details

Selena Deckelmann :selenamarie :selena

Comment 16

•

11 years ago

Next steps for these scripts: 

* Get backup_postgres.sh included in Puppet config for stage, located in /root/bin
* Get restore_postgres.sh included in Pupppet config for dev, located in /var/lib/postgres/bin

:mpressman to take care of this config

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Assignee: sdeckelmann → mpressman

Matt Pressman [:mpressman]

Comment 17

•

11 years ago

The awk statement is gone and a check to fail if the variables REMOTEDUMPFILE and REMOTEDUMPGLOBALS has been put in.

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Attachment #695855 - Attachment mime type: application/x-shellscript → text/plain

Matt Pressman [:mpressman]

Comment 18

•

11 years ago

Attached file Restore script (obsolete) — Details

Matt Pressman [:mpressman]

Comment 19

•

11 years ago

Attached file pg_dump Restore script — Details

Selena Deckelmann :selenamarie :selena

Comment 20

•

11 years ago

We kicked off a restore of a pg_dump backup.

[postgres@socorro1 ~]$ scripts/restore_postgres.sh
Mon Jan  7 15:17:54 PST 2013

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Attachment #698096 - Attachment is obsolete: true

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Attachment #698859 - Attachment is obsolete: true

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Attachment #698895 - Attachment description: Working Version → pg_dump Restore script

Attachment #698895 - Attachment mime type: application/x-shellscript → text/plain

Selena Deckelmann :selenamarie :selena

Comment 21

•

11 years ago

## Restore of breakpad dump from /pgdata/tmp/breakpad-db-20130106.dump succeeded
# Done!
Tue Jan  8 03:01:28 PST 2013

[postgres@socorro1 pgslave]$ du -sh .
444G    .

Still need to do some verification.

Selena Deckelmann :selenamarie :selena

Comment 22

•

11 years ago

Adding puppetizing the pg_dump backups for mana to this ticket

Summary: puppetize pg_dump backups for socorro that are written to stage → puppetize pg_dump backups

Selena Deckelmann :selenamarie :selena

Comment 23

•

11 years ago

Attached file Diff that was applied - something isn't quite right with the import for the pgdump.pp though :/ (obsolete) — Details

Attachment #695855 - Attachment is obsolete: true

Sheeri Cabral [:sheeri]

Assignee

Comment 24

•

11 years ago

So we've made the postgres2::server::backups::pgdump class at /modules/postgres2/manifests/backups/pgdump.pp, see the file in svn, and the 

socorro1.stage.db.phx1.mozilla.com

entry in manifests/nodes/socorro.pp references it with:


    postgres2::server::backups::pgdump {
        "socorro-stage":
            database        => 'breakpad',
            globals         => true,
            hostname        => 'tp-socorro01-master02',
            cluster         => 'main',
            postgres_version => '9.2',
            pg_backup_prefix  => '/pgdata/backups',
            frequency       => "weekly";
    }


But we can't figure out why it's not working (specifically, why /etc/cron.d doesn't have backup_postgres_daily and backup_postgres_weekly.....halp?

Selena Deckelmann :selenamarie :selena

Comment 25

•

11 years ago

Attached file Diff to add backups to puppet (obsolete) — Details

Ok -- I made a few mistakes :)

the classname should have been: postgres2::backups::pgdump and then I failed to define a variable needed by the crontab .erbs. 

Those errors are fixed in a diff against version 55716.

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Attachment #699387 - Attachment is obsolete: true

Selena Deckelmann :selenamarie :selena

Comment 26

•

11 years ago

Attached file Addresses commits by sheeri in version 55730 (obsolete) — Details

Attachment #699437 - Attachment is obsolete: true

Selena Deckelmann :selenamarie :selena

Comment 27

•

11 years ago

Still busted.  Waiting for someone better with puppet to help out.

Selena Deckelmann :selenamarie :selena

Comment 28

•

11 years ago

Attached file This should fix it all. (obsolete) — Details

I put this in socorroadm instead of socorro1.db - moved to the correct node, and fixed an error in a template that crept in.

Attachment #699439 - Attachment is obsolete: true

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Assignee: mpressman → sdeckelmann

Brandon Burton [:solarce]

Comment 29

•

11 years ago

bburton@voltaire [04:32:41] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/templates/default] 
-> % svn ci -m "updates to template from selenamarie, bug 823186"
Sending        default/backup_postgres_daily.erb
Transmitting file data .
Committed revision 55779.

bburton@voltaire [04:34:23] [~/code/mozilla/sysadmins/puppet/trunk/manifests/nodes] 
-> % svn ci -m "changes to stage postgres2 class paramaeters, from selenamarie, bug 823186"
Sending        nodes/socorro.pp
Transmitting file data .
Committed revision 55781.

Brandon Burton [:solarce]

Comment 30

•

11 years ago

File pushed

info: FileBucket adding {md5}3f41c8d2d455310f833f4192a54f1ac8
info: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]: Filebucketed /root/bin/backup_postgres.sh to main with sum 3f41c8d2d455310f833f4192a54f1ac8
notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/content: content changed '{md5}3f41c8d2d455310f833f4192a54f1ac8' to '{md5}40b71cc4c90b2fc10f25ab80e106c4e0'
notice: /Stage[main]/Postgres2::Backups::Pgdump/File[/root/bin/backup_postgres.sh]/mode: mode changed '0700' to '0755'
notice: Finished catalog run in 42.58 seconds

Brandon Burton [:solarce]

Comment 31

•

11 years ago

bburton@voltaire [05:22:19] [~/code/mozilla/sysadmins/puppet/trunk/modules/postgres2/manifests] 
-> % svn ci -m "fixes for psql paths, from selenamarie, bug 823186"
Sending        manifests/create/database.pp
Sending        manifests/create/role.pp
Sending        manifests/server.pp
Transmitting file data ...
Committed revision 55791.

Selena Deckelmann :selenamarie :selena

Comment 32

•

11 years ago

Attached patch Fix for duplication in the backup_postgres_weekly cron template (obsolete) — Details — Splinter Review

Attachment #699494 - Attachment is obsolete: true

Selena Deckelmann :selenamarie :selena

Comment 33

•

11 years ago

Attached patch Fix three problems :) - cronjob duplicate removal, script path and filenaming problem in backup_postgres.sh — Details — Splinter Review

Attachment #700544 - Attachment is obsolete: true

Selena Deckelmann :selenamarie :selena

Comment 34

•

11 years ago

(In reply to Selena Deckelmann :selenamarie :selena from comment #33)
> Created attachment 700546 [details] [diff] [review]
> Fix three problems :) - cronjob duplicate removal, script path and
> filenaming problem in backup_postgres.sh

Alright - could someone apply the above diff? After that, this ticket can be closed. :)

Sheeri Cabral [:sheeri]

Assignee

Comment 35

•

11 years ago

localhost:trunk scabral$ svn commit -m "updating as per bug https://bugzilla.mozilla.org/show_bug.cgi?id=823186 comment 33-34"
Sending        modules/postgres2/files/scripts/backup_postgres.sh
Sending        modules/postgres2/templates/default/backup_postgres_weekly.erb
Transmitting file data ..
Committed revision 56012.

Can you verify the file is what you want it to be?

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Selena Deckelmann :selenamarie :selena

Comment 36

•

11 years ago

Jan  8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly)
Jan  8 16:37:01 socorro1 crond[2172]: (CRON) bad username (/etc/cron.d/backup_postgres_weekly)

The cron job didn't have a username specified. 

Please add 'root' as the username in /etc/cron.d/backup_postgres_weekly and then the cron will run.

Probably has to wait until tomorrow because of svn outage.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Assignee: sdeckelmann → scabral

Sheeri Cabral [:sheeri]

Assignee

Comment 37

•

11 years ago

Updated the template to have root:

+++ modules/postgres2/files/scripts/backup_postgres.sh  (working copy)
# Backup postgres database on a weekly basis as root user
<% if globals and nodb -%>
0 2 * * 6 root /root/bin/backup_postgres.sh -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g -n
<% elsif globals -%>
0 2 * * 6 root /root/bin/backup_postgres.sh -d <%= database %> -r <%= hostname %> -p <%= pg_backup_path %> -v <%= postgres_version %> -g
<% end -%>

Committed revision 56536.

Status: REOPENED → NEW

Sheeri Cabral [:sheeri]

Assignee

Comment 38

•

11 years ago

Is there any more work on this?

Whiteboard: [2013q1]

Selena Deckelmann :selenamarie :selena

Comment 39

•

11 years ago

This is done.

Status: NEW → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → Data & BI Services Team

You need to log in before you can comment on or make changes to this bug.