Closed Bug 818989 Opened 12 years ago Closed 11 years ago

Backups failed due to bug in 9.2.1 through 9.2.4 - coordinate debugging with a debugging patch

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: selenamarie, Assigned: selenamarie)

References

Details

We had a security and bugfix release for PostgreSQL today. These include fixing issues with routine maintenance commands like CREATE INDEX CONCURRENTLY and some replication bugs. At the earliest convenient time, we should upgrade to 9.2.2. This process is just: * Install new packaged version of 9.2 * Stop socorro * Stop/start databases * Restart socorro
Assignee: server-ops-database → mpressman
9.2.3 will be out by the time it's possible to upgrade.
Summary: Schedule 9.2.2 upgrade for Socorro databases → Schedule 9.2.3 upgrade for Socorro databases
We hit a bug in 9.2.1 today: # Make a backup! Tue Feb 19 00:02:01 UTC 2013 ## Destructively clean up local /data/pgbasebackups/9.2/2013021900/data ## Making base backup transaction log start point: 1097/BC7A0460 pg_basebackup: starting background WAL receiver pg_basebackup: streaming header too small: 25 transaction log end point: 1098/14940E60 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: child process exited with error 1 This bug was fixed here: http://www.postgresql.org/message-id/CABUevEyv+YCQ26eXFKgBetDekeSp_YCJs80m1N=bFy3a47-GOg@mail.gmail.com And the commit is present in 9.2.3 We have 9.2.3 available on backup4, but the server must be upgraded to fix this problem.
Summary: Schedule 9.2.3 upgrade for Socorro databases → Backups failed due to bug in 9.2.1 - Schedule 9.2.3 upgrade for Socorro databases
Blocks: 823507
Tue Feb 19 18:04:10 UTC 2013 ## Destructively clean up local /data/pgbasebackups/9.2/2013021918/data ## Making base backup transaction log start point: 109D/3F26C6D0 pg_basebackup: starting background WAL receiver pg_basebackup: streaming header too small: 25 Ugh. Still happening.
Matt -- Could you confirm that the pg_basebackup is now working since the upgrade. Thanks! -selena
Flags: needinfo?(mpressman)
Since the update to 9.2.4 on master01 we have not hit this issue, here is the output from /var/log/base_backup.log Fri Apr 5 17:58:18 UTC 2013 # Cleaning up old backup directories Nothing to remove! Only 5 backups present. # Done! Fri Apr 5 17:58:18 UTC 2013 # Make a backup! Sat Apr 6 16:02:01 UTC 2013 ## Destructively clean up local /data/pgbasebackups/9.2/2013040616/data ## Making base backup transaction log start point: 1211/AF8076C0 pg_basebackup: starting background WAL receiver transaction log end point: 1212/25DB3D20 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed # Done with backup! Sat Apr 6 17:58:13 UTC 2013 # Cleaning up old backup directories 2013032800 # Done! Sat Apr 6 17:58:14 UTC 2013 # Make a backup! Sun Apr 7 16:02:01 UTC 2013 ## Destructively clean up local /data/pgbasebackups/9.2/2013040716/data ## Making base backup transaction log start point: 1218/B33B490 pg_basebackup: starting background WAL receiver transaction log end point: 1218/768B8788 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed # Done with backup! Sun Apr 7 17:58:44 UTC 2013 # Cleaning up old backup directories 2013032900 # Done! Sun Apr 7 17:58:45 UTC 2013 # Make a backup! Mon Apr 8 16:02:01 UTC 2013 ## Destructively clean up local /data/pgbasebackups/9.2/2013040816/data ## Making base backup transaction log start point: 121D/3022C9C0 pg_basebackup: starting background WAL receiver transaction log end point: 121D/9ABB6538 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed # Done with backup! Mon Apr 8 17:58:52 UTC 2013 # Cleaning up old backup directories 2013040100 # Done! Mon Apr 8 17:58:52 UTC 2013 # Make a backup! Tue Apr 9 16:02:01 UTC 2013 ## Destructively clean up local /data/pgbasebackups/9.2/2013040916/data ## Making base backup transaction log start point: 1222/58023B00 pg_basebackup: starting background WAL receiver transaction log end point: 1222/D65F8F58 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed # Done with backup! Tue Apr 9 17:59:01 UTC 2013 # Cleaning up old backup directories 2013040300 # Done!
Flags: needinfo?(mpressman)
Great! Thank you
apparently this is still a problem, since the upgrade we have not had an issue until today's backup: # Make a backup! Wed Apr 10 16:02:01 UTC 2013 ## Destructively clean up local /data/pgbasebackups/9.2/2013041016/data ## Making base backup transaction log start point: 1226/CC2A4360 pg_basebackup: starting background WAL receiver pg_basebackup: streaming header too small: 25 transaction log end point: 1228/48FFE918 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: child process exited with error 1
(In reply to Matt Pressman [:mpressman] from comment #7) > apparently this is still a problem, since the upgrade we have not had an > issue until today's backup: Ugh. Alright. I'm going to have a look at compiling a debugging version of PostgreSQL to see if we can catch this in the act. I have to address a couple other tickets before that unfortunately.
Flags: needinfo?(sdeckelmann)
Matt, Selena - is this still a problem? backups failing seems like a non-trivial thing, is there an update on this bug?
(In reply to Sheeri Cabral [:sheeri] from comment #9) > Matt, Selena - is this still a problem? backups failing seems like a > non-trivial thing, is there an update on this bug? I'm not sure -- maybe matt could check on the backups? Also, monitoring for the backups through nagios would be great.
Flags: needinfo?(sdeckelmann)
Still happening: # Make a backup! Wed Sep 4 16:02:01 UTC 2013 ## Making base backup transaction log start point: 14D5/E1440C28 pg_basebackup: starting background WAL receiver pg_basebackup: streaming header too small: 25 transaction log end point: 14D6/53FBDBF8 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: child process exited with error 1 # Done with backup! Wed Sep 4 18:37:03 UTC 2013 # Cleaning up old backup directories 2013090116 # Done! Looks like we get a few backups succeeding, sometimes.
Summary: Backups failed due to bug in 9.2.1 - Schedule 9.2.3 upgrade for Socorro databases → Backups failed due to bug in 9.2.1 through 9.2.4 - coordinate debugging with a debugging patch
I modified the base_backup.sh script so that if it catches an error, it will re-run until it completes successfully. There is a nagios check for the postgres backups, however it doesn't look like it is applied. Additionally, that check it could/should probably be more robust. I'll work on that as well.
Blocks: 914155
Blocks: 922156
Assignee: mpressman → sdeckelmann
No longer blocks: 922156
Depends on: 922156
Progress today: * installed the following on backup4 to support a build: > sudo yum install -y bison-devel readline-devel zlib-devel openssl-devel wget > sudo yum groupinstall -y 'Development Tools' * Built a copy of Postgres from commit 061b88c732952c59741374806e1e41c1ec845d50, which includes a fix for pg_basebackup. :mpressman is kicking off a backup now to test it.
Had to recompile the 9.2 version. 9.3 is not backward compatible for streaming replication.
No longer depends on: 922156
Confirmed that this fix is in. Version 9.2.6 of Postgres will include the fix.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.