Closed Bug 540170 Opened 16 years ago Closed 16 years ago

Need to integrate Vertica backups into Bakbone (corp backup tool)

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dre, Assigned: phong)

References

Details

(Whiteboard: Ready for integration)

Currently, we stage the data from the three cm-metricsdb0[1-3] machines onto the attached drive on cm-metricsetl01 (mounted on /home/dbadmin). We really need to reclaim the space on this drive to use it for the etl01 MySQL database. Could we hook up a USB drive or something to either db01 or etl01 and migrate the Vertica backup staging over to it?
We don't need space on etl01 anymore - we have plenty
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
This bug wasn't supposed to be closed. The Vertica backups are slowly getting larger, and we are now at the point that Nagios is warning that this disk is almost full. We need to have a better way to stage the Vertica data so it can be backed up. Either extra disk(s)s that we can copy the snapshot data onto, or a more complex approach where the backup daemon runs a script that takes the snapshot, copies the files directly from the machines, and when done, releases the snapshot.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 545761
USB drives aren't rack friendly. Possible to do if it's short term but I don't think it's the best solution. Also implies you don't care if that drive dies. Does this need to be high performance? How much space do you need? Will iscsi work?
Worst case if it dies and takes more than a day to replace is we don't have backups. Doesn't need to be high performance, but it might need to copy a few hundred gig or so. On normal days the transfer is incremental. 2TB would be nice for room for growth. iSCSI or maybe NFS might be fine, I could even mount it on the three machines and have them sync their own files to it.
I wiped out a bunch of files on the disk just to get rid of the critical page. The files will come back again tonight when it attempts to back up though, and it looks like we are now out of space cause it hit 0% last night.
Summary: Need 1TB of storage that we can use to stage Vertica data prepped for backup → Need 2TB of storage that we can use to stage Vertica data prepped for backup
Need to get resolution on this. I wiped out half a year of versioncheck data (that can be regenerated) in order to try to make it fit, but we are definitely at a critical stage here.
Severity: normal → major
Summary of the current backup process in case it helps for planning: * at 00:00 PST, etl01 fires a cron job to begin the backup. * cm-metricsetl01:/home/dbadmin/vertica_cluster_backup/backup.sh performs the following rough logic: * clean up previous run scripts and archive run logs * build new run scripts and execute them * execute SQL command on Vertica cluster to create a snapshot point for a stable hot backup * while that snapshot point is active (i.e. a current transaction) * get a list of all data files that are part of this snapshot * rsync all those files from each of the three servers onto the etl01 backup staging directory * delete any old files that are no longer part of backup (just the --delete flag of rsync) * backup config files * when rsync is finished, close SQL transaction which closes the snapshot window We could easily have the backups be taken from each machine individually.. the only requirement is that the snapshot SQL command be executed and that the transaction not get closed until all the files are rsynced/backed-up.
Assignee: server-ops → mrz
Severity: major → normal
Assignee: mrz → aravind
To be clear, we don't need the space if we can: * run a command before the backup job runs * keep the command open while the backup is running * end the command when the backup is done (Daniel can give details) If you can do the above, we don't need the 2tb of space. Aravind and Daniel - please sync up on this.
Assignee: aravind → mrz
Assignee: mrz → aravind
Summary: Need 2TB of storage that we can use to stage Vertica data prepped for backup → Need to integrate Vertica backups into Bakbone (corp backup tool)
Aravind, please ping me or call my extension if you want to discuss further. I've gone through and added more comments to the backup.sh file mentioned in comment #8.
After talking to Daniel, here is what we will do. 1. Mount the three machines that have data to be backuped up as NFS mounts onto the central machine (cm-metricsetl01). 2. Daniel will give me a script to be called that will quiesce the system and generate a list of files that need to be backed up. This script will exist after this file list has been generated (even though it might spawn off additional processes that keep the db in the quiesce mode. 3. Bakbone will read this generate file and backup this list of files in it. 4. After the backup is done, bakbone will call another script to release the db back into its normal mode. We have not tested how inclusion lists work with bakbone, so will have to watch this system pretty closely for a few days once we put this into place. @Daniel, please comment here once you side of things are ready (the nfs mounts, and the scripts).
Assignee: aravind → deinspanjer
Whiteboard: [needs script]
Punt back when you have your script.
Group: infra → mozilla-stats
Component: Server Operations → Data/Backend Reports
Product: mozilla.org → Mozilla Stats
QA Contact: mrz → data-reports
Version: other → 0.1
Whiteboard: [needs script] → Script will be provided on 2010-04-01
The backup script was reconfigured to store the files for each db machine on a new directory on that machine. The following four directories should be backed up in order to have a complete backup set: cm-metricsetl01/home/dbadmin/vertica_cluster_backup cm-metricsdb01/home/dbadmin/vertica_cluster_backup cm-metricsdb02/home/dbadmin/vertica_cluster_backup cm-metricsdb03/home/dbadmin/vertica_cluster_backup
Assignee: deinspanjer → server-ops
Group: mozilla-stats
Component: Data/Backend Reports → Server Operations
Product: Mozilla Stats → mozilla.org
QA Contact: data-reports → mrz
Whiteboard: Script will be provided on 2010-04-01 → Ready for integration
Version: 0.1 → other
Aravind is our backup dude!
Assignee: server-ops → aravind
302 to Phong.
Assignee: aravind → phong
I will add these to tonight backups.
we have a policy set up for /home so those should be included. let me know if you want everything else to be excluded and only back up /home/dbadmin/vertica_cluster_backup
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Yikes, definitely has to be just the /home/dbadmin/vertica_cluster_backup directory, otherwise you'd be taking two copies of all the data.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
updated policy to only back up /home/dbadmin/vertica_cluster_backup
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.