Currently, sometime between 12:30am and 1:30am pacific time each night, bmo locks uip. After a while, connections die with 'too many connection errors', then apache stops responding. Eventually (20-40 minutes later), everything works as normal. The backups are apparently being done with the equivalent of: nice mysqldump bugs | nice gzip -9c > backups.gz There are several problems with this: - There aren't any explicit locks being taken, so this could lead to inconsistent backups. We can't totally fix this, since we don't have transactions, but we could do better. - since mysqdump has to grab a read lock, noone can write to the db (including to login). I suspect that the |nice| lowers the priority, prefering people who are trying to login, and wait for the writing. Thats just a guess, though, but you could probably confirm this with some processlist output while the backup is in progress. The minimal solution is to use mysqldump --opt, but its probably better to use mysqlhotcopy instead. This would lock the entire db for as long as it takes to copy all the tables somewhere else - see the man page for more details. On IRC, myk said that that would only be a coupole of moinutes, which is probably sensible.
ping? The backups regularly lock up bugzilla at arround 1am Pacific, for 20 minutes or so, when everyone keeps waiting on the write locks for various things. I don't know how long mysqlhotcopy will take, but it can't be any slower, and should be much faster. If you're paranoid^Wcautious, you could do a dump once a week, rather than daily, or something.
I've got this command running nightly on buttmonkey now: cd /var/lib/mysql; mv bugs-bak.tar.gz bugs-bak-old.tar.gz; mysqlhotcopy bugs && tar --create --to-stdout --remove-files bugs_copy | gzip - > bugs-bak.tar.gz && rm -rf bugs_copy Ping me in a week, and if all is well I'll remove the existing backup code and rely on this one instead (possibly moving this one to b.m.o in the process).
OK. I'm not sure how much the tar/gzip will give you, though, since you have binary data. You may want to leave off the rm -rf at the end at one point, just to see if its worth it.
The DB is 3.2GB at the moment and 1.6GB compressed, so compression seems to be useful. The trailing "rm -rf" is a puzzler; shouldn't "--remove-files" cause tar to remove bugs_copy after archiving it? Or does it only remove the files and not the directory?
My docs don't have a --removefiles option, so I don't know what it does... You probably want to do a 'real' backup weekly, though, just for sanity's sake.
Hmm, mothra doesn't have a remove-files option either, so I guess it's irrelevant. The man page on buttmonkey says it removes files but doesn't mention directories. What's a "real" backup?
mysqldump is what I meant by a 'real' backup. Maybe I'm just paranoid. OTOH, theres replicated backups on a separate machine now, and the chances of that machine, an dboth of bmo's disks dying is probably fairly low, and for that to happen and the backups to die somehow... I'm just thinking back to that corrupted index problem we had a while back, plus a feeling of paranoia. How long does the mysqlhotcopy take compared to the dump, btw?
We fixed the corrupted indexes problems by dumping the data and reimporting it. Doing a mysqldump now doesn't give us anything that a hotcopy does, since we can always dump/reimport the hotcopy later. hotcopy takes about 5 minutes on mothra; dump takes somewhere around an hour.
Is this still an issue? Can't shadow be backed up now it's reliable?
Now that we're using real replication rather than the shadowdb stuff, a backup of the replication would be fine. I haven't noticed this lockup for a while. Whether thats due to lower load, or a faster machine/disks/etc, I'm not sure.
We've been doing this for the last three months. I've been getting cronmail with the results of "mysqlhotcopy --allowold --keepold bugs /data/backups" on megalon (the replicant server) every night since then.