Status

()

bugzilla.mozilla.org
General
RESOLVED FIXED
15 years ago
7 years ago

People

(Reporter: bbaetz, Assigned: myk)

Tracking

Details

(Reporter)

Description

15 years ago
Currently, sometime between 12:30am and 1:30am pacific time each night, bmo
locks uip. After a while, connections die with 'too many connection errors',
then apache stops responding. Eventually (20-40 minutes later), everything works
as normal.

The backups are apparently being done with the equivalent of:

nice mysqldump bugs | nice gzip -9c > backups.gz

There are several problems with this:

- There aren't any explicit locks being taken, so this could lead to
inconsistent backups. We can't totally fix this, since we don't have
transactions, but we could do better.

- since mysqdump has to grab a read lock, noone can write to the db (including
to login). I suspect that the |nice| lowers the priority, prefering people who
are trying to login, and wait for the writing. Thats just a guess, though, but
you could probably confirm this with some processlist output while the backup is
in progress.

The minimal solution is to use mysqldump --opt, but its probably better to use
mysqlhotcopy instead. This would lock the entire db for as long as it takes to
copy all the tables somewhere else - see the man page for more details. On IRC,
myk said that that would only be a coupole of moinutes, which is probably sensible.
(Reporter)

Comment 1

15 years ago
ping? The backups regularly lock up bugzilla at arround 1am Pacific, for 20
minutes or so, when everyone keeps waiting on the write locks for various things.

I don't know how long mysqlhotcopy will take, but it can't be any slower, and
should be much faster. If you're paranoid^Wcautious, you could do a dump once a
week, rather than daily, or something.
(Assignee)

Comment 2

15 years ago
I've got this command running nightly on buttmonkey now:

cd /var/lib/mysql; mv bugs-bak.tar.gz bugs-bak-old.tar.gz; mysqlhotcopy bugs &&
tar --create --to-stdout --remove-files bugs_copy | gzip - > bugs-bak.tar.gz &&
rm -rf bugs_copy

Ping me in a week, and if all is well I'll remove the existing backup code and
rely on this one instead (possibly moving this one to b.m.o in the process).
(Reporter)

Comment 3

15 years ago
OK. I'm not sure how much the tar/gzip will give you, though, since you have
binary data. You may want to leave off the rm -rf at the end at one point, just
to see if its worth it.
(Assignee)

Comment 4

15 years ago
The DB is 3.2GB at the moment and 1.6GB compressed, so compression seems to be
useful.  The trailing "rm -rf" is a puzzler; shouldn't "--remove-files" cause
tar to remove bugs_copy after archiving it?  Or does it only remove the files
and not the directory?
(Reporter)

Comment 5

15 years ago
My docs don't have a --removefiles option, so I don't know what it does...

You probably want to do a 'real' backup weekly, though, just for sanity's sake.
(Assignee)

Comment 6

15 years ago
Hmm, mothra doesn't have a remove-files option either, so I guess it's
irrelevant.  The man page on buttmonkey says it removes files but doesn't
mention directories.    What's a "real" backup?
(Reporter)

Comment 7

15 years ago
mysqldump is what I meant by a 'real' backup. Maybe I'm just paranoid. OTOH,
theres replicated backups on a separate machine now, and the chances of that
machine, an dboth of bmo's disks dying is probably fairly low, and for that to
happen and the backups to die somehow...

I'm just thinking back to that corrupted index problem we had a while back, plus
a feeling of paranoia.

How long does the mysqlhotcopy take compared to the dump, btw?
(Assignee)

Comment 8

15 years ago
We fixed the corrupted indexes problems by dumping the data and reimporting it.
 Doing a mysqldump now doesn't give us anything that a hotcopy does, since we
can always dump/reimport the hotcopy later.  hotcopy takes about 5 minutes on
mothra; dump takes somewhere around an hour.
Is this still an issue?  Can't shadow be backed up now it's reliable?
(Reporter)

Comment 10

14 years ago
Now that we're using real replication rather than the shadowdb stuff, a backup
of the replication would be fine. I haven't noticed this lockup for a while.
Whether thats due to lower load, or a faster machine/disks/etc, I'm not sure.
We've been doing this for the last three months.

I've been getting cronmail with the results of "mysqlhotcopy --allowold
--keepold bugs /data/backups" on megalon (the replicant server) every night
since then.
Status: NEW → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED
Component: Bugzilla: Other b.m.o Issues → General
Product: mozilla.org → bugzilla.mozilla.org
You need to log in before you can comment on or make changes to this bug.