Closed
Bug 669710
Opened 14 years ago
Closed 13 years ago
backups for blog.mozilla.com are broken
Categories
(Data & BI Services Team :: DB: MySQL, task)
Data & BI Services Team
DB: MySQL
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nmaul, Assigned: mpressman)
Details
It looks like the backups for blog_mozilla_com_wp database (c01 cluster) are broken. On tm-backup01, the .sql.gz files are only 331 bytes, and contain only the boilerplate comments you usually find in mysqldumps (time zone, collation, etc).
We need to get this fixed, but also find out how long it's been broken for. I have another bug to restore data for a particular blog back to how it was on 5/29. This will likely mean going to tape, unless you can figure out a better way to isolate it.
Running the mysqldump command by hand:
[root@tm-backup01 blog_mozilla_com_wp]# mysqldump --defaults-file=/data/c01/c01.cnf --socket=/var/lib/mysql/c01.sock --max_allowed_packet=1G --allow-keywords --add-drop-table --add-locks --no-autocommit --routines --extended-insert --create-options blog_mozilla_com_wp > ~/jakem-test.sql
mysqldump: Got error: 1016: Can't open file: './blog_mozilla_com_wp/wp_37_commentmeta.frm' (errno: 24) when using LOCK TABLES
Return code for that is 2.
(I've left ~/jakem-test.sql in place so you can see the actual dump file yourselves... although there's nothing substantial in it).
It might be a good idea to update our backup scripts to check for abnormally small dumps, or at least alert someone if the return code is not an expected value. It looks like the script author intended it to generate output on a failure, but I suspect the backticks in the script might be screwing that up, or maybe the alerts are getting ignored or dropped. This is in the cron file: MAILTO="infra-dbnotices@mozilla.com".
It looks like it often also backups up 'information_schema' databases... those should be excluded since there's no data in them.
Fortunately, after applying some find-fu, I think this is the only database that's supposed to be dumping that isn't working.
Comment 1•14 years ago
|
||
The mail with the error output was going where it was supposed to, but neither of us saw it due to bad mail filters :( I have a separate folder for the infra-dbnotices stuff, but my filters were in the wrong order and it was getting snagged by my generic cron mail filter, which goes into a box with too much mail to keep up with. I've since corrected mine.
It's been broken going back to at least June 7th. I have Thunderbird nuking cron mail over 30 days, so I don't know how much further back than that it's been broken.
Comment 2•14 years ago
|
||
fwiw, you can do a manual db dump using the same commands the backups usually use by running:
db-sqldump-oneoff {clustername} {dbname}
If it works, you'll get a dump in the current working directory with the cluster, dbname, and timestamp in the filename.
Assignee | ||
Comment 3•14 years ago
|
||
This is actually an OS issue, not a mysql issue per se. The problem is we have exceeded the maximum number of open files.
Assignee | ||
Comment 4•14 years ago
|
||
There are some workarounds for this in terms of backups, but I did notice that while flushing tables to get mysqld to release all file descriptors, another db ran into the same error. So this should be addressed by increasing the available file descriptors at the os level.
Reporter | ||
Comment 5•14 years ago
|
||
Ping. This is blocking a different bug... need to know at least if we can restore a particular blog to its state on 5/29/2011. If we cannot, I can unblock that bug.
(Of course, getting working backups is a critical issue regardless.)
Assignee | ||
Comment 6•14 years ago
|
||
The open_files_limit inn mysql on the backup host is set to 4156 whereas on the master it's set to 6000. I recommending increasing the value to 10,000. This should be sufficient for handling the backups. This does need to be set in the my.cnf and can only be set at startup so the the server would need to be restarted. When is the best time to restart this instance?
Reporter | ||
Comment 7•14 years ago
|
||
For the backup host, this is doable any time that mysqld instance isn't running backups. That should be in cron on that box somewhere... guessing it's daily or hourly mysqldumps. Should be easy to edit that one config file and restart that one instance.
Did you want to do this on the prod master as well? I don't see that it'd be particularly useful since it isn't breaking, but just covering the bases... we'd want to plan that out in more depth.
Comment 8•14 years ago
|
||
The backup system restarts the databases on a daily basis anyway, so on the backup server you could just change the config and leave it, and it'll automatically get picked up after the next raw backup runs.
Reporter | ||
Comment 9•14 years ago
|
||
(In reply to comment #0)
> We need to get this fixed, but also find out how long it's been broken for.
> I have another bug to restore data for a particular blog back to how it was
> on 5/29. This will likely mean going to tape, unless you can figure out a
> better way to isolate it.
I still need this answer.
Assignee | ||
Updated•14 years ago
|
Assignee: server-ops-database → mpressman
Reporter | ||
Comment 10•14 years ago
|
||
commnet 9 is moot now.
Are the backups all fixed? If so we can close this.
Assignee | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•