Closed Bug 589181 Opened 15 years ago Closed 15 years ago

amo01 database backups are failing

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: clouserw, Assigned: justdave)

Details

All the AMO database dumps on khan are empty: /data/backup-drop/mrdb04/mysql/addons_remora
Assignee: server-ops → tellis
They're supposed to be. They moved to cm-webdev01-master01 about a year ago.
Assignee: tellis → justdave
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
and they'd be under amo01 not mrdb04 :) but otherwise the same pathname over there.
Actually, this bug was filed because the remora DB at cm-webdev01-master01 doesn't have any data from this week, it's out of sync. I mentioned this to Wil and he filed this bug. So, I guess the cause for the bug is incorrect, but the actual bug is still there.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: Database dumps are empty on khan → cm-webdev01-master01 is out of sync for about a week
Yeah, just the wrong host. The db dumps on cm-webdev01-master01 were missing
OK, looking into it.
They look there to me... cm-webdev01-master01:/data/backup-drop/amo01/mysql/addons_remora/
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → WORKSFORME
They are empty
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
ooohh... communication here... you have the files, but the files are empty. gotcha. I'm also seeing that, didn't look to see that they had data in them, I just looked that they were there. These are directly pulled from the backups, this also means we have no backups of this specific database because the backups are busted :|
Severity: major → critical
Summary: cm-webdev01-master01 is out of sync for about a week → amo01 database backups are failing
Trying to run the backup by hand gives me this: mysqldump: Got error: 1146: Table 'addons_remora.stats_firefoxcup' doesn't exist when using LOCK TABLES looks like the backup replicant is corrupted, I'll queue up a re-sync from one of the other slaves.
According to the mail the cron job has been sending, it's been broken since August 12. Yay for noisy cron job output mailing lists that you can never find anything in :/ (jabba is working on fixing that)
I cloned slave03 to the backup server, and it still had the same error. I suspect slave03 is where the backup server was last cloned from and this wasn't caught before. Re-cloned from slave02, and that one appears to be NOT corrupted so far (yay). Production backup is running now, I'll push back to slave03 to replace the corrupted copy there after the backup completes.
Production backup successfully completed. It's rsyncing out both to dracula (for push to webdev) and to slave03 (to reload it since it was the source of the corruption) currently. You guys should have a replaced copy of today's backup with the full data sometime in the next 15 minutes or so.
ok, slave03 restore completed, I also verified the new backup did make it to cm-webdev01-master01, you guys should be good to go.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Looks good. I think nagios is making sure the files are there - maybe we should adjust it to make sure the filesize is larger than 500M.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.