Closed Bug 793004 Opened 12 years ago Closed 12 years ago

dev1.db.phx1.mozilla.com corrupt databases

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jd, Unassigned)

Details

There are several databases that seem to be corrupt on dev1.db.phx1.mozilla.com

Sheeri worked on intranet_dev_allizom_org and intranet_allizom_org earlier and found corrupt tables that could not be removed.

After the fix I was able to load these databases again but then was unable to drop intranet_dev_allizom_org with the same error. (ERROR 2013 (HY000): Lost connection to MySQL server during query)

Later I tried to drop intranet_dev_allizom_org_directors, intranet_allizom_org_infrasec, intranet_dev_allizom_org_infrasec all with the same error.

I was able to drop and recreate intranet_allizom_org_archived_forum without errors so this does not seem to affect all of the databases on the server.

What I need to be able to drop and recreate are all of the intranet_* databases:
| intranet_allizom_org                    |
| intranet_allizom_org_archived_forum     |
| intranet_allizom_org_directors          |
| intranet_allizom_org_forum              |
| intranet_allizom_org_gsadev             |
| intranet_allizom_org_infrasec           |
| intranet_allizom_org_lilly              |
| intranet_allizom_org_metrics            |
| intranet_allizom_org_mozbeat            |
| intranet_allizom_org_p2                 |
| intranet_allizom_org_partners           |
| intranet_allizom_org_pto                |
| intranet_allizom_org_releng             |
| intranet_dev_allizom_org                |
| intranet_dev_allizom_org_archived_forum |
| intranet_dev_allizom_org_directors      |
| intranet_dev_allizom_org_forum          |
| intranet_dev_allizom_org_gsadev         |
| intranet_dev_allizom_org_infrasec       |
| intranet_dev_allizom_org_lilly          |
| intranet_dev_allizom_org_metrics        |
| intranet_dev_allizom_org_mozbeat        |
| intranet_dev_allizom_org_p2             |
| intranet_dev_allizom_org_partners       |
| intranet_dev_allizom_org_pto            |
| intranet_dev_allizom_org_releng

I did not test each of these individually as the issue seems to be larger than the specific databases.

Please let me know if I can provide any further useful information.

Thanks
dev1 and dev2 seemed to have the same problems with corruption. however, the backup seemed to have no such problems. Seems like the best case is to restore the backup; however before I do that, let's checksum:

[root@dev1.db.phx1 bin]# /usr/bin/pt-table-checksum --user checksum --password ****  --lock-wait-time=50 --quiet --chunk-size-limit=0  --no-check-plan --replicate percona.checksums --ignore-databases=drop_me
Checksum is still going, and it made the servers crash when it was trying to checksum blog_allizom_org (both dev1 and dev2, but not backup).
OK, the checksum turned out OK:


mysql> select * from percona.checksums where this_crc!=master_crc\G
0 rows in set (0.04 sec)

mysql> select max(ts) from percona.checksums;
+---------------------+
| max(ts)             |
+---------------------+
| 2012-09-21 11:57:11 |
+---------------------+
1 row in set (0.02 sec)

mysql> select count(*) from percona.checksums;
+-----------------------+
| count(*)              |
+-----------------------+
|                 15891 |
+-----------------------+
1 row in set (0.05 sec)

So I will restore today's backup to dev2.
The restore also had corruption, so over the weekend I set innodb_force_recovery to 6, exported the databases, and reimported to a fresh MySQL install, then tested and did a restore to dev1.

Things look good now, but there's still a mysql_old on dev1 just in case.
Status: NEW → ASSIGNED
deleting mysql_old, this is all OK.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
This corrupted backups, too.  I recovered them from the servers.
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.