Last Comment Bug 793004 - dev1.db.phx1.mozilla.com corrupt databases
: dev1.db.phx1.mozilla.com corrupt databases
Status: RESOLVED FIXED
:
Product: Data & BI Services Team
Classification: Other
Component: DB: MySQL (show other bugs)
: other
: x86_64 Linux
: -- normal
: ---
Assigned To: server-ops-database
: Corey Shields [:cshields]
Mentors:
Depends on:
Blocks: 791455
  Show dependency treegraph
 
Reported: 2012-09-20 14:57 PDT by Jason Crowe [:jd]
Modified: 2014-10-17 12:46 PDT (History)
3 users (show)
See Also:
Due Date:
Mozilla Project: ---
QA Whiteboard:
Iteration: ---
Points: ---
Cab Review: ServiceNow Change Request (use flag)


Attachments

Description Jason Crowe [:jd] 2012-09-20 14:57:04 PDT
There are several databases that seem to be corrupt on dev1.db.phx1.mozilla.com

Sheeri worked on intranet_dev_allizom_org and intranet_allizom_org earlier and found corrupt tables that could not be removed.

After the fix I was able to load these databases again but then was unable to drop intranet_dev_allizom_org with the same error. (ERROR 2013 (HY000): Lost connection to MySQL server during query)

Later I tried to drop intranet_dev_allizom_org_directors, intranet_allizom_org_infrasec, intranet_dev_allizom_org_infrasec all with the same error.

I was able to drop and recreate intranet_allizom_org_archived_forum without errors so this does not seem to affect all of the databases on the server.

What I need to be able to drop and recreate are all of the intranet_* databases:
| intranet_allizom_org                    |
| intranet_allizom_org_archived_forum     |
| intranet_allizom_org_directors          |
| intranet_allizom_org_forum              |
| intranet_allizom_org_gsadev             |
| intranet_allizom_org_infrasec           |
| intranet_allizom_org_lilly              |
| intranet_allizom_org_metrics            |
| intranet_allizom_org_mozbeat            |
| intranet_allizom_org_p2                 |
| intranet_allizom_org_partners           |
| intranet_allizom_org_pto                |
| intranet_allizom_org_releng             |
| intranet_dev_allizom_org                |
| intranet_dev_allizom_org_archived_forum |
| intranet_dev_allizom_org_directors      |
| intranet_dev_allizom_org_forum          |
| intranet_dev_allizom_org_gsadev         |
| intranet_dev_allizom_org_infrasec       |
| intranet_dev_allizom_org_lilly          |
| intranet_dev_allizom_org_metrics        |
| intranet_dev_allizom_org_mozbeat        |
| intranet_dev_allizom_org_p2             |
| intranet_dev_allizom_org_partners       |
| intranet_dev_allizom_org_pto            |
| intranet_dev_allizom_org_releng

I did not test each of these individually as the issue seems to be larger than the specific databases.

Please let me know if I can provide any further useful information.

Thanks
Comment 1 Sheeri Cabral [:sheeri] 2012-09-21 07:10:27 PDT
dev1 and dev2 seemed to have the same problems with corruption. however, the backup seemed to have no such problems. Seems like the best case is to restore the backup; however before I do that, let's checksum:

[root@dev1.db.phx1 bin]# /usr/bin/pt-table-checksum --user checksum --password ****  --lock-wait-time=50 --quiet --chunk-size-limit=0  --no-check-plan --replicate percona.checksums --ignore-databases=drop_me
Comment 2 Sheeri Cabral [:sheeri] 2012-09-21 08:51:28 PDT
Checksum is still going, and it made the servers crash when it was trying to checksum blog_allizom_org (both dev1 and dev2, but not backup).
Comment 3 Sheeri Cabral [:sheeri] 2012-09-21 12:04:07 PDT
OK, the checksum turned out OK:


mysql> select * from percona.checksums where this_crc!=master_crc\G
0 rows in set (0.04 sec)

mysql> select max(ts) from percona.checksums;
+---------------------+
| max(ts)             |
+---------------------+
| 2012-09-21 11:57:11 |
+---------------------+
1 row in set (0.02 sec)

mysql> select count(*) from percona.checksums;
+-----------------------+
| count(*)              |
+-----------------------+
|                 15891 |
+-----------------------+
1 row in set (0.05 sec)

So I will restore today's backup to dev2.
Comment 4 Sheeri Cabral [:sheeri] 2012-09-24 06:34:17 PDT
The restore also had corruption, so over the weekend I set innodb_force_recovery to 6, exported the databases, and reimported to a fresh MySQL install, then tested and did a restore to dev1.

Things look good now, but there's still a mysql_old on dev1 just in case.
Comment 5 Sheeri Cabral [:sheeri] 2012-09-25 10:54:09 PDT
deleting mysql_old, this is all OK.
Comment 6 Dustin J. Mitchell [:dustin] 2012-09-25 11:16:09 PDT
This corrupted backups, too.  I recovered them from the servers.

Note You need to log in before you can comment on or make changes to this bug.