Creating bug as per sheeri. MySQL keeps crashing; .... InnoDB: End of page dump 120827 10:06:24 InnoDB: Page checksum 1351490099 (32bit_calc: 35087108), prior-to-4.0.14-form checksum 4233694457 InnoDB: stored checksum 587866912, prior-to-4.0.14-form stored checksum 3396583943 InnoDB: Page lsn 1919027747 168455535, low 4 bytes of lsn at page end 4000231286 InnoDB: Page number (if stored to page already) 1718707310, InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 1949973072 InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 7. ....
yes, the important part there is: 120827 10:06:03 InnoDB: Error: space id and page n:o stored in the page InnoDB: read in are 1949973072:1718707310, should be 1392:7! InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 7. looks like corruption. This is a slave, and there are 2 other slaves that are OK, so it's likely a hardware thing. I'll see if I can recover.
Started with innodb_force_recovery=1, but it still crashed, worked my way up to where it didn't, which was innodb_force_recovery=4.... 120827 10:13:20 Percona XtraDB (http://www.percona.com) 1.0.17-13.4 started; log sequence number 14573800889 InnoDB: !!! innodb_force_recovery is set to 4 !!! 120827 10:13:20 [Note] Recovering after a crash using /var/lib/mysql/node37-bin 120827 10:13:20 [Note] Starting crash recovery... 120827 10:13:20 [Note] Crash recovery finished. 120827 10:13:21 [Note] Event Scheduler: Loaded 0 events 120827 10:13:21 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.63-rel13.4-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), 13.4, Revision 443 120827 10:13:21 [Note] Slave SQL thread initialized, starting replication in log 'node36-bin.000001' at position 258927923, relay log '/var/lib/mysql/node37-relay-bin.000002' position: 258928069 120827 10:13:21 [Note] Slave I/O thread: connected to master 'firstname.lastname@example.org:3306',replication started in log 'node36-bin.000001' at position 260995688 InnoDB: A new raw disk partition was initialized or InnoDB: innodb_force_recovery is on: we do not allow InnoDB: database modifications by the user. Shut down InnoDB: mysqld and edit my.cnf so that newraw is replaced InnoDB: with raw, and innodb_force_... is removed. 120827 10:13:21 [ERROR] Slave SQL: Error 'Got error -1 from storage engine' on query. Default database: 'reps_dev_allizom_org'. Query: 'INSERT INTO `events_event` (`name`, `slug`, `start`, `end`, `timezone`, `venue`, `city`, `region`, `country`, `lat`, `lon`, `external_link`, `owner_id`, `planning_pad_url`, `estimated_attendance`, `description`, `extra_content`, `mozilla_event`, `hashtag`, `converted_visitors`, `swag_bug_id`, `budget_bug_id`) VALUES ('test66', 'test66', '2012-02-01 12:00:00', '2012-02-01 13:00:00', 'Africa/Abidjan', 'wdfw', 'rewe', 'werwer', 'Albania', 54.9776136706963, -29.53125, '', 9, 'http://etherpad.mozilla.org/remo-test66', 10, 'osdfnhosid hfoisd hfosdihf osdfih sdofhsd f', '', 0, '', 0, NULL, NULL)', Error_code: 1030 120827 10:13:21 [Warning] Slave: Got error -1 from storage engine Error_code: 1030 120827 10:13:21 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'node36-bin.000001' position 258927923
Most importantly, the crash recovery is OK, but it looks like the first SQL statement in the slave is causing the problems: 120827 10:13:16 [Note] Slave SQL thread initialized, starting replication in log 'node36-bin.000001' at position 258927923, relay log '/var/lib/mysql/node37-relay-bin.000002' position: 258928069 120827 10:13:16 [Note] Slave I/O thread: connected to master 'email@example.com:3306',replication started in log 'node36-bin.000001' at position 260995688 120827 10:13:16 InnoDB: Error: space id and page n:o stored in the page InnoDB: read in are 1949973072:1718707310, should be 1392:7! InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 7. So we can export that table, and hopefully export/reimport of that table will work.
[firstname.lastname@example.org mysql]# du -sh reps_dev_allizom_org/ 27M reps_dev_allizom_org/ [email@example.com mysql]# mysqldump reps_dev_allizom_org > reps_dev_allizom_org.sql Then I restarted with innodb_force_recovery commented out, and skip_slave_start enabled. That started OK (yay!) But trying to drop the reps_dev_allizom_org database or using that database caused issues. seeing if I can do an individual table drop of events_event.
Dropping the table is also causing crashing. :( As is truncating, dropping the database, etc. Going to re-sync from the master. [13:25:52] <Aj> sheeri, yea seeing some errors; EXT4-fs error [13:26:04] <sheeri> *nod* [13:26:12] <sheeri> FWIW Linux node37.seamicro.phx1.mozilla.com 2.6.32-71.el6.i686 #1 SMP Wed Sep 1 01:26:34 EDT 2010 i686 i686 i386 GNU/Linux [13:26:21] <Aj> want me to reboot and run a fsck ? [13:26:52] <sheeri> sure, I have deleted the db files, I'll restore from backup [13:26:59] <sheeri> but I'm logged out now. [13:27:18] <Aj> ok, want me to update the kernel while at it ? [13:27:34] <sheeri> yes please :D [13:27:39] <sheeri> and why not do a yum update :D
As per sheeri, ran; yum -y update kernel: 2.6.32-71.el6.i686 -> 2.6.32-279.2.1.el6.i686 Then performed a fsck -y which fixed lots of file system errors. Host is now back online.
resynced from node36, all is happy now.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.