Closed Bug 673531 Opened 13 years ago Closed 13 years ago

sync57.db.scl2.svc: failed drive

Categories

(Cloud Services :: Operations: Miscellaneous, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlaz, Unassigned)

Details

13:07:21 < nagios-sjc1> [57] sync57.db.scl2.svc:mysql is CRITICAL: Cant connect to MySQL server on 10.14.106.14 (111)
13:08:28 < nagios-sjc1> [59] sync57.db.scl2.svc:load is WARNING: WARNING - load average: 19.61, 14.97, 10.65
13:12:17 < nagios-sjc1> sync57.db.scl2.svc:mysql is OK: Uptime: 524  Threads: 119  Questions: 1731  Slow queries: 0  Opens: 117  Flush tables: 1  Open tables: 102  Queries per second avg: 3.303
13:13:30 < nagios-sjc1> sync57.db.scl2.svc:load is OK: OK - load average: 6.45, 11.06, 10.33

After some poking around, it looks like sdd has failed:

sd 6:0:0:0: [sdd] Unhandled sense code
sd 6:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 01 25 92 
sd 6:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocate failed
sd 6:0:0:0: [sdd] CDB: Read(10): 28 00 00 01 25 90 00 00 08 00
end_request: I/O error, dev sdd, sector 75154
ata6: EH complete
raid10:md125: read error corrected (8 sectors at 2064 on sdd1)
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata6.00: irq_stat 0x40000001
ata6.00: failed command: READ DMA
ata6.00: cmd c8/00:08:98:25:01/00:00:00:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:9b:25:01/00:00:00:00:00/e0 Emask 0x9 (media error)
ata6.00: status: { DRDY ERR }
ata6.00: error: { UNC }
ata6.00: configured for UDMA/133
ata6: EH complete
raid10: sdd1: redirecting sector 283008 to another mirror

Spare has kicked in and the array is rebuilding.  Will swap on the next scl2 visit.
Swapped out sdd, sde, and sdl and rebuilt the RAID-10 array on this box.
Looks like sdg has failed as well

md128 : active raid10 sdg[7](F) sdh[6] sdi[5] sdj[4] sdk[3] sdd[2] sde[1] sdl[0]
      3907045632 blocks super 1.2 64K chunks 2 near-copies [8/7] [UUUUUUU_]

Going to swap sdg, wipe out and rebuild the RAID10 array on the next scl2 visit.
Summary: Failed drive in sync57.db.scl2.svc → sync57.db.scl2.svc: failed drive
back in prod
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Operations: Hardware → Operations
You need to log in before you can comment on or make changes to this bug.