sync57.db.scl2.svc: failed drive

RESOLVED FIXED

Status

Cloud Services
Operations
RESOLVED FIXED
6 years ago
2 years ago

People

(Reporter: jlaz, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

13:07:21 < nagios-sjc1> [57] sync57.db.scl2.svc:mysql is CRITICAL: Cant connect to MySQL server on 10.14.106.14 (111)
13:08:28 < nagios-sjc1> [59] sync57.db.scl2.svc:load is WARNING: WARNING - load average: 19.61, 14.97, 10.65
13:12:17 < nagios-sjc1> sync57.db.scl2.svc:mysql is OK: Uptime: 524  Threads: 119  Questions: 1731  Slow queries: 0  Opens: 117  Flush tables: 1  Open tables: 102  Queries per second avg: 3.303
13:13:30 < nagios-sjc1> sync57.db.scl2.svc:load is OK: OK - load average: 6.45, 11.06, 10.33

After some poking around, it looks like sdd has failed:

sd 6:0:0:0: [sdd] Unhandled sense code
sd 6:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 01 25 92 
sd 6:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocate failed
sd 6:0:0:0: [sdd] CDB: Read(10): 28 00 00 01 25 90 00 00 08 00
end_request: I/O error, dev sdd, sector 75154
ata6: EH complete
raid10:md125: read error corrected (8 sectors at 2064 on sdd1)
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata6.00: irq_stat 0x40000001
ata6.00: failed command: READ DMA
ata6.00: cmd c8/00:08:98:25:01/00:00:00:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:9b:25:01/00:00:00:00:00/e0 Emask 0x9 (media error)
ata6.00: status: { DRDY ERR }
ata6.00: error: { UNC }
ata6.00: configured for UDMA/133
ata6: EH complete
raid10: sdd1: redirecting sector 283008 to another mirror

Spare has kicked in and the array is rebuilding.  Will swap on the next scl2 visit.
(Reporter)

Comment 1

6 years ago
Swapped out sdd, sde, and sdl and rebuilt the RAID-10 array on this box.
(Reporter)

Comment 2

6 years ago
Looks like sdg has failed as well

md128 : active raid10 sdg[7](F) sdh[6] sdi[5] sdj[4] sdk[3] sdd[2] sde[1] sdl[0]
      3907045632 blocks super 1.2 64K chunks 2 near-copies [8/7] [UUUUUUU_]

Going to swap sdg, wipe out and rebuild the RAID10 array on the next scl2 visit.
Summary: Failed drive in sync57.db.scl2.svc → sync57.db.scl2.svc: failed drive
back in prod
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

2 years ago
Component: Operations: Hardware → Operations
Product: Cloud Services → Cloud Services
You need to log in before you can comment on or make changes to this bug.