Closed
Bug 666943
Opened 14 years ago
Closed 14 years ago
Drive failure suspected on sync48.db.scl2.svc
Categories
(Cloud Services :: Operations: Miscellaneous, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlaz, Assigned: jlaz)
Details
Looks like /dev/sdf may be dead on sync48.db.scl2.svc
We received a nagios alert for high load at around 3AM:
03:07:01 < nagios-sjc1> [95] sync48.db.scl2.svc:load is CRITICAL: CRITICAL - load average: 28.21, 14.22, 8.41
03:11:57 < nagios-sjc1> sync48.db.scl2.svc:load is OK: OK - load average: 6.79, 9.29, 7.83
Contents of /proc/mdstat show that md125 is in a rebuild state:
md125 : active raid10 sdd1[0] sdl1[8](S) sdk1[7] sdj1[6] sdi1[5] sdh1[4] sdg1[3] sdf1[2] sde1[1]
3907037184 blocks super 1.1 512K chunks 2 near-copies [8/8] [UUUUUUUU]
[==>..................] check = 12.3% (482911616/3907037184) finish=5761.3min speed=9904K/sec
bitmap: 4/30 pages [16KB], 65536KB chunk
and a look at dmesg shows that /dev/sdf could be our failed drive:
ata6.00: error: { UNC }
ata6.00: configured for UDMA/133
ata6: EH complete
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata6.00: irq_stat 0x40000001
ata6.00: failed command: READ DMA EXT
ata6.00: cmd 25/00:00:00:26:01/00:04:00:00:00/e0 tag 0 dma 524288 in
res 51/40:00:3e:27:01/00:00:00:00:00/e0 Emask 0x9 (media error)
ata6.00: status: { DRDY ERR }
ata6.00: error: { UNC }
ata6.00: configured for UDMA/133
sd 6:0:0:0: [sdf] Unhandled sense code
sd 6:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdf] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
00 01 27 3e
sd 6:0:0:0: [sdf] Add. Sense: Unrecovered read error - auto reallocate failed
sd 6:0:0:0: [sdf] CDB: Read(10): 28 00 00 01 26 00 00 04 00 00
Assignee | ||
Comment 1•14 years ago
|
||
Drive swapped, RAID rebuilding, and sdf marked as a spare.
[root@sync48.db.scl2.svc ~]# mdadm --manage /dev/md125 --add ${DEAD}1
mdadm: added /dev/sdf1
[root@sync48.db.scl2.svc ~]# cat /proc/mdstat
Personalities : [raid1] [raid10]
md125 : active raid10 sdf1[9](S) sdd1[0] sdl1[8] sdk1[7] sdj1[6] sdi1[5] sdh1[4] sdg1[3] sde1[1]
3907037184 blocks super 1.1 512K chunks 2 near-copies [8/7] [UU_UUUUU]
[>....................] recovery = 0.1% (1320000/976759296) finish=1655.4min speed=9819K/sec
bitmap: 4/30 pages [16KB], 65536KB chunk
md126 : active raid1 sda1[0] sdc1[2](S) sdb1[1]
102388 blocks super 1.0 [2/2] [UU]
md127 : active raid1 sda2[0] sdc2[2](S) sdb2[1]
958095228 blocks super 1.1 [2/2] [UU]
bitmap: 0/8 pages [0KB], 65536KB chunk
unused devices: <none>
Assignee: nobody → jlazaro
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Component: Operations: Hardware → Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•