Closed Bug 671590 Opened 13 years ago Closed 13 years ago

Failing disks on sync47.db.scl2.svc

Categories

(Cloud Services :: Operations: Miscellaneous, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlaz, Assigned: jlaz)

Details

07:46:56 < nagios-sjc1> [10] sync47.db.scl2.svc:load is WARNING: WARNING - load average: 18.79, 12.94, 10.02

dmesg shows some issues with the disks in bus 10 and 11

ata10.00: status: { DRDY ERR }
ata10.00: error: { UNC }
ata10.00: failed command: READ FPDMA QUEUED
ata10.00: cmd 60/80:f0:00:91:01/00:00:00:00:00/40 tag 30 ncq 65536 in
         res 51/40:04:80:91:01/40:f4:00:00:00/40 Emask 0x9 (media error)

ata11.00: edma_err_cause=00000018 pp_flags=00000000, dev disconnect
ata11: SError: { RecovComm PHYRdyChg }
ata11.00: failed command: FLUSH CACHE EXT
ata11.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         res 80/00:04:10:08:00/00:00:00:00:00/a0 Emask 0x12 (ATA bus error)
ata11.00: status: { Busy }
ata11: hard resetting link
ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata11.00: configured for UDMA/133
ata11: EH complete
ata11: exception Emask 0x10 SAct 0x0 SErr 0x10002 action 0xe frozen

sdj (ata10) and sdk (ata11) seem like the likely suspects:

[root@sync47.db.scl2.svc ~]# find /sys -name sdj
/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.3/0000:07:01.0/host10/target10:0:0/10:0:0:0/block/sdj
/sys/class/block/sdj
/sys/block/sdj
[root@sync47.db.scl2.svc ~]# find /sys -name sdk
/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.3/0000:07:01.0/host11/target11:0:0/11:0:0:0/block/sdk
/sys/class/block/sdk
/sys/block/sdk

Will swap on the next SCL2 visit
RAID rebuild underway:

md125 : active raid10 sdj[9](S) sdd1[0] sdl1[8] sdk1[7] sdi1[5] sdh1[4] sdg1[3] sdf1[2] sde1[1]
      3907037184 blocks super 1.1 512K chunks 2 near-copies [8/7] [UUUUUU_U]
      [>....................]  recovery =  0.2% (2741248/976759296) finish=1383.9min speed=11729K/sec
      bitmap: 4/30 pages [16KB], 65536KB chunk

sdj now marked as a spare.  Will need to swap sdk on the next visit after this rebuild completes.
sdk swapped, sdj rebuilding.

md125 : active raid10 sdk[10](S) sdj[9] sdd1[0] sdl1[8] sdi1[5] sdh1[4] sdg1[3] sdf1[2] sde1[1]
      3907037184 blocks super 1.1 512K chunks 2 near-copies [8/7] [UUUUUUU_]
      [>....................]  recovery =  0.6% (5992384/976759296) finish=581.2min speed=27832K/sec
      bitmap: 4/30 pages [16KB], 65536KB chunk
Assignee: nobody → jlazaro
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Operations: Hardware → Operations
You need to log in before you can comment on or make changes to this bug.