Closed Bug 673531 Opened 13 years ago Closed 13 years ago

sync57.db.scl2.svc: failed drive

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jlaz, Unassigned)

Details

Justin Lazaro [:jlaz] (use needinfo)

Reporter

Description

•

13 years ago

13:07:21 < nagios-sjc1> [57] sync57.db.scl2.svc:mysql is CRITICAL: Cant connect to MySQL server on 10.14.106.14 (111)
13:08:28 < nagios-sjc1> [59] sync57.db.scl2.svc:load is WARNING: WARNING - load average: 19.61, 14.97, 10.65
13:12:17 < nagios-sjc1> sync57.db.scl2.svc:mysql is OK: Uptime: 524  Threads: 119  Questions: 1731  Slow queries: 0  Opens: 117  Flush tables: 1  Open tables: 102  Queries per second avg: 3.303
13:13:30 < nagios-sjc1> sync57.db.scl2.svc:load is OK: OK - load average: 6.45, 11.06, 10.33

After some poking around, it looks like sdd has failed:

sd 6:0:0:0: [sdd] Unhandled sense code
sd 6:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 01 25 92 
sd 6:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocate failed
sd 6:0:0:0: [sdd] CDB: Read(10): 28 00 00 01 25 90 00 00 08 00
end_request: I/O error, dev sdd, sector 75154
ata6: EH complete
raid10:md125: read error corrected (8 sectors at 2064 on sdd1)
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata6.00: irq_stat 0x40000001
ata6.00: failed command: READ DMA
ata6.00: cmd c8/00:08:98:25:01/00:00:00:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:9b:25:01/00:00:00:00:00/e0 Emask 0x9 (media error)
ata6.00: status: { DRDY ERR }
ata6.00: error: { UNC }
ata6.00: configured for UDMA/133
ata6: EH complete
raid10: sdd1: redirecting sector 283008 to another mirror

Spare has kicked in and the array is rebuilding.  Will swap on the next scl2 visit.

Justin Lazaro [:jlaz] (use needinfo)

Reporter

Comment 1

•

13 years ago

Swapped out sdd, sde, and sdl and rebuilt the RAID-10 array on this box.

Justin Lazaro [:jlaz] (use needinfo)

Reporter

Comment 2

•

13 years ago

Looks like sdg has failed as well

md128 : active raid10 sdg[7](F) sdh[6] sdi[5] sdj[4] sdk[3] sdd[2] sde[1] sdl[0]
      3907045632 blocks super 1.2 64K chunks 2 near-copies [8/7] [UUUUUUU_]

Going to swap sdg, wipe out and rebuild the RAID10 array on the next scl2 visit.

:Atoll

Updated

•

13 years ago

Summary: Failed drive in sync57.db.scl2.svc → sync57.db.scl2.svc: failed drive

Pete Fritchman [:petef]

Comment 3

•

13 years ago

back in prod

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

9 years ago

Component: Operations: Hardware → Operations

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

sync57.db.scl2.svc: failed drive

Categories

(Cloud Services :: Operations: Miscellaneous, task)

Tracking

(Not tracked)

People

(Reporter: jlaz, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Updated