Closed Bug 787111 Opened 12 years ago Closed 12 years ago

Disk 1/3 failed on Atom seamicro.phx1

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mburns, Assigned: mburns)

References

Details

(Whiteboard: [phx1 visit])

node115.seamicro.phx1 -- I/O error seen on Vdisk @ port 0. Please record this message and contact SeaMicro support.

node179.seamicro.phx1 -- I/O error seen on Vdisk @ port 0. Please record this message and contact SeaMicro support

The hosts using this disk are as follow ( node{3,19,35, ...}.seamicro.phx1 )

Disk1/3 is up
 Model: ST9500620NS, Revision: SN01, Serial: 9XF01WVV
 Id: 5000c5002d9ea4cb, Name: /dev/wd5c, Size: 465GB
 Server:   3, Vdisk:  0, Name: partition-1/3-00, Size: 14GB, Offset: 00000004GB
 Server:  19, Vdisk:  0, Name: partition-1/3-01, Size: 14GB, Offset: 00000018GB
 Server:  35, Vdisk:  0, Name: partition-1/3-02, Size: 14GB, Offset: 00000032GB
 Server:  51, Vdisk:  0, Name: partition-1/3-03, Size: 14GB, Offset: 00000046GB
 Server:  67, Vdisk:  0, Name: partition-1/3-04, Size: 14GB, Offset: 00000060GB
 Server:  83, Vdisk:  0, Name: partition-1/3-05, Size: 14GB, Offset: 00000074GB
 Server:  99, Vdisk:  0, Name: partition-1/3-06, Size: 14GB, Offset: 00000088GB
 Server: 115, Vdisk:  0, Name: partition-1/3-07, Size: 14GB, Offset: 00000102GB
 Server: 131, Vdisk:  0, Name: partition-1/3-08, Size: 14GB, Offset: 00000116GB
 Server: 147, Vdisk:  0, Name: partition-1/3-09, Size: 14GB, Offset: 00000130GB
 Server: 163, Vdisk:  0, Name: partition-1/3-10, Size: 14GB, Offset: 00000144GB
 Server: 179, Vdisk:  0, Name: partition-1/3-11, Size: 14GB, Offset: 00000158GB
 Server: 195, Vdisk:  0, Name: partition-1/3-12, Size: 14GB, Offset: 00000172GB
 Server: 211, Vdisk:  0, Name: partition-1/3-13, Size: 14GB, Offset: 00000186GB
 Server: 227, Vdisk:  0, Name: partition-1/3-14, Size: 14GB, Offset: 00000200GB
 Server: 243, Vdisk:  0, Name: partition-1/3-15, Size: 14GB, Offset: 00000214GB
 Server: 259, Vdisk:  0, Name: partition-1/3-16, Size: 14GB, Offset: 00000228GB
 Server: 275, Vdisk:  0, Name: partition-1/3-17, Size: 14GB, Offset: 00000242GB
 Server: 291, Vdisk:  0, Name: partition-1/3-18, Size: 14GB, Offset: 00000256GB
 Server: 307, Vdisk:  0, Name: partition-1/3-19, Size: 14GB, Offset: 00000270GB
 Server: 323, Vdisk:  0, Name: partition-1/3-20, Size: 14GB, Offset: 00000284GB
 Server: 339, Vdisk:  0, Name: partition-1/3-21, Size: 14GB, Offset: 00000298GB
 Server: 355, Vdisk:  0, Name: partition-1/3-22, Size: 14GB, Offset: 00000312GB
 Server: 371, Vdisk:  0, Name: partition-1/3-23, Size: 14GB, Offset: 00000326GB
 Server: 387, Vdisk:  0, Name: partition-1/3-24, Size: 14GB, Offset: 00000340GB
 Server: 403, Vdisk:  0, Name: partition-1/3-25, Size: 14GB, Offset: 00000354GB
 Server: 419, Vdisk:  0, Name: partition-1/3-26, Size: 14GB, Offset: 00000368GB
 Server: 435, Vdisk:  0, Name: partition-1/3-27, Size: 14GB, Offset: 00000382GB
 Server: 451, Vdisk:  0, Name: partition-1/3-28, Size: 14GB, Offset: 00000396GB
 Server: 467, Vdisk:  0, Name: partition-1/3-29, Size: 14GB, Offset: 00000410GB
 Server: 483, Vdisk:  0, Name: partition-1/3-30, Size: 14GB, Offset: 00000424GB
 Server: 499, Vdisk:  0, Name: partition-1/3-31, Size: 14GB, Offset: 00000438GB
Blocks: 770708
colo-trip: --- → phx1
Vinh, could you copy Ashlee on the RMA process with Seamicro? I'd like her to learn the process and help document it.
RMA ticket has been submitted with SeaMicro (C-2444 Hard Drive RMA on Atom SeaMicro [ ref:_00DA0IH7f._500F0Bp3Qn:ref]). However they want to perform some troubleshooting over Webex. Kicking ticket over to Michael. 

mburns - Once SeaMicro ships the hard drive out, feel free to assign the ticket back to DCOPs. 

Thanks,
Vinh Hua
Assignee: server-ops → mburns
Status: NEW → ASSIGNED
:mburns, I am heading to phx1 on 9/25. Is there an update regarding this bug?

Thanks,
Van
This is still alerting. Did it get looked at or is it waiting for another phx1 colo trip?
Burnsie, any updates here?
Thanks!
Severity: normal → major
Buuuuurnsie?
<nagios-phx1:#sysadmins> Mon 08:16:08 PDT [131] 
  management.seamicro.phx1.mozilla.com:Seamicro Admin is CRITICAL: CRITICAL: 
  Seamicro Disk Status 1 Failure. Return Value: 3
This is waiting on a PHX1 visit from DCOps, as the disk needs to be reset or confirmed it isn't empty. I'll talk to them and get this resolved.
Assignee: mburns → server-ops
Severity: major → normal
Whiteboard: [phx1 visit]
Alerted again, downtimed again.
We have a trip booked for 10/24 and 10/25, we'll be able to address it then.
:mburns, there was a drive that had a red LED on it. I reseated the drive and it looks like there's a lot of activity on it as it is solid green right now. 

Drive SN# 9XFO1WVV was the drive in question. Please let me know if this didn't resolve the issue. If it did, please go ahead and close it for me.

Van
Disk1/3 is up
 Model: ST9500620NS, Revision: SN01, Serial: 9XF01WVV
 Id: 5000c5002d9ea4cb, Name: /dev/wd7c, Size: 465GB

preliminary diag shows that the disk is working nicely. I'm doing a test kickstart of one of the effected nodes to confirm.
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
Assignee: server-ops → mburns
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.