Disk 1/3 failed on Atom seamicro.phx1

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: mburns, Assigned: mburns)

Tracking

Details

(Whiteboard: [phx1 visit])

(Assignee)

Description

5 years ago
node115.seamicro.phx1 -- I/O error seen on Vdisk @ port 0. Please record this message and contact SeaMicro support.

node179.seamicro.phx1 -- I/O error seen on Vdisk @ port 0. Please record this message and contact SeaMicro support

The hosts using this disk are as follow ( node{3,19,35, ...}.seamicro.phx1 )

Disk1/3 is up
 Model: ST9500620NS, Revision: SN01, Serial: 9XF01WVV
 Id: 5000c5002d9ea4cb, Name: /dev/wd5c, Size: 465GB
 Server:   3, Vdisk:  0, Name: partition-1/3-00, Size: 14GB, Offset: 00000004GB
 Server:  19, Vdisk:  0, Name: partition-1/3-01, Size: 14GB, Offset: 00000018GB
 Server:  35, Vdisk:  0, Name: partition-1/3-02, Size: 14GB, Offset: 00000032GB
 Server:  51, Vdisk:  0, Name: partition-1/3-03, Size: 14GB, Offset: 00000046GB
 Server:  67, Vdisk:  0, Name: partition-1/3-04, Size: 14GB, Offset: 00000060GB
 Server:  83, Vdisk:  0, Name: partition-1/3-05, Size: 14GB, Offset: 00000074GB
 Server:  99, Vdisk:  0, Name: partition-1/3-06, Size: 14GB, Offset: 00000088GB
 Server: 115, Vdisk:  0, Name: partition-1/3-07, Size: 14GB, Offset: 00000102GB
 Server: 131, Vdisk:  0, Name: partition-1/3-08, Size: 14GB, Offset: 00000116GB
 Server: 147, Vdisk:  0, Name: partition-1/3-09, Size: 14GB, Offset: 00000130GB
 Server: 163, Vdisk:  0, Name: partition-1/3-10, Size: 14GB, Offset: 00000144GB
 Server: 179, Vdisk:  0, Name: partition-1/3-11, Size: 14GB, Offset: 00000158GB
 Server: 195, Vdisk:  0, Name: partition-1/3-12, Size: 14GB, Offset: 00000172GB
 Server: 211, Vdisk:  0, Name: partition-1/3-13, Size: 14GB, Offset: 00000186GB
 Server: 227, Vdisk:  0, Name: partition-1/3-14, Size: 14GB, Offset: 00000200GB
 Server: 243, Vdisk:  0, Name: partition-1/3-15, Size: 14GB, Offset: 00000214GB
 Server: 259, Vdisk:  0, Name: partition-1/3-16, Size: 14GB, Offset: 00000228GB
 Server: 275, Vdisk:  0, Name: partition-1/3-17, Size: 14GB, Offset: 00000242GB
 Server: 291, Vdisk:  0, Name: partition-1/3-18, Size: 14GB, Offset: 00000256GB
 Server: 307, Vdisk:  0, Name: partition-1/3-19, Size: 14GB, Offset: 00000270GB
 Server: 323, Vdisk:  0, Name: partition-1/3-20, Size: 14GB, Offset: 00000284GB
 Server: 339, Vdisk:  0, Name: partition-1/3-21, Size: 14GB, Offset: 00000298GB
 Server: 355, Vdisk:  0, Name: partition-1/3-22, Size: 14GB, Offset: 00000312GB
 Server: 371, Vdisk:  0, Name: partition-1/3-23, Size: 14GB, Offset: 00000326GB
 Server: 387, Vdisk:  0, Name: partition-1/3-24, Size: 14GB, Offset: 00000340GB
 Server: 403, Vdisk:  0, Name: partition-1/3-25, Size: 14GB, Offset: 00000354GB
 Server: 419, Vdisk:  0, Name: partition-1/3-26, Size: 14GB, Offset: 00000368GB
 Server: 435, Vdisk:  0, Name: partition-1/3-27, Size: 14GB, Offset: 00000382GB
 Server: 451, Vdisk:  0, Name: partition-1/3-28, Size: 14GB, Offset: 00000396GB
 Server: 467, Vdisk:  0, Name: partition-1/3-29, Size: 14GB, Offset: 00000410GB
 Server: 483, Vdisk:  0, Name: partition-1/3-30, Size: 14GB, Offset: 00000424GB
 Server: 499, Vdisk:  0, Name: partition-1/3-31, Size: 14GB, Offset: 00000438GB
(Assignee)

Updated

5 years ago
Blocks: 770708

Updated

5 years ago
colo-trip: --- → phx1
Vinh, could you copy Ashlee on the RMA process with Seamicro? I'd like her to learn the process and help document it.

Comment 2

5 years ago
RMA ticket has been submitted with SeaMicro (C-2444 Hard Drive RMA on Atom SeaMicro [ ref:_00DA0IH7f._500F0Bp3Qn:ref]). However they want to perform some troubleshooting over Webex. Kicking ticket over to Michael. 

mburns - Once SeaMicro ships the hard drive out, feel free to assign the ticket back to DCOPs. 

Thanks,
Vinh Hua
Assignee: server-ops → mburns
Status: NEW → ASSIGNED

Comment 3

5 years ago
:mburns, I am heading to phx1 on 9/25. Is there an update regarding this bug?

Thanks,
Van

Updated

5 years ago
Duplicate of this bug: 795913
This is still alerting. Did it get looked at or is it waiting for another phx1 colo trip?
Burnsie, any updates here?
Thanks!
Severity: normal → major
Buuuuurnsie?
<nagios-phx1:#sysadmins> Mon 08:16:08 PDT [131] 
  management.seamicro.phx1.mozilla.com:Seamicro Admin is CRITICAL: CRITICAL: 
  Seamicro Disk Status 1 Failure. Return Value: 3
(Assignee)

Comment 8

5 years ago
This is waiting on a PHX1 visit from DCOps, as the disk needs to be reset or confirmed it isn't empty. I'll talk to them and get this resolved.
(Assignee)

Updated

5 years ago
Assignee: mburns → server-ops
Severity: major → normal

Updated

5 years ago
Whiteboard: [phx1 visit]
Alerted again, downtimed again.
We have a trip booked for 10/24 and 10/25, we'll be able to address it then.

Comment 11

5 years ago
:mburns, there was a drive that had a red LED on it. I reseated the drive and it looks like there's a lot of activity on it as it is solid green right now. 

Drive SN# 9XFO1WVV was the drive in question. Please let me know if this didn't resolve the issue. If it did, please go ahead and close it for me.

Van
(Assignee)

Comment 12

5 years ago
Disk1/3 is up
 Model: ST9500620NS, Revision: SN01, Serial: 9XF01WVV
 Id: 5000c5002d9ea4cb, Name: /dev/wd7c, Size: 465GB

preliminary diag shows that the disk is working nicely. I'm doing a test kickstart of one of the effected nodes to confirm.

Updated

5 years ago
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
Assignee: server-ops → mburns
(Assignee)

Updated

5 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.