Closed
Bug 1047936
Opened 10 years ago
Closed 9 years ago
Decom node5.testing.stage.metrics.scl3.mozilla.com
Categories
(Infrastructure & Operations :: DCOps, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nagiosapi, Unassigned)
References
()
Details
(Keywords: spring-cleaning)
Automated alert report from nagios1.private.scl3.mozilla.com: Hostname: node5.testing.stage.metrics.scl3.mozilla.com Service: Metrics Disk State: CRITICAL Output: NRPE: Unable to read output Runbook: http://m.allizom.org/Metrics+Disk
Comment 1•10 years ago
|
||
I can't ssh to the box. I logged into this supermicro box via the OOB and it looks like we have a disk failure? Not sure if it can be salvaged?
Comment 2•10 years ago
|
||
DCOps, can you check out a disk failure on this host, please?
Assignee: nobody → server-ops-dcops
Component: Server Operations: MOC → Server Operations: DCOps
Updated•10 years ago
|
colo-trip: --- → scl3
Comment 3•10 years ago
|
||
cant figure out the root password for this box as it needs a manual fsck. can someone pm me or point me to the right direction? ive tried the root passwords in the sysadmin gpg file with no luck.
Comment 4•10 years ago
|
||
:tmary, is this server still in use? all the root passwords we tried didnt work and the server is no longer booting up properly. per SA, we can try to reimage it or decommission it as it is no longer under warranty.
Flags: needinfo?(tmeyarivan)
Comment 5•10 years ago
|
||
(In reply to Van Le [:van] from comment #4) > :tmary, is this server still in use? all the root passwords we tried didnt > work and the server is no longer booting up properly. per SA, we can try to > reimage it or decommission it as it is no longer under warranty. Host should be reimaged if required If it needs disk replacements, can existing disks on node6.testing.stage be used here ? (node6 has been decommissioned) --
Flags: needinfo?(tmeyarivan)
Comment 6•10 years ago
|
||
>Host should be reimaged if required
all 4 drives are detected with no errors reported by the sata controller. over to MOC to kick the host. per :rbryce since none of the passwords worked, this host might have not been configured properly.
please let me know if you need further hands on.
Assignee: server-ops-dcops → server-ops
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
Comment 7•10 years ago
|
||
node5.testing.stage.metrics.scl3.mozilla.com has been down for almost 2 weeks (See PING below) and probably not very functional for 3 months... IPMI Log - CRITICAL 11-03-2014 08:22:33 92d 14h 43m 55s 3/3 CHECK_NRPE: Socket timeout after 30 seconds. Metrics Disk - CRITICAL 11-03-2014 08:22:58 92d 15h 0m 59s 3/3 CHECK_NRPE: Socket timeout after 15 seconds. PING - CRITICAL 11-03-2014 08:26:52 13d 9h 7m 2s 3/3 PING CRITICAL - Packet loss = 100% Swap - CRITICAL 11-03-2014 08:26:37 92d 15h 1m 33s 3/3 CHECK_NRPE: Socket timeout after 15 seconds.
Comment 8•10 years ago
|
||
According to tmary in last week's data team meeting, this server can be decom'd. Changing the subject to reflect.
Summary: Metrics Disk on node5.testing.stage.metrics.scl3.mozilla.com is CRITICAL: NRPE: Unable to read output → Decom node5.testing.stage.metrics.scl3.mozilla.com
Updated•9 years ago
|
Keywords: spring-cleaning
Whiteboard: [id=nagios1.private.scl3.mozilla.com:395039]
Comment 9•9 years ago
|
||
10.22.31.212 = node5.testing.stage.metrics.scl3 No NFS because historically it hasn't used any. Assuming no netvault. Pulled from nagios in change 99007. Already powered off because of damage. Waiting a week is quite possibly silly, but this is also an easy Friday decom as opposed to some of the more thought-requiring ones I'm doing, so, going through the motions of waiting by throwing it back on the pile for a while.
Group: mozilla-employee-confidential
Updated•9 years ago
|
colo-trip: scl3 → ---
Updated•9 years ago
|
Group: mozilla-employee-confidential
Component: Server Operations → MOC: Service Requests
Product: mozilla.org → Infrastructure & Operations
Updated•9 years ago
|
QA Contact: shyam → lypulong
Comment 10•9 years ago
|
||
Punting over to DCOPs for physical decom.
Component: MOC: Service Requests → DCOps
QA Contact: lypulong
Updated•9 years ago
|
colo-trip: --- → scl3
Comment 11•9 years ago
|
||
Host has been decomm'd, inventory and DNS updated.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•