admin2a.private.tpe1 needs to be reimaged

RESOLVED FIXED

Status

Infrastructure & Operations
DCOps
RESOLVED FIXED
5 months ago
4 months ago

People

(Reporter: van, Assigned: van)

Tracking

Details

Attachments

(1 attachment)

(Assignee)

Description

5 months ago
Tue 21:54:41 UTC [5293] [Unknown] admin2a.private.tpe1.mozilla.com:HP RAID is UNKNOWN: RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly :  Error: The specified device does not have any logical drives. (http://m.mozilla.org/HP+RAID)

checked the host and saw this:

[vle@admin2a.private.tpe1 ~]$ sudo hpacucli ctrl all show config

Dynamic Smart Array B140i in Slot 0b      ()


   Port Name: 1I

   Port Name: 2I

   Port Name: 3I

   Port Name: 4I

rebooted host checked the RAID configs/BIOS settings and everything looks fine. the host is complaining of XFS errors upon reboot and checking /etc/fstab, there's a /sdb and a msdos partition which doesn't match admin2b.private.tpe1. i'd like to rekick this host if possible.
(Assignee)

Comment 1

5 months ago
:ashish, can we kickstart from tpe1?
Assignee: server-ops-dcops → vle
Flags: needinfo?(ashish)
(Assignee)

Comment 2

5 months ago
Created attachment 8890137 [details]
Screen Shot 2017-07-25 at 3.32.57 PM.png
The fs looks pretty trashed :/
Flags: needinfo?(ashish)
(Assignee)

Comment 4

5 months ago
host has been reimaged, puppetized, and back online.

there *might* be an issue with this host so we should keep it on a short leash. 

1) system lost raid for no reason
2) when trying to reimage, it couldn't detect a raid or any drives. had to change a config in bios, save, then revert. took a couple of tries.
3) 6+ hours to reimage, 2+ hours to pupppetize with multiple failures. installing a simple package or two can take tens of minutes. not sure if this is due to the pacific ocean and we're kickstarting and puppetizing from scl3.

however, I ran the HP Insight Diagnostics and everything passed.
(Assignee)

Comment 5

4 months ago
the host has been online for over 3 days and the RAID doesnt show any issues although i notice the always present load. going to close bug, reopen if any issues please.


[vle@admin2a.private.tpe1 ~]$ uptime
 17:57:41 up 3 days, 12:54,  1 user,  load average: 8.28, 9.52, 9.91
[vle@admin2a.private.tpe1 ~]$ sudo hpacucli ctrl all show config

Dynamic Smart Array B140i in Slot 0b      ()


   Port Name: 1I

   Port Name: 2I
   array A (SATA, Unused Space: 0  MB)


      logicaldrive 1 (465.7 GB, RAID 1, OK)

      physicaldrive 1I:0:1 (port 1I:box 0:bay 1, SATA, 500 GB, OK)
      physicaldrive 1I:0:2 (port 1I:box 0:bay 2, SATA, 500 GB, OK)
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.