Closed
Bug 1094087
Opened 11 years ago
Closed 11 years ago
LSI Raid on backup1.private.par1.mozilla.com is WARNING: WARNING: 0:0:RAID-6:16 drives:1.818TB:Optimal Drives:16 online(23 Errors)
Categories
(Infrastructure & Operations :: MOC: Problems, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nagiosapi, Unassigned)
References
()
Details
(Whiteboard: [id=nagios1.private.scl3.mozilla.com:460996])
Automated alert report from nagios1.private.scl3.mozilla.com:
Hostname: backup1.private.par1.mozilla.com
Service: LSI Raid
State: WARNING
Output: WARNING: 0:0:RAID-6:16 drives:1.818TB:Optimal Drives:16 online(23 Errors)
Runbook: http://m.allizom.org/LSI+Raid
| Reporter | ||
Comment 1•11 years ago
|
||
Automated alert acknowledgement: (ashish)rebuilding drive bug 1094087
Status: NEW → ASSIGNED
Comment 2•11 years ago
|
||
Took drive offline and initiated a rebuild:
> Rebuild Progress on Device at Enclosure 17, Slot 1 Completed 0% in 2 Minutes.
Once rebuild completes, run the following command to put the drive online again:
> # cd /opt/MegaRAID/MegaCli
> # ./MegaCli64 -PDOnline -PhysDrv [17:1] -a0
Comment 3•11 years ago
|
||
Asish, the server is beeping all the time. Since there is nothing on it I just took it offline. Please let me know when we can work together on it. Thanks!
Comment 4•11 years ago
|
||
The machine has been downtimed until later in the day (when PAR staff are not in the office) the rebuild was interrupted so needs to be done again.
Comment 5•11 years ago
|
||
Rick can you take a look at this on your shift, the PAR staff should be out and I've asked Guiom to switch it on.
Flags: needinfo?(rbryce)
Comment 6•11 years ago
|
||
Hello Guillaume,
Can you turn on the server again?
The server will continue to beep until the raid finishes rebuilding, we could perhaps silence the alarm if needed.
Comment 7•11 years ago
|
||
The server is back online;
$ uptime
17:51:40 up 56 min, 1 user, load average: 0.00, 0.01, 0.00
cd /opt/MegaRAID/MegaCli
sudo ./MegaCli64 -AdpSetProp AlarmSilence -a0 # this silences the beeping
# Rebuild status:
$ sudo ./MegaCli64 -PDRbld -ShowProg -PhysDrv [17:1] -a0
Rebuild Progress on Device at Enclosure 17, Slot 1 Completed 91% in 311 Minutes.
Exit Code: 0x00
Seems the rebuilding might be stuck so will check in again (~1-2hrs) for any progress.
May need to "remove" the drive and restart rebuild (possible that the turning off of the server interrupted the rebuilding and left it in a irrecoverable state).
Flags: needinfo?(rbryce)
| Reporter | ||
Comment 8•11 years ago
|
||
Automated alert acknowledgement: (rbryce)WIP
| Reporter | ||
Comment 9•11 years ago
|
||
Automated alert recovery:
Hostname: backup1.private.par1.mozilla.com
Service: LSI Raid
State: OK
Output: OK: 0:0:RAID-6:16 drives:1.818TB:Optimal Drives:16 online
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 11•11 years ago
|
||
This recurred and a repeat of rebuilding didn't help clear out the errors. I've filed Bug 1096415 for replacing the drive.
Comment 12•11 years ago
|
||
Ashish, I was hoping you could help with the bleeping, the Paris folks are complaining that they cannot use the community space close-by.
Flags: needinfo?(ashish)
Comment 13•11 years ago
|
||
Alarm was silenced.
[12:22] < guiom> | Aj awesome the bleeping stopped.
Flags: needinfo?(ashish)
Comment 14•11 years ago
|
||
The following (which was executed), should permanently disable ALL alarms (beeping):
$ sudo ./MegaCli64 -AdpSetProp AlarmDsbl -aALL
Adapter 0: Set alarm to Disabled success.
Exit Code: 0x00
| Assignee | ||
Updated•8 years ago
|
Component: MOC: Incidents → MOC: Problems
You need to log in
before you can comment on or make changes to this bug.
Description
•