Closed
Bug 661420
Opened 13 years ago
Closed 13 years ago
Upgrade to RHEL 5.7 + upgrade HP Firmware to prevent hangs on the DL 360 machines
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dumitru, Assigned: bkero)
References
Details
This bug is to track the upgrades on the RAID controllers alerts that are being reported by Nagios. Current list is here: https://mana.mozilla.org/wiki/display/~dgherman@mozilla.com/HP+RAID+controllers+that+need+firmware+upgrades#
Flags: needs-downtime+
Reporter | ||
Comment 1•13 years ago
|
||
Correction: https://mana.mozilla.org/wiki/display/SYSADMIN/HP+RAID+controllers+that+need+firmware+upgrades
Reporter | ||
Comment 3•13 years ago
|
||
CC'ing Ravi on this to see when we can squeeze in a ringring reboot.
Comment 4•13 years ago
|
||
This is a blocker since machines are hard locking because of a combination of this + RHEL kernel weirdness. Assigning to current oncall.
Assignee: dgherman → ashish
Severity: normal → blocker
Updated•13 years ago
|
Summary: HP RAID controllers that need firmware upgrades → Upgrade to RHEL 5.7 + upgrade HP Firmware to prevent hangs on the DL 360 machines
Comment 5•13 years ago
|
||
To summarize - POA is as follows on the crashing machines : 1) Upgrade to RHEL 5.7, reboot. 2) Upgrade the HP Firmware for the RAID controller. Dave, correct me if it's needed here. Ashish, 1) should be simple enough, either a yum -y update and reboot or login to boris as root, forward keys and run rebootrhel <hostname> and let that script do the magic. Phong should have updated you with instructions for 2.
Comment 6•13 years ago
|
||
boris and dm-peep01 upgraded. The remaining machines in the mana page are production hosts and we'll have to schedule downtime with corresponding teams.
Comment 7•13 years ago
|
||
pm-web02 and pm-web03 can be done one at a time, since they are part of a redundant cluster. The three socorro hosts on there are mostly used for dev work and could be done whenever with a quick heads-up announcement in #breakpad, however they are all running rhel6, so I'm not sure it is an issue. I'd still push to do the firmware upgrades though.
Comment 8•13 years ago
|
||
I mean the *5* socorro hosts :) In fact, now would be a good time to do those, since everyone is probably asleep that might use them.
Comment 9•13 years ago
|
||
Added a list of thus far known controller firmwares to https://mana.mozilla.org/wiki/display/SYSADMIN/Hardware+Issues (scroll to the end)
Updated•13 years ago
|
Assignee: ashish → bkero
Comment 10•13 years ago
|
||
cc: coop so he can see details for the downtime announcement.
Comment 11•13 years ago
|
||
The downtime has been announced: http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/036a7b39b6059a7d# As of now, we are a GO for the downtime tonight at 6pm PDT. If I hear otherwise, I'll update this bug. I'll be responsible for closing trees on the releng side.
Updated•13 years ago
|
Assignee | ||
Comment 12•13 years ago
|
||
All machines have been taken care of besides: tm-c01-master01 dm-svn01 dm-svn02 dm-webtools04 ringring.mv These machines are slated to be taken care of tonight when scheduled downtime is planned.
Assignee | ||
Comment 13•13 years ago
|
||
all systems upgraded beside stm-c01-master01, which is a database for a whole lot of high profile sites (also a single point of failure). We'll need to coordinate a downtime window for this bad boy.
Severity: blocker → major
Updated•13 years ago
|
Group: infra
Comment 15•13 years ago
|
||
(In reply to comment #13) > all systems upgraded beside stm-c01-master01, which is a database for a > whole lot of high profile sites (also a single point of failure). We'll > need to coordinate a downtime window for this bad boy. tm-c01-master01 (and tm-c01-slave01) were brought up to date this morning during an outage on tm-c01-slave01.
Comment 16•13 years ago
|
||
That was the last one on the list.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•