Closed
Bug 1035897
Opened 11 years ago
Closed 11 years ago
node1.peach.metrics.scl3.mozilla.com crashed
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ericz, Unassigned)
References
Details
(Whiteboard: :DCOps)
node1.peach.metrics.scl3.mozilla.com crashed. We just upgraded the firmware so please open a case with HP. Details of the crash are in /var/crash/127.0.0.1-2014-07-08-05:41:58.
Updated•11 years ago
|
colo-trip: --- → scl3
Comment 1•11 years ago
|
||
HP states our system ROM is over a year old and is the probably the cause of the crash as AHS and IML didn't record the crashes. Please downtime and update ROM.
HP ProLiant System ROM 06/09/2013
[Tuesday, July 08, 2014 10:25 AM] -- Guru A says:
From the AHS report shared, I do not see the NMI error reported in the IML logs or at the hardware level. However we are facing this NMI errors reported by this server, hence I hope that the reproted errors is not recorded in AHS report.
[Tuesday, July 08, 2014 11:20 AM] -- Guru A says:
Okay, not to worry we have a fix for this issue with the latest firmware update.
[Tuesday, July 08, 2014 11:32 AM] -- Guru A says:
Van, the server is currently running with 2013 version of firmware and we need to update tit to the latest.
[Tuesday, July 08, 2014 11:45 AM] -- Guru A says:
Thank you for your time, Van. My apologies.
Please use the below FTP site to download the latest SPP utility.
SFTP Access : sftp -o Port=2222 spp1406@ftp.usa.hp.com
sftp -P 2222 spp1406@ftp.usa.hp.com
HTTPS Access: https://ftp.usa.hp.com/hprc
FTP Access : ftp://spp1406:dy}QB4dv@ftp.usa.hp.com
Login: spp1406
Password: dy}QB4dv (NOTE: CASE-sensitive)
Updated•11 years ago
|
Assignee: server-ops-dcops → server-ops
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
Reporter | ||
Comment 2•11 years ago
|
||
As per https://bugzilla.mozilla.org/show_bug.cgi?id=1019834#c8 we no longer have access to those ROMs. Can you tell HP to link our various HP accounts with our service paks so we can download it? Pir's would be most important as he's been kicking butt on firmware since 2013(TM). Also, if you want to yell at them for making firmware hard to get, that's cool.
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(vle)
Comment 3•11 years ago
|
||
:ericz, what account(s)? i dont use any account to log in or talk to HP. you just file an RMA and based on whether the system's warranty is still active, you can proceed on to the next screen which allows you to chat to a tech. i'm not sure i'm the right person to talk to for accounts as i've never had to have an account with them.
perhaps :dmoore/an SRE can advise. is this something we can discuss with Rich when we order servers? Is there a list of accounts that need to be tied to a service pack?
https://h50203.www5.hp.com/WCLWeb/WCLEntry.aspx
Flags: needinfo?(vle)
Updated•11 years ago
|
Whiteboard: :DCOps
Comment 4•11 years ago
|
||
node26.peach, with updated ROM, became unreachable again. Following messages were displayed on console around that time.
"""
[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
Uhhuh. NMI received for unknown reason 31 on CPU 0.
"""
--
Reporter | ||
Comment 5•11 years ago
|
||
Copied comment 4 about node26 to bug 1019301.
Comment 6•11 years ago
|
||
new case ID is 4648872629 & referenced old case ID 4648810838 with HP support.
HP says the error is OK to ignore. We just need to change a BIOS setting. After 2 hours on support with them, HP will send a tech on site tomorrow to replace the motherboard and CPU. They still suggest we change the BIOS settings to avoid getting the corrupted BIOS messages.
http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_ba847bafb2a2d782fcbb0710b053ce01=wsrp-navigationalState%3DdocId%253Demr_na-c03265132-2%257CdocLocale%253D%257CcalledBy%253D&javax.portlet.tpst=ba847bafb2a2d782fcbb0710b053ce01&ac.admitted=1405448909169.876444892.199480143
Comment 7•11 years ago
|
||
comment 6 was intended for node26 troubleshooting but might be applicable for node1.
Reporter | ||
Comment 8•11 years ago
|
||
node1 has been up for 44 days and I believe we've regained the ability to push out firmware upgrades, closing this.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•