Closed Bug 1035897 Opened 11 years ago Closed 11 years ago

node1.peach.metrics.scl3.mozilla.com crashed

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ericz, Unassigned)

References

Details

(Whiteboard: :DCOps)

node1.peach.metrics.scl3.mozilla.com crashed. We just upgraded the firmware so please open a case with HP. Details of the crash are in /var/crash/127.0.0.1-2014-07-08-05:41:58.
colo-trip: --- → scl3
HP states our system ROM is over a year old and is the probably the cause of the crash as AHS and IML didn't record the crashes. Please downtime and update ROM. HP ProLiant System ROM 06/09/2013 [Tuesday, July 08, 2014 10:25 AM] -- Guru A says: From the AHS report shared, I do not see the NMI error reported in the IML logs or at the hardware level. However we are facing this NMI errors reported by this server, hence I hope that the reproted errors is not recorded in AHS report. [Tuesday, July 08, 2014 11:20 AM] -- Guru A says: Okay, not to worry we have a fix for this issue with the latest firmware update. [Tuesday, July 08, 2014 11:32 AM] -- Guru A says: Van, the server is currently running with 2013 version of firmware and we need to update tit to the latest. [Tuesday, July 08, 2014 11:45 AM] -- Guru A says: Thank you for your time, Van. My apologies. Please use the below FTP site to download the latest SPP utility. SFTP Access : sftp -o Port=2222 spp1406@ftp.usa.hp.com sftp -P 2222 spp1406@ftp.usa.hp.com HTTPS Access: https://ftp.usa.hp.com/hprc FTP Access : ftp://spp1406:dy}QB4dv@ftp.usa.hp.com Login: spp1406 Password: dy}QB4dv (NOTE: CASE-sensitive)
Assignee: server-ops-dcops → server-ops
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
As per https://bugzilla.mozilla.org/show_bug.cgi?id=1019834#c8 we no longer have access to those ROMs. Can you tell HP to link our various HP accounts with our service paks so we can download it? Pir's would be most important as he's been kicking butt on firmware since 2013(TM). Also, if you want to yell at them for making firmware hard to get, that's cool.
Flags: needinfo?(vle)
:ericz, what account(s)? i dont use any account to log in or talk to HP. you just file an RMA and based on whether the system's warranty is still active, you can proceed on to the next screen which allows you to chat to a tech. i'm not sure i'm the right person to talk to for accounts as i've never had to have an account with them. perhaps :dmoore/an SRE can advise. is this something we can discuss with Rich when we order servers? Is there a list of accounts that need to be tied to a service pack? https://h50203.www5.hp.com/WCLWeb/WCLEntry.aspx
Flags: needinfo?(vle)
Whiteboard: :DCOps
node26.peach, with updated ROM, became unreachable again. Following messages were displayed on console around that time. """ [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076) Uhhuh. NMI received for unknown reason 31 on CPU 0. """ --
Copied comment 4 about node26 to bug 1019301.
new case ID is 4648872629 & referenced old case ID 4648810838 with HP support. HP says the error is OK to ignore. We just need to change a BIOS setting. After 2 hours on support with them, HP will send a tech on site tomorrow to replace the motherboard and CPU. They still suggest we change the BIOS settings to avoid getting the corrupted BIOS messages. http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_ba847bafb2a2d782fcbb0710b053ce01=wsrp-navigationalState%3DdocId%253Demr_na-c03265132-2%257CdocLocale%253D%257CcalledBy%253D&javax.portlet.tpst=ba847bafb2a2d782fcbb0710b053ce01&ac.admitted=1405448909169.876444892.199480143
comment 6 was intended for node26 troubleshooting but might be applicable for node1.
node1 has been up for 44 days and I believe we've regained the ability to push out firmware upgrades, closing this.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.