HP Log on fuzzer-linux5.sec.scl3.mozilla.com is CRITICAL: CRITICAL 0016: POST Error: The system experienced an unexpected reboot. The Integrated Management Log (IML) may contain an entry indicating additional information about this reboot.

RESOLVED WONTFIX

Status

Infrastructure & Operations
MOC: Problems
RESOLVED WONTFIX
3 years ago
a year ago

People

(Reporter: MOC Nagios API, Unassigned)

Tracking

Details

(Whiteboard: [id=nagios1.private.scl3.mozilla.com:483370], URL)

(Reporter)

Description

3 years ago
Automated alert report from nagios1.private.scl3.mozilla.com:

Hostname: fuzzer-linux5.sec.scl3.mozilla.com
Service:  HP Log
State:    CRITICAL
Output:   CRITICAL 0016: POST Error: The system experienced an unexpected reboot. The Integrated Management Log (IML) may contain an entry indicating additional information about this reboot.

Runbook:  http://m.allizom.org/HP+Log

Comment 1

3 years ago
	15	 Critical	CPU	12/03/2014 16:42	12/03/2014 16:42	1	Uncorrectable Machine Check Exception (Board 0, Processor 2, APIC ID 0x00000041, Bank 0x00000004, Status 0xBA000080'00020C0F, Address 0x00000000'00000000, Misc 0xC0050FFF'01000000)
	14	 Critical	CPU	12/03/2014 16:42	12/03/2014 16:42	1	Uncorrectable Machine Check Exception (Board 0, Processor 2, APIC ID 0x00000040, Bank 0x00000004, Status 0xBA000080'00020C0F, Address 0x00000000'00000000, Misc 0xC0040FFE'01000000)
	13	 Critical	CPU	12/03/2014 16:42	12/03/2014 16:42	1	Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000021, Bank 0x00000004, Status 0xBA000080'00020C0F, Address 0x00000000'00000000, Misc 0xC0040FFE'01000000)
	12	 Critical	CPU	12/03/2014 16:42	12/03/2014 16:42	1	Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000020, Bank 0x00000004, Status 0xBA000080'00020C0F, Address 0x00000000'00000000, Misc 0xC0040FFE'01000000)
Assignee: nobody → server-ops-dcops
Component: MOC: Incidents → DCOps

Updated

3 years ago
colo-trip: --- → scl3

Updated

3 years ago
Duplicate of this bug: 1107095

Comment 3

3 years ago
HP Case ID is 4650088321


[Wednesday, December 03, 2014 2:54 PM] -- Venkatesh A says:
I understand the server is functioning with the older version of the BIOS and firmware, which may be a cause for the alert

[Wednesday, December 03, 2014 2:55 PM] -- Venkatesh A says:
Hence may i know if we shall update the BIOS update on the server and then monitor the same?

[Wednesday, December 03, 2014 2:56 PM] -- Vinh Hua says:
What's the latest version?

[Wednesday, December 03, 2014 2:56 PM] -- Venkatesh A says:
I shall share you the version details now

[Wednesday, December 03, 2014 2:57 PM] -- Venkatesh A says:
May I know the version of the operating system installed on the server ?

[Wednesday, December 03, 2014 2:58 PM] -- Vinh Hua says:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.5 LTS"
NAME="Ubuntu"
VERSION="12.04.5 LTS, Precise Pangolin"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu precise (12.04.5 LTS)"
VERSION_ID="12.04"

[Wednesday, December 03, 2014 2:59 PM] -- Venkatesh A says:
Okay, Vinh, I see the latest version of the BIOS update available for the server is :

Version - 2014.09.03 whose release date is 6 Oct 2014

[Wednesday, December 03, 2014 2:59 PM] -- Venkatesh A says:
However you may use the HP service pack for proliant Utilty on the server to update the latest firmware and BIOS together on the server

Comment 4

3 years ago
MOC - Can someone update the BIOS firmware on this host?  I've dropped the file "SP68853.exe" into /tmp folder on admin1.scl3.mozilla.com.
Assignee: server-ops-dcops → nobody
Component: DCOps → MOC: Incidents
:vinh: BIOS update done (and A28 updated in puppet). Machine rebooted.

Several of these fuzzer machines have puppet broken and since they're Ubuntu they're poorly supported so updates don't often happen :/

Resolving this for now, we can reopen and bug HP if (when) it happens again.
Mass resolve. Clearing out this component.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
(Assignee)

Updated

a year ago
Component: MOC: Incidents → MOC: Problems
Product: Infrastructure & Operations → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.