node2.testing.stage.metrics.scl3.mozilla.com:IPMI Log is CRITICAL: CRITICAL - 4 -- 08/07/2012 -- 19:20:07 -- Memory -- Uncorrectable ECC -
It also has a bad power supply: node2.testing.stage.metrics.scl3 ~]$ sudo ipmitool sel list 1 | 05/07/2012 | 23:30:41 | OEM #0x02 | 2 | 05/07/2012 | 23:30:44 | OEM #0x02 | 3 | 08/07/2012 | 19:04:57 | Power Supply #0x17 | Failure detected | Asserted 4 | 08/07/2012 | 19:20:07 | Memory | Uncorrectable ECC | Asserted
Transferring to DC Ops to do the RMA process. IX Systems ticket submission is online at http://support.ixsystems.com/index.php?_m=tickets&_a=submit in case that is helpful (though I imagine you already knew that).
Assignee: eziegenhorn → server-ops
Component: Server Operations → Server Operations: DCOps
QA Contact: jdow → dmoore
:ericz, is there any more information you can provide us? There's more than one p/s in the chassis and multiple DIMMs in the board. I've tried logging in the host and checking ipmitool but wasn't able to narrow it down. Hardware-wise, neither p/s have an amber light indicating hardware failure. Thanks, Van
No, it really doesn't give much information. I'd ask the vendor how to identify the bad DIMM. It's difficult in my experience. I'd also ask about the power supply because this keeps happening where it alerts as to being bad and then looks ok.
:ericz, the host is no longer showing the error messages. I'm not seeing any amber led indicating bad hardware. What do you suggest? [firstname.lastname@example.org ~]$ sudo ipmitool sel elist SEL has no entries [email@example.com ~]$ sudo ipmitool sel list SEL has no entries
spoke to ericz and we're closing this as a false alarm as the system event logs are cleared, we're not seeing any hardware amber lights, and the logs aren't showing any specific hardware component failing. we'll reopen if it alerts again.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.