node2.testing.stage.metrics.scl3 has a bad DIMM



Infrastructure & Operations
6 years ago
3 years ago


(Reporter: ericz, Unassigned)





6 years ago Log is CRITICAL: CRITICAL -    4 -- 08/07/2012 -- 19:20:07 -- Memory -- Uncorrectable ECC -

Comment 1

6 years ago
It also has a bad power supply:

node2.testing.stage.metrics.scl3 ~]$ sudo ipmitool sel list
   1 | 05/07/2012 | 23:30:41 | OEM #0x02 | 
   2 | 05/07/2012 | 23:30:44 | OEM #0x02 | 
   3 | 08/07/2012 | 19:04:57 | Power Supply #0x17 | Failure detected | Asserted
   4 | 08/07/2012 | 19:20:07 | Memory | Uncorrectable ECC | Asserted

Comment 2

6 years ago
Transferring to DC Ops to do the RMA process.  IX Systems ticket submission is online at in case that is helpful (though I imagine you already knew that).
Assignee: eziegenhorn → server-ops
Component: Server Operations → Server Operations: DCOps
QA Contact: jdow → dmoore


6 years ago
colo-trip: --- → scl3

Comment 3

6 years ago
:ericz, is there any more information you can provide us? There's more than one p/s in the chassis and multiple DIMMs in the board. I've tried logging in the host and checking ipmitool but wasn't able to narrow it down. Hardware-wise, neither p/s have an amber light indicating hardware failure.

Duplicate of this bug: 781938

Comment 5

6 years ago
No, it really doesn't give much information.  I'd ask the vendor how to identify the bad DIMM.  It's difficult in my experience.  I'd also ask about the power supply because this keeps happening where it alerts as to being bad and then looks ok.

Comment 6

6 years ago
:ericz, the host is no longer showing the error messages. I'm not seeing any amber led indicating bad hardware. What do you suggest?

[vle@node2.testing.stage.metrics.scl3 ~]$ sudo ipmitool sel elist
SEL has no entries
[vle@node2.testing.stage.metrics.scl3 ~]$ sudo ipmitool sel list
SEL has no entries

Comment 7

6 years ago
spoke to ericz and we're closing this as a false alarm as the system event logs are cleared, we're not seeing any hardware amber lights, and the logs aren't showing any specific hardware component failing. we'll reopen if it alerts again.
Last Resolved: 6 years ago
Resolution: --- → WONTFIX


5 years ago
Assignee: server-ops → server-ops-dcops
Product: → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.