Closed Bug 472183 Opened 16 years ago Closed 16 years ago

RAM problem on bm-vmware13

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
Linux
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: phong)

Details

(Whiteboard: HP case 3604400561)

Nagios said at 1:10 PST today
  [69] bm-vmware13.build:health is CRITICAL: CRITICAL - dimm module 6 @ cartridge 0 needs attention (dimm is degraded)
and 5 minutes later
  [72] bm-vmware13.build:hplog is WARNING: WARNING 0000: Corrected Memory Error threshold exceeded (System Memory, Memory Module 6)

Could we check what proportion of memory this affects asap ?
Assignee: server-ops → phong
HP Case ID 3604400561
Is the hardware (or ESX) handling this gracefully, or are we in a situation where we might be producing bogus builds and confusing developers ? Does the load-balancing automatically notice the memory restriction and transition VMs to other hosts ?
This is a hardware problem.  I have a replacement on order to replace the bad DIMM.  I will migrate all the VM's off that host and shutdown the ESX server.
Whiteboard: HP case 3604400561
Bad DIMM replaced and system is back to normal.  Putting ESX host back into the build pool.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Thanks for the quick turnaround.
Just got this again
[43] bm-vmware13.build:hplog is WARNING: WARNING 0000: Corrected Memory Error threshold exceeded (System Memory, Memory Module 6)

:-(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
That was an old alert that didn't get cleared.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.