Created attachment 635582 [details] Screenshot of purple screen Around 7:27PM PT, esx11.private.scl3 purple screened. Screenshot attached. Please look into the cause and possible fixes. I've reset the server and will put it in maintenance mode once it returns.
Case 12189923506 opened.
Status: NEW → ASSIGNED
From VMware: This morning at about the time of the psod in the CIM log i found this... ElementName = System Board 8 Memory: Uncorrectable ECC CurrentState = Deassert Caption = System Board 8 Memory: Uncorrectable ECC Opening case with HP, 4640687506
HP shipped a new 2G DIMM to SCL3 last thing out on Friday, arriving today/Monday. They didn't mention which one of the existing DIMMs was bad, so I can't really tell dcops to go do anything with it. I've gotten the case put back in the tech queue to try to tell us what to replace... since I have a gut-suspicion they didn't work the logs I sent.
The motherboard got replaced on 28 June. 1.5 passes of memtest came back clean. Going to do more burning over the weekend.
After the mobo swap, ran the box on a CPU burn for just under 3 days. Power dropped out on the server overnight on the 3rd day, no indications of a failure in the logs though. Booted to ESX, ran at full CPU and 75% memory for a day, no issues. Having no leads for diagnosing the power glitch, calling it. Returned to service.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
esx11 dropped out last night. iLO log (times UTC): 07/08/2012 10:09 Server power removed. Which parallels the dropout we had last week: 07/02/2012 03:30 Server power removed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Server was powered off and I left it off til this morning. Popped in a virtual CD, booted. It didn't boot the CD, but rather started straight into ESX. Powered off, and from there it wouldn't power up anymore. Opened case 4640958848 backreferencing other cases, and adding this new information.
van 13:49 hp tech says the machine is good now 13:49 can you confirm or do you need to confirm? gcox 13:49 OK, lemme peek. Is there a story on what he found? van 13:49 he says the system board just failed to the point where it wont even POST 13:50 he replaced the whole system board gcox 13:50 Nice. That was one they brought out just the other week. 13:52 So, if he's swapped the board, cool. Once I can iLO in we'll set it up to do some burn-in tests and make sure they're not bad.
Couple of loops of memtest, day's worth of CPU burn. Added back into circulation. THIS TIME FOR SURE.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago → 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.