Closed Bug 673972 Opened 8 years ago Closed 8 years ago
X hardware repairs
Per bug 666411 I tried to get it booted and found that the HDD is dead. Needs to go back with iX next time they are onsite.
zandr, can you contact IX about getting this serviced?
Assignee: server-ops → zandr
We have a few machines that need to go back to iX now. w64-ix-slave02(has had issues before hand with powering off after trying to image), w64-ix-slave41(Possible bad HDD unable to be imaged) and w32-ix-slave06(Bad HDD)
Summary: w32-ix-slave06 DOA → Dead iX machines
Assignee: zandr → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
Assignee: server-ops-releng → zandr
colo-trip: --- → scl1
List of machines with issues thus far is; linux64-ix-slave14, w64-ix-slave02, w64-ix-slave41 and w32-ix-slave06
Summary: Dead iX machines → iX hardware repairs
I added a note to all those hosts on slavealloc.
armen I was able to fix w64-ix-slave41 and it can now go back into the pool.
(In reply to Matthew Larrain[:digipengi] from comment #5) > armen I was able to fix w64-ix-slave41 and it can now go back into the pool. Thanks Matt. I will set it up in bug 684019.
linux64-ix-slave37 - SMART detects bad disk at boot SATA Port0 ST3250318AS linux64-ix-slave16 - WARNING: These memory DIMMs are NOT supported!!!
These are part of the pile of hardware on Matt's desk that needs to be repaired. Zandr is introducing him to the iX rep today.
Matt, here are the details for these machines. linux64-ix-slave14 A1-16186 4787 Machine Check Exception, CPU 2 (see bug 678907) linux64-ix-slave16 A1-16188 4789 WARNING: These memory DIMMs are NOT supported!!! linux64-ix-slave37 A1-16209 4810 SMART detects bad disk at boot SATA Port0 ST3250318AS w64-ix-slave02 A1-16107 4708 has had issues before and with powering off after trying to image w32-ix-slave06 A1-16052 4600 Hard drive is not recognizing If the asset tags line up, can you send this data along to finney and see when they can pick them up?
Thanks you dustin for this I am checking the asset tags/serial #'s for verification and will get this passed along to Matt at iX tonight :)
linux64-ix-slave14 A1-16186 4787 Machine Check Exception, CPU 2 (see bug 678907) linux64-ix-slave16 A1-16188 4789 WARNING: These memory DIMMs are NOT supported!!! linux64-ix-slave37 A1-16209 4810 SMART detects bad disk at boot SATA Port0 ST3250318AS w64-ix-slave02 A1-16107 4708 has had issues before & powers off after imaging w32-ix-slave06 A1-16052 4600 Hard drive is not recognizing w32-ix-slave35 A1-16098 4699 WARNING: These Memory DIMMs are NOT Supported!!! w32-ix-slave03 A1-16049 4597 disk disappeared even from BIOS, now BSOD'ing
email sent to Matt at iX.
w32-ix-slave12.build.mtv1 is not answering to ping, not even the management interface. It needs some investigation and likely also needs to go out for repair.
All 6 of the machines will be picked up tomorrow from MTV1 at 3pm.
Status: NEW → ASSIGNED
There should be 7 or 8 (if you get to w32-ix-slave12) machines according to comment 15.
All machines have been given to iX for repair. Asset tags are as follows; 4787 4789 4810 4708 4600 4699 4597 and 4606
Copy Pasta from iX systems; Here are the actions we took: A1-16186 #4787 - Replaced CPU & system board A1-16209 #4810 - Replaced HDD A1-16052 #4600 - Replaced HDD A1-16058 #4606 - Repaired bad sector on HDD A1-16098 #4699 - No Trouble Found A1-16049 #4597 - No Trouble Found A1-16188 #4789 - No Trouble Found A1-16107 #4708 - No Trouble Found Be sure to check the BIOS settings before redeployment as they may have changed for our testing or in the case of A1-16186 the board was replaced and the settings were set to defaults.
I am verifying why we had marked those "No Trouble Found" machines and will message iX with questions.
w32-ix-slave35 A1-16098 4699 WARNING: These Memory DIMMs are NOT Supported!!! w32-ix-slave03 A1-16049 4597 disk disappeared even from BIOS, now BSOD'ing linux64-ix-slave16 A1-16188 4789 WARNING: These memory DIMMs are NOT supported!!! w64-ix-slave02 A1-16107 4708 has had issues before & powers off after imaging These are the issues those machines had. Will contact iX about it.
I've seen other hosts occasionally have the "WARNING: These Memory DIMMs are NOT Supported!!!" error, and it goes away after a reboot. 4597 may have just needed a thump in the head or a hard power cycle to cure its flakiness. I'm surprised about w64-ix-slave02 A1-16107 4708, though. That seemed to be a true problem child.
Based on location, seamonkey should get w32-ix-slave06 (since it's in mtv) and three other machines. I recommend w64-ix-slave02 (based on the current status of w64), and linux64-ix-slave37 and linux64-ix-slave16 (based on the thought that we can spare more linux64 then w32 machines). Coop, do you have a preference?
I have contacted iX about the ones that didn't show an error and dustin has told me that the Memory thing does go away sometimes after reboot so I had asked iX before to do rapid reboots to try and flag the issue.(Yes I had sporatic errors as much as everyone else and can't expect them to care all to much if the boxes are showing green) I have also told iX to deliver the machines to MTV1 with my name on it and either Jake or I can get them racked and installed to wherever they need to be deployed upon return.
These machines are back at my desk. Assigned to Jake to take them to SCL1 to rerack them.
(In reply to Amy Rich [:arich] [:arr] from comment #25) > Based on location, seamonkey should get w32-ix-slave06 (since it's in mtv) > and three other machines. I recommend w64-ix-slave02 (based on the current > status of w64), and linux64-ix-slave37 and linux64-ix-slave16 (based on the > thought that we can spare more linux64 then w32 machines). > > Coop, do you have a preference? I don't have a particular preference here, but your choices make sense. Let's go with that.
This bug will be to re-rack/install: linux64-ix-slave14 A1-16186 4787 w32-ix-slave35 A1-16098 4699 w32-ix-slave03 A1-16049 4597 Jake, can you rack these up in their previous locations, and make sure they come on and the mgmt interface is pingable? They should then be updated in the repairs spreadsheet and moved to the repaired section. Once that's done, please update this bug, and I'll do a fresh install on them.
Ah, I missed that we also sent w32-ix-slave12.build.mtv1 out for repair and got it back. Since that one also lived in mtv, we'll send linux64-ix-slave16 back to scl1 and keep w32-ix-slave12 for seamonkey instead.
Jake got these four racked and IPMI pingable. I've updated the spreadsheet.
Assignee: jwatkins → arich
linux64-ix-slave14 and linux64-ix-slave16 have been rebuilt with the linux64 image w32-ix-slave03 and w32-ix-slave35 have been rebuilt with the w32 image from 2011-05 (see bug 683228). They are ready for name changes and customizations.
Assignee: arich → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Summary: iX hardware repairs → returned iX hardware repairs
Let me summarized what happened in this bug: * To be put back into the pool by releng: ** linux64-ix-slave14 ** linux64-ix-slave16 ** w32-ix-slave03 (re-imaged from old snapshot) - worked on bug 683228 ** w32-ix-slave35 (re-imaged from old snapshot) - worked on bug 683228 * Given to seamonkey ** w32-ix-slave06 ** w32-ix-slave12 ** w64-ix-slave02 ** linux64-ix-slave37 ** <strike>linux64-ix-slave16</strike> - see comment 30 From looking at comment 21 it seems that all 8 slaves are summarized in here. > A1-16186 #4787 - Replaced CPU & system board - linux64-ix-slave14 > A1-16209 #4810 - Replaced HDD - linux64-ix-slave37 > A1-16052 #4600 - Replaced HDD - w32-ix-slave06 > A1-16058 #4606 - Repaired bad sector on HDD - > A1-16098 #4699 - No Trouble Found - > A1-16049 #4597 - No Trouble Found - > A1-16188 #4789 - No Trouble Found - linux64-ix-slave16 > A1-16107 #4708 - No Trouble Found - w64-ix-slave02 NOTE: I took the comment and tried to match hostnames. If anyone wants the remaining hostnames I assume they are in the spreadsheet.
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #33) > * Given to seamonkey > ** w32-ix-slave06 > ** w32-ix-slave12 > ** w64-ix-slave02 > ** linux64-ix-slave37 Shall we file a separate bug for removing these 4 from DNS/nagios/inventory? I would like to remove these hosts from releng configs.
DNS/nagios/inventory work filed as bug 703662
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.