Closed Bug 673972 Opened 8 years ago Closed 8 years ago

returned iX hardware repairs

Categories

(Release Engineering :: General, defect)

x86
Windows Server 2003
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mlarrain, Unassigned)

References

()

Details

Per bug 666411 I tried to get it booted and found that the HDD is dead.
Needs to go back with iX next time they are onsite.
OS: Mac OS X → Windows Server 2003
zandr, can you contact IX about getting this serviced?
Assignee: server-ops → zandr
We have a few machines that need to go back to iX now. w64-ix-slave02(has had issues before hand with powering off after trying to image), w64-ix-slave41(Possible bad HDD unable to be imaged) and w32-ix-slave06(Bad HDD)
Summary: w32-ix-slave06 DOA → Dead iX machines
Assignee: zandr → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
Blocks: 670761
Blocks: 672973
Assignee: server-ops-releng → zandr
colo-trip: --- → scl1
List of machines with issues thus far is; 

linux64-ix-slave14, w64-ix-slave02, w64-ix-slave41 and w32-ix-slave06
Blocks: 678907
Alias: iX-repairs
Summary: Dead iX machines → iX hardware repairs
I added a note to all those hosts on slavealloc.
armen I was able to fix w64-ix-slave41 and it can now go back into the pool.
(In reply to Matthew Larrain[:digipengi] from comment #5)
> armen I was able to fix w64-ix-slave41 and it can now go back into the pool.

Thanks Matt. I will set it up in bug 684019.
linux64-ix-slave37 - SMART detects bad disk at boot SATA Port0 ST3250318AS
linux64-ix-slave16 - WARNING: These memory DIMMs are NOT supported!!!
Assignee: zandr → mlarrain
These are part of the pile of hardware on Matt's desk that needs to be repaired.  Zandr is introducing him to the iX rep today.
Duplicate of this bug: 678907
Matt, here are the details for these machines.

linux64-ix-slave14  A1-16186    4787    Machine Check Exception, CPU 2 (see bug 678907)
linux64-ix-slave16  A1-16188    4789    WARNING: These memory DIMMs are NOT supported!!!
linux64-ix-slave37  A1-16209    4810    SMART detects bad disk at boot SATA Port0 ST3250318AS
w64-ix-slave02      A1-16107    4708    has had issues before and with powering off after trying to image
w32-ix-slave06      A1-16052    4600    Hard drive is not recognizing

If the asset tags line up, can you send this data along to finney and see when they can pick them up?
Thanks you dustin for this I am checking the asset tags/serial #'s for verification and will get this passed along to Matt at iX tonight :)
colo-trip: scl1 → ---
Two more to add here, from bug 682574 and bug 684374.
Assignee: mlarrain → dustin
Duplicate of this bug: 682574
Duplicate of this bug: 684374
linux64-ix-slave14  A1-16186    4787    Machine Check Exception, CPU 2 (see bug 678907)
linux64-ix-slave16  A1-16188    4789    WARNING: These memory DIMMs are NOT supported!!!
linux64-ix-slave37  A1-16209    4810    SMART detects bad disk at boot SATA Port0 ST3250318AS
w64-ix-slave02      A1-16107    4708    has had issues before & powers off after imaging
w32-ix-slave06      A1-16052    4600    Hard drive is not recognizing
w32-ix-slave35      A1-16098    4699    WARNING: These Memory DIMMs are NOT Supported!!!
w32-ix-slave03      A1-16049	4597    disk disappeared even from BIOS, now BSOD'ing
email sent to Matt at iX.
w32-ix-slave12.build.mtv1 is not answering to ping, not even the management interface.  It needs some investigation and likely also needs to go out for repair.
All 6 of the machines will be picked up tomorrow from MTV1 at 3pm.
Status: NEW → ASSIGNED
There should be 7 or 8 (if you get to w32-ix-slave12) machines according to comment 15.
All machines have been given to iX for repair. Asset tags are as follows; 
4787 4789 4810 4708 4600 4699 4597 and 4606
Assignee: dustin → mlarrain
Copy Pasta from iX systems;

Here are the actions we took:

A1-16186 #4787 - Replaced CPU & system board
A1-16209 #4810 - Replaced HDD
A1-16052 #4600 - Replaced HDD
A1-16058 #4606 - Repaired bad sector on HDD
A1-16098 #4699 - No Trouble Found
A1-16049 #4597 - No Trouble Found
A1-16188 #4789 - No Trouble Found
A1-16107 #4708 - No Trouble Found

Be sure to check the BIOS settings before redeployment as they may have changed for our testing or in the case of A1-16186 the board was replaced and the settings were set to defaults.
I am verifying why we had marked those "No Trouble Found" machines and will message iX with questions.
w32-ix-slave35      A1-16098    4699    WARNING: These Memory DIMMs are NOT Supported!!!
w32-ix-slave03      A1-16049	4597    disk disappeared even from BIOS, now BSOD'ing
linux64-ix-slave16  A1-16188    4789    WARNING: These memory DIMMs are NOT supported!!!
w64-ix-slave02      A1-16107    4708    has had issues before & powers off after imaging

These are the issues those machines had. Will contact iX about it.
colo-trip: --- → scl1
I've seen other hosts occasionally have the "WARNING: These Memory DIMMs are NOT Supported!!!" error, and it goes away after a reboot.  

4597 may have just needed a thump in the head or a hard power cycle to cure its flakiness.  

I'm surprised about w64-ix-slave02      A1-16107    4708, though.  That seemed to be a true problem child.
Based on location, seamonkey should get w32-ix-slave06 (since it's in mtv) and three other machines.  I recommend w64-ix-slave02 (based on the current status of w64), and linux64-ix-slave37 and linux64-ix-slave16 (based on the thought that we can spare more linux64 then w32 machines).

Coop, do you have a preference?
I have contacted iX about the ones that didn't show an error and dustin has told me that the Memory thing does go away sometimes after reboot so I had asked iX before to do rapid reboots to try and flag the issue.(Yes I had sporatic errors as much as everyone else and can't expect them to care all to much if the boxes are showing green) I have also told iX to deliver the machines to MTV1 with my name on it and either Jake or I can get them racked and installed to wherever they need to be deployed upon return.
These machines are back at my desk. Assigned to Jake to take them to SCL1 to rerack them.
Assignee: mlarrain → jwatkins
(In reply to Amy Rich [:arich] [:arr] from comment #25)
> Based on location, seamonkey should get w32-ix-slave06 (since it's in mtv)
> and three other machines.  I recommend w64-ix-slave02 (based on the current
> status of w64), and linux64-ix-slave37 and linux64-ix-slave16 (based on the
> thought that we can spare more linux64 then w32 machines).
> 
> Coop, do you have a preference?

I don't have a particular preference here, but your choices make sense. Let's go with that.
This bug will be to re-rack/install:

linux64-ix-slave14  A1-16186    4787
w32-ix-slave35      A1-16098    4699
w32-ix-slave03      A1-16049	4597

Jake, can you rack these up in their previous locations, and make sure they come on and the mgmt interface is pingable?  They should then be updated in the repairs spreadsheet and moved to the repaired section.

Once that's done, please update this bug, and I'll do a fresh install on them.
Ah, I missed that we also sent w32-ix-slave12.build.mtv1 out for repair and got it back.  Since that one also lived in mtv, we'll send linux64-ix-slave16 back to scl1 and keep w32-ix-slave12 for seamonkey instead.
Jake got these four racked and IPMI pingable.

I've updated the spreadsheet.
Assignee: jwatkins → arich
linux64-ix-slave14 and linux64-ix-slave16 have been rebuilt with the linux64 image
w32-ix-slave03 and w32-ix-slave35 have been rebuilt with the w32 image from 2011-05 (see bug 683228).

They are ready for name changes and customizations.
Alias: iX-repairs
Assignee: arich → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Summary: iX hardware repairs → returned iX hardware repairs
Let me summarized what happened in this bug:
* To be put back into the pool by releng:
** linux64-ix-slave14
** linux64-ix-slave16
** w32-ix-slave03 (re-imaged from old snapshot) - worked on bug 683228
** w32-ix-slave35 (re-imaged from old snapshot) - worked on bug 683228

* Given to seamonkey
** w32-ix-slave06
** w32-ix-slave12
** w64-ix-slave02
** linux64-ix-slave37
** <strike>linux64-ix-slave16</strike> - see comment 30

From looking at comment 21 it seems that all 8 slaves are summarized in here.

> A1-16186 #4787 - Replaced CPU & system board - linux64-ix-slave14 
> A1-16209 #4810 - Replaced HDD                - linux64-ix-slave37
> A1-16052 #4600 - Replaced HDD                - w32-ix-slave06
> A1-16058 #4606 - Repaired bad sector on HDD  -
> A1-16098 #4699 - No Trouble Found            -
> A1-16049 #4597 - No Trouble Found            -
> A1-16188 #4789 - No Trouble Found            - linux64-ix-slave16
> A1-16107 #4708 - No Trouble Found            - w64-ix-slave02

NOTE: I took the comment and tried to match hostnames. If anyone wants the remaining hostnames I assume they are in the spreadsheet.
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #33)
> * Given to seamonkey
> ** w32-ix-slave06
> ** w32-ix-slave12
> ** w64-ix-slave02
> ** linux64-ix-slave37

Shall we file a separate bug for removing these 4 from DNS/nagios/inventory?
I would like to remove these hosts from releng configs.
DNS/nagios/inventory work filed as bug 703662
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.