Closed
Bug 672973
Opened 13 years ago
Closed 13 years ago
iX hardware issues in scl1 post heatsink/fan/RAM modifications
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Assigned: zandr)
References
()
Details
(Whiteboard: last few tracked in bug 673972)
The following ix machines in scl1 still exhibit issues after the heatsink/fan/RAM modifications and should return to iX for diagnosis and repairs. linux-ix-slave01 linux-ix-slave06 linux-ix-slave33 linux64-ix-slave26 linux64-ix-slave37 w32-ix-slave41 w32-ix-slave23 is already at ix for hardware issues. The w64-ix machines have not been tested since we are waiting on the windows builder vlan configuration to be complete before imaging those machines. The following w32 hosts have also not been tested, since it has not been possible to break the boot cycle to get to the bios to change the boot order so that network is first (these will likely require crash carting): w32-ix-slave01 w32-ix-slave03 w32-ix-slave26 w32-ix-slave29 The included spreadsheet URL will be used to track status.
Assignee | ||
Updated•13 years ago
|
colo-trip: --- → scl1
Assignee | ||
Comment 1•13 years ago
|
||
(In reply to comment #0) > linux-ix-slave01 > linux-ix-slave06 > linux-ix-slave33 > linux64-ix-slave26 > linux64-ix-slave37 These machines have been pulled, stacked on the cart, and iX notified to pick them up. > w32-ix-slave41 I think this might be an entirely blank drive. If there isn't even a partition table, Deploystudio won't see it. I'll investigate this tomorrow. > w32-ix-slave01 > w32-ix-slave03 > w32-ix-slave26 > w32-ix-slave29 These four machines have been set to netboot in BIOS and I've kicked off reimaging them with the win32-ix-ref-110527 image. Spreadsheet updated.
Comment 3•13 years ago
|
||
Please add linux64-ix-slave21 to the list of machines to go to iX - it is hung with a machine check exception (4 bank 5 and lots of 0's)
Reporter | ||
Comment 4•13 years ago
|
||
Please add linux64-ix-slave36 to the list of machines to check for an initial partition table. It's also rebooting into ds without finding a disk.
Assignee | ||
Comment 5•13 years ago
|
||
w32-ix-slave41: wrote partition table, started imaging from w32-ix-ref-110527 linux64-ix-slave36: wrote partition table, started imaging from linux64-ix-ref-110527 linux64-ix-slave21: pulled and delivered to iX. I haven't put these updates in the spreadsheet yet.
Comment 6•13 years ago
|
||
(updates are in the spreadsheet now)
Reporter | ||
Comment 7•13 years ago
|
||
w32-ix-slave41 also needs to go back to iX. It errors with "These memory DIMMs are not supported" and then gets into a reboot loop if you hit F1 to continue.
Reporter | ||
Updated•13 years ago
|
Assignee: server-ops-releng → zandr
Comment 8•13 years ago
|
||
Here's the latest from iX on these machines: Asset # 4620 / A1-16072 - This system is currently under further diagnosis at the moment. 4625 / A1-16077 - This system had a bad board and we are now processing the RMA for replacement. 4764 / A1-16163 - This system is under further diagnosis as well however, we did discover a failed disk. The replacement has been made (with WD) and testing has resumed. 4799 / A1-16198 - This system had a failed disk as well and the drive has been replaced (with WD). 4810 / A1-16209 - We are currently diagnosing this box but are looking at one of the new modules as the culprit. Regardless, we will be confirming so by tomorrow I believe. 4794 / A1-16193 - This one was off the list but it appears the drive may be the culprit. Should have confirmation on this by tomorrow as well. 4617 / A1-16069 - This is the repeat offender we've had here 3 times now. We're certain at this point the board is the culprit and are waiting on the replacement to arrive. ETA is currently 7/26 We should get them back onsite early next week.
Assignee | ||
Comment 9•13 years ago
|
||
(In reply to comment #7) > w32-ix-slave41 also needs to go back to iX. It errors with "These memory > DIMMs are not supported" and then gets into a reboot loop if you hit F1 to > continue. Reseated memory and it seems to come up just fine.
Reporter | ||
Comment 10•13 years ago
|
||
w32-ix-slave41 still seems to be in a reboot loop. Please send it back to iX.
Assignee | ||
Comment 11•13 years ago
|
||
4620 4625 4764 4799 4810 4794 4617 (the machines from comment 8) are racked and powered
Reporter | ||
Comment 12•13 years ago
|
||
4625 linux-ix-slave06 is not responding on it's primary or ipmi interface.
Reporter | ||
Comment 13•13 years ago
|
||
4810 linux64-ix-slave37 is also unresponsive on both interfaces
Reporter | ||
Comment 14•13 years ago
|
||
linux64-ix-slave26 looks like it probably needs a partition table put on it before I can image it.
Reporter | ||
Comment 15•13 years ago
|
||
The following hosts are now reimaged and ready for postimage/puppetization: linux-ix-slave01-mgmt linux-ix-slave06-mgmt linux-ix-slave33-mgmt linux64-ix-slave21-mgmt linux64-ix-slave26-mgmt linux64-ix-slave37-mgmt w32-ix-slave23-mgmt
Assignee | ||
Comment 16•13 years ago
|
||
Beyond the reboot loop, w32-ix-slave41 is also complaining of incompatible dimms. Will pull for repair.
Comment 17•13 years ago
|
||
(In reply to Amy Rich [:arich] from comment #15) > The following hosts are now reimaged and ready for postimage/puppetization: > > linux-ix-slave01-mgmt > linux-ix-slave06-mgmt > linux-ix-slave33-mgmt > linux64-ix-slave21-mgmt > linux64-ix-slave26-mgmt > linux64-ix-slave37-mgmt > w32-ix-slave23-mgmt per irc w/catlee: these have been done as part of bug#673436.
Assignee | ||
Comment 18•13 years ago
|
||
w32-ix-slave41 (4705) pulled and will deliver to iX.
Assignee | ||
Comment 19•13 years ago
|
||
So, I had some time to kill waiting for a mini to image and played with 4705 a bit. Flipping to IDE fixed the reboot loop, and I wasn't able to reproduce the memory problem. So, w32-ix-slave41 is back in service, awaiting postimage.
Comment 20•13 years ago
|
||
I think this got skipped in yesterday's meeting with IT; neither bear, nor myself have notes on this. What is status?
Reporter | ||
Comment 21•13 years ago
|
||
Matt mentioned that there are one or two hosts that need to go back to iX.
Comment 22•13 years ago
|
||
(In reply to Amy Rich [:arich] from comment #21) > Matt mentioned that there are one or two hosts that need to go back to iX. Is this bug 673972?
Reporter | ||
Comment 23•13 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #22) > (In reply to Amy Rich [:arich] from comment #21) > > Matt mentioned that there are one or two hosts that need to go back to iX. > > Is this bug 673972? I believe so, yes.
Reporter | ||
Comment 24•13 years ago
|
||
The few machines that's are out for repair at IX are now being tracked in 673972, so this bug is redundant at this point. Resolving this on in favor of the new bug.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•