Trying to kickstart this host takes a very long time, pointing to some sort of hardware issue. We've already tried replacing the cable and changing switch ports without effect. I asked hubear to run diagnostics on it to see if we can figure out what the issue is (network, disk, RAM, etc).
I went ahead and filed a ticket for iX Systems and described the problem above. As soon as I get a reply, I'll update the bug. They're pretty good at responding quickly. Ticket ID: DHG-111212
colo-trip: scl3 → ---
Whiteboard: [Waiting on response from iX Systems]
iX Systems suggested us trying this before we request for a replacement. Remove the node for at least 20 seconds and place it back into the chassis. If you continue to experience issues afterward, please let us know and we will promptly schedule a pick-up (for repair/replacement).
We reseated it a couple of times already, we had no luck with it.
colo-trip: --- → scl3
Whiteboard: [Waiting on response from iX Systems] → [Picked up by iX Systems for repair]
Host is back from iX systems with it's motherboard replaced. nic0 and mgmt mac addresses have been updated in inventory. Please kickstart host and close bug if issue is resolved.
Status: NEW → ASSIGNED
This machine still doesn't seem happy. I wonder if it's a bad disk, too? Or maybe something that's not seated properly?
:hubear, can you grab a drive from one of the new 10 iX systems that came in and swap it with this host? Set the suspect drive aside so we can RMA it if necessary.
Hard drive swapped, Please kickstart host and close bug if issue is resolved.
Now it doesn't even PXE boot. All I get after the initial power on screen is a blinking cursor.
I filed a ticket for iXsystems and let them know its still acting up.
:arr iX Systems suggested we try this: Blank screen usually means there is an invalid boot block on the boot device. Please try booting the system and while the BIOS screens are up press F12 repeatedly. This will instruct BIOS to PXE boot instead of following the standard boot path.
:ashlee, :hubear Can you two put a monitor on this host and perform some onsite debugging? We want to try reseating all the components and confirming we can actually reach the PXE boot screen before returning it to :arr. She can't do anything remotely until the host can reach that screen.
:dmoore The host is now on the PXE boot screen.
We'll try moving it to one of the chassis we received this week in order to troubleshoot the problem.
:arr, I swapped the location of talos-linux64-ix-057.test.releng.scl3.mozilla.com and talos-linux32-ix-100.test.releng.scl3.mozilla.com. Can you kickstart both and let me know of any issues?
This seems to be functioning normally now after some reseating.
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.