Power and rack 42 1u xi machines

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
8 years ago
3 years ago

People

(Reporter: joduinn, Assigned: jabba)

Tracking

Details

They're already ordered. Last I heard from mrz, these are due to arrive later this week. These would be racked in 3rd floor server room.

After talking with jabba, I'm filing this bug to track getting switches, power, etc setup in advance.
Assignee: server-ops → jdow
(Assignee)

Comment 1

8 years ago
Adding dmoore to CC. Derek, we'll need power and switches for 42 1u builders in Castro. Not sure what is needed for that, i.e. if a switch is already available or not, if not can we order one today?

I'll see how many patchcables and power cables we have.
I believe we've decided that 100Mbps uplinks are acceptable for the builders, which means we can use the spare 100Mbps switch from the 3rd floor server room.
Justin, any updates here? Do we know what day we're getting them yet?
Blocks: 588950
(Assignee)

Comment 4

8 years ago
The last I heard it will probably be Monday. We've ordered PDUs, power cables and patch cables, but still waiting for an ETA on the PDUs. The physical racking and cabling will probably take a few days, but we'll make sure and get as many as we can online as soon as they come in.
Thanks for the quick update. I'm planning to have the ref images ready to go tomorrow, so we'll be able to start imaging any time after that.

Comment 6

8 years ago
I had lunch with iX.  Machines are in their burn-in rack.  Some number less than 42 will start showing up Friday but give some time to rack and wire and image.
linux-ix-ref and win32-ix-ref are both ready for imaging. We'll be doing a split of 25 Windows machines and 17 Linux ones. The ref machines are currently on, but you're free to shut them down whenever you're ready to start. Please ping me if you have any questions.
(Assignee)

Comment 8

8 years ago
I need the hostnames that these will be named.
win32-ix-slave01 and up
linux-ix-slave01 and up
(Assignee)

Comment 10

8 years ago
Machines are racked and powered. Henry is installing patch cables right now. Bkero has deploystudio pulling the ref images right now. So far everything is looking promising. Barring any issues, they should start to come online tomorrow morning.

1 server was dead on arrival and I have a ticket open with IX for it.
(Assignee)

Comment 11

8 years ago
Status:

We ran into a few issues with the linux image due to GRUB not booting after deploying the image to the machines, so the linux hosts are not online yet. I think bkero has found a solution, so these should be up on Tuesday. In a worst case we'll have to manually install GRUB on them, but that should still be doable in a day. 

As for the Windows hosts:
All Windows hosts w32-ix-slave{01-25} are finished and online with the following exceptions:

w32-ix-slave08 had a faulty LED, which led me to believe the host was actually dead. IX picked it up this morning, fixed it and will bring it back next week sometime.

w32-ix-slave06 has a faulty hard drive. IX will pick up this machine when they drop off slave08.

w32-ix-slave23 has about 20 minutes left to go and then needs its hostname changed, but I need to leave now to go home... Will try to finish it up tonight from home.
(Assignee)

Comment 12

8 years ago
w32-ix-slave23 is now online and all Windows slaves (except for 06 and 08) have been added to Nagios.

Still remaining: all linux slaves and w32-ix-slave06 and 08.
(Assignee)

Comment 13

8 years ago
The linux slaves are all online and in Nagios. Only two remaining are w32-ix-slave06 and 08, which are pending hardware fixes from IX systems.

Comment 14

8 years ago
Linux servers are all online.  I turned off the IDE emulation mode(they defaulted to AHCI anyway).  Ran into an issue booting, they just sat at boot saying "GRUB GRUB".

I fixed this by open the the IPMI console, booted an Ubuntu ISO(Virtual Storage), booted a 'rescue environment', then ran "grub" with the commands "root (hd0,0)", and "setup (hd0)".

After that, I ran "dd if=/dev/sda of=mbr bs=512 count=63".

I scp'd the 'mbr' file onto the DeployStudio server(pxe1.build.mozilla.org) into /Volumes/Deploy/Masters/PC/ImageName/mbr.  That fixed booting on all the imaged servers, and I just did basic DeployStudio installs for all the linux nodes with no problem.
(Assignee)

Comment 15

8 years ago
w32-ix-slave08 has been returned from IX. I imaged it and it is online now and in nagios.

Still waiting for w32-ix-slave06.
(Assignee)

Comment 16

8 years ago
w32-ix-slave06 has been returned, imaged, is online and in nagios.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.