Closed Bug 698603 Opened 14 years ago Closed 14 years ago

please image 70 more talos-r4-lion machines

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhford, Assigned: mlarrain)

References

Details

Please image 70 machines using the same reference image used to create talos-r4-lion-001 through 010. Please do the following: -Deploy the talos-r4-lion-ref image used for 001-010 and name the machines: -talos-r4-lion-011.build.scl1.mozilla.com through -talos-r4-lion-080.build.scl1.mozilla.com -add CNAMES to .build.mozilla.org for the above addresses -add the above hosts to nagios, configured with the same checks as other talos-r4-lion machines. marking as critical so this gets triaged soon.
Depends on: 696507
we've already discussed the dongle requirements, but restating for clarity. These machines need to have a dongle installed to get a workable resolution
Matt will work on this tomorrow after he meets our new consultant at the office and gets him set up with his badge and laptop.
Assignee: server-ops-releng → mlarrain
colo-trip: --- → scl1
DNS and DCHP are already done. They already have dongles. Here's the layout (from the front). If you can label these with the hostnames (including "talos-r4-", omitted below for brevity), that will avoid confusion down the road. r102-1: 25: lion-025 lion-026 ... ...... 13: lion-001 lion-002 r102-2: 25: lion-071 lion-072 ... ...... 13: lion-027 lion-028 r102-3: 36: lion-075 lion-076 35: lion-073 lion-074 r102-4: 24: lion-079 lion-080 23: lion-077 lion-078 Note that lion-testing3 is becoming lion-075 and lion-ref is becoming lion-076.
Depends on: 671415
Sorry, this should read: r102-2: 37: unused 36: unused 35: lion-071 lion-072 ... ...... 13: lion-027 lion-028
I see that most of these machines are imaged. There are a couple issues that I am seeing: 009, 053, 054 are all not recognizing their dongles 017, 018, 029, 031, 035, 037, 044, 045, 070 are not responding to ping. What state are these machines in? If they have been imaged, please let me know what state they are in -- I want to make sure they aren't going to sleep or something like that. Thanks!
All were imaged except for the following which failed to netboot: 029, 031, 035, 037, 045, 070 073 and 078 showed up in DS as snow-046 and snow-001 respectively. All of these will need to be looked at on a crash cart plus 017, 018 and 044.
073 and 078 had been booted previously and assigned different hostnames at that time. I have reinstalled these two.
I power cycled 044 and it's up now. I power cycled 017, and it did not come back up and will require crash carting. So that leaves the following machines which I can't do anything with remotely: 018 (which probably needs a power cycle) 029, 031, 035, 037, 045, 070 (need to be diagnosed and netbooted) 009, 053, 054 are all not recognizing their dongles
017 and 018 were actually just completely turned off, powered them on and able to ping 029 network cable wasn't fully connected, installing 031 Power button on the chassis sucks powered on from behind and is installing 035 Powered it on and installing Will continue tomorrow with 037, 045, 070, 009, 053, 054
Since we're down to only two hosts that still need to be imaged, I've added all the slaves to nagios.
037 didn't have it's cat5e plugged in, imaging now 045 DOA will need to go out for repair 070 connected and netboot, imaging now 009 Reseated dongle and now showing proper display size 053 Didn't even have a dongle added one 054 Reseated dongle and now showing proper display size
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.