Closed Bug 791861 Opened 12 years ago Closed 12 years ago

coordinate move of tegras from MTV-4

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: hwine)

References

Details

(Whiteboard: [reit-tegra])

Attachments

(1 file)

We're going to move the tegras from the 4th floor (on dcops desktops) to their new space without a tree closure. This may require moving tegras in batches, each batch will be defined here.

If the work in bug 791854 is completed prior to Wed Sep 19 noon PT, we will be able to move all the tegras at once on or after Thur Sep 20, greatly simplifying things. If not, we'll need to do in batches.

Each batch requires about 4 hours to move:
 - 2 hours to take out of service (releng)
 - 1-1/2 hours for physical move (dcops)
 - 1/2 to put back into service (releng)

Current proposed time for the first (and hopefully only) move is Thursday, 20 Sep, morning. (ET releng folks will take units out of service so that PT dcops folks can do as first task of the day.)
:hwine, We'd like to move all the tegras at once if possible. We have a cart on the 4th floor that will house the tegras. Once the tegras are taken out of rotation,

dcops will:
1) disconnect all cables and move existing switches to cart and secure
2) move all tegra boards and pdus to cart and secure
3) move pdus/cdus to cart and secure
3) roll cart down to allocated space in 3-mdf
4) attach network, power, and pre-run uplinks to switches and power on devices

Please let me know if that will work for you. We will probably use up the full 1.5 hours (please allocate more time if possible although I hope we don't take that long)  to make sure the Tegras aren't touching any metal/each other and any last minute troubleshooting (since we know these boot up perfectly everytime).

Regards,
Van
Van - yeah, if bug 791854 closes on time is what will determine one batch or two. Believe me, we'd prefer one as well.

Does Thursday morning Sep 20 work for you? If so, what's your expected in-the-office time? (so we can have all the units offline by then).

And, yes, we might run over on getting all of the tegras back up - my hope is that enough will come up "okay" that we don't take a big hit on availability to developers. If that looks risky due to the nature of tegras & the rack solution, then we should split to two batches for safety.
I just had a thought I want to be extra explicit on. If any PDU hostnames/ports/etc change during this move for powercycling the tegras. Please be explicit which tegras were changed, since it will need [relatively simple] code-changes on our side. And for you guys to ensure inventory is correct (if it changes)
Hal/Callek,

tegra-[319-350] reside on switch1.dcops.ops.mtv1 and pdu1.dcops.build.mtv1
tegra-[351-370] reside on switch2.dcops.ops.mtv1 and pdu2.dcops.build.mtv1

If we cant move them all at the same time, we'll probably need to move the in 2 batches because we're going to need to use the same pdus and switches. Hopefully this works.

thanks,
Van
I forgot these pdus aren't daisy chained.

tegra-[319-350] reside on switch1.dcops.ops.mtv1 and pdu[1-2].dcops.build.mtv1
tegra-[351-370] reside on switch2.dcops.ops.mtv1 and pdu[3-4].dcops.build.mtv1
We do have spare switches and PDUs if we need to set up in advance to accommodate smaller batches, but that will mean changing switch and PDU information in inventory as part of the move.
(In reply to Hal Wine [:hwine] from comment #2)
> Van - yeah, if bug 791854 closes on time is what will determine one batch or
> two. Believe me, we'd prefer one as well.

Bug 791854 is done, so it will be suiteable as one-batch.

> Does Thursday morning Sep 20 work for you? If so, what's your expected
> in-the-office time? (so we can have all the units offline by then).

This is what works best for releng, but we can flex if need be. Can we get a signoff on this timeslot, and a specific "when do you expect to start working on this/being in the office" so we can plan our end appropriately?

> And, yes, we might run over on getting all of the tegras back up - my hope
> is that enough will come up "okay" that we don't take a big hit on
> availability to developers. If that looks risky due to the nature of tegras
> & the rack solution, then we should split to two batches for safety.

If we expect to be able to get this move accomplished in 4 hours absolute max of DCOps time, I think we can safely handle a single batch. We'll plan for a 1.5->2 hours of DCOps time though, and only expand if its an absolute necessity? How does that sound?
Callek,

Spoke to hwine offline regarding the time slot and decided we're shooting to start moving the tegras at 1030am. Let me know if that works for you.

1.5-2 hours for dcops sounds good to me.

Thanks,
Van
Van & Callek - sounds like we have a plan - I'll leave it in your capable hands! Holler if you need any assistance.
(In reply to Van Le [:van] from comment #8)
> Callek,
> 
> Spoke to hwine offline regarding the time slot and decided we're shooting to
> start moving the tegras at 1030am. Let me know if that works for you.

SGTM, I'll assume 10:30am is PT unless explicitly said otherwise here ;-)
Here's the list of foopies <-> tegras taken from tegras.json this morning for all impacted tegras (for reference)
Assignee: nobody → hwine
Status: NEW → ASSIGNED
Starting graceful shutdown of tegras
Graceful shutdown initiated for all foopy client proxies of production foopies via links on http://mobile-dashboard1.build.mtv1.mozilla.com/tegras/

Unassigning - someone else will pick up next step in shutdown
Status: ASSIGNED → NEW
We have moved the tegras and they're all pingable. Please let me know if there are any issues.

Thanks,
Van
All tegras back into production (at least as they were prior to move, some recovery might be needed for some, but no more than would have been needed if this did not happen -- calling this done)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 793994
Depends on: 791717
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: