Closed Bug 1030659 Opened 10 years ago Closed 10 years ago

please move console connection for seamicro-c1.r101-3.console.scl3.mozilla.com to inband.releng.scl3.mozilla.com

Categories

(Infrastructure & Operations :: DCOps, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: arich, Unassigned)

Details

We need the seamicro chassis dedicated to releng (seamicro-c1.r101-3.console.scl3.mozilla.com) to have it's management interface on the inband network instead of the console network so that ipmi commands can be routed to it from specific machines on the GlobalVPN and the machine that does slave reboots. Assuming that the r101-3 portion of the hostname is correct, the new hostname should be seamicro-c1.r101-3.inband.releng.scl3.mozilla.com. This is going to take a fair bit of coordination since we need to move the physical network connection, update inventory, and update the chassis itself in order to make things work. I'm nearly positive that this will require someone from dcops to be physically at the console to change the IP information as it gets moved to a new vlan, and i'm not sure how experienced you folks are with that. I'd like to try to plan this out ahead of time so that we have minimal downtime for the console interface. If this also requires physically moving the chassis to be in a "releng rack," this will need to happen during a tree closing window.
We need to make sure the seamicro is trunked to VLAN 216 (system-vlan 216 is already set in the config) The relevant bit of the config to change will be accomplished by logging in, and running: enable conf interface mgmteth ip address <new ip> ip default-gateway 10.26.16.1 exit exit write mem
colo-trip: --- → scl3
We do have these chassis on serial console, as well, so all necessary IP configuration changes can be accomplished that way. I would prefer to eventually move this chassis to the releng racks. This would be an opportunity to address the power redundancy concerns in Bug 905758.
Ah, excellent, are their directions on how to use the serial connection? I've never seen or used that on these machines. Derek: it sounds like we should aim for moving the entire chassis once we get the occupancy permit for the new pod? DO we have an ETA on that? Right now the number of false positive bugs being filed are a pain in the butt for both dcops and releng, I think (there's a different bug to try to address that). Hal, Coop, Callek: do we need a tree closure to move the chassis (64 nodes, split evenly between try and build), or do we have sufficient capacity that we can have a non-tree-closing window to operate in? I would assume the latter since we JUST added these machines, but I don't know how bad off we were before that.
Flags: needinfo?(hwine)
Flags: needinfo?(dmoore)
Flags: needinfo?(coop)
Flags: needinfo?(bugspam.Callek)
i prepped a cable to switch1.r102.console.scl3.mozilla.net:0/0/8 and tagged it with the inband vlan. let me know when you guys want to switch over.
gave the non-routable hostname in last comment, should be switch1.r102-8.ops.releng.scl3.mozilla.net:0/0/8
I'm suspecting this is ok for non-tree closure, but haven't looked closely. I'll let coop/hal decide for certain.
Flags: needinfo?(bugspam.Callek)
(In reply to Amy Rich [:arich] [:arr] from comment #3) > Hal, Coop, Callek: do we need a tree closure to move the chassis (64 nodes, > split evenly between try and build), or do we have sufficient capacity that > we can have a non-tree-closing window to operate in? I would assume the > latter since we JUST added these machines, but I don't know how bad off we > were before that. I'm fine to do this outside of a tree-closing window *PROVIDED* there's some coordination with buildduty. Buildduty will need to disable the sm nodes a few hours before the planned change, and then bring them up again quickly afterwards. We can deal without the capacity for a few hours, but a day or more will get dicey. Van: if you can coordinate with buildduty (happens to be me this week, or :pmoore next week), we can get this done on your schedule. As noted, we just need a few hours of notice in advance.
Flags: needinfo?(hwine)
Flags: needinfo?(coop)
I definitely want to wait till I'm back in a timezone that's going to overlap with dcops, just in case things go wrong. I think we'll try to shoot for sometime next week or the week after. I suspect one of the gating factors here will be whether or not we have the occupancy permit by then (if dmoore would like us to move the chassis).
pod 100 wont be ready by then. construction is set to being 7/9 and it'll take about 2 weeks for them to finish installing the cabinets and the cold air encapsulation unit. dcops will need ~2 days to complete the infrastructure cabling afterwards. :arr let me know when you're ready/back in the US and I can move that cable for you.
coop: we'd absolutely coordinate with buildduty and plan this out in advance with a timeline/schedule (like the scl1 moves) so that releng had time to disable jobs. dmoore/coop: do you want to wait till construction is finished and move it into the new pod so that we only need one outage? Unfortunately that timeframe is right when I'll be in the UK again, but we can work out some sort of window where we'll overlap.
Flags: needinfo?(coop)
(In reply to Amy Rich [:arich] [:arr] from comment #10) > dmoore/coop: do you want to wait till construction is finished and move it > into the new pod so that we only need one outage? Unfortunately that > timeframe is right when I'll be in the UK again, but we can work out some > sort of window where we'll overlap. I don't know what that timeframe is. Provided it's not too long (longer than 2 weeks), I'm fine with waiting.
Flags: needinfo?(coop)
The seamicro's haven't really proven adequate for performance and manageability, so we're likely not going to use them for production going forward.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(dmoore)
Resolution: --- → WONTFIX
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.