Closed
Bug 743828
Opened 13 years ago
Closed 13 years ago
migrate opsi hosts to scl3
Categories
(Infrastructure & Operations :: Virtualization, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Assigned: mburns)
References
Details
The opsi vms need to migrate from sjc1 to srv.releng.scl3.mozilla.com.
We'll need:
* new IPs
* appropriate flows
* releng to add/fix references to opsi's fqdn
* the virtualization team to do the actual migration
* likely a tree flosure or downtime in which to perform this work.
Reporter | ||
Comment 1•13 years ago
|
||
production-opsi.srv.releng.scl3.mozilla.com 10.26.48.38
staging-opsi.srv.releng.scl3.mozilla.com 10.26.48.39
Reporter | ||
Updated•13 years ago
|
Assignee: server-ops-releng → dustin
Comment 2•13 years ago
|
||
Heh, google tells me OPSI stands for "Overwhelming post-splenectomy infection". Sounds about right.
Windows systems use a registry key to tell them which master to hit:
https://wiki.mozilla.org/ReleaseEngineering/OPSI#Check_which_master
and it uses an IP. I imagine the easiest way to edit that will be via REG.EXE and cssh?
I think we can manage this without a tree closure as follows, assuming the vmware crew can wave some magic wands:
* snapshot or copy the existing hosts locally to sjc1 with limited downtime (a downtime for these hosts of, say, 30 minutes, will not significantly impact builds, as hosts will wait for them to come back up)
* bring them back up
* send the images to scl3 and set up the new machines at a leisurely pace
* bring up the new VMs
* start redirecting hosts to the new VMs
Dan, does that sound doable? When could we schedule this?
I don't think any new flows will be required, as the releng BU is currently any/any within itself and to the other build nets, right?
Comment 3•13 years ago
|
||
I don't know what this means:
"snapshot or copy the existing hosts locally to sjc1 with limited downtime"
Reporter | ||
Comment 4•13 years ago
|
||
Dustin: that should be correct about all all for the releng flows.
Comment 5•13 years ago
|
||
Per Dan, we can clone the hosts inside of a 30-minute downtime. We could do this later in the day tomorrow (Thursday), with coordination from Buildduty. No tree-closure is required, as we can do this with, at worst, a slight increase in wait time (as hosts stall at the trying-to-run-OPSI stage).
Dan, what does your schedule look like? OK, that's the wrong question - I'm pretty sure it looks like two hours of work for each hour from now until eternity.
When should we plan to do this, and who should I look for to work the controls in vCenter?
Assignee: dustin → server-ops
Component: Server Operations: RelEng → Server Operations: Virtualization
QA Contact: arich → dparsons
Comment 6•13 years ago
|
||
Oh yeah, and this is on the OMG EVAUCATE THIS WEEK list :(
Severity: normal → critical
Comment 7•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> Oh yeah, and this is on the OMG EVAUCATE THIS WEEK list :(
Incorrect - the chassis this is in is *not* moving on Monday, so there's time.
Severity: critical → major
Updated•13 years ago
|
Assignee: server-ops → phong
Comment 8•13 years ago
|
||
What do we need to do here? If you know what is needed, one of the SRE can help with the migration.
Assignee: phong → dparsons
Comment 9•13 years ago
|
||
Phong:
We need to pick a time to take these sjc1 hosts down, clone them, then bring them back up, and start the migration of the clones to scl3.
Comment 10•13 years ago
|
||
Phong - I will help coordinate with RelEng for any downtime required to make this happen. Please let me know what timeframe you are looking at for this.
Comment 11•13 years ago
|
||
The plan outlined by Dustin in comment #9 works for us - we just need to have the
down,clone,restart old,bring up new
done sooner than later so I can start the process of migrating the slaves to point to the new instance
Assignee | ||
Updated•13 years ago
|
Assignee: dparsons → mburns
Assignee | ||
Comment 12•13 years ago
|
||
staging-opsi downed
cloned (as staging-opsi-NEW
started the old staging-opsi
created https://inventory.mozilla.org/en-US/systems/show/6030/
Just need to migrate the to SCL3's esx cluster and bring it up.
Comment 13•13 years ago
|
||
do we have the new staging opsi spun up?
Comment 14•13 years ago
|
||
is the staging opsi vm running?
Comment 15•13 years ago
|
||
There's a new flow required to do the migration - that request is pending. I don't recall the bug #.
Assignee | ||
Comment 16•13 years ago
|
||
bug 746858 is tracking this unexpected missing flow.
Assignee | ||
Comment 17•13 years ago
|
||
staging-opsi.srv.releng.scl3.mozilla.com is migrated and online.
Reporter | ||
Comment 18•13 years ago
|
||
mburns: please coordinate with rail (buildduty) tomorrow to shut down, clone, and migrate the production opsi vm. The downtime starts at 9:00 pacific, but rail will give the all clear in #infra when work can commence.
Assignee | ||
Comment 19•13 years ago
|
||
looking forward to it.
Assignee | ||
Comment 20•13 years ago
|
||
[09:55:55] <mburns> bear: rail-buildduty: 64 bytes from production-opsi.srv.releng.scl3.mozilla.com (10.26.48.38): icmp_seq=1 ttl=61 time=4.90 ms :)
[09:56:05] <bear> \o/
[09:56:12] <rail-buildduty> whooo
[09:57:05] <bear> I can reach it and it looks like an opsi server
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•