Closed Bug 491767 Opened 15 years ago Closed 15 years ago

Migrate some VM off equallogic before the firmware upgrade, then move them back

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: phong)

Details

(Whiteboard: needs-scheduling for moving them back)

From meetings and email threads this week, it looks like the following list of VMs need to be moved before next week's firmware upgrade on equal-logic arrays.

production-master 
try-master
try-win32-slave01
try-win32-slave02
try-win32-slave03
 

If it turns out there is space to migrate a couple of additional try slaves, let us know and we'll add them to the list. These additional slaves would make easier for sheriffs on the day, but are "nice to have", not "must have".
I thought that people were OK with no try server for the four hours of the eql outage.
VMs with storage requirements:

production-master      60GB
try-master             30GB
try-win32-slave01      60GB
try-win32-slave02      60GB
try-win32-slave03      60GB

geodns01               10GB
pm-ns03                10GB

About 300GB needed and my quick spot check looks like there's enough NetApp space to handle this.
(In reply to comment #1)
> I thought that people were OK with no try server for the four hours of the eql
> outage.

Yep, but when I saw that moving just a few VMs would give us some (tiny) TryServer, and make a big difference to the sheriffs that day, it seemed worth trying for (bad pun, sorry). 

If we can get any TryServer going thats great. Worst case, we'll go ahead with the downtime taking down TryServer. Make sense?
(In reply to comment #4)
> From meetings and email threads this week, it looks like the following list of
> VMs need to be moved before next week's firmware upgrade on equal-logic arrays.
> 
> production-master 
> try-master
> try-win32-slave01
> try-win32-slave02
> try-win32-slave03
> 
mrz: is there room to migrate these also?

graphs.m.o
build.m.o
anonymous hg.m.o
graphs.m.o -> dm-graphs01   30GB


build.m.o -> dm-wwwbuild01  80GB

anonymous hg.m.o 
* dm-vcview01   10GB
* dm-vcview02   10GB


So another 130GB.  That might be tight (the build NetApp has more space than the IT ones).  I'll see what we can do.
 
> anonymous hg.m.o 
> * dm-vcview01   10GB
> * dm-vcview02   10GB


Not needed to move - dm-vcview03 is physical hardware and can handle load during the window.
(In reply to comment #2)
> VMs with storage requirements:
> 
> production-master      60GB
> try-master             30GB
> try-win32-slave01      60GB
> try-win32-slave02      60GB
> try-win32-slave03      60GB

Most of these look like they were migrated off in advance of downtime. However, "try-master" is still on eq01-bm01... can you migrate that also?
Assignee: server-ops → phong
it looks like only the 10GB drive needs to be migrated off.  this should be a quick migration.
(In reply to comment #8)
> it looks like only the 10GB drive needs to be migrated off.  this should be a
> quick migration.

This turned out to hit an error (got any more deets Phong ?) and the disks ended up back on eql-bm01. The buildbot process then went a bit nuts CPU-load-wise, so I took the opportunity to shutdown the machine and migrate the storage to d-sata-build-003. Two try runs that were interrupted, and two that occurred during the migration, were resubmitted.

All done here ?
Probably but I want to keep this around for tracking to eventually move the VMs back.  We grabbed a lot of temporary SAN space.
Whiteboard: post Fx3.5
Need to plan on moving this back...
Whiteboard: post Fx3.5
Whiteboard: needs-scheduling
(In reply to comment #11)
> Need to plan on moving this back...

Probably too late to include in our downtime tmrw (Thurs) morning. Could we do this in another downtime maybe next week?
RelEng, please let me know when we can schedule this and toss back when it can get in the windows.
Assignee: phong → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Re-assigning to joduinn for scheduling.
Assignee: nobody → joduinn
Summary: Migrate some VM off equallogic before the firmware upgrade → Migrate some VM off equallogic before the firmware upgrade, then move them back
Whiteboard: needs-scheduling → needs-scheduling for moving them back
We've got a downtime tomorrow and I was to do this during it. It's hard to decode exactly what needs to be done, though. Which VMs need to be moved? Which datastores do I move them to?
Assignee: joduinn → bhearsum
Status: NEW → ASSIGNED
production-master      60GB
try-master             30GB
try-win32-slave01      60GB
try-win32-slave02      60GB
try-win32-slave03      60GB

499G  224G  275G  44% /vmfs/volumes/eql02-bm03
499G  306G  193G  61% /vmfs/volumes/eql01-bm12
Those 2 datastores have some free space.

on the IT Cluster:
graphs.m.o -> dm-graphs01   30GB
build.m.o -> dm-wwwbuild01  80GB
(In reply to comment #16)
> production-master      60GB
> try-master             30GB
> try-win32-slave01      60GB
> try-win32-slave02      60GB
> try-win32-slave03      60GB
> 

I migrated all of these in the downtime today.
Phong, is there anything else to do here?
Assignee: bhearsum → joduinn
When can we migrate the other 2?

on the IT Cluster:
graphs.m.o -> dm-graphs01   30GB
build.m.o -> dm-wwwbuild01  80GB
(In reply to comment #19)
> When can we migrate the other 2?
> 
> on the IT Cluster:
> graphs.m.o -> dm-graphs01   30GB
> build.m.o -> dm-wwwbuild01  80GB

Let's look to do this in a downtime next week. I'll let you know on Monday exactly when.
We're going to have downtime to do this tomorrow morning. Starting anytime between 5am and 7am is fine. Phong, whatever works best for you in the window sounds good to me.
Assignee: joduinn → phong
Component: Release Engineering → Server Operations
QA Contact: release → mrz
dm-graphs01 migrated.
dm-wwwbuild01 also migrated.  I think we are all done here.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.