Closed
Bug 501251
Opened 15 years ago
Closed 15 years ago
Test ESX upgrade on single host
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: phong)
References
Details
We'd like to verify that our existing VMs will work fine during the period when the ESX hosts have been updated and we have a chance to update VMWare tools. So please take one of the ESX hosts and put it in a separate cluster (so that DRS doesn't move machines around). After the ESX upgrade we can migrate moz2-win32-slave28 & 29, try-w32-slave19, moz2-linux-slave22 & 23, and try-linux-slave19 onto that host. RelEng will monitor those machines for a couple of days and if all is well then we can proceed with the other host upgrades. We could also test that the new VMWare tools don't introduce any problems. Incidentally, perhaps we can use this as a deployment strategy to keep machines with new VMWare tools on updated ESX hosts.
Assignee | ||
Comment 1•15 years ago
|
||
I forgot to mention that we've already upgraded the IT - San Jose Production cluster. I don't think any of the VMs on those 4 hosts are have the updated VMware tools. They are all running fine since last week.
Assignee | ||
Comment 2•15 years ago
|
||
I just released these are all on the new Intel cluster and can't be moved to bm-vmare08 without shutting them down first.
Reporter | ||
Comment 3•15 years ago
|
||
(In reply to comment #1) > I forgot to mention that we've already upgraded the IT - San Jose Production > cluster. I don't think any of the VMs on those 4 hosts are have the updated > VMware tools. They are all running fine since last week. How many Centos5 VMs do you have there ? How about windows ? (In reply to comment #2) > I just released these are all on the new Intel cluster and can't be moved to > bm-vmare08 without shutting them down first. I chose them at random really. Feel free to pick any two moz2-win32-slaveN machines, one try-w32-slaveN, and the same for linux.
Assignee | ||
Comment 4•15 years ago
|
||
(In reply to comment #3) > (In reply to comment #1) > > I forgot to mention that we've already upgraded the IT - San Jose Production > > cluster. I don't think any of the VMs on those 4 hosts are have the updated > > VMware tools. They are all running fine since last week. > > How many Centos5 VMs do you have there ? How about windows ? 1 - Win2k3 and the rest are Rhel > > (In reply to comment #2) > > I just released these are all on the new Intel cluster and can't be moved to > > bm-vmare08 without shutting them down first. > > I chose them at random really. Feel free to pick any two moz2-win32-slaveN > machines, one try-w32-slaveN, and the same for linux. moz2-linux-slave14/17, moz2-win32-slave20/22, and try-win32-slave09 are moved over. with just those 5 vms, it's almost at 100% of the memory. is that good enough for now? if not, i can move another ESX host over and and move a few more VMs.
Reporter | ||
Comment 5•15 years ago
|
||
That'll be close enough, we'll watch it to see if anything comes up.
Assignee | ||
Comment 6•15 years ago
|
||
can we call this successful and upgrade the rest of them now?
Reporter | ||
Comment 7•15 years ago
|
||
Can't see any regressions for the build slaves running with old tools on the updated host, just known orange on unit tests. Now going ahead with updating the tools and verifying that.
Reporter | ||
Comment 8•15 years ago
|
||
Tried moz2-linux-slave14 first and hit some snags. I'm putting this here as a doc for RelEng mostly. The automatic tools upgrade completed successfully according to VI, but seemed to not restart the tools ("not running"). Starting it as root gave no errors, but since it needed a network restart too I decided to test it in the reboot case. There were problems unmounting /builds on shutdown, something was making the kernel think it was busy (buildbot was already shut down). There was a storage problem on the console when I first came to it, so perhaps that's related. On reboot I discover /etc/fstab has been changed. The main problem is that /builds is not present, but we also lose we lost the noatime option on / and the Puppet header. This is http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1006718&sliceId=1&docTypeID=DT_KB_1_1&dialogID=24732296&stateId=1%200%2024734213 summary: vmware tools upgrade has a bug that wipes out any mods you made workaround: backup /etc/fstab before upgrade and then restore comment: OMGWTFBBQ!!11!!! Restoring puppets backup from June 30 (/etc/fstab.old.0) would do, although it would be good to remove this section # Beginning of the block added by the VMware software .host:/ /mnt/hgfs vmhgfs defaults,ttl=5 0 0 # End of the block added by the VMware software which causes a spurious error about mounting local file systems on boot (and vmware tools also removed it). So I did that. moz2-linux-slave14 is back in the prod pool.
Reporter | ||
Comment 9•15 years ago
|
||
moz2-linux-slave17 done at 16:35 PDT. Steps: 1, login as root, cd /etc, cp fstab fstab.bak 2, Using VI, do automatic upgrade of vmware tools 3, back as root, cp fstab.bak fstab, edit fstab to remove the hgfs lines in comment #8 4, reboot
Reporter | ||
Comment 10•15 years ago
|
||
moz2-win32-slave20 done at 17:14 PDT moz2-win32-slave22 done at 17:18 PDT try-win32-slave09 done at 17:22 PDT Notes 1, Both moz2 windows slaves think they have DNS like win32-slave20.uib.local rather than win32-slave20.build.mozilla.org. 2, When VI says "Completed" for a win32 tools upgrade, it means tools are installed and a reboot has been initiated, rather than it's back up after the reboot
Reporter | ||
Comment 11•15 years ago
|
||
Can't see any regression from the upgraded host/upgraded tools scenario, after excluding the known causes of orange and red builds. bhearsum/catlee, do you have any comments before Phong goes ahead with upgrading the world ?
Comment 12•15 years ago
|
||
Sounds OK to me to go ahead. The Linux stuff sucks, so we'll have to careful when we do the tools installs.
Assignee | ||
Comment 13•15 years ago
|
||
I am starting the upgrade now.
Assignee | ||
Comment 14•15 years ago
|
||
Can we close this bug or move it over to release for your internal tracking of the tools upgrade?
Reporter | ||
Comment 15•15 years ago
|
||
Thanks for indulging us Phong. Updates done so far: * moz2-linux-slave14, 17 * moz2-win32-slave03, 20, 22 * try-win32-slave09 * production-master, possibly staging-master Linux to be done per comment #9, windows can be done automatically. Buildbot shutdown in each case.
Assignee: phong → nobody
Component: Server Operations: Tinderbox Maintenance → Release Engineering
QA Contact: mrz → release
Summary: Test ESX upgrade on single host → Upgrade VMWare tools on RelEng VMs
Reporter | ||
Comment 16•15 years ago
|
||
Actually, the deps on this mean we should make a new bug.
Assignee: nobody → phong
Status: NEW → RESOLVED
Closed: 15 years ago
Component: Release Engineering → Server Operations: Tinderbox Maintenance
QA Contact: release → mrz
Resolution: --- → FIXED
Summary: Upgrade VMWare tools on RelEng VMs → Test ESX upgrade on single host
Reporter | ||
Comment 17•15 years ago
|
||
Bug 503392 for the tools updates on the slaves.
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•