Closed
Bug 501222
Opened 15 years ago
Closed 15 years ago
Bump up RAM and update VMware tools on linux slaves
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: bhearsum)
References
Details
Currently our linux build slaves have 768 MB of RAM and 512 MB of swap. We're hitting swap when linking some of the larger libraries, and we suspect that we're actually running out of memory in some cases, especially on Try. Can we bump the RAM on all our linux slaves up to 2 GB?
Updated•15 years ago
|
Assignee: server-ops → phong
Comment 1•15 years ago
|
||
This will require the VMs to be shutdown. This will also put additional load on our ESX hosts to allocated more memory for all the VMs.
Reporter | ||
Comment 2•15 years ago
|
||
(In reply to comment #1) > This will require the VMs to be shutdown. This will also put additional load > on our ESX hosts to allocated more memory for all the VMs. Do we have enough RAM in the ESX hosts to cover this? We have at least 44 linux slaves right now that would need this additional RAM, giving a total increase of 55 GB on the ESX hosts. What happens if we overcommit?
Comment 4•15 years ago
|
||
(In reply to comment #2) > (In reply to comment #1) > > This will require the VMs to be shutdown. This will also put additional load > > on our ESX hosts to allocated more memory for all the VMs. > > Do we have enough RAM in the ESX hosts to cover this? We have at least 44 > linux slaves right now that would need this additional RAM, giving a total > increase of 55 GB on the ESX hosts. Per discussion with phong in last week's group meeting, we have 70+GB ram available, so we can increase RAM like this without fear of overcommitting. Note: Phong wanted to wait until after the ESX upgrades completed before doing this for all the linux VMs. However, he was fine with us doing this for a few staging linux VMs as soon as we want, if we want to do some testing in staging first.
Comment 5•15 years ago
|
||
The staging slave moz2-linux-slave17.b.m.o now has 2GB RAM, rebooted and looks like it is processing jobs just fine. I'll leave it run over the weekend before declaring it safe.
Assignee | ||
Comment 6•15 years ago
|
||
(In reply to comment #5) > The staging slave moz2-linux-slave17.b.m.o now has 2GB RAM, rebooted and looks > like it is processing jobs just fine. I'll leave it run over the weekend before > declaring it safe. I don't see any errors on this slave related to memory, let's go ahead and do the rest. Phong, I think it's actually easier for us to pull out slaves one by one and do them, so I'm moving this bug back to RelEng.
Assignee: phong → nobody
Component: Server Operations: Tinderbox Maintenance → Release Engineering
QA Contact: mrz → release
Assignee | ||
Comment 7•15 years ago
|
||
Got moz2-linux-slave02 today.
Comment 8•15 years ago
|
||
moz2-linux-slave03 moz2-linux-slave04 moz2-linux-slave17 ...all now have 2gb ram, VMware tools installed. Also, went back to moz-linux-slave02, verified it has 2gb ram and then I installed VMware tools on it. (unclear if VMware tools was there already or not).
Summary: Bump up RAM on linux slaves → Bump up RAM and install VMware tools on linux slaves
Comment 9•15 years ago
|
||
From bug#503392, comment#0: Linux like this 1, login as root, cd /etc, cp fstab fstab.bak 2, Using VI, do automatic upgrade of vmware tools 3, back as root, cp fstab.bak fstab, edit fstab to remove these three lines # Beginning of the block added by the VMware software .host:/ /mnt/hgfs vmhgfs defaults,ttl=5 0 0 # End of the block added by the VMware software 4, reboot Buildbot shutdown in each case. I think we can roll this out gradually to the slaves, but will have to schedule downtimes for masters.
Updated•15 years ago
|
Summary: Bump up RAM and install VMware tools on linux slaves → Bump up RAM and update VMware tools on linux slaves
Comment 10•15 years ago
|
||
While you're doing this, can you move some of them to the INTEL2 cluster. I've added bm-vmware08 to that cluster.
Comment 11•15 years ago
|
||
Armen noticed that the slaving slaves were AWOL. I suspect they didn't get the reboot step after the vmware tools upgrade. That's what happened on moz2-linux-slave02, which was also not responding. Each machine was not running vmware-tools and the network was down, which is typical after you do the tools upgrade. A reboot fixed them up.
Comment 12•15 years ago
|
||
VMWare tools done on sm-try-master. Left it at 1GB RAM.
Assignee | ||
Comment 13•15 years ago
|
||
I'm going to try and finish these up today.
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Assignee | ||
Comment 14•15 years ago
|
||
RAM and VMware tools upgrades are done on moz2-linux-slave01 -> 25 and try-linux-slave01 -> 19. I still need to go through the other VMs and do tools upgrades.
Assignee | ||
Comment 15•15 years ago
|
||
moz2-linux64-slave01 is done too.
Assignee | ||
Comment 16•15 years ago
|
||
We're going to do the rest of these updates in the downtime tomorrow.
Assignee | ||
Comment 17•15 years ago
|
||
Only machines left to do are: production-1.9-master qm-buildbot01 qm-rhel02 staging-1.9-master staging-master staging-try-master talos-master talos-staging-master cruncher production-opsi production-prometheus-vm production-puppet prometheus-vm staging-opsi staging-puppet staging-stage Linux ref platform
Assignee | ||
Comment 18•15 years ago
|
||
(In reply to comment #17) > Only machines left to do are: > production-1.9-master > qm-buildbot01 > staging-1.9-master > staging-master > staging-try-master > talos-staging-master > cruncher > production-prometheus-vm > production-puppet > Linux ref platform > prometheus-vm > staging-puppet > staging-stage The following VMs need downtime to do the tools uprgade: qm-rhel02 talos-master production and staging opsi still need VMware tools, too, but they gave me an error when I tried to do the install, "A general system error occurred: Internal error". This might be because they were cloned from a Virtual Appliance? I'm not sure.
Whiteboard: still to do: talos-master, qm-rhel02, production-opsi, staging-opsi
Comment 19•15 years ago
|
||
(In reply to comment #18) > (In reply to comment #17) > production and staging opsi still need VMware tools, too, but they gave me an > error when I tried to do the install, "A general system error occurred: > Internal error". This might be because they were cloned from a Virtual > Appliance? I'm not sure. Phong: any idea what might be causing this?
Assignee | ||
Comment 20•15 years ago
|
||
I pinged Phong about this on Friday, actually, and he confirmed my theory about it happening because they were cloned from a Virtual Appliance. I'm not entirely certain what to do with them at this point.
Comment 21•15 years ago
|
||
(In reply to comment #18) > The following VMs need downtime to do the tools uprgade: > qm-rhel02 > talos-master I've done these two while recovering from today's air-con outage.
Comment 22•15 years ago
|
||
(In reply to comment #21) > (In reply to comment #18) > > The following VMs need downtime to do the tools uprgade: > > qm-rhel02 > > talos-master > > I've done these two while recovering from today's air-con outage. Nice, thanks Nick! bhearsum, any chance we can upgrade production-opsi without a downtime?
Whiteboard: still to do: talos-master, qm-rhel02, production-opsi, staging-opsi → still to do: production-opsi, staging-opsi
Assignee | ||
Comment 23•15 years ago
|
||
(In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #18) > > > The following VMs need downtime to do the tools uprgade: > > > qm-rhel02 > > > talos-master > > > > I've done these two while recovering from today's air-con outage. > > Nice, thanks Nick! > > bhearsum, any chance we can upgrade production-opsi without a downtime? Sure...the problem with staging and production-opsi is that VMware tools refuses to install on them, though, per comment 18
Updated•15 years ago
|
Whiteboard: still to do: production-opsi, staging-opsi → still to do: vmware tools on production-opsi, staging-opsi
Assignee | ||
Comment 24•15 years ago
|
||
It's confusing to have this bug open still. I filed bug 511442 (in the future) to track getting vmware tools installed on the remaining two machines. The rest of this bug is FIXED.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: still to do: vmware tools on production-opsi, staging-opsi
Assignee | ||
Comment 25•15 years ago
|
||
I finally managed to get VMware tools installed on an OPSI server. Seems that having the Operating System set to 'Other' causes VMware to barf. Changing it to Linux -> Other 32-bit let me mount the VMware tools CD and do the install by hand.
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•