501222 - Bump up RAM and update VMware tools on linux slaves

Reporter

Description

•

16 years ago

Currently our linux build slaves have 768 MB of RAM and 512 MB of swap. We're hitting swap when linking some of the larger libraries, and we suspect that we're actually running out of memory in some cases, especially on Try. Can we bump the RAM on all our linux slaves up to 2 GB?

Phong Tran [:phong]

Updated

•

16 years ago

Assignee: server-ops → phong

Phong Tran [:phong]

Comment 1

•

16 years ago

This will require the VMs to be shutdown. This will also put additional load on our ESX hosts to allocated more memory for all the VMs.

Chris AtLee [:catlee]

Reporter

Comment 2

•

16 years ago

(In reply to comment #1) > This will require the VMs to be shutdown. This will also put additional load > on our ESX hosts to allocated more memory for all the VMs. Do we have enough RAM in the ESX hosts to cover this? We have at least 44 linux slaves right now that would need this additional RAM, giving a total increase of 55 GB on the ESX hosts. What happens if we overcommit?

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 4

•

16 years ago

(In reply to comment #2) > (In reply to comment #1) > > This will require the VMs to be shutdown. This will also put additional load > > on our ESX hosts to allocated more memory for all the VMs. > > Do we have enough RAM in the ESX hosts to cover this? We have at least 44 > linux slaves right now that would need this additional RAM, giving a total > increase of 55 GB on the ESX hosts. Per discussion with phong in last week's group meeting, we have 70+GB ram available, so we can increase RAM like this without fear of overcommitting. Note: Phong wanted to wait until after the ESX upgrades completed before doing this for all the linux VMs. However, he was fine with us doing this for a few staging linux VMs as soon as we want, if we want to do some testing in staging first.

bhearsum@mozilla.com (:bhearsum)

Assignee

Updated

•

16 years ago

Blocks: 500699

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 5

•

16 years ago

The staging slave moz2-linux-slave17.b.m.o now has 2GB RAM, rebooted and looks like it is processing jobs just fine. I'll leave it run over the weekend before declaring it safe.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 6

•

16 years ago

(In reply to comment #5) > The staging slave moz2-linux-slave17.b.m.o now has 2GB RAM, rebooted and looks > like it is processing jobs just fine. I'll leave it run over the weekend before > declaring it safe. I don't see any errors on this slave related to memory, let's go ahead and do the rest. Phong, I think it's actually easier for us to pull out slaves one by one and do them, so I'm moving this bug back to RelEng.

Assignee: phong → nobody

Component: Server Operations: Tinderbox Maintenance → Release Engineering

QA Contact: mrz → release

Aki Sasaki (not active)

Updated

•

16 years ago

Blocks: 498393

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 7

•

16 years ago

Got moz2-linux-slave02 today.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 8

•

16 years ago

moz2-linux-slave03 moz2-linux-slave04 moz2-linux-slave17 ...all now have 2gb ram, VMware tools installed. Also, went back to moz-linux-slave02, verified it has 2gb ram and then I installed VMware tools on it. (unclear if VMware tools was there already or not).

Summary: Bump up RAM on linux slaves → Bump up RAM and install VMware tools on linux slaves

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 9

•

16 years ago

From bug#503392, comment#0: Linux like this 1, login as root, cd /etc, cp fstab fstab.bak 2, Using VI, do automatic upgrade of vmware tools 3, back as root, cp fstab.bak fstab, edit fstab to remove these three lines # Beginning of the block added by the VMware software .host:/ /mnt/hgfs vmhgfs defaults,ttl=5 0 0 # End of the block added by the VMware software 4, reboot Buildbot shutdown in each case. I think we can roll this out gradually to the slaves, but will have to schedule downtimes for masters.

Nick Thomas [:nthomas] (UTC+12)

Updated

•

16 years ago

Summary: Bump up RAM and install VMware tools on linux slaves → Bump up RAM and update VMware tools on linux slaves

Phong Tran [:phong]

Comment 10

•

16 years ago

While you're doing this, can you move some of them to the INTEL2 cluster. I've added bm-vmware08 to that cluster.

Nick Thomas [:nthomas] (UTC+12)

Comment 11

•

16 years ago

Armen noticed that the slaving slaves were AWOL. I suspect they didn't get the reboot step after the vmware tools upgrade. That's what happened on moz2-linux-slave02, which was also not responding. Each machine was not running vmware-tools and the network was down, which is typical after you do the tools upgrade. A reboot fixed them up.

Nick Thomas [:nthomas] (UTC+12)

Comment 12

•

16 years ago

VMWare tools done on sm-try-master. Left it at 1GB RAM.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 13

•

16 years ago

I'm going to try and finish these up today.

Assignee: nobody → bhearsum

Status: NEW → ASSIGNED

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 14

•

16 years ago

RAM and VMware tools upgrades are done on moz2-linux-slave01 -> 25 and try-linux-slave01 -> 19. I still need to go through the other VMs and do tools upgrades.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 15

•

16 years ago

moz2-linux64-slave01 is done too.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 16

•

16 years ago

We're going to do the rest of these updates in the downtime tomorrow.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 17

•

16 years ago

Only machines left to do are: production-1.9-master qm-buildbot01 qm-rhel02 staging-1.9-master staging-master staging-try-master talos-master talos-staging-master cruncher production-opsi production-prometheus-vm production-puppet prometheus-vm staging-opsi staging-puppet staging-stage Linux ref platform

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 18

•

16 years ago

(In reply to comment #17) > Only machines left to do are: > production-1.9-master > qm-buildbot01 > staging-1.9-master > staging-master > staging-try-master > talos-staging-master > cruncher > production-prometheus-vm > production-puppet > Linux ref platform > prometheus-vm > staging-puppet > staging-stage The following VMs need downtime to do the tools uprgade: qm-rhel02 talos-master production and staging opsi still need VMware tools, too, but they gave me an error when I tried to do the install, "A general system error occurred: Internal error". This might be because they were cloned from a Virtual Appliance? I'm not sure.

Whiteboard: still to do: talos-master, qm-rhel02, production-opsi, staging-opsi

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 19

•

16 years ago

(In reply to comment #18) > (In reply to comment #17) > production and staging opsi still need VMware tools, too, but they gave me an > error when I tried to do the install, "A general system error occurred: > Internal error". This might be because they were cloned from a Virtual > Appliance? I'm not sure. Phong: any idea what might be causing this?

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 20

•

16 years ago

I pinged Phong about this on Friday, actually, and he confirmed my theory about it happening because they were cloned from a Virtual Appliance. I'm not entirely certain what to do with them at this point.

Nick Thomas [:nthomas] (UTC+12)

Comment 21

•

16 years ago

(In reply to comment #18) > The following VMs need downtime to do the tools uprgade: > qm-rhel02 > talos-master I've done these two while recovering from today's air-con outage.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 22

•

16 years ago

(In reply to comment #21) > (In reply to comment #18) > > The following VMs need downtime to do the tools uprgade: > > qm-rhel02 > > talos-master > > I've done these two while recovering from today's air-con outage. Nice, thanks Nick! bhearsum, any chance we can upgrade production-opsi without a downtime?

Whiteboard: still to do: talos-master, qm-rhel02, production-opsi, staging-opsi → still to do: production-opsi, staging-opsi

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 23

•

16 years ago

(In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #18) > > > The following VMs need downtime to do the tools uprgade: > > > qm-rhel02 > > > talos-master > > > > I've done these two while recovering from today's air-con outage. > > Nice, thanks Nick! > > bhearsum, any chance we can upgrade production-opsi without a downtime? Sure...the problem with staging and production-opsi is that VMware tools refuses to install on them, though, per comment 18

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

16 years ago

Whiteboard: still to do: production-opsi, staging-opsi → still to do: vmware tools on production-opsi, staging-opsi

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 24

•

16 years ago

It's confusing to have this bug open still. I filed bug 511442 (in the future) to track getting vmware tools installed on the remaining two machines. The rest of this bug is FIXED.

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Whiteboard: still to do: vmware tools on production-opsi, staging-opsi

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 25

•

16 years ago

I finally managed to get VMware tools installed on an OPSI server. Seems that having the Operating System set to 'Other' causes VMware to barf. Changing it to Linux -> Other 32-bit let me mount the VMware tools CD and do the install by hand.

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering