tinderbox.bugzilla.lan (192.168.99.10 on cg-vmware01) is the tinderbox client for the Bugzilla tests. I recently set up our Selenium tests to run on that server, and with everything else that's going on, things are now so slow that the QA tests actually time out when they try to run, meaning that we can't get accurate test results. The problem both the disk and the CPU. Adding another CPU to the system (and possibly allocating it more RAM) would probably help. If that doesn't help, it'll just need faster disk access, if that's in any way possible.
Phong is our vmware guy, so passing this to him.
Are the vmware guest tools running?
I think so: ps -Af | grep vm root 1735 1 0 Nov26 ? 00:09:13 [vmmemctl] root 1770 1 0 Nov26 ? 00:55:36 /usr/sbin/vmware-guestd --background /var/run/vmware-guestd.pid
This ESX host only has 4GB of RAM total. We will need to add more to this host. This will require downtime to add RAM.
Really? I could have sworn it had 8GB. Is it maybe only addressing 4GB because the host OS is 32-bit?
The ESX host only has 4GB of physical memory and more than 70% of it is in use. The tinderbox.bugzilla.lan VM is only configured for 1GB RAM.
Max: Can I take this cluster down tomorrow afternoon to add more RAM?
(In reply to comment #7) > Max: Can I take this cluster down tomorrow afternoon to add more RAM? Sure, that would be fine. If I'm on IRC (mkanat) just let me know before it goes down.
(In reply to comment #7) > Max: Can I take this cluster down tomorrow afternoon to add more RAM? Which cluster is that? All of cg-vmware01? That affects way more than just Bugzilla, so please make sure you get all necessary parties involved.
> Which cluster is that? All of cg-vmware01? That affects way more than just > Bugzilla, so please make sure you get all necessary parties involved. I wouldn't say "way more" but Reed's right - I forget who ownes cg-ecmascript01 and cg-centos01 but I suspect you could easily get a window from those owners or pause the VMs before taking the host down. displayName = "cg-ecmascript01" displayName = "landfill" displayName = "tinderbox.bugzilla" displayName = "cg-centos01" displayName = "windows.bugzilla" displayName = "oracle.bugzilla"
I own cg-centos01, fyi. ;)
Who are the owners of the remaining VM's?
(In reply to comment #12) > Who are the owners of the remaining VM's? "cg-ecmascript01" -- Dave Herman (firstname.lastname@example.org) [however, I think this VM isn't used anymore... should ask] "landfill" -- Bugzilla Project "tinderbox.bugzilla" -- Bugzilla Project "cg-centos01" -- reed "windows.bugzilla" -- Bugzilla Project "oracle.bugzilla" -- Bugzilla Project
Re: cg-ecmascript01, IIRC we didn't end up using this, but I'd double check with Brendan before deleting. Thx
And I have to check with graydon, my memory is failing me. Probably that means we never used this. /be
nope, not used.
are we ready for me to take these down? I'll also delete cg-ecmascript01.
(In reply to comment #17) > are we ready for me to take these down? I'll also delete cg-ecmascript01. ok from me for cg-centos01
(In reply to comment #8) > Sure, that would be fine. If I'm on IRC (mkanat) just let me know before it > goes down. mkanat is not on IRC, so let's go.
I was able to added 1 GB of RAM to the ESX host. I also bumped tinderbox.bugzilla RAM up to 1536.
Great! So, we're not hitting the swap anymore, but the machine is still too slow for the QA tests to pass. Any chance of allocating it another CPU? If that doesn't do it, then it will need to be moved to its own machine (which is funny, since it's one of the whole reasons we got cg-vmware01).
I can add a second CPU for this VM, but it will require a quick shutdown to make the change.
(In reply to comment #22) > I can add a second CPU for this VM, but it will require a quick shutdown to > make the change. That's fine, you can shut it down whenever.
second CPU added.
Please reopen if you run into more issues.
The machine is still too slow for our tests to all be running and pass. (Also, unfortunately, I don't have a good way to run them in-order instead of in parallel.) The machine is currently swapping actively--it's using about 1GB of swap. I suspect the main limiter is the disk, though. Ideally we'd have an additional machine for certain disk-heavy tests.
This ESX host has 4GB RAM and is using nearly 70% of it right now. It doesn't at all look CPU constrained. Disk I/O doesn't look strained either. My gut feeling is that the ESX host is memory constrained. I'd recommend a query to Community Giving about upgrading either the entire ESX host or getting additional memory.
The server upgrade is being tracked elsewhere, can I close this?