tinderbox.bugzilla.lan is so slow that our Bugzilla QA tests can't pass

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations: Projects
RESOLVED FIXED
9 years ago
3 years ago

People

(Reporter: Max Kanat-Alexander, Assigned: phong)

Tracking

Bug Flags:
needs-downtime -

Details

(Reporter)

Description

9 years ago
tinderbox.bugzilla.lan (192.168.99.10 on cg-vmware01) is the tinderbox client for the Bugzilla tests. I recently set up our Selenium tests to run on that server, and with everything else that's going on, things are now so slow that the QA tests actually time out when they try to run, meaning that we can't get accurate test results.

The problem both the disk and the CPU. Adding another CPU to the system (and possibly allocating it more RAM) would probably help.

If that doesn't help, it'll just need faster disk access, if that's in any way possible.
Phong is our vmware guy, so passing this to him.
Assignee: server-ops → phong
Are the vmware guest tools running?
(Reporter)

Comment 3

9 years ago
I think so:

ps -Af | grep vm
root      1735     1  0 Nov26 ?        00:09:13 [vmmemctl]
root      1770     1  0 Nov26 ?        00:55:36 /usr/sbin/vmware-guestd --background /var/run/vmware-guestd.pid
(Assignee)

Comment 4

9 years ago
This ESX host only has 4GB of RAM total.  We will need to add more to this host.  This will require downtime to add RAM.
(Assignee)

Updated

9 years ago
Flags: needs-downtime+
Flags: colo-trip+
(Reporter)

Comment 5

9 years ago
Really? I could have sworn it had 8GB. Is it maybe only addressing 4GB because the host OS is 32-bit?

Comment 6

9 years ago
The ESX host only has 4GB of physical memory and more than 70% of it is in use.

The tinderbox.bugzilla.lan VM is only configured for 1GB RAM.
(Assignee)

Comment 7

9 years ago
Max: Can I take this cluster down tomorrow afternoon to add more RAM?
(Reporter)

Comment 8

9 years ago
(In reply to comment #7)
> Max: Can I take this cluster down tomorrow afternoon to add more RAM?

  Sure, that would be fine. If I'm on IRC (mkanat) just let me know before it goes down.
(In reply to comment #7)
> Max: Can I take this cluster down tomorrow afternoon to add more RAM?

Which cluster is that? All of cg-vmware01? That affects way more than just Bugzilla, so please make sure you get all necessary parties involved.
> Which cluster is that? All of cg-vmware01? That affects way more than just
> Bugzilla, so please make sure you get all necessary parties involved.

I wouldn't say "way more" but Reed's right - I forget who ownes cg-ecmascript01 and cg-centos01 but I suspect you could easily get a window from those owners or pause the VMs before taking the host down.

displayName = "cg-ecmascript01"
displayName = "landfill"
displayName = "tinderbox.bugzilla"
displayName = "cg-centos01"
displayName = "windows.bugzilla"
displayName = "oracle.bugzilla"
I own cg-centos01, fyi. ;)
(Assignee)

Comment 12

9 years ago
Who are the owners of the remaining VM's?
(In reply to comment #12)
> Who are the owners of the remaining VM's?

"cg-ecmascript01" -- Dave Herman (dherman@ccs.neu.edu) [however, I think this VM isn't used anymore... should ask]

"landfill" -- Bugzilla Project

"tinderbox.bugzilla" -- Bugzilla Project

"cg-centos01" -- reed

"windows.bugzilla" -- Bugzilla Project

"oracle.bugzilla" -- Bugzilla Project

Comment 14

9 years ago
Re: cg-ecmascript01, IIRC we didn't end up using this, but I'd double check with Brendan before deleting. Thx
And I have to check with graydon, my memory is failing me. Probably that means we  never used this.

/be
nope, not used.
(Assignee)

Comment 17

9 years ago
are we ready for me to take these down?  I'll also delete cg-ecmascript01.
(In reply to comment #17)
> are we ready for me to take these down?  I'll also delete cg-ecmascript01.

ok from me for cg-centos01

Comment 19

9 years ago
(In reply to comment #8)
>   Sure, that would be fine. If I'm on IRC (mkanat) just let me know before it
> goes down.

mkanat is not on IRC, so let's go.
(Assignee)

Comment 20

9 years ago
I was able to added 1 GB of RAM to the ESX host.  I also bumped tinderbox.bugzilla RAM up to 1536.
(Reporter)

Comment 21

9 years ago
Great! So, we're not hitting the swap anymore, but the machine is still too slow for the QA tests to pass.

Any chance of allocating it another CPU? If that doesn't do it, then it will need to be moved to its own machine (which is funny, since it's one of the whole reasons we got cg-vmware01).
(Assignee)

Comment 22

9 years ago
I can add a second CPU for this VM, but it will require a quick shutdown to make the change.
(Reporter)

Comment 23

9 years ago
(In reply to comment #22)
> I can add a second CPU for this VM, but it will require a quick shutdown to
> make the change.

  That's fine, you can shut it down whenever.
(Assignee)

Comment 24

9 years ago
second CPU added.
(Assignee)

Comment 25

9 years ago
Please reopen if you run into more issues.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
(Reporter)

Comment 26

9 years ago
The machine is still too slow for our tests to all be running and pass. (Also, unfortunately, I don't have a good way to run them in-order instead of in parallel.)

The machine is currently swapping actively--it's using about 1GB of swap.

I suspect the main limiter is the disk, though. Ideally we'd have an additional machine for certain disk-heavy tests.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This ESX host has 4GB RAM and is using nearly 70% of it right now.  It doesn't at all look CPU constrained.  Disk I/O doesn't look strained either.

My gut feeling is that the ESX host is memory constrained.  I'd recommend a query to Community Giving about upgrading either the entire ESX host or getting additional memory.
Component: Server Operations → Server Operations: Projects

Updated

9 years ago
Depends on: 509679
The server upgrade is being tracked elsewhere, can I close this?
Flags: needs-downtime-
Flags: needs-downtime+
Flags: colo-trip-
Flags: colo-trip+
(Reporter)

Comment 29

9 years ago
Sure.

Updated

8 years ago
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago8 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.