479146 - set fixed resource allocations for unit test VMs

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Description

•

15 years ago

In order to reduce the variability in timeout and timing-sensitive test failures, I would like us to set fixed resource limits on the VMs running the unit tests.  We should lock them each to 1/5th of a machine or whatever, both max and min, so that our timeout problems will at least become more consistent, and isolated from load on other guests.

I'm not sure how much control over I/O capacity we have in our current config, but ESX gives some knobs for cases dealing with local disk (and I think iSCSI); obviously the other side of the SAN is harder to apportion as crisply.

Nick Thomas [:nthomas] (UTC+12)

Comment 1

•

15 years ago

The current pool-of-slaves implementation means that any slave can be given a unit test job, so all slaves would need a fixed resource allocation (which somewhat reduces the flexibility of having a pool in the first place). 

Can we dynamically add an allocation ? If so, the build process could signal to the ESX host it was starting unit tests and needed guaranteed resources, and give them back afterwards.

Nick Thomas [:nthomas] (UTC+12)

Comment 2

•

15 years ago

I'm assuming you're talking about the Firefox, Firefox3.1, and Tracemonkey trees. Dedicated VMs are used for Firefox3.0.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Comment 3

•

15 years ago

We don't want a unit test VM to get more than its allocation either, because that leads to "false" passes that turn into failures when things are more contended.  I don't care about idle cycles nearly as much as I care about consistent results, but as long as nothing ever gets more or less than it should, we can do whatever we want with the allocations. :)

matthew zeier [:mrz]

Updated

•

15 years ago

Assignee: server-ops → phong

Phong Tran [:phong]

Comment 4

•

15 years ago

Setting up reservations for the VM's has cause some problems.  Since not all of the ESX host have identical CPU and RAM, these VM's have issues migrating between ESX host.  What we should do instead is to give the important VMs higher priority to the resources.  We should also set up rules that will keep these VMs on separate ESX hosts.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Comment 5

•

15 years ago

I don't understand how giving them higher priority solves this problem.  Having too many resources available is as bad as having too little, because it's variance that causes the unreliability.

bhearsum@mozilla.com (:bhearsum)

Comment 6

•

15 years ago

Phong, as an aside, can you give us a complete list of the VMs we have reservations on?

Phong Tran [:phong]

Comment 7

•

15 years ago

high share priority:
production master
qm-rhel02
qm-buildbot01
try-master

reservations limit CPU:
staging-prometheus-vm
production-prometheus-vm
fx-win32-1.9-slave2
production-prometheus-vm02
staging-try-master
tb-linux-tbox
fx-linux-1.9-slave08
moz2-linux-slave06
fx-linux-1.9-slave2
fxdbug-linux-tbox
bm-l10n-centos5-01
fx-linux-1.9-slave1
moz2-linux-slave15
bm-centos5-unittest-01
fx-linux-1.9-slave03/04/07/09
moz2-linuxnonsse-slave01
moz2-linux-slave(01-19)
murali-experiment
staging-1.9-master
staging-master
test-linslave
test-mgmt
try-linux-slave(01-05)
xr-linux-tbox

bhearsum@mozilla.com (:bhearsum)

Comment 8

•

15 years ago

(In reply to comment #7)
> reservations limit CPU:

Phong, what does this actually mean? Is this an upper limit on CPU usage, or a lower limit?

Phong Tran [:phong]

Comment 9

•

15 years ago

That means those VM's are set for an upper limit on CPU usage.  I think it should be the opposite.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Comment 10

•

15 years ago

I'm pretty sure we want both high and low limits, and we want them to be identical.

Phong Tran [:phong]

Comment 11

•

15 years ago

We replaced the old AMD cluster with 4 new Intel blades.  This gives us more resources to share amongst the virtual machines.

Ideally we would let VMWare DRS handle resource allocations for the VMs.  If we set a lower limit, then those cycle will be reserve for the VM even if it's idle and not using it.  When we set an upper limit, the VM won't be able to use more than what is allocated even if there is more resources available.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Comment 12

•

15 years ago

(In reply to comment #11)
> If we
> set a lower limit, then those cycle will be reserve for the VM even if it's
> idle and not using it.  When we set an upper limit, the VM won't be able to use
> more than what is allocated even if there is more resources available.

Yes, that's exactly the point of what I'm asking for.  I want to make sure that if a unit test is running on a VM, it always gets the same CPU resources -- never extra, never fewer.  Otherwise, anything that has a timing element will vary according to what happens to be running next to it.

bhearsum@mozilla.com (:bhearsum)

Comment 13

•

15 years ago

(In reply to comment #12)
> (In reply to comment #11)
> > If we
> > set a lower limit, then those cycle will be reserve for the VM even if it's
> > idle and not using it.  When we set an upper limit, the VM won't be able to use
> > more than what is allocated even if there is more resources available.
> 
> Yes, that's exactly the point of what I'm asking for.  I want to make sure that
> if a unit test is running on a VM, it always gets the same CPU resources --
> never extra, never fewer.  Otherwise, anything that has a timing element will
> vary according to what happens to be running next to it.

We don't have dedicated unittest machines. This will make the machines unable to burst during periods where there is free resources. Do you really want that?

Mike Shaver (:shaver -- probably not reading bugmail closely)

Reporter

Comment 14

•

15 years ago

Well, I want there to be as little timing variance in our unit test machines as possible.  I don't care if the unit test VMs' resources are used if there is no unit test being run, but when they're running I want them to always run with the same (fixed) parameters.  Fixed/permanent min=max limits are one way to address that, but if you have others (such as using the VMWare guest APIs to change the reservations when the unit test runs start and end) then go for it.

How often are they idle?

matthew zeier [:mrz]

Comment 15

•

15 years ago

I might be disconnected but I thought unit tests were on the Mac Minis and not at all on the VMs.  Am I wrong?

matthew zeier [:mrz]

Comment 16

•

15 years ago

Stalled on some input - which VMs are doing unit tests that need this?

Chris AtLee [:catlee]

Updated

•

15 years ago

Assignee: phong → joduinn

matthew zeier [:mrz]

Updated

•

15 years ago

Component: Server Operations → Release Engineering

QA Contact: mrz → release

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 17

•

14 years ago

Fixing this in bug#548768, as a side-effect of moving unittests from VMs to minis. Running unittests on identical hardware, one process per machine, should remove hardware/VM from concerns of variability in timeout and timing-sensitive test failures.

I'll close this as DUP, because that seems closest; we are fixing the underlying problem in another bug - just differently to how originally asked here.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → DUPLICATE

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Bugzilla

Quick Search

set fixed resource allocations for unit test VMs

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: shaver, Assigned: joduinn)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Updated

Updated

Comment 17

Updated