Closed
Bug 479146
Opened 15 years ago
Closed 14 years ago
set fixed resource allocations for unit test VMs
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 548768
People
(Reporter: shaver, Assigned: joduinn)
Details
In order to reduce the variability in timeout and timing-sensitive test failures, I would like us to set fixed resource limits on the VMs running the unit tests. We should lock them each to 1/5th of a machine or whatever, both max and min, so that our timeout problems will at least become more consistent, and isolated from load on other guests. I'm not sure how much control over I/O capacity we have in our current config, but ESX gives some knobs for cases dealing with local disk (and I think iSCSI); obviously the other side of the SAN is harder to apportion as crisply.
Comment 1•15 years ago
|
||
The current pool-of-slaves implementation means that any slave can be given a unit test job, so all slaves would need a fixed resource allocation (which somewhat reduces the flexibility of having a pool in the first place). Can we dynamically add an allocation ? If so, the build process could signal to the ESX host it was starting unit tests and needed guaranteed resources, and give them back afterwards.
Comment 2•15 years ago
|
||
I'm assuming you're talking about the Firefox, Firefox3.1, and Tracemonkey trees. Dedicated VMs are used for Firefox3.0.
Reporter | ||
Comment 3•15 years ago
|
||
We don't want a unit test VM to get more than its allocation either, because that leads to "false" passes that turn into failures when things are more contended. I don't care about idle cycles nearly as much as I care about consistent results, but as long as nothing ever gets more or less than it should, we can do whatever we want with the allocations. :)
Updated•15 years ago
|
Assignee: server-ops → phong
Comment 4•15 years ago
|
||
Setting up reservations for the VM's has cause some problems. Since not all of the ESX host have identical CPU and RAM, these VM's have issues migrating between ESX host. What we should do instead is to give the important VMs higher priority to the resources. We should also set up rules that will keep these VMs on separate ESX hosts.
Reporter | ||
Comment 5•15 years ago
|
||
I don't understand how giving them higher priority solves this problem. Having too many resources available is as bad as having too little, because it's variance that causes the unreliability.
Comment 6•15 years ago
|
||
Phong, as an aside, can you give us a complete list of the VMs we have reservations on?
Comment 7•15 years ago
|
||
high share priority: production master qm-rhel02 qm-buildbot01 try-master reservations limit CPU: staging-prometheus-vm production-prometheus-vm fx-win32-1.9-slave2 production-prometheus-vm02 staging-try-master tb-linux-tbox fx-linux-1.9-slave08 moz2-linux-slave06 fx-linux-1.9-slave2 fxdbug-linux-tbox bm-l10n-centos5-01 fx-linux-1.9-slave1 moz2-linux-slave15 bm-centos5-unittest-01 fx-linux-1.9-slave03/04/07/09 moz2-linuxnonsse-slave01 moz2-linux-slave(01-19) murali-experiment staging-1.9-master staging-master test-linslave test-mgmt try-linux-slave(01-05) xr-linux-tbox
Comment 8•15 years ago
|
||
(In reply to comment #7) > reservations limit CPU: Phong, what does this actually mean? Is this an upper limit on CPU usage, or a lower limit?
Comment 9•15 years ago
|
||
That means those VM's are set for an upper limit on CPU usage. I think it should be the opposite.
Reporter | ||
Comment 10•15 years ago
|
||
I'm pretty sure we want both high and low limits, and we want them to be identical.
Comment 11•15 years ago
|
||
We replaced the old AMD cluster with 4 new Intel blades. This gives us more resources to share amongst the virtual machines. Ideally we would let VMWare DRS handle resource allocations for the VMs. If we set a lower limit, then those cycle will be reserve for the VM even if it's idle and not using it. When we set an upper limit, the VM won't be able to use more than what is allocated even if there is more resources available.
Reporter | ||
Comment 12•15 years ago
|
||
(In reply to comment #11) > If we > set a lower limit, then those cycle will be reserve for the VM even if it's > idle and not using it. When we set an upper limit, the VM won't be able to use > more than what is allocated even if there is more resources available. Yes, that's exactly the point of what I'm asking for. I want to make sure that if a unit test is running on a VM, it always gets the same CPU resources -- never extra, never fewer. Otherwise, anything that has a timing element will vary according to what happens to be running next to it.
Comment 13•15 years ago
|
||
(In reply to comment #12) > (In reply to comment #11) > > If we > > set a lower limit, then those cycle will be reserve for the VM even if it's > > idle and not using it. When we set an upper limit, the VM won't be able to use > > more than what is allocated even if there is more resources available. > > Yes, that's exactly the point of what I'm asking for. I want to make sure that > if a unit test is running on a VM, it always gets the same CPU resources -- > never extra, never fewer. Otherwise, anything that has a timing element will > vary according to what happens to be running next to it. We don't have dedicated unittest machines. This will make the machines unable to burst during periods where there is free resources. Do you really want that?
Reporter | ||
Comment 14•15 years ago
|
||
Well, I want there to be as little timing variance in our unit test machines as possible. I don't care if the unit test VMs' resources are used if there is no unit test being run, but when they're running I want them to always run with the same (fixed) parameters. Fixed/permanent min=max limits are one way to address that, but if you have others (such as using the VMWare guest APIs to change the reservations when the unit test runs start and end) then go for it. How often are they idle?
Comment 15•15 years ago
|
||
I might be disconnected but I thought unit tests were on the Mac Minis and not at all on the VMs. Am I wrong?
Comment 16•15 years ago
|
||
Stalled on some input - which VMs are doing unit tests that need this?
Updated•15 years ago
|
Assignee: phong → joduinn
Updated•15 years ago
|
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Assignee | ||
Comment 17•14 years ago
|
||
Fixing this in bug#548768, as a side-effect of moving unittests from VMs to minis. Running unittests on identical hardware, one process per machine, should remove hardware/VM from concerns of variability in timeout and timing-sensitive test failures. I'll close this as DUP, because that seems closest; we are fixing the underlying problem in another bug - just differently to how originally asked here.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•