Closed Bug 392788 Opened 18 years ago Closed 18 years ago

Intermittent reftest failures on "qm-centos5-01" Tinderbox

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: MatsPalmgren_bugz, Assigned: mrz)

References

()

Details

I looked at 4 logs, 3 had these errors: REFTEST UNEXPECTED FAIL (LOADING): file:///builds/slave/trunk_centos5/mozilla/layout/reftests/bugs/28811-2a.html REFTEST UNEXPECTED FAIL: file:///builds/slave/trunk_centos5/mozilla/layout/reftests/bugs/28811-2b.html 1 had these errors: REFTEST UNEXPECTED FAIL (LOADING): file:///builds/slave/trunk_centos5/mozilla/layout/reftests/bugs/382600-1.html REFTEST UNEXPECTED FAIL: file:///builds/slave/trunk_centos5/mozilla/layout/reftests/bugs/384576-1.html Since there is no exception for Orange on this Tinderbox, this bug blocks me from doing checkins as far as I understand it. http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox
Summary: Intermittent reftest failures on "centos5-01" Tinderbox → Intermittent reftest failures on "qm-centos5-01" Tinderbox
Apparently people are checking in anyway...
Severity: blocker → major
bug 381765 can't be the problem because that's Mac-only.
I made a checkin yesterday to increase the reftest load timeout from 10s to 30s, and that seems to have mostly fixed the problem. The question is, what the heck is causing these simple pages to take more than 10s to load? Is something weird going on with the VM?
It's entirely possible, if not likely. We see intermittent problems still on qm-winxp01 related to slow VM performance. I think there are around 8 VMs running on that vhost so there's a lot of competition for the hardware. If we continue to see problems we can try to move qm-centos5-01 to physical hardware.
Blocks: 380595
The VM is apparently still not entirely its usual self. Just before I checked in a mochitest (bug 380595), the box was already orange due to a strange netwerk unittest failure, and afterwards it failed one of my mochitests, without being able to find a logical explanation for this failure. I've backed out the test for now, but it'd be good to figure out what's up with the box and/or how it could be fixed.
(In reply to comment #5) > If we > continue to see problems we can try to move qm-centos5-01 to physical hardware. What about another ESX host as a first try? As mentioned in bug 394051, there will be another QA ESX host coming online probably this week. This would be a lot easier than trying to build a new host (which would be a reinstall and some effort on someone's behalf).
Not sure. How many machines are we going to be running on the new host? These machines seem to be really sensitive to hardware availability. We could try it, but I'm worried about running into the same problem down the road as we add more machines to the vm host.
Justin had a good point in the other bug - if this is perf stuff it should be a seperate box. He mentioned setting up a mini - will that be fine (and can these two be combined?)?
Any progress on the setting up of another machine?
We've setup lots of new machines, just not this one. :) Is this reftest still failing intermittently? Matt, is the new vmhost ready? If so, we can clone this machine or move it over.
Which esx server do you mean? qm-vmware01 and qm-vmware02 are the two QA ESX servers.
(In reply to comment #11) > We've setup lots of new machines, just not this one. :) > > Is this reftest still failing intermittently? > <snip> Not that I know of. However, I had to comment out the most important of the tests for bug 380595 *again*, back in December, because it intermittently failed on this box.
Matthew: I'm not sure who lives on which server. Are they fairly balanced or does one have more cycles available than the other? If so, I'd like qm-centos5-01 moved to the less-occupied box. I think this is going to be a temporary solution at-best as we fill up both of these servers. Gijs: I feel your pain.
qm-vmware02 has capacity. Who's doing the clone, me or you?
I don't think they let me play with clones. That'll have to be you. Let me know when you're ready to do it and I'll take the machine down.
qm-centos5-01 is on dev-vmware01 probably because it was made before qm-vmware02 existed. The right place is really qm-vmware01 but it's short on disk space - will setup iscsi and hot clone.
Assignee: nobody → mrz
This got buried. I'll have to shift a whole bunch of VMs around to make room on qm-vmware01 and I don't know if that will have any performance benefit over where it is now. I can move it to qm-vmware02 but that requires downtime.
if it's more work than setting up a dedicated box, why don't we do that instead? The whole plan was originally to just move this (or setup a new instance of) to dedicated hardware but it looks like it's more of a pain than you originally thought.
What's the action plan then?
I'll file a bug/reopen existing bug to order hardware with specs. Do we have standard-issue server grade hardware that can run with a reasonable graphics card in linux?
Rob - can I take a downtime hit on qm-centos5-01 to move it to the SAN and onto an unloaded ESX host? I also want to up the memory and and CPU.
yup, do it up.
I should have asked, how long will it take first? And when would you like to do it so I can give some headsup?
The tree will need to be closed for any downtime to qm-centos5-01, so doing it sometime out of normal hours would be nice.
We have changes in bug 393413 that need to land to and will also require minimal downtime. Let's coordinate the two.
(In reply to comment #25) > The tree will need to be closed for any downtime to qm-centos5-01, so doing it > sometime out of normal hours would be nice. Oh, in that case, let me do a hot clone and let you know when that's done. From your perspective, it'll be a reboot so it should be quick-ish. Let's plan on Thursday morning around 10am to do the "reboot" ?
(In reply to comment #27) > Let's plan on Thursday morning around 10am to do the "reboot" ? Works for me.
me too. We'll meet up with you then.
Moved this morning. The original VM was paused and I'll leave it there for awhile before removing it. The new image has twice the memory and a second virtual CPU.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
this isn't working. It might be the additional CPU or something else, but we're having a bunch of inexplicable problems. Could we revert to the original image?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Dropped the extra cpu lastnight.
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
Component: Testing → Release Engineering
Product: Core → mozilla.org
QA Contact: testing → release
Version: Trunk → other
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.