Closed
Bug 438871
Opened 17 years ago
Closed 12 years ago
***Do not use any more** some tests fail intermittently
Categories
(Testing :: General, defect)
Testing
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: rcampbell, Unassigned)
References
(Depends on 45 open bugs)
Details
(Whiteboard: [not used for new bugs, see comment 26 for replacement searches/whines])
The following machines are still failing regularly after the netapp fixes and reorganization:
qm-centos5-01
qm-centos5-02*
qm-centos5-03
qm-centos5-04
qm-win2k3-moz2-01
qm-win2k3-pgo01
* - qm-centos5-02 is to be reimaged due to corruption - see bug 438664.
Could we verify that these machines are not on an overburdened ESX host and have plenty of fast storage available? They should also have at least 1GB of RAM each. Hopefully that'll improve performance.
Any resource allocation you can give them on the ESX hosts to improve performance would also be appreciated.
Reporter | ||
Comment 1•17 years ago
|
||
add qm-centos5-moz2-01 to that list.
Comment 2•17 years ago
|
||
Rob - you can check most of that on your own. If the datastore has the word "sata" in it, it's on a SATA array. If it has "fcal" it's on the fibre channel array.
Clicking on the ESX host's Summary tab will show you CPU and memory usage. Keep in mind that the BL460s have 8 cores and 16GB RAM - all the others have 8GB and at least two cores. I don't think any host is over tasked.
Even if that were the case, they'd just be slow not fail. Are you sure it's a performance issue?
Updated•17 years ago
|
Assignee: server-ops → mrz
Comment 3•17 years ago
|
||
(In reply to comment #0)
> The following machines are still failing regularly after the netapp fixes and
> reorganization:
>
qm-centos5-01 - SATA
qm-centos5-02 - FCAL
qm-centos5-03 - SATA
qm-centos5-04 - SAS (EqualLogic)
qm-win2k3-moz2-01 - FCAL
qm-win2k3-pgo01 - SATA & vmware tools are out of date
Not clear on what action you want me to take. Migrate to faster storage?
Pass back when ready.
Assignee: mrz → nobody
Component: Server Operations → Release Engineering
QA Contact: justin → release
Comment 4•17 years ago
|
||
issue here is not storage if all of these are having issues as they are on faster and slower storage alike, let's not move things around for nothing.
i think what is needed is to check and make sure the VMWare hosts are not overloaded (which none of them are) and check the memory size for all of them.
Updated•17 years ago
|
Summary: unittest vms still failing sporadically → some tests fail intermittently on unittest VMs
Comment 5•17 years ago
|
||
Current RAM allocated for each VM. Is this enough?
qm-centos5-01 - 512mb ram - SATA disk
qm-centos5-02 - 512mb ram - FCAL
qm-centos5-03 - 512mb ram - SATA
qm-centos5-04 - 512mb ram - SAS (EqualLogic)
qm-win2k3-moz2-01 - 2gb ram - FCAL
qm-win2k3-pgo01 - 2gb ram - SATA & vmware tools are out of date
Comment 6•17 years ago
|
||
(In reply to comment #5)
> qm-centos5-01 - 512mb ram
I used to run Fx on a machine with 512mb at work - it would intermittently switch between taking a second to open a window and taking 15 or 20 seconds. Not the sort of thing I'd try running unit tests on. I gave it 1gb and it turned painless.
Is qm-centos5-01 aka qm-centos5-moz2-01? It seems to be failing two out of three times, which means the tree is effectively closed for three or four hours out of every five or six.
Reporter | ||
Comment 7•17 years ago
|
||
these should all have 1GB minimum. I thought at one point the linux machines were bumped up to that, but that requirement may have been lost along the way.
(In reply to comment #6)
> Is qm-centos5-01 aka qm-centos5-moz2-01? It seems to be failing two out of
> three times, which means the tree is effectively closed for three or four hours
> out of every five or six.
No, they're separate machines.
Updated•17 years ago
|
Priority: -- → P3
Comment 8•17 years ago
|
||
(In reply to comment #7)
> these should all have 1GB minimum. I thought at one point the linux machines
> were bumped up to that, but that requirement may have been lost along the way.
Ah! Thats too low.
ok, I've now bumped RAM allocated for each VM:
qm-centos5-01 - 1gb ram - SATA disk
qm-centos5-02 - 1gb ram - FCAL
qm-centos5-03 - 1gb ram - SATA
qm-centos5-04 - 1gb ram - SAS (EqualLogic)
qm-win2k3-moz2-01 - 2gb ram - FCAL
qm-win2k3-pgo01 - 2gb ram - SATA & vmware tools are out of date
Lets see if that makes things better.
Priority: P3 → --
Updated•17 years ago
|
Assignee: nobody → lukasblakk
Priority: -- → P2
Updated•17 years ago
|
Comment 9•17 years ago
|
||
Has the RAM helped or are we still having intermittent failures?
Reporter | ||
Comment 10•17 years ago
|
||
They're still failing sporadically. Looking at cvs trunk right now there are 8 separate test suite failures on the two linux machines, seemingly unrelated to checkins. More on mozilla-central across the board. There was even one failure on the Mac mini though it appeared to be a valid exception.
Reporter | ||
Comment 11•17 years ago
|
||
the 2 linux machines mentioned above are qm-centos5-01 and 02. qm-centos5-03 has only had one orange cycle during the last 12 hours.
Comment 13•17 years ago
|
||
Did a quick investigation of the reliability of qm-moz2mini01.
It seems about 1/8 build (somewhere between 1-2 times a day) there is a failure like:
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla2/1212405798.1212411791.13207.gz
Which shows a cascade of multiple failures. This first appeared on 6/2 around 3am. The first builds on this machine were around 5/21.
There were other failures but they were either clearly due to a code checkin or not nearly as common. Sayre mentioned on IRC that test failures like this were previous caught with valgrind as some latent memory issues..
Comment 14•17 years ago
|
||
Note that the failure is most often preceded with something like:
*** 17828 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdatadeletedataexceedslength.html...
*** 17829 INFO PASS | characterdataDeleteDataExceedsLengthAssert
*** 17831 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdatadeletedatagetlengthanddata.html...
*** 17832 INFO PASS | data
*** 17833 INFO PASS | length
*** 17835 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdatadeletedatamiddle.html...
*** 17836 INFO PASS | characterdataDeleteDataMiddleAssert
*** 17838 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdatagetdata.html...
*** 17839 INFO PASS | characterdataGetDataAssert
*** 17841 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdatagetlength.html...
*** 17842 INFO PASS | characterdataGetLengthAssert
*** 17844 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrdeletedatacountnegative.html...
*** 17845 INFO expected error in todo testcase | throws_INDEX_SIZE_ERR
*** 17846 INFO TODO | test marked todo should fail somewhere |
*** 17848 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrdeletedataoffsetgreater.html...
*** 17849 INFO PASS | throw_INDEX_SIZE_ERR
*** 17851 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrdeletedataoffsetnegative.html...
*** 17852 INFO PASS | throws_INDEX_SIZE_ERR
*** 17854 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrinsertdataoffsetgreater.html...
*** 17855 INFO PASS | throw_INDEX_SIZE_ERR
*** 17857 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrinsertdataoffsetnegative.html...
*** 17858 INFO PASS | throws_INDEX_SIZE_ERR
*** 17860 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrreplacedatacountnegative.html...
*** 17861 INFO expected error in todo testcase | throws_INDEX_SIZE_ERR
*** 17862 INFO TODO | test marked todo should fail somewhere |
*** 17864 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrreplacedataoffsetgreater.html...
*** 17865 INFO PASS | throw_INDEX_SIZE_ERR
*** 17867 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_characterdataindexsizeerrreplacedataoffsetnegative.html...
NEXT ERROR *** 17868 ERROR FAIL | Unable to restore focus, expect failures and timeouts. |
**
Comment 15•17 years ago
|
||
Note for the record there was concern about overheating. This is a mini running at up to 70C - according to intel specs the CPU can take up to 100C. So I don't think that is the cause.
Updated•17 years ago
|
Updated•17 years ago
|
Summary: some tests fail intermittently on unittest VMs → [Tracking bug] some tests fail intermittently on unittest VMs
Updated•17 years ago
|
Comment 17•17 years ago
|
||
We should probably go back through the fixed dep bugs here and land them all in CVS as well.
Comment 18•17 years ago
|
||
Gonna take this bug to remind me to backport patches to 1.9 as I said in comment 17.
Assignee: lukasblakk → ted.mielczarek
Updated•16 years ago
|
Comment 19•16 years ago
|
||
I recall watching the tree with rather odd timeout behaviours, the closest thing to a reason being me watching at a currently running stdio directly on the buildbot waterfall. I looked at the code pushing data from the slave to the master, too, and that protocol seemed rather chatty. Thus...
Did we have a look at how much of our tubes we're using? For one push, for a push on central and 1.9.1, or with tracemonkey or nightlies even?
Comment 20•16 years ago
|
||
Adding bug 481414 for intermittent fails in scriptaculous
Depends on: 481414
Comment 21•16 years ago
|
||
Adding bug 481487 - unexplained crash in mochitests at /content/a11y/accessible/test_textboxes.html
Depends on: 481487
![]() |
||
Updated•16 years ago
|
Alias: randomorange
Updated•16 years ago
|
Updated•16 years ago
|
Updated•16 years ago
|
Updated•16 years ago
|
Comment 22•16 years ago
|
||
Adding all the bugs which have [orange] in their whiteboard but do not block this bug.
Depends on: 474915
Updated•16 years ago
|
Depends on: 483917
Depends on: 446197
Updated•16 years ago
|
Depends on: 490041
Depends on: 490062
Depends on: 491735
Updated•16 years ago
|
Depends on: 500063
Depends on: 504466
Depends on: 501960
Depends on: 503623
Depends on: 505217
Depends on: 505708
Depends on: 505718
Depends on: 505752
Depends on: 506038
Depends on: 507015
Depends on: 507698
Depends on: 510220
Depends on: 510219
Depends on: 510592
Depends on: 512296
Depends on: 523558
No longer depends on: 524014
Depends on: 525739
Updated•15 years ago
|
Summary: [Tracking bug] some tests fail intermittently on unittest VMs → [Tracking bug] some tests fail intermittently
Depends on: 528765
Depends on: 529338
Depends on: 529343
Updated•15 years ago
|
Depends on: 529837
Depends on: 529898
Depends on: 530225
Depends on: 530810
Depends on: 530906
Depends on: 531590
Depends on: 534243
Depends on: 534247
Depends on: 534277
Depends on: 534372
Depends on: 473841
Depends on: 536585
Depends on: 536587
Depends on: 530007
Depends on: 535585
Depends on: 537454
Depends on: 538364
Depends on: 539247
Depends on: 541852
Depends on: 541853
Depends on: 542078
Depends on: 542550
Depends on: 543228
Depends on: 544537
Comment 23•15 years ago
|
||
this bug was originally opened to track issues with regards to the netapp storage. We have long migrated off this storage and I don't see why this bug is open, with random dependencies added and removed constantly. Think we should just close this as the reason the bug was opened has long been fixed.
Comment 24•15 years ago
|
||
The bug is serving a useful purpose, so I'm not going to close it. I'll move it out of RelEng, since it definitely doesn't belong there anymore.
Assignee: ted.mielczarek → nobody
Component: Release Engineering → General
Product: mozilla.org → Testing
QA Contact: release → general
Version: other → Trunk
Depends on: 547613
Depends on: 560929
No longer depends on: 562000
Depends on: 563994
Depends on: 563997
Depends on: 564249
Depends on: 565437
Depends on: 566395
Depends on: 566398
Updated•15 years ago
|
Updated•15 years ago
|
Depends on: 570905
Depends on: 573524
Depends on: 574542
Depends on: 582821
Blocks: 582831
Depends on: 583361
Depends on: 583423
Depends on: 583449
Depends on: 583599
Depends on: 583598
Depends on: 583554
Updated•15 years ago
|
Depends on: 585695
Depends on: 586422
Updated•15 years ago
|
Depends on: 595368
Depends on: 595372
Depends on: 586295
Depends on: 595413
Depends on: 595417
Depends on: 596603
Depends on: 597742
Updated•14 years ago
|
Updated•14 years ago
|
Depends on: 602727
Depends on: 605392
Updated•14 years ago
|
Updated•14 years ago
|
Updated•14 years ago
|
Updated•14 years ago
|
Depends on: 612625
Updated•14 years ago
|
Depends on: 618041
Depends on: 618233
Depends on: 618926
Depends on: 626100
Depends on: 626103
Depends on: 626119
Updated•14 years ago
|
Depends on: 636278
Depends on: 636790
Depends on: 636793
Depends on: 653943
Description
•