Closed
Bug 808671
Opened 12 years ago
Closed 12 years ago
Testruns on Linux nodes are getting aborted due to unknown reasons
Categories
(Mozilla QA Graveyard :: Mozmill Automation, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: whimboo, Assigned: whimboo)
References
Details
So far this has only be seen on our Linux 64bit machine [mm-ub-1204-64-1 (10.250.73.246)]. Under /tmp the binary, the profile, the mozmill-tests are not getting removed. Not sure yet under which condition that happens but we have to fix that ASAP. I will look into this issue in a bit.
Assignee | ||
Comment 1•12 years ago
|
||
As it looks like this problem appears whenever Jenkins itself has to cancel a testrun. On the Linux 64bit machine this happens for the functional testrun, especially the testAddons_enableDisableExtension/test2.js test. It hangs and Mozmill doesn't kill the application. Not sure why yet. http://10.250.73.243:8080/job/mozilla-central_functional/1582/console In some cases the run continuous but fails here: TEST-START | /tmp/tmpB8obr2.mozmill-tests/tests/functional/restartTests/testRestartChangeArchitecture/test3.js | setupModule WARNING | test3.js::setupModule | (SKIP) Architecture changes only supported on OSX 10.6 TEST-START | /tmp/tmpB8obr2.mozmill-tests/tests/functional/restartTests/testRestartChangeArchitecture/test3.js | tBuild timed out (after 60 minutes). Marking the build as aborted.
Assignee | ||
Comment 2•12 years ago
|
||
TEST-START | /tmp/tmp7Whhgc.mozmill-tests/tests/functional/restartTests/testAddons_enableDisableExtension/test2.js | testDisableExtension TEST-PASS | /tmp/tmp7Whhgc.mozmill-tests/tests/functional/restartTests/testAddons_enableDisableExtension/test2.js | test2.js::testDisableExtenBuild timed out (after 60 minutes). Marking the build as aborted. Build was aborted Recording test results No emails were triggered. Finished: ABORTED I wish we would send out emails for aborted runs. Dave, would that be possible? Can we get this files as a mozmill-ci issue?
Assignee | ||
Updated•12 years ago
|
Summary: Out of disk space due to testrun files are not getting removed → Out of disk space on Linux 64 node because testrun files are not getting removed when Jenkins aborts a testrun
Assignee | ||
Comment 3•12 years ago
|
||
The next task here is to figure out why Mozmill is not able to shutdown the browser in this situation. Most likely I have to file a Mozmill bug for it.
Comment 4•12 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #2) > I wish we would send out emails for aborted runs. Dave, would that be > possible? Can we get this files as a mozmill-ci issue? Not only possible, but relatively easy. I've raised https://github.com/mozilla/mozmill-ci/issues/182 for this.
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?(hskupin)
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?(hskupin)
Assignee | ||
Comment 6•12 years ago
|
||
So it's not a general issue with Linux64 but only for this specific node. In a local VM it works as expected.
Comment 7•12 years ago
|
||
Today we had some failures on the Fallback update test "Disconnect Error: Application unexpectedly closed". Jenkins console output: http://10.250.73.243:8080/job/mozilla-central_update/1589/console http://10.250.73.243:8080/job/mozilla-central_update/1590/console http://10.250.73.243:8080/job/mozilla-central_update/1591/console
Assignee | ||
Comment 8•12 years ago
|
||
Those errors are not related to this bug but bug 808548. I have reopened the other one.
Assignee | ||
Comment 9•12 years ago
|
||
This seems to happen across Linux nodes. So not only the 64 bit one is affected. We got a couple of those reports this morning: 32 bit: http://10.250.73.243:8080/job/mozilla-aurora_functional/1637/ 64 bit: http://10.250.73.243:8080/job/mozilla-aurora_functional/1636/ Mainly we fail in 'testAddons_enableDisableExtension/test2.js' but also in 'testAddons_RestartlessExtensionWorksAfterRestart'. Both cause a hang in all of the cases and Mozmill is not able to shutdown the browser. Andrea, please file individual bugs for each of the cases under mozmill-tests and mark them dependent on this bug. Thanks!
Component: Mozmill Automation → Mozmill
Product: Mozilla QA → Testing
Hardware: x86_64 → All
Summary: Out of disk space on Linux 64 node because testrun files are not getting removed when Jenkins aborts a testrun → Testruns on Linux nodes are getting aborted due to Mozmill not being able to shutdown the application after the global timeout
Whiteboard: [mozmill-1.5.20?][mozmill-2.0?]
Comment 10•12 years ago
|
||
This morning an endurance testrun got aborted, but no test was runned: http://10.250.73.243:8080/job/mozilla-aurora_endurance/1644/ updating to branch default 401 files updated, 0 files merged, 0 files removed, 0 files unresolved *** Installing 2012-11-13-04-20-14-mozilla-aurora-firefox-18.0a2.fr.win32.installer.exe => c:\docume~1\mozilla\locals~1\temp\tmpakaznl.binary\ *** Application: Firefox 18.0a2 *** Updating to branch 'mozilla-aurora' pulling from mozmill-tests searching for changes no changes found 37 files updated, 0 files merged, 0 files removed, 0 files unresolved Build timed out (after 60 minutes). Marking the build as aborted. Build was aborted Recording test results No test report files were found. Configuration error? Email was triggered for: Aborted Sending email for trigger: Aborted Sending email to: mozmill-ci@mozilla.org Finished: ABORTED
Assignee | ||
Comment 11•12 years ago
|
||
Oh wow! So that's not related to any type of testrun but seems to be a general issue with the VM or the Jenkins master<->slave connection. Dave, have you ever seen something like that?
Comment 12•12 years ago
|
||
I have not. I wouldn't expect a master/slave issue to cause a build to hang though. Has anyone witnessed this issue occurring? I wonder what is present during this time.
Comment 13•12 years ago
|
||
This may be related to bug 797389. Alex is going to demonstrate a hang to me now.
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Dave Hunt (:davehunt) from comment #13) > This may be related to bug 797389. Alex is going to demonstrate a hang to me > now. I don't think so. Two of the referenced tests do not make use of a user shutdown. When it happens the browser hangs. Not sure for what else I should look. Any idea?
Comment 15•12 years ago
|
||
Since yesterday afternoon, we have about 10 aborted testruns. Here are the links for restart tests: * http://10.250.73.243:8080/job/mozilla-central_functional/1834/ * http://10.250.73.243:8080/job/mozilla-aurora_functional/1680/ * http://10.250.73.243:8080/job/mozilla-central_functional/1826/ * http://10.250.73.243:8080/job/mozilla-aurora_functional/1673/ * http://10.250.73.243:8080/job/mozilla-aurora_functional/1663/ What I see now is that happened on the non restart tests also, failing at testPrefereces/testPreferredLanguage.js: * http://10.250.73.243:8080/job/mozilla-central_functional/1835/ * http://10.250.73.243:8080/job/mozilla-central_functional/1833/ * http://10.250.73.243:8080/job/mozilla-central_functional/1807/ * http://10.250.73.243:8080/job/mozilla-central_functional/1806 * http://10.250.73.243:8080/job/mozilla-central_functional/1801/ This is the most detailed error: TEST-START | /tmp/tmpE9GcwZ.mozmill-tests/tests/functional/testPreferences/testPreferredLanguage.js | setupModule TEST-PASS | /tmp/tmpE9GcwZ.mozmill-tests/tests/functional/testPreferences/testPreferredLanguage.js | testPreferredLanguage.js::setupModule TEST-START | /tmp/tmpE9GcwZ.mozmill-tests/tests/functional/testPreferences/testPreferredLanguage.js | testSetLanguages TEST-PASS | /tmp/tmpE9GcwZ.mozmill-tests/tests/functional/testPreferences/testPreferredLanguage.js | testPreferredLanguage.js::testSetLanguages TEST-START | /tmp/tmpE9GcwZ.mozmill-tests/tests/functional/testPreferences/teNOTE: child process received `Goodbye', closing down WARNING: waitpid failed pid:31116 errno:10: file /builds/slave/m-cen-lnx64-ntly/build/ipc/chromium/src/base/process_util_posix.cc, line 260 WARNING: waitpid failed pid:31116 errno:10: file /builds/slave/m-cen-lnx64-ntly/build/ipc/chromium/src/base/process_util_posix.cc, line 260 WARNING: Failed to deliver SIGKILL to 31116!(3).: file /builds/slave/m-cen-lnx64-ntly/build/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc, line 118 Build timed out (after 60 minutes). Marking the build as aborted. Build was aborted Recording test results Email was triggered for: Aborted Sending email for trigger: Aborted Sending email to: mozmill-ci@mozilla.org Finished: ABORTED I will file a separate bug for it.
Assignee | ||
Comment 16•12 years ago
|
||
Looks like Mozmill isn't involved at all here. So moving back to automation for now. I have restarted the box and will watch it today and do some trial runs. If it still happens I hope to be able to find the application in such a state.
Component: Mozmill → Mozmill Automation
Product: Testing → Mozilla QA
Whiteboard: [mozmill-1.5.20?][mozmill-2.0?]
Assignee | ||
Comment 17•12 years ago
|
||
In case when firefox currently runs tests and the process is halted, the application is not frozen and doesn't hang in any way. It just sits around and does nothing. I will try to nail this down. Probably this case could be related to the userShutdown issue.
Assignee | ||
Comment 18•12 years ago
|
||
As what we have seen the machines are totally slow in their response. It's somewhat similar to what we have already discovered with VMware Fusion in former time. The host gets filled up with memory and doesn't let the VM properly function anymore. A manual run of 'purge' fixed the problem for us all the time. I'm going to make use of the Linux VM's on qa-set again. I hope that will fix the problem until the new ESX cluster can be used.
Assignee | ||
Comment 19•12 years ago
|
||
I haven't swapped the machines yet but updated the Ubuntu 12.04 64bit VM in ESX with the latest software. It also upgraded Java to 7.x. As for now we do not see those aborts anymore. Not sure if it is because of the Java upgrade or the restart. I will do the same for the 32bit machine and watch results the next days or two.
Updated•12 years ago
|
Assignee | ||
Comment 20•12 years ago
|
||
This bug doesn't depend on bug 814430. It's independent. So I have also updated Ubuntu 32bit and so far I can only see aborts due to bug 814430. If it stays that way we shall be done here.
No longer depends on: 814430
Summary: Testruns on Linux nodes are getting aborted due to Mozmill not being able to shutdown the application after the global timeout → Testruns on Linux nodes are getting aborted due to unknown reasons
Assignee | ||
Comment 21•12 years ago
|
||
I call this done. No other aborts anymore as the known ones from the functional testrun which is covered buy bug 814430. http://10.250.73.243:8080/computer/mm-ub-1204-32-1/builds http://10.250.73.243:8080/computer/mm-ub-1204-64-1/builds
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•