Closed
Bug 1455953
Opened 7 years ago
Closed 6 years ago
Intermittent Aborting task - max run time exceeded! after talos hang after download of target.common.tests.zip took very long
Categories
(Testing :: Talos, defect)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: CosminS, Unassigned)
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
140.56 KB,
image/jpeg
|
Details |
11:52:31 INFO - Processing c:\users\task_1524395807\build\tests\mozbase\mozprocess
11:52:43 INFO - Processing c:\users\task_1524395807\build\tests\mozbase\mozprofile
11:52:57 INFO - Processing c:\users\task_1524395807\build\tests\mozbase\mozrunner
11:53:06 INFO - Processing c:\users\task_1524395807\build\tests\mozbase\mozscreenshot
11:53:12 INFO - Processing c:\users\task_1524395807\build\tests\mozbase\moztest
11:53:20 INFO - Processing c:\users\task_1524395807\build\tests\mozbase\mozversion
11:53:27 INFO - Installing collected packages: mozterm, manifestparser, mozcrash, mozdebug, mozdevice, mozfile, mozhttpd, mozinfo, mozInstall, mozleak, mozlog, moznetwork, mozprocess, mozprofile, mozrunner, mozscreenshot, moztest, mozversion
11:53:27 INFO - Running setup.py install for mozterm: started
11:53:40 INFO - Running setup.py install for mozterm: finished with status 'done'
11:53:41 INFO - Running setup.py install for manifestparser: started
11:53:51 INFO - Running setup.py install for manifestparser: finished with status 'done'
11:53:52 INFO - Running setup.py install for mozcrash: started
11:54:01 INFO - Running setup.py install for mozcrash: finished with status 'done'
11:54:02 INFO - Running setup.py install for mozdebug: started
11:54:11 INFO - Running setup.py install for mozdebug: finished with status 'done'
11:54:11 INFO - Running setup.py install for mozdevice: started
11:54:26 INFO - Running setup.py install for mozdevice: finished with status 'done'
11:54:26 INFO - Running setup.py install for mozfile: started
11:54:37 INFO - Running setup.py install for mozfile: finished with status 'done'
11:54:37 INFO - Running setup.py install for mozhttpd: started
11:54:48 INFO - Running setup.py install for mozhttpd: finished with status 'done'
11:54:49 INFO - Running setup.py install for mozinfo: started
11:55:00 INFO - Running setup.py install for mozinfo: finished with status 'done'
11:55:00 INFO - Running setup.py install for mozInstall: started
[taskcluster 2018-04-22T11:55:00.802Z] Exit Code: 0
[taskcluster 2018-04-22T11:55:00.802Z] User Time: 0s
[taskcluster 2018-04-22T11:55:00.802Z] Kernel Time: 0s
[taskcluster 2018-04-22T11:55:00.802Z] Wall Time: 24m57.0562737s
[taskcluster 2018-04-22T11:55:00.802Z] Peak Memory: 6041600
[taskcluster 2018-04-22T11:55:00.802Z] Result: IDLENESS_LIMIT_EXCEEDED
[taskcluster 2018-04-22T11:55:00.803Z] === Task Finished ===
[taskcluster 2018-04-22T11:55:00.803Z] Task Duration: 24m57.2678403s
![]() |
||
Comment 1•7 years ago
|
||
This is hitting talos across the board.
Examples:
https://treeherder.mozilla.org/logviewer.html#?job_id=175009545&repo=mozilla-central&lineNumber=125
11:30:05 INFO - Downloading and extracting to C:\Users\task_1524395807\build\tests these dirs * from https://queue.taskcluster.net/v1/task/UyOJifjARxOX6VLJdJWiTg/artifacts/public/build/target.common.tests.zip
11:30:05 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/UyOJifjARxOX6VLJdJWiTg/artifacts/public/build/target.common.tests.zip'}, attempt #1
11:30:05 INFO - Fetch https://queue.taskcluster.net/v1/task/UyOJifjARxOX6VLJdJWiTg/artifacts/public/build/target.common.tests.zip into memory
11:30:06 INFO - Content-Length response header: 40057399
11:30:06 INFO - Bytes received: 40057399
11:39:46 INFO - Downloading and extracting to C:\Users\task_1524395807\build\tests these dirs * from https://queue.taskcluster.net/v1/task/UyOJifjARxOX6VLJdJWiTg/artifacts/public/build/target.talos.tests.zip
https://treeherder.mozilla.org/logviewer.html#?job_id=174978720&repo=mozilla-central
https://treeherder.mozilla.org/logviewer.html#?job_id=175023685&repo=try (central-as-beta simulation)
https://treeherder.mozilla.org/logviewer.html#?job_id=175006261&repo=mozilla-central&lineNumber=116
Flags: needinfo?(rwood)
Summary: Intermittent talos (O) IDLENESS_LIMIT_EXCEEDED → Intermittent Aborting task - max run time exceeded! after talos hang after download of target.common.tests.zip took very long
Comment hidden (Intermittent Failures Robot) |
Comment 3•7 years ago
|
||
I believe most of these issues are windows mooonshot machines- this is infra related as we are timing out or not getting the specific resources we are looking for.
:markco, do you have backend logs about networking issues with the moonshots?
:aryx, how can we get a list of the problems, specifically I want to look at all the failures and see if specific machine names show up.
Flags: needinfo?(rwood) → needinfo?(mcornmesser)
![]() |
||
Updated•7 years ago
|
Keywords: intermittent-failure
![]() |
||
Comment 4•7 years ago
|
||
All classifications for T-W1064-244: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1455953
Comment 5•7 years ago
|
||
Is this isolated to T-W1064-244?
That machine appears to hit a state where GenericWorker has hit an issue with privileges. I have quarantined the machine, so it will stop picking up tasks.
Apr 22 08:27:49 T-W1064-MS-244.mdc1.mozilla.com generic-worker: time="2018-04-22T15:27:48Z" level=error msg="Error terminating process 1368: Access is denied." #015
Apr 22 13:23:35 T-W1064-MS-244.mdc1.mozilla.com generic-worker: time="2018-04-22T20:23:34Z" level=error msg="Error terminating process 1344: Access is denied." #015
Apr 22 19:20:22 T-W1064-MS-244.mdc1.mozilla.com generic-worker: time="2018-04-23T02:20:21Z" level=error msg="Error terminating process 1360: Access is denied." #015
Apr 23 02:16:03 T-W1064-MS-244.mdc1.mozilla.com generic-worker: time="2018-04-23T09:16:01Z" level=error msg="Error terminating process 1340: Access is denied." #015
Apr 23 03:54:38 T-W1064-MS-244.mdc1.mozilla.com generic-worker: time="2018-04-23T10:54:37Z" level=error msg="Error terminating process 1352: Access is denied." #015
I will dive in and see if I can find a root cause.
Flags: needinfo?(mcornmesser)
Comment 6•7 years ago
|
||
I was doing some testing and unquarantined the node. The test gets to here and just hangs.
I have not found anything interesting in the logs yet.
Comment 7•7 years ago
|
||
If I click on the window the test resumes, closes, opens a new Windows and hangs again.
Comment 8•7 years ago
|
||
I wonder if this is running the same generic worker as the other machines? I suspect it is the same and the error is different, but that behavior is what I saw on the 10.6* generic worker for reftests when we were trying to test the upgraded agent as a solution to the disk space and log files.
Comment 9•7 years ago
|
||
Just verified the version is 8.3.0.
Comment 10•7 years ago
|
||
maybe we reinstall the machine? It would be nice to figure out if there is a system resource hang? Maybe a multiple processes running (we didn't clean up from an earlier job or something?)
Comment 11•7 years ago
|
||
I have kicked off an install and setup a papertrail alert for that will email relops if another machine hits this state.
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•