Intermittent Android 7.0 x86-64 opt [taskcluster:error] Task timeout after 2700/5400 seconds. Force killing container.
Categories
(Firefox Build System :: Task Configuration, defect, P5)
Tracking
(firefox76 unaffected, firefox77 unaffected, firefox78 wontfix)
Tracking | Status | |
---|---|---|
firefox76 | --- | unaffected |
firefox77 | --- | unaffected |
firefox78 | --- | wontfix |
People
(Reporter: intermittent-bug-filer, Assigned: gbrown)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
Filed by: nerli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=301420455&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RWHImX9DT920WGk4eScQqg/runs/0/artifacts/public/logs/live_backing.log
[task 2020-05-08T15:51:22.812Z] 15:51:22 INFO - wait for org.mozilla.geckoview.test complete; top activity=com.android.launcher3
[task 2020-05-08T15:51:22.920Z] 15:51:22 INFO - remoteautomation.py | Application ran for: 0:00:27.934177
[task 2020-05-08T15:51:23.458Z] 15:51:23 INFO - Stopping web server
[task 2020-05-08T15:51:23.463Z] 15:51:23 INFO - Stopping web socket server
[task 2020-05-08T15:51:23.487Z] 15:51:23 INFO - Stopping ssltunnel
[task 2020-05-08T15:51:23.509Z] 15:51:23 INFO - leakcheck | refcount logging is off, so leaks can't be detected!
[task 2020-05-08T15:51:23.509Z] 15:51:23 INFO - runtests.py | Running tests: end.
[task 2020-05-08T15:51:24.799Z] 15:51:24 INFO - Buffered messages finished
[task 2020-05-08T15:51:24.799Z] 15:51:24 INFO - Running manifest: parser/htmlparser/tests/mochitest/mochitest.ini
[task 2020-05-08T15:51:25.373Z] 15:51:25 INFO - runtests.py | Failed to copy /builds/worker/workspace/build/tests/mochitest/hyphenation to profile
[task 2020-05-08T15:51:25.484Z] 15:51:25 INFO - pk12util: PKCS12 IMPORT SUCCESSFUL
[task 2020-05-08T15:51:25.594Z] 15:51:25 INFO - MochitestServer : launching [u'/builds/worker/workspace/build/hostutils/host-utils-77.0a1.en-US.linux-x86_64/xpcshell', '-g', '/builds/worker/workspace/build/hostutils/host-utils-77.0a1.en-US.linux-x86_64', '-f', '/builds/worker/workspace/build/hostutils/host-utils-77.0a1.en-US.linux-x86_64/components/httpd.js', '-e', "const _PROFILE_PATH = '/tmp/tmp7eqAGE.mozrunner'; const _SERVER_PORT = '8854'; const _SERVER_ADDR = '10.0.2.2'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', '/builds/worker/workspace/build/tests/mochitest/server.js']
[task 2020-05-08T15:51:25.594Z] 15:51:25 INFO - runtests.py | Server pid: 9193
[task 2020-05-08T15:51:25.599Z] 15:51:25 INFO - runtests.py | Websocket server pid: 9196
[task 2020-05-08T15:51:25.606Z] 15:51:25 INFO - runtests.py | SSL tunnel pid: 9199
[task 2020-05-08T15:51:26.271Z] 15:51:26 INFO - runtests.py | Running with scheme: http
[task 2020-05-08T15:51:26.271Z] 15:51:26 INFO - runtests.py | Running with e10s: True
[task 2020-05-08T15:51:26.271Z] 15:51:26 INFO - runtests.py | Running with fission: False
[task 2020-05-08T15:51:26.271Z] 15:51:26 INFO - runtests.py | Running with serviceworker_e10s: True
[task 2020-05-08T15:51:26.271Z] 15:51:26 INFO - runtests.py | Running with socketprocess_e10s: False
[task 2020-05-08T15:51:26.271Z] 15:51:26 INFO - runtests.py | Running tests: start.
[task 2020-05-08T15:51:26.595Z] 15:51:26 INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
[task 2020-05-08T15:51:28.442Z] 15:51:28 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity -a android.intent.action.MAIN --es env9 MOZ_UPLOAD_DIR=/data/local/tmp/tests/mozlog --es env8 R_LOG_DESTINATION=stderr --es args "-no-remote -profile /data/local/tmp/tests/profile//" --es env3 R_LOG_VERBOSE=1 --es env2 XPCOM_DEBUG_BREAK=stack --es env1 MOZ_WEBRENDER=0 --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env6 MOZ_IN_AUTOMATION=1 --es env5 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env4 MOZ_HIDE_RESULTS_TABLE=1 --ez use_multiprocess True --es env11 R_LOG_LEVEL=6 --es env10 MOZ_CRASHREPORTER_NO_REPORT=1 -d "http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&logFile=%2Fdata%2Flocal%2Ftmp%2Ftests%2Flogs%2Fmochitest.log&fileLevel=INFO&consoleLevel=INFO&hideResultsTable=1&manifestFile=tests.json&dumpOutputDirectory=%2Fdata%2Flocal%2Ftmp%2Ftests"
[task 2020-05-08T15:51:29.259Z] 15:51:29 INFO - remoteautomation.py | Application pid: 14236
[task 2020-05-08T15:51:31.333Z] 15:51:31 INFO - 3668 INFO SimpleTest START
[task 2020-05-08T15:51:31.334Z] 15:51:31 INFO - 3669 INFO TEST-START | parser/htmlparser/tests/mochitest/test_bug102699.html
[task 2020-05-08T15:51:31.335Z] 15:51:31 INFO - 3670 INFO TEST-OK | parser/htmlparser/tests/mochitest/test_bug102699.html | took 1113ms
[task 2020-05-08T15:51:31.335Z] 15:51:31 INFO - 3671 INFO TEST-START | parser/htmlparser/tests/mochitest/test_bug1104732.html
[taskcluster:error] Task timeout after 5400 seconds. Force killing container.
[taskcluster 2020-05-08 15:51:38.817Z] === Task Finished ===
[taskcluster 2020-05-08 15:51:38.818Z] Unsuccessful task run with exit code: -1 completed in 5403.46 seconds
Updated•5 years ago
|
Comment 1•5 years ago
|
||
There was a discussion about this failures on Firefox CI channel on Riot. We also have some other reftest occurrences like:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301417701&repo=autoland&lineNumber=2479
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301417725&repo=autoland&lineNumber=1094
Assignee | ||
Comment 2•5 years ago
|
||
These task timeouts show very long android-sdk-linux downloads.
android-sdk-linux downloads changed from tooltool to toolchain in bug 1624649, on 2020-05-06.
There was also a change in hosts, from packet.net to aws bare metal, bug 1635877, on 2020-05-07/08.
wcosta is trying to determine which change is responsible for these timeouts.
Assignee | ||
Comment 3•5 years ago
|
||
Task timeouts are often reported in bug 1411358. Note another complicating factor in analyzing those failures: There were recent (2020-05-03/04) android-em failures there from another cause, addressed in bug 1634230.
Comment 4•5 years ago
•
|
||
I did a couple of try pushes to packet.net to confirm if it only happens on AWS:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=828a47a8bda6a7217aeff9e1ae77c5fee8706bcd
https://treeherder.mozilla.org/#/jobs?repo=try&revision=97e1464b3431ca69044f5e3c635b0ff7769608fe
Assignee | ||
Comment 5•5 years ago
|
||
Spot checking earlier android-em tasks on packet.net with toolchain fetches of android-sdk-linux I see good performance:
[fetches 2020-05-08T04:18:58.541Z] http://taskcluster/api/queue/v1/task/dttL8yxRRridMMMoTyfacQ/artifacts/project/gecko/android-sdk/android-sdk-linux.tar.xz resolved to 592960992 bytes with sha256 288057ca2a3a06edcb2daa05d8fa0f972295651d13d515d6f958dc7bee2f4958 in 31.204s
[fetches 2020-05-08T04:03:45.809Z] http://taskcluster/api/queue/v1/task/dttL8yxRRridMMMoTyfacQ/artifacts/project/gecko/android-sdk/android-sdk-linux.tar.xz resolved to 592960992 bytes with sha256 288057ca2a3a06edcb2daa05d8fa0f972295651d13d515d6f958dc7bee2f4958 in 34.987s
[fetches 2020-05-08T04:03:37.218Z] http://taskcluster/api/queue/v1/task/dttL8yxRRridMMMoTyfacQ/artifacts/project/gecko/android-sdk/android-sdk-linux.tar.xz resolved to 592960992 bytes with sha256 288057ca2a3a06edcb2daa05d8fa0f972295651d13d515d6f958dc7bee2f4958 in 26.148s
Assignee | ||
Comment 6•5 years ago
|
||
...but spot checking recent android-em tasks on aws with toolchain fetches also typically shows good performance, sometimes better:
[fetches 2020-05-08T16:26:37.773Z] http://taskcluster/api/queue/v1/task/dttL8yxRRridMMMoTyfacQ/artifacts/project/gecko/android-sdk/android-sdk-linux.tar.xz resolved to 592960992 bytes with sha256 288057ca2a3a06edcb2daa05d8fa0f972295651d13d515d6f958dc7bee2f4958 in 11.648s
Comment 7•5 years ago
•
|
||
As we discussed in the channel, packet.net runs 4 tasks per worker, while AWS we run 32 (r5.metal) or 15 (m5.metal) tasks in parallel. In a perfect storm where all tasks require the android-sdk-linux
artifact, and they all start more or less the same time, it might be possible the instance suffers from a network bandwidth shortage, causing slow downloads of artifacts.
Comment 8•5 years ago
•
|
||
The following two logs show hangs of almost an hour. Both had this line before (both at 14:38):
Removing /builds/worker/fetches/fix-stacks.tar.xz
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301417701&repo=autoland&lineNumber=2479
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301417725&repo=autoland&lineNumber=41
Assignee | ||
Comment 9•5 years ago
|
||
(In reply to Wander Lairson Costa from comment #4)
I made a couple of try pushes to packet.net to confirm if it only happens on AWS:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=828a47a8bda6a7217aeff9e1ae77c5fee8706bcd
https://treeherder.mozilla.org/#/jobs?repo=try&revision=97e1464b3431ca69044f5e3c635b0ff7769608fe
Thanks Wander. I don't see any task timeouts there.
Comment 10•5 years ago
|
||
I did a bunch of try pushes to baremetal, no sign of timeouts either:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac22c19603dadf6c560b4fba8c1154bd2f1f8b52
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d7e3aa95b80634ddabb4e172f22e202d10476bd9
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca2fea3ffe21e9697c691b04b26d28878ceb887c
Comment hidden (Intermittent Failures Robot) |
Comment 12•5 years ago
|
||
Set release status flags based on info from the regressing bug 1624649
Updated•5 years ago
|
Assignee | ||
Comment 13•5 years ago
|
||
No recent failures. Let's fold this into the main timeout bug and revisit in that bug if this condition arises again.
Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Updated•4 years ago
|
Description
•