Closed
Bug 1440995
Opened 7 years ago
Closed 7 years ago
Intermittent talos OS X Aborting task - max run time exceeded! followed by Exit Code: -1
Categories
(Testing :: Talos, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1445580
People
(Reporter: apavel, Unassigned)
References
Details
(Whiteboard: [stockwell infra])
Revision: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=bfe62272d2a21f9d10d45e5aaa680f5c735604ae
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=164190404&repo=mozilla-central
02:59:47 INFO - TEST-INFO | started process 2511 (/Users/cltbld/tasks/task_1519524388/build/application/Firefox Nightly.app/Contents/MacOS/firefox -foreground -profile /var/folders/9g/1jk_l9qn7pn68yzgn2443qzc00000w/T/tmpQmZUGh/profile http://localhost:49225/startup_test/tresize/addon/content/tresize-test.html?gecko_profile_entries=1000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=%2Fvar%2Ffolders%2F9g%2F1jk_l9qn7pn68yzgn2443qzc00000w%2FT%2Ftmp1WLxYm&gecko_profile_interval=2 http://localhost:49225/startup_test/tresize/addon/content/tresize-test.html?gecko_profile_entries=1000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=%2Fvar%2Ffolders%2F9g%2F1jk_l9qn7pn68yzgn2443qzc00000w%2FT%2Ftmp1WLxYm&gecko_profile_interval=2)
02:59:47 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
02:59:48 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
02:59:49 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
02:59:49 INFO - PID 2511 | 2018-02-25 02:59:49.392 plugin-container[2512:33809] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x9d3b, name = 'com.apple.tsm.portname'
02:59:49 INFO - PID 2511 | See /usr/include/servers/bootstrap_defs.h for the error codes.
02:59:49 INFO - PID 2511 | 2018-02-25 02:59:49.421 plugin-container[2513:33831] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x924b, name = 'com.apple.tsm.portname'
02:59:49 INFO - PID 2511 | See /usr/include/servers/bootstrap_defs.h for the error codes.
02:59:50 INFO - PID 2511 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
02:59:50 INFO - PID 2511 | 2018-02-25 02:59:50.758 plugin-container[2514:33941] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x963f, name = 'com.apple.tsm.portname'
02:59:50 INFO - PID 2511 | See /usr/include/servers/bootstrap_defs.h for the error codes.
03:00:03 INFO - PID 2511 | 2018-02-25 03:00:02.999 firefox[2511:33804] Persistent UI failed to open file file:///Users/cltbld/Library/Saved%20Application%20State/org.mozilla.nightly.savedState/window_1.data: No such file or directory (2)
03:00:05 INFO - PID 2511 | __start_report28.62__end_report
03:00:05 INFO - PID 2511 | __startTimestamp1519556405340__endTimestamp
03:00:06 INFO - PID 2511 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
03:00:06 INFO - TEST-INFO | 2511: exit 0
[taskcluster 2018-02-25T11:00:17.346Z] Aborting task - max run time exceeded!
[taskcluster 2018-02-25T11:00:17.350Z] Exit Code: -1
[taskcluster 2018-02-25T11:00:17.350Z] === Task Finished ===
[taskcluster 2018-02-25T11:00:17.350Z] Task Duration: 14m58.809909888s
[taskcluster 2018-02-25T11:00:17.980Z] Uploading artifact public/logs/localconfig.json from file logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:18.439Z] Uploading artifact public/logs/talos_critical.log from file logs/talos_critical.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:18.797Z] Uploading artifact public/logs/talos_error.log from file logs/talos_error.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:19.176Z] Uploading artifact public/logs/talos_fatal.log from file logs/talos_fatal.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:19.569Z] Uploading artifact public/logs/talos_info.log from file logs/talos_info.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:19.993Z] Uploading artifact public/logs/talos_raw.log from file logs/talos_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:20.539Z] Uploading artifact public/logs/talos_warning.log from file logs/talos_warning.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:20.908Z] Uploading artifact public/test_info/chromez-e10s_errorsummary.log from file build/blobber_upload_dir/chromez-e10s_errorsummary.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:21.295Z] Uploading artifact public/test_info/chromez-e10s_raw.log from file build/blobber_upload_dir/chromez-e10s_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:21.697Z] Uploading artifact public/test_info/missingsymbols.zip from file build/blobber_upload_dir/missingsymbols.zip with content encoding "", mime type "application/zip" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:22.578Z] Uploading artifact public/test_info/profile_about_preferences_basic.zip from file build/blobber_upload_dir/profile_about_preferences_basic.zip with content encoding "", mime type "application/zip" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:23.241Z] Uploading artifact public/test_info/profile_tresize.zip from file build/blobber_upload_dir/profile_tresize.zip with content encoding "", mime type "application/zip" and expiry 2019-02-25T10:03:27.189Z
[taskcluster 2018-02-25T11:00:23.574Z] Task not successful due to following exception(s):
[taskcluster 2018-02-25T11:00:23.574Z] Exception 1)
[taskcluster 2018-02-25T11:00:23.574Z] signal: killed
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 3•7 years ago
|
||
Over the last 7 days this bug has 30 failures. These happen on OS X 10.10 and macosx64-nightly
Here is the most recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=165946894
Flags: needinfo?(rwood)
Updated•7 years ago
|
Whiteboard: [stockwell needswork]
Comment 4•7 years ago
|
||
these are all on profiling jobs- I suspect there is something broken there- possibly a quick fix.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 7•7 years ago
|
||
Since 11.03 this fail spikes up and now there are 75 failures in the last 3 days.
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1440995&startday=2018-03-11&endday=2018-03-13&tree=all
Most of them are on OS X 10.10 and only 6 on macosx64-nightly. There are 2 on Linux but those are starred wrong
:rwood, can you please take a look?
Comment hidden (Intermittent Failures Robot) |
Comment 9•7 years ago
|
||
this seems to be mostly related to the time to download artifact [1]:
3:14:41 INFO - u'xpcshell': [u'target.common.tests.zip', u'target.xpcshell.tests.zip']}
23:14:41 INFO - Downloading packages: [u'target.common.tests.zip', u'target.mochitest.tests.zip'] for test suite categories: ['mochitest']
23:14:41 INFO - Downloading and extracting to /Users/cltbld/tasks/task_1520920450/build/tests these dirs * from https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip
23:14:41 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip'}, attempt #1
23:14:41 INFO - Fetch https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip into memory
23:15:01 INFO - Content-Length response header: 36551587
23:15:01 INFO - Bytes received: 36551587
23:15:18 INFO - Downloading and extracting to /Users/cltbld/tasks/task_1520920450/build/tests these dirs * from https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip
23:15:18 INFO - retry: Calling fetch_url_into_memory with args: (), kwargs: {'url': u'https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip'}, attempt #1
23:15:18 INFO - Fetch https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip into memory
23:52:34 INFO - Content-Length response header: 78551031
23:52:34 INFO - Bytes received: 78551031
23:52:46 INFO - Downloading https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.dmg to /Users/cltbld/tasks/task_1520920450/installer.dmg
23:52:46 INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.dmg', 'file_name': '/Users/cltbld/tasks/task_1520920450/installer.dmg'}, attempt #1
23:53:00 INFO - Downloaded 73040179 bytes.
23:53:00 INFO - Setting buildbot property build_url to https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.dmg
23:53:00 INFO - mkdir: /Users/cltbld/tasks/task_1520920450/properties
23:53:00 INFO - Writing buildbot properties ['build_url'] to /Users/cltbld/tasks/task_1520920450/properties/build_url
23:53:00 INFO - Writing to file /Users/cltbld/tasks/task_1520920450/properties/build_url
that is almost 30 minutes to download target.common.tests.zip which is 78MB.
In other cases [2], [3] we time out before finishing the download of a .zip or python package. Here is a snippet from [4]:
15:04:59 INFO - Collecting simplejson
15:06:15 INFO - Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x1098fbd10>: Failed to establish a new connection: [Errno 60] Operation timed out',)': /pub
15:07:30 INFO - Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x1098fb1d0>: Failed to establish a new connection: [Errno 60] Operation timed out',)': /pub
15:08:46 INFO - Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x109a1d090>: Failed to establish a new connection: [Errno 60] Operation timed out',)': /pub
I am tempted to make this a duplicate of bug 1443130, needinfo :dragrom as that is the next step on bug 1443130 as well.
[1] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167596727
[2] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167596731
[3] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167793330
[4] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=167775711
Flags: needinfo?(rwood) → needinfo?(dcrisan)
Comment 10•7 years ago
|
||
Tested on t-yosemite-r7-423.test.releng.mdc1.mozilla.com:
[root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# wget https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip; rm target.common.tests.zip
--2018-03-14 05:24:24-- https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip
Resolving queue.taskcluster.net... 50.16.228.78, 107.22.197.53, 50.17.218.87
Connecting to queue.taskcluster.net|50.16.228.78|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip [following]
--2018-03-14 05:24:25-- https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip
Resolving taskcluster-artifacts.net... 54.192.212.56
Connecting to taskcluster-artifacts.net|54.192.212.56|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36551587 (35M) [application/zip]
Saving to: 'target.common.tests.zip'
100%[==================================================================================================================================================================>] 36,551,587 6.30MB/s in 11s
2018-03-14 05:24:37 (3.26 MB/s) - 'target.common.tests.zip' saved [36551587/36551587]
[root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# wget https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip; rm target.common.tests.zip
--2018-03-14 05:24:43-- https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.common.tests.zip
Resolving queue.taskcluster.net... 50.16.228.78, 107.22.197.53, 50.17.218.87
Connecting to queue.taskcluster.net|50.16.228.78|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip [following]
--2018-03-14 05:24:44-- https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.common.tests.zip
Resolving taskcluster-artifacts.net... 54.192.212.56
Connecting to taskcluster-artifacts.net|54.192.212.56|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36551587 (35M) [application/zip]
Saving to: 'target.common.tests.zip'
100%[==================================================================================================================================================================>] 36,551,587 5.32MB/s in 7.7s
2018-03-14 05:24:52 (4.53 MB/s) - 'target.common.tests.zip' saved [36551587/36551587]
[root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# wget https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip; rm target.mochitest.tests.zip
--2018-03-14 05:26:14-- https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip
Resolving queue.taskcluster.net... 50.16.228.78, 107.22.197.53, 50.17.218.87
Connecting to queue.taskcluster.net|50.16.228.78|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.mochitest.tests.zip [following]
--2018-03-14 05:26:15-- https://taskcluster-artifacts.net/aLf3dLO8T2eepBczSorNug/0/public/build/target.mochitest.tests.zip
Resolving taskcluster-artifacts.net... 54.192.212.56
Connecting to taskcluster-artifacts.net|54.192.212.56|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 78551031 (75M) [application/zip]
Saving to: 'target.mochitest.tests.zip'
100%[==================================================================================================================================================================>] 78,551,031 5.71MB/s in 18s
2018-03-14 05:26:35 (4.16 MB/s) - 'target.mochitest.tests.zip' saved [78551031/78551031]
I also used curl instead of wget:
[root@t-yosemite-r7-423.test.releng.mdc1.mozilla.com ~]# curl -L -O https://queue.taskcluster.net/v1/task/aLf3dLO8T2eepBczSorNug/artifacts/public/build/target.mochitest.tests.zip; rm target.mochitest.tests.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 124 100 124 0 0 104 0 0:00:01 0:00:01 --:--:-- 104
100 74.9M 100 74.9M 0 0 4813k 0 0:00:15 0:00:15 --:--:-- 6135k
looks like the average speed is around 4,5 MB/s
If I run wget or curl on individual hosts or on the max 3 hosts in the same time, the download speed looks OK.(I tested 3 downloads in the same time from 3 different hosts). Will be interesting to see the network load if more workers will try to download the artifacts in the same time.....
:dividehex Who can give us a network load for the last 24 hours? I suspect a network bottleneck, when many worker try to download the artifacts in the same time....
Flags: needinfo?(dcrisan) → needinfo?(jwatkins)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jwatkins)
Resolution: --- → DUPLICATE
Whiteboard: [stockwell disable-recommended] → [stockwell infra]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•