Intermittent partials concurrent.futures._base.TimeoutError
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
People
(Reporter: nataliaCs, Assigned: sfraser)
References
Details
(Keywords: leave-open)
Attachments
(1 file)
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=172849870&repo=mozilla-central 2018-04-10 12:05:33,157 - INFO - Downloading https://hg.mozilla.org/mozilla-central/raw-file/default/tools/update-packaging/unwrap_full_update.pl to /tmp/tmp1eqix4v3/unwrap_full_update.pl 2018-04-10 12:06:46,980 - INFO - Downloading https://hg.mozilla.org/mozilla-central/raw-file/default/tools/update-packaging/unwrap_full_update.pl to /tmp/tmp1eqix4v3/unwrap_full_update.pl 2018-04-10 12:08:12,510 - INFO - Downloading https://hg.mozilla.org/mozilla-central/raw-file/default/tools/update-packaging/unwrap_full_update.pl to /tmp/tmp1eqix4v3/unwrap_full_update.pl 2018-04-10 12:10:09,551 - INFO - Downloading https://hg.mozilla.org/mozilla-central/raw-file/default/tools/update-packaging/unwrap_full_update.pl to /tmp/tmp1eqix4v3/unwrap_full_update.pl 2018-04-10 12:12:49,807 - INFO - Downloading https://hg.mozilla.org/mozilla-central/raw-file/default/tools/update-packaging/unwrap_full_update.pl to /tmp/tmp1eqix4v3/unwrap_full_update.pl 2018-04-10 12:13:50,865 - WARNING - retry_async: <function download at 0x7f21cc158bf8>: too many retries! Traceback (most recent call last): File "/home/worker/bin/funsize.py", line 504, in <module> main() File "/home/worker/bin/funsize.py", line 478, in main manifest = loop.run_until_complete(async_main(args, signing_certs)) File "/usr/lib/python3.5/asyncio/base_events.py", line 387, in run_until_complete return future.result() File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result raise self._exception File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step result = coro.throw(exc) File "/home/worker/bin/funsize.py", line 405, in async_main await workenv.setup() File "/home/worker/bin/funsize.py", line 245, in setup await retry_download(url, dest=self.paths[filename], mode=0o755) File "/home/worker/bin/funsize.py", line 104, in retry_download kwargs=kwargs File "/usr/local/lib/python3.5/dist-packages/scriptworker/utils.py", line 252, in retry_async return await func(*args, **kwargs) File "/home/worker/bin/funsize.py", line 113, in download async with session.get(url, timeout=60) as resp: File "/usr/local/lib/python3.5/dist-packages/aiohttp/client.py", line 690, in __aenter__ self._resp = yield from self._coro File "/usr/local/lib/python3.5/dist-packages/aiohttp/client.py", line 341, in _request break File "/usr/local/lib/python3.5/dist-packages/aiohttp/helpers.py", line 727, in __exit__ raise asyncio.TimeoutError from None concurrent.futures._base.TimeoutError [taskcluster 2018-04-10 12:13:51.172Z] === Task Finished === [taskcluster 2018-04-10 12:13:51.236Z] Artifact "public/build/gu-IN/target.partial-2.mar" not found at "/home/worker/artifacts/target.partial-2.mar" [taskcluster 2018-04-10 12:13:51.312Z] Artifact "public/build/gu-IN/target.partial-1.mar" not found at "/home/worker/artifacts/target.partial-1.mar" [taskcluster 2018-04-10 12:13:51.368Z] Artifact "public/build/gu-IN/manifest.json" not found at "/home/worker/artifacts/manifest.json" [taskcluster 2018-04-10 12:13:51.424Z] Artifact "public/build/gu-IN/target.partial-4.mar" not found at "/home/worker/artifacts/target.partial-4.mar" [taskcluster 2018-04-10 12:13:51.488Z] Artifact "public/build/gu-IN/target.partial-3.mar" not found at "/home/worker/artifacts/target.partial-3.mar" [taskcluster 2018-04-10 12:13:52.011Z] Unsuccessful task run with exit code: 1 completed in 527.45 seconds
Assignee | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 4•4 years ago
|
||
This started failing again beginning with this push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=0df854d34c01bafb3c88d595f8464ed381b95d03&filter-resultStatus=busted&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&filter-searchStr=nightly&selectedJob=180154518
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 6•4 years ago
|
||
These seem to be legitimate errors. The timing is spread out a bit, so I'm not convinced it's the number of concurrent tasks. I'll add more retry catching, but it feels like a band-aid over an underlying issue.
Comment 7•4 years ago
|
||
Attempt to get more information about download timeouts, and also retry the partial generation if download timeouts happen too often.
Comment 8•4 years ago
|
||
Comment on attachment 8981797 [details] Bug 1452927 Improve logging and retries for partials r=mtabara Mihai Tabara [:mtabara]⌚️GMT has approved the revision. https://phabricator.services.mozilla.com/D1467
Pushed by sfraser@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d9afdede8e0f Improve logging and retries for partials r=mtabara
Assignee | ||
Updated•4 years ago
|
Comment 10•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d9afdede8e0f
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 12•3 years ago
|
||
Beetmover occurrence: https://tools.taskcluster.net/groups/eeyI-m9lS8yCOvYPdgX6hA/tasks/HXYUD_-WTPiKpP-JDAIzWA/runs/0/logs/public%2Flogs%2Fchain_of_trust.log
2019-09-12T11:46:21 DEBUG - Reclaim task response:
{'credentials': '{********}',
'runId': 0,
'status': {'deadline': '2019-09-13T10:06:13.077Z',
'expires': '2020-09-11T10:06:13.077Z',
'provisionerId': 'scriptworker-k8s',
'retriesLeft': 5,
'runs': [{'reasonCreated': 'scheduled',
'runId': 0,
'scheduled': '2019-09-12T11:40:09.834Z',
'started': '2019-09-12T11:41:19.627Z',
'state': 'running',
'takenUntil': '2019-09-12T12:06:21.099Z',
'workerGroup': 'gecko-3-beetmover',
'workerId': 'gecko-3-beetmover-zp6byvbxt9krrnqsaxsb'}],
'schedulerId': 'gecko-level-3',
'state': 'running',
'taskGroupId': 'eeyI-m9lS8yCOvYPdgX6hA',
'taskId': 'HXYUD_-WTPiKpP-JDAIzWA',
'workerType': 'gecko-3-beetmover'},
'takenUntil': '2019-09-12T12:06:21.099Z',
'workerGroup': 'gecko-3-beetmover',
'workerId': 'gecko-3-beetmover-zp6byvbxt9krrnqsaxsb'}
2019-09-12T11:46:21 DEBUG - waiting 300 seconds before reclaiming...
2019-09-12T11:46:31 ERROR - SCRIPTWORKER_UNEXPECTED_EXCEPTION task
Traceback (most recent call last):
File "/app/lib/python3.7/site-packages/scriptworker/worker.py", line 55, in do_run_task
await run_cancellable(verify_chain_of_trust(chain))
File "/app/lib/python3.7/site-packages/scriptworker/worker.py", line 158, in _run_cancellable
result = await self.future
File "/app/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 2063, in verify_chain_of_trust
task_count = await verify_task_types(chain)
File "/app/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 1813, in verify_task_types
await valid_task_types[task_type](chain, obj)
File "/app/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 1712, in verify_parent_task
await verify_parent_task_definition(chain, link)
File "/app/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 1572, in verify_parent_task_definition
chain, parent_link, decision_link, tasks_for
File "/app/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 1534, in get_jsone_context_and_template
tmpl = await get_in_tree_template(decision_link)
File "/app/lib/python3.7/site-packages/scriptworker/cot/verify.py", line 1377, in get_in_tree_template
context.config["work_dir"], "{}_taskcluster.yml".format(link.name)
File "/app/lib/python3.7/site-packages/scriptworker/utils.py", line 633, in load_json_or_yaml_from_url
retry_exceptions=(DownloadError, aiohttp.ClientError),
File "/app/lib/python3.7/site-packages/scriptworker/utils.py", line 261, in retry_async
return await func(*args, **kwargs)
File "/app/lib/python3.7/site-packages/scriptworker/utils.py", line 565, in download_file
chunk = await resp.content.read(chunk_size)
File "/app/lib/python3.7/site-packages/aiohttp/streams.py", line 369, in read
await self._wait('read')
File "/app/lib/python3.7/site-packages/aiohttp/streams.py", line 297, in _wait
await waiter
File "/app/lib/python3.7/site-packages/aiohttp/helpers.py", line 585, in exit
raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError
2019-09-12T11:46:31 DEBUG - "/app/artifacts/public/logs/chain_of_trust.log" is encoded with "None" and has mime/type "text/plain"
2019-09-12T11:46:31 INFO - "/app/artifacts/public/logs/chain_of_trust.log" can be gzip'd. Compressing...
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 14•3 years ago
|
||
I think the beetmover error is going to be unrelated, since it's different code, and they just moved beetmover to GCP. I'll see what we can dig up.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 25•2 years ago
|
||
Also seen this failure in a build signing task on macosx:
https://firefox-ci-tc.services.mozilla.com/tasks/SQgM7ZutQBG9kRN949ChXA/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FSQgM7ZutQBG9kRN949ChXA%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Fchain_of_trust.log#L286
Simon,
Should we make a new bug for this or is this bug good for keeping track of it?
Assignee | ||
Comment 26•2 years ago
|
||
I think that'll be a separate infrastructure issue, and worth raising a new bug about. Perhaps ping :aki
Thanks!
Comment hidden (Intermittent Failures Robot) |
Description
•