Open Bug 1452927 Opened 2 years ago Updated 4 months ago

Intermittent partials concurrent.futures._base.TimeoutError


(Release Engineering :: General, defect)

Not set


(Not tracked)


(Reporter: nataliaCs, Assigned: sfraser)



(Keywords: leave-open)


(1 file)

Failure log:

2018-04-10 12:05:33,157 - INFO - Downloading to /tmp/tmp1eqix4v3/
2018-04-10 12:06:46,980 - INFO - Downloading to /tmp/tmp1eqix4v3/
2018-04-10 12:08:12,510 - INFO - Downloading to /tmp/tmp1eqix4v3/
2018-04-10 12:10:09,551 - INFO - Downloading to /tmp/tmp1eqix4v3/
2018-04-10 12:12:49,807 - INFO - Downloading to /tmp/tmp1eqix4v3/
2018-04-10 12:13:50,865 - WARNING - retry_async: <function download at 0x7f21cc158bf8>: too many retries!
Traceback (most recent call last):
  File "/home/worker/bin/", line 504, in <module>
  File "/home/worker/bin/", line 478, in main
    manifest = loop.run_until_complete(async_main(args, signing_certs))
  File "/usr/lib/python3.5/asyncio/", line 387, in run_until_complete
    return future.result()
  File "/usr/lib/python3.5/asyncio/", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/", line 241, in _step
    result = coro.throw(exc)
  File "/home/worker/bin/", line 405, in async_main
    await workenv.setup()
  File "/home/worker/bin/", line 245, in setup
    await retry_download(url, dest=self.paths[filename], mode=0o755)
  File "/home/worker/bin/", line 104, in retry_download
  File "/usr/local/lib/python3.5/dist-packages/scriptworker/", line 252, in retry_async
    return await func(*args, **kwargs)
  File "/home/worker/bin/", line 113, in download
    async with session.get(url, timeout=60) as resp:
  File "/usr/local/lib/python3.5/dist-packages/aiohttp/", line 690, in __aenter__
    self._resp = yield from self._coro
  File "/usr/local/lib/python3.5/dist-packages/aiohttp/", line 341, in _request
  File "/usr/local/lib/python3.5/dist-packages/aiohttp/", line 727, in __exit__
    raise asyncio.TimeoutError from None
[taskcluster 2018-04-10 12:13:51.172Z] === Task Finished ===
[taskcluster 2018-04-10 12:13:51.236Z] Artifact "public/build/gu-IN/target.partial-2.mar" not found at "/home/worker/artifacts/target.partial-2.mar"
[taskcluster 2018-04-10 12:13:51.312Z] Artifact "public/build/gu-IN/target.partial-1.mar" not found at "/home/worker/artifacts/target.partial-1.mar"
[taskcluster 2018-04-10 12:13:51.368Z] Artifact "public/build/gu-IN/manifest.json" not found at "/home/worker/artifacts/manifest.json"
[taskcluster 2018-04-10 12:13:51.424Z] Artifact "public/build/gu-IN/target.partial-4.mar" not found at "/home/worker/artifacts/target.partial-4.mar"
[taskcluster 2018-04-10 12:13:51.488Z] Artifact "public/build/gu-IN/target.partial-3.mar" not found at "/home/worker/artifacts/target.partial-3.mar"
[taskcluster 2018-04-10 12:13:52.011Z] Unsuccessful task run with exit code: 1 completed in 527.45 seconds
See Also: → 1430600
Assignee: nobody → sfraser
These seem to be legitimate errors. The timing is spread out a bit, so I'm not convinced it's the number of concurrent tasks. I'll add more retry catching, but it feels like a band-aid over an underlying issue.
Attempt to get more information about download timeouts, and
also retry the partial generation if download timeouts happen too often.
Comment on attachment 8981797 [details]
Bug 1452927 Improve logging and retries for partials r=mtabara

Mihai Tabara [:mtabara]⌚️GMT has approved the revision.
Attachment #8981797 - Flags: review+
Pushed by
Improve logging and retries for partials r=mtabara
Keywords: leave-open

Beetmover occurrence:

2019-09-12T11:46:21 DEBUG - Reclaim task response:
{'credentials': '{********}',
'runId': 0,
'status': {'deadline': '2019-09-13T10:06:13.077Z',
'expires': '2020-09-11T10:06:13.077Z',
'provisionerId': 'scriptworker-k8s',
'retriesLeft': 5,
'runs': [{'reasonCreated': 'scheduled',
'runId': 0,
'scheduled': '2019-09-12T11:40:09.834Z',
'started': '2019-09-12T11:41:19.627Z',
'state': 'running',
'takenUntil': '2019-09-12T12:06:21.099Z',
'workerGroup': 'gecko-3-beetmover',
'workerId': 'gecko-3-beetmover-zp6byvbxt9krrnqsaxsb'}],
'schedulerId': 'gecko-level-3',
'state': 'running',
'taskGroupId': 'eeyI-m9lS8yCOvYPdgX6hA',
'taskId': 'HXYUD_-WTPiKpP-JDAIzWA',
'workerType': 'gecko-3-beetmover'},
'takenUntil': '2019-09-12T12:06:21.099Z',
'workerGroup': 'gecko-3-beetmover',
'workerId': 'gecko-3-beetmover-zp6byvbxt9krrnqsaxsb'}
2019-09-12T11:46:21 DEBUG - waiting 300 seconds before reclaiming...
Traceback (most recent call last):
File "/app/lib/python3.7/site-packages/scriptworker/", line 55, in do_run_task
await run_cancellable(verify_chain_of_trust(chain))
File "/app/lib/python3.7/site-packages/scriptworker/", line 158, in _run_cancellable
result = await self.future
File "/app/lib/python3.7/site-packages/scriptworker/cot/", line 2063, in verify_chain_of_trust
task_count = await verify_task_types(chain)
File "/app/lib/python3.7/site-packages/scriptworker/cot/", line 1813, in verify_task_types
await valid_task_types[task_type](chain, obj)
File "/app/lib/python3.7/site-packages/scriptworker/cot/", line 1712, in verify_parent_task
await verify_parent_task_definition(chain, link)
File "/app/lib/python3.7/site-packages/scriptworker/cot/", line 1572, in verify_parent_task_definition
chain, parent_link, decision_link, tasks_for
File "/app/lib/python3.7/site-packages/scriptworker/cot/", line 1534, in get_jsone_context_and_template
tmpl = await get_in_tree_template(decision_link)
File "/app/lib/python3.7/site-packages/scriptworker/cot/", line 1377, in get_in_tree_template
context.config["work_dir"], "{}_taskcluster.yml".format(
File "/app/lib/python3.7/site-packages/scriptworker/", line 633, in load_json_or_yaml_from_url
retry_exceptions=(DownloadError, aiohttp.ClientError),
File "/app/lib/python3.7/site-packages/scriptworker/", line 261, in retry_async
return await func(*args, **kwargs)
File "/app/lib/python3.7/site-packages/scriptworker/", line 565, in download_file
chunk = await
File "/app/lib/python3.7/site-packages/aiohttp/", line 369, in read
await self._wait('read')
File "/app/lib/python3.7/site-packages/aiohttp/", line 297, in _wait
await waiter
File "/app/lib/python3.7/site-packages/aiohttp/", line 585, in exit
raise asyncio.TimeoutError from None
2019-09-12T11:46:31 DEBUG - "/app/artifacts/public/logs/chain_of_trust.log" is encoded with "None" and has mime/type "text/plain"
2019-09-12T11:46:31 INFO - "/app/artifacts/public/logs/chain_of_trust.log" can be gzip'd. Compressing...

I think the beetmover error is going to be unrelated, since it's different code, and they just moved beetmover to GCP. I'll see what we can dig up.

I think that'll be a separate infrastructure issue, and worth raising a new bug about. Perhaps ping :aki

Flags: needinfo?(sfraser)

Filed Bug 1624643,
Thank you!

See Also: → 1624643
Duplicate of this bug: 1624643
You need to log in before you can comment on or make changes to this bug.