Closed Bug 1648664 Opened 4 years ago Closed 4 years ago

busted partials Intermittent asyncio.exceptions.TimeoutError

Categories

(Release Engineering :: Release Automation: Other, defect)

defect

Tracking

(firefox79 fixed, firefox80 fixed)

RESOLVED FIXED
Tracking Status
firefox79 --- fixed
firefox80 --- fixed

People

(Reporter: CosminS, Assigned: sfraser)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Log: https://treeherder.mozilla.org/logviewer.html#?job_id=307608509&repo=mozilla-central

Raw log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VDEMwPkgSMSUfggbdPZTOw/runs/0/artifacts/public/logs/live_backing.log

2020-06-26 00:48:02,662 - DEBUG - Bytes downloaded for https://archive.mozilla.org/pub/firefox/nightly/2020/06/2020-06-25-09-44-52-mozilla-central-l10n/firefox-79.0a1.ca-valencia.win64-aarch64.complete.mar: 58738222
2020-06-26 00:48:25,484 - DEBUG - Bytes downloaded for https://archive.mozilla.org/pub/firefox/nightly/2020/06/2020-06-25-09-44-52-mozilla-central-l10n/firefox-79.0a1.ca-valencia.win64-aarch64.complete.mar: 62932526
2020-06-26 00:48:50,914 - WARNING - retry_async: download: too many retries!
Traceback (most recent call last):
File "/home/worker/bin/funsize.py", line 466, in <module>
main()
File "/home/worker/bin/funsize.py", line 455, in main
manifest = loop.run_until_complete(async_main(args, signing_cert))
File "/usr/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
return future.result()
File "/home/worker/bin/funsize.py", line 400, in async_main
downloads = await download_and_verify_mars(
File "/home/worker/bin/funsize.py", line 249, in download_and_verify_mars
await asyncio.gather(*tasks)
File "/home/worker/bin/funsize.py", line 118, in retry_download
await retry_async(
File "/usr/local/lib/python3.8/dist-packages/scriptworker/utils.py", line 262, in retry_async
_check_number_of_attempts(attempt, attempts, func, "retry_async")
File "/usr/local/lib/python3.8/dist-packages/scriptworker/utils.py", line 259, in retry_async
return await func(*args, **kwargs)
File "/home/worker/bin/funsize.py", line 155, in download
chunk = await resp.content.read(chunk_size)
File "/usr/local/lib/python3.8/dist-packages/aiohttp/streams.py", line 368, in read
await self._wait('read')
File "/usr/local/lib/python3.8/dist-packages/aiohttp/streams.py", line 296, in _wait
await waiter
File "/usr/local/lib/python3.8/dist-packages/aiohttp/helpers.py", line 596, in exit
raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
[taskcluster 2020-06-26 00:48:51.266Z] === Task Finished ===
[taskcluster 2020-06-26 00:48:51.341Z] Artifact "public/build/ca-valencia/target.partial-1.mar" not found at "/home/worker/artifacts/target.partial-1.mar"
[taskcluster 2020-06-26 00:48:51.397Z] Artifact "public/build/ca-valencia/manifest.json" not found at "/home/worker/artifacts/manifest.json"
[taskcluster 2020-06-26 00:48:51.461Z] Artifact "public/build/ca-valencia/target.partial-4.mar" not found at "/home/worker/artifacts/target.partial-4.mar"
[taskcluster 2020-06-26 00:48:51.521Z] Artifact "public/build/ca-valencia/target.partial-2.mar" not found at "/home/worker/artifacts/target.partial-2.mar"
[taskcluster 2020-06-26 00:48:51.577Z] Artifact "public/build/ca-valencia/target.partial-3.mar" not found at "/home/worker/artifacts/target.partial-3.mar"
[taskcluster 2020-06-26 00:48:51.793Z] Unsuccessful task run with exit code: 1 completed in 811.258 seconds

Summary: [scriptworker] Intermittent asyncio.exceptions.TimeoutError → busted partials Intermittent asyncio.exceptions.TimeoutError

a) In the busted run, we were downloading, just slowly, and
b) the errors went away in run 1,

so I'm fairly certain this is infra related. However, I'm looking at funsize.py and see that we're using a semaphore to limit how many mars we generate concurrently. I'm wondering if we should also use a semaphore to limit concurrent downloads.

Simon, do you think a download semaphore might improve things here?

Flags: needinfo?(aki) → needinfo?(sfraser)

Worth a try, I'm unsure if the instance type these containers run on has changed recently, to give them a new network profile, so I'm happy to try it out. The Semaphore for the mar generation is not likely to be limiting things right now if the instance isn't overloaded, it's a safety net.

I wonder if the time-out it too short on the download as well.

Flags: needinfo?(sfraser)
Assignee: nobody → sfraser
Status: NEW → ASSIGNED
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4a832f0c89db
Reduce download concurrency in partials r=aki
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Shall this get uplifted?

Flags: needinfo?(sfraser)

Yes.

Flags: needinfo?(sfraser)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: