Closed Bug 1889955 Opened 11 months ago Closed 11 months ago

High frequency hg.mo infra issues are causing many tasks to fail because of connection time out - in general this failure line appears in treeherder: abort: reached maximum number of network attempts; giving up

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1889836

People

(Reporter: imoraru, Unassigned)

References

Details

This issue started ~16-17 hours ago.

[taskcluster 2024-04-05 14:12:56.631Z] using cache "gecko-level-3-tooltool-cache-v3-91dffdb6efcae13c131e" -> /builds/worker/tooltool-cache
[taskcluster 2024-04-05 14:12:57.052Z] Image 'public/image.tar.zst' from task 'FfTk6oGbTHqIvvoX4JQ1Jg' loaded.  Using image ID sha256:f39e23d1eea2a88b937a4979322eb5a39688de0c19ebdc0fd650bbaa2d4b58dc.
[taskcluster 2024-04-05 14:12:57.062Z] === Task Starting ===
[setup 2024-04-05T14:12:57.313Z] run-task started in /builds/worker
[cache 2024-04-05T14:12:57.314Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1
[cache 2024-04-05T14:12:57.314Z] cache /builds/worker/tooltool-cache exists; requirements: gid=1000 uid=1000 version=1
[volume 2024-04-05T14:12:57.314Z] volume /builds/worker/checkouts is a cache
[volume 2024-04-05T14:12:57.314Z] volume /builds/worker/tooltool-cache is a cache
[volume 2024-04-05T14:12:57.314Z] changing ownership of volume /builds/worker/workspace to 1000:1000
[setup 2024-04-05T14:12:57.315Z] running as worker:worker
[vcs 2024-04-05T14:12:57.315Z] fetching hgmointernal config from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgmointernal
[vcs 2024-04-05T14:12:57.354Z] region google/us-west1 not yet supported; using public hg.mozilla.org service
[vcs 2024-04-05T14:12:57.354Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '5869a1ed97bddcc45e1369e143f36108de5dcc42', 'https://hg.mozilla.org/integration/autoland', '/builds/worker/checkouts/gecko']
[vcs 2024-04-05T14:12:57.466Z] (using Mercurial 6.4.3)
[vcs 2024-04-05T14:12:57.466Z] ensuring https://hg.mozilla.org/integration/autoland@5869a1ed97bddcc45e1369e143f36108de5dcc42 is available at /builds/worker/checkouts/gecko
[vcs 2024-04-05T14:15:08.261Z] socket error: [Errno 110] Connection timed out
[vcs 2024-04-05T14:15:08.261Z] (retrying after network failure on attempt 1 of 3)
[vcs 2024-04-05T14:15:08.261Z] (waiting 3.14s before retry)
[vcs 2024-04-05T14:15:11.399Z] ensuring https://hg.mozilla.org/integration/autoland@5869a1ed97bddcc45e1369e143f36108de5dcc42 is available at /builds/worker/checkouts/gecko
[vcs 2024-04-05T14:17:21.380Z] socket error: [Errno 110] Connection timed out
[vcs 2024-04-05T14:17:21.380Z] (retrying after network failure on attempt 2 of 3)
[vcs 2024-04-05T14:17:21.380Z] (waiting 5.65s before retry)
[vcs 2024-04-05T14:17:27.031Z] ensuring https://hg.mozilla.org/integration/autoland@5869a1ed97bddcc45e1369e143f36108de5dcc42 is available at /builds/worker/checkouts/gecko
[vcs 2024-04-05T14:19:36.548Z] socket error: [Errno 110] Connection timed out
[vcs 2024-04-05T14:19:36.548Z] PERFHERDER_DATA: {"framework": {"name": "vcs"}, "suites": [{"extraOptions": ["projects/970387039909/machineTypes/c2-standard-16"], "hgVersion": "6.4.3", "lowerIsBetter": true, "name": "overall", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 399.082316160202}, {"extraOptions": ["projects/970387039909/machineTypes/c2-standard-16"], "hgVersion": "6.4.3", "lowerIsBetter": true, "name": "overall_nopull", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 399.082316160202}, {"extraOptions": ["projects/970387039909/machineTypes/c2-standard-16"], "hgVersion": "6.4.3", "lowerIsBetter": true, "name": "overall_nopull_fullcheckout", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 399.082316160202}, {"extraOptions": ["projects/970387039909/machineTypes/c2-standard-16"], "hgVersion": "6.4.3", "lowerIsBetter": true, "name": "overall_nopull_populatedwdir", "serverUrl": "hg.mozilla.org", "shouldAlert": false, "subtests": [], "value": 399.082316160202}]}
[vcs 2024-04-05T14:19:36.549Z] abort: reached maximum number of network attempts; giving up
[vcs 2024-04-05T14:19:36.549Z] 
[taskcluster 2024-04-05 14:19:36.823Z] === Task Finished ===
[taskcluster 2024-04-05 14:19:36.827Z] Artifact "public/logs" not found at "/builds/worker/logs/": (HTTP code 404) no such container - Could not find the file /builds/worker/logs/ in container 6d62302b9505188821392c3b6c1c9fa73c1c43d98d72c9532236312b9ca3d083 
[taskcluster 2024-04-05 14:19:36.829Z] Artifact "public/build" not found at "/builds/worker/artifacts/": (HTTP code 404) no such container - Could not find the file /builds/worker/artifacts/ in container 6d62302b9505188821392c3b6c1c9fa73c1c43d98d72c9532236312b9ca3d083 
[taskcluster 2024-04-05 14:19:36.830Z] Artifact "public/cidata/sccache-stats.json" not found at "/builds/worker/cidata/sccache-stats.json": (HTTP code 404) no such container - Could not find the file /builds/worker/cidata/sccache-stats.json in container 6d62302b9505188821392c3b6c1c9fa73c1c43d98d72c9532236312b9ca3d083 
[taskcluster 2024-04-05 14:19:36.832Z] Artifact "public/cidata/sccache.log" not found at "/builds/worker/cidata/sccache.log": (HTTP code 404) no such container - Could not find the file /builds/worker/cidata/sccache.log in container 6d62302b9505188821392c3b6c1c9fa73c1c43d98d72c9532236312b9ca3d083 
[taskcluster 2024-04-05 14:19:36.834Z] Artifact "public/cidata/target.crashreporter-symbols-full.tar.zst" not found at "/builds/worker/cidata/target.crashreporter-symbols-full.tar.zst": (HTTP code 404) no such container - Could not find the file /builds/worker/cidata/target.crashreporter-symbols-full.tar.zst in container 6d62302b9505188821392c3b6c1c9fa73c1c43d98d72c9532236312b9ca3d083 
[taskcluster 2024-04-05 14:19:36.872Z] Unsuccessful task run with exit code: 255 completed in 400.241 seconds

For a better overview of the failure rate, please also check orange factor in Bug 1451080.
Thank you!

See Also: → 1451080, 1889836
Summary: High frequency hg.mo infra issues are causing many tasks to fail because of connection time out - in general this failure line appears in treeherder is: abort: reached maximum number of network attempts; giving up → High frequency hg.mo infra issues are causing many tasks to fail because of connection time out - in general this failure line appears in treeherder: abort: reached maximum number of network attempts; giving up

It's worth noting that this issue has been occurring even when hg is having no issues with load on the server side. Yesterday evening we observed that the server was quick to respond to requests from outside CI, but was failing to connect at all from inside CI. There may be some other issues with networking at play here.

Flags: needinfo?(sheehan)

(In reply to Connor Sheehan [:sheehan] from comment #2)

It's worth noting that this issue has been occurring even when hg is having no issues with load on the server side. Yesterday evening we observed that the server was quick to respond to requests from outside CI, but was failing to connect at all from inside CI. There may be some other issues with networking at play here.

For example, bug 1889764 has been occurring as well, which is not related to Mercurial.

Status: NEW → RESOLVED
Closed: 11 months ago
Duplicate of bug: 1889836
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.