Open
Bug 1420643
Opened 7 years ago
Updated 7 years ago
Intermittent gecko decision tasks failing with ReadTimeout: HTTPSConnectionPool(host='hg.mozilla.org', port=443): Read timed out. (read timeout=5)
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
NEW
People
(Reporter: aryx, Unassigned)
References
Details
+++ This bug was initially created as a clone of Bug #1417719 +++
https://treeherder.mozilla.org/logviewer.html#?job_id=147647781&repo=mozilla-inbound
[task 2017-11-26T00:00:22.910648Z] Generating target task graph
[task 2017-11-26T00:00:22.917192Z] Adding 0 tasks with `always_target` attribute
[task 2017-11-26T00:00:22.933443Z] Generating optimized task graph
[task 2017-11-26T00:00:22.937789Z] Querying version control for metadata: https://hg.mozilla.org/integration/mozilla-inbound/json-automationrelevance/3e14872b31a7b1b207605d09b78fbaaf21f1bba7
[task 2017-11-26T00:00:22.937877Z] attempt 1/2
[task 2017-11-26T00:00:22.938124Z] retry: calling get_automationrelevance, attempt #1
[task 2017-11-26T00:00:22.945783Z] Starting new HTTPS connection (1): hg.mozilla.org
[task 2017-11-26T00:00:28.039379Z] retry: Caught exception:
[task 2017-11-26T00:00:28.039528Z] sleeping for 10.00s (attempt 1/2)
[task 2017-11-26T00:00:38.049886Z] attempt 2/2
[task 2017-11-26T00:00:38.050028Z] retry: calling get_automationrelevance, attempt #2
[task 2017-11-26T00:00:38.051405Z] Starting new HTTPS connection (1): hg.mozilla.org
[task 2017-11-26T00:00:43.081313Z] retry: Caught exception:
[task 2017-11-26T00:00:43.081414Z] retry: Giving up on get_automationrelevance
[task 2017-11-26T00:00:43.082160Z] Traceback (most recent call last):
[task 2017-11-26T00:00:43.082345Z] File "/builds/worker/checkouts/gecko/taskcluster/mach_commands.py", line 165, in taskgraph_decision
[task 2017-11-26T00:00:43.082792Z] return taskgraph.decision.taskgraph_decision(options)
[task 2017-11-26T00:00:43.082852Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/decision.py", line 128, in taskgraph_decision
[task 2017-11-26T00:00:43.082901Z] write_artifact('task-graph.json', tgg.morphed_task_graph.to_json())
[task 2017-11-26T00:00:43.083119Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 213, in morphed_task_graph
[task 2017-11-26T00:00:43.083161Z] return self._run_until('morphed_task_graph')
[task 2017-11-26T00:00:43.083240Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 319, in _run_until
[task 2017-11-26T00:00:43.083416Z] k, v = self._run.next()
[task 2017-11-26T00:00:43.083464Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 306, in _run
[task 2017-11-26T00:00:43.083520Z] existing_tasks=existing_tasks)
[task 2017-11-26T00:00:43.083569Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/optimize.py", line 55, in optimize_task_graph
[task 2017-11-26T00:00:43.083595Z] do_not_optimize=do_not_optimize)
[task 2017-11-26T00:00:43.083644Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/optimize.py", line 126, in remove_tasks
[task 2017-11-26T00:00:43.083678Z] if opt.should_remove_task(task, params, arg):
[task 2017-11-26T00:00:43.083881Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/optimize.py", line 383, in should_remove_task
[task 2017-11-26T00:00:43.083961Z] scheduled = self.scheduled_by_push(params['head_repository'], params['head_rev'])
[task 2017-11-26T00:00:43.084018Z] File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/util.py", line 944, in method_call
[task 2017-11-26T00:00:43.084400Z] cache[args] = self.func(instance, *args)
[task 2017-11-26T00:00:43.084483Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/optimize.py", line 365, in scheduled_by_push
[task 2017-11-26T00:00:43.084764Z] changed_files = files_changed.get_changed_files(repository, revision)
[task 2017-11-26T00:00:43.084828Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/files_changed.py", line 33, in get_changed_files
[task 2017-11-26T00:00:43.084870Z] contents = retry(get_automationrelevance, attempts=2, sleeptime=10)
[task 2017-11-26T00:00:43.084914Z] File "/builds/worker/checkouts/gecko/third_party/python/redo/redo/__init__.py", line 162, in retry
[task 2017-11-26T00:00:43.084938Z] return action(*args, **kwargs)
[task 2017-11-26T00:00:43.085203Z] File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/files_changed.py", line 31, in get_automationrelevance
[task 2017-11-26T00:00:43.085272Z] response = requests.get(url, timeout=5)
[task 2017-11-26T00:00:43.085321Z] File "/builds/worker/checkouts/gecko/third_party/python/requests/requests/api.py", line 67, in get
[task 2017-11-26T00:00:43.085359Z] return request('get', url, params=params, **kwargs)
[task 2017-11-26T00:00:43.085512Z] File "/builds/worker/checkouts/gecko/third_party/python/requests/requests/api.py", line 53, in request
[task 2017-11-26T00:00:43.085549Z] return session.request(method=method, url=url, **kwargs)
[task 2017-11-26T00:00:43.085594Z] File "/builds/worker/checkouts/gecko/third_party/python/requests/requests/sessions.py", line 468, in request
[task 2017-11-26T00:00:43.085621Z] resp = self.send(prep, **send_kwargs)
[task 2017-11-26T00:00:43.085666Z] File "/builds/worker/checkouts/gecko/third_party/python/requests/requests/sessions.py", line 576, in send
[task 2017-11-26T00:00:43.085691Z] r = adapter.send(request, **kwargs)
[task 2017-11-26T00:00:43.085736Z] File "/builds/worker/checkouts/gecko/third_party/python/requests/requests/adapters.py", line 449, in send
[task 2017-11-26T00:00:43.085763Z] raise ReadTimeout(e, request=request)
[task 2017-11-26T00:00:43.085806Z] ReadTimeout: HTTPSConnectionPool(host='hg.mozilla.org', port=443): Read timed out. (read timeout=5)
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 2•7 years ago
|
||
I've hit this at least three times in the last two days...
Greag, do you know anything about this setup, or who would?
Flags: needinfo?(gps)
Comment 3•7 years ago
|
||
The code is already retrying the connection in case of timeout. That's good.
99% of failures should be intermittent and the retry should pave over that.
But the fact you are getting this more reliably is a bit concerning.
Could you please link to logs where this is happening so I can correlate with server logs and see what's happening?
Flags: needinfo?(gps) → needinfo?(bzbarsky)
![]() |
||
Comment 4•7 years ago
|
||
For what it's worth, it hasn't happened to me since comment 2.
Here are the logs I see from groveling through my try pushes:
https://taskcluster-artifacts.net/ADzbEzCZSEaHWtkr114_kQ/0/public/logs/live_backing.log
https://taskcluster-artifacts.net/bj_UqMb0TGaEA79dZ6Ka-Q/0/public/logs/live_backing.log
https://taskcluster-artifacts.net/eYrHpgvYSZ68E_Zfw5yN6w/0/public/logs/live_backing.log
I have 7 total "Task failed" and "Task exception" mails for "Gecko Decision Task", but they don't clearly link back to the push, so I can't tell whether they're caused by me canceling before the D task was done, or including a nonexistent platform (which trychooser will happily do for you) or something else. Those three aboe are red D jobs on treeherder that we re definitely not canceled and not the "unknown platform" thing.
Thank you for looking into this!
Flags: needinfo?(bzbarsky)
You need to log in
before you can comment on or make changes to this bug.
Description
•