Open Bug 1356787 Opened 8 years ago Updated 8 years ago

Mercurial operations can time out when network is down

Categories

(Developer Services :: Mercurial: robustcheckout, defect)

defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: aryx, Unassigned)

Details

Gecko decision Task fails frequently on Try. https://treeherder.mozilla.org/logviewer.html#?job_id=91864551&repo=try has a Taskcluster queue internal server error. There is also https://treeherder.mozilla.org/logviewer.html#?job_id=91852952&repo=try [taskcluster 2017-04-15 07:52:49.183Z] === Task Starting === [setup 2017-04-15T07:52:49.389417Z] run-task started [setup 2017-04-15T07:52:49.391381Z] running as worker:worker [vcs 2017-04-15T07:52:49.391454Z] executing ['hg', 'robustcheckout', '--sharebase', '/home/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '582ed58ec4c90356494c8c0595260e60f9d0f2dd', 'https://hg.mozilla.org/try/', '/home/worker/checkouts/gecko'] [vcs 2017-04-15T07:52:49.450581Z] ensuring https://hg.mozilla.org/try/@582ed58ec4c90356494c8c0595260e60f9d0f2dd is available at /home/worker/checkouts/gecko [vcs 2017-04-15T07:52:49.450731Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified) [taskcluster:error] Task timeout after 1800 seconds. Force killing container. Slow cloning? I saw 5 job failures related to slow cloning on integration branches yesterday, but not in Gecko decision task. In https://treeherder.mozilla.org/logviewer.html#?job_id=91869426&repo=try it even fails to get a lock on checkout.
Flags: needinfo?(gps)
Component: General → Mercurial: robustcheckout
Product: Taskcluster → Developer Services
Summary: gecko decision task fails frequently on Try → gecko decision task fails frequently
Other errors in that 2nd log like "Failed during proxy request: Put https://queue.taskcluster.net/v1/task/Vfj9qRCnQja2njk0WVW2gQ: dial tcp: lookup queue.taskcluster.net on 172.31.0.2:53: read udp 172.17.0.2:53603->172.31.0.2:53: i/o timeout" seem to indicate there is some kind of network failure. I would fully expect hg operations to time out as well. I am a bit surprised that Mercurial isn't timing out here. Perhaps Mercurial doesn't set a network timeout by default or we don't have one configured. We should definitely fix this. While this is a legitimate bug, I don't believe we see this issue enough to make fixing it a priority.
Flags: needinfo?(gps)
Summary: gecko decision task fails frequently → Mercurial operations can time out when network is down
You need to log in before you can comment on or make changes to this bug.