Open
Bug 1356787
Opened 8 years ago
Updated 8 years ago
Mercurial operations can time out when network is down
Categories
(Developer Services :: Mercurial: robustcheckout, defect)
Developer Services
Mercurial: robustcheckout
Tracking
(Not tracked)
NEW
People
(Reporter: aryx, Unassigned)
Details
Gecko decision Task fails frequently on Try.
https://treeherder.mozilla.org/logviewer.html#?job_id=91864551&repo=try has a Taskcluster queue internal server error.
There is also https://treeherder.mozilla.org/logviewer.html#?job_id=91852952&repo=try
[taskcluster 2017-04-15 07:52:49.183Z] === Task Starting ===
[setup 2017-04-15T07:52:49.389417Z] run-task started
[setup 2017-04-15T07:52:49.391381Z] running as worker:worker
[vcs 2017-04-15T07:52:49.391454Z] executing ['hg', 'robustcheckout', '--sharebase', '/home/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '582ed58ec4c90356494c8c0595260e60f9d0f2dd', 'https://hg.mozilla.org/try/', '/home/worker/checkouts/gecko']
[vcs 2017-04-15T07:52:49.450581Z] ensuring https://hg.mozilla.org/try/@582ed58ec4c90356494c8c0595260e60f9d0f2dd is available at /home/worker/checkouts/gecko
[vcs 2017-04-15T07:52:49.450731Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified)
[taskcluster:error] Task timeout after 1800 seconds. Force killing container.
Slow cloning? I saw 5 job failures related to slow cloning on integration branches yesterday, but not in Gecko decision task.
In https://treeherder.mozilla.org/logviewer.html#?job_id=91869426&repo=try it even fails to get a lock on checkout.
Flags: needinfo?(gps)
| Reporter | ||
Comment 1•8 years ago
|
||
Has now also hit autoland: https://treeherder.mozilla.org/logviewer.html#?job_id=91875324&repo=autoland
Component: General → Mercurial: robustcheckout
Product: Taskcluster → Developer Services
Summary: gecko decision task fails frequently on Try → gecko decision task fails frequently
Comment 2•8 years ago
|
||
Other errors in that 2nd log like "Failed during proxy request: Put https://queue.taskcluster.net/v1/task/Vfj9qRCnQja2njk0WVW2gQ: dial tcp: lookup queue.taskcluster.net on 172.31.0.2:53: read udp 172.17.0.2:53603->172.31.0.2:53: i/o timeout" seem to indicate there is some kind of network failure. I would fully expect hg operations to time out as well.
I am a bit surprised that Mercurial isn't timing out here. Perhaps Mercurial doesn't set a network timeout by default or we don't have one configured. We should definitely fix this.
While this is a legitimate bug, I don't believe we see this issue enough to make fixing it a priority.
Flags: needinfo?(gps)
Updated•8 years ago
|
Summary: gecko decision task fails frequently → Mercurial operations can time out when network is down
You need to log in
before you can comment on or make changes to this bug.
Description
•