Closed Bug 1610998 Opened 4 years ago Closed 4 years ago

build-macosx64-noopt/debug intermittently exceeds its max-run-time


(Taskcluster :: General, task)

Not set


(Not tracked)



(Reporter: gbrown, Unassigned)


(Blocks 1 open bug)


Only looked at the first two, but they spent most of the hour they're allocated to ... cloning from mercurial and downloading fetches. The build part itself is less than 20 minutes. This is not the first time I've seen things like this. For some reason networking and/or I/O sucks badly on some workers.

Component: Task Configuration → General
Product: Firefox Build System → Taskcluster

Agreed, cloning from mercurial and downloading fetches is unusually slow in these cases. And we have discussed this before, like I have had some success in avoiding intermittent failed tasks by simply allowing +30 minutes for the max-run-time, but that approach has been resisted and called out as a hack. I would like to see some sort of solution as this type of intermittent failure is seen at least several times each week; in addition to the wasted machine time, these failures can "hide" more serious task timeouts, like test and product hangs.

So this could be anything from a cloud instance with bad I/O or a bad network connection to an issue with falling back to some slower option in either the hg client or the hg server.

If it's the former, I'm not sure there's much to do but notice and terminate the instance. I don't think we see enough of these to be able to characterize a "sick" instance accurately and quickly. And something that takes, say, 10 minutes at worker startup to decide whether the instance was sick would end up being phenomenally expensive and add a great deal of E2E time.

If it's the latter, then one thing you could do is wrap the hg command to time out after, say, ten minutes, and retry a few times within the task before failing.

I don't see this happening any more.

Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.