Closed Bug 1221014 Opened 9 years ago Closed 8 years ago

Intermittent [taskcluster:error] Task was aborted because states could not be created successfully. Error: connect EHOSTUNREACH

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: intermittent-failure)

https://treeherder.mozilla.org/logviewer.html#?job_id=2587365&repo=mozilla-central

askcluster:error] Task was aborted because states could not be created successfully. Error: connect EHOSTUNREACH
     at /home/ubuntu/docker_worker/lib/states.js:63:15
     at runMicrotasksCallback (node.js:337:7)
     at process._tickDomainCallback (node.js:381:11)

 [taskcluster] Unsuccessful task run with exit code: -1 completed in 181.471 seconds
The only feature enabled is "balrogVPNProxy" (see https://tools.taskcluster.net/task-inspector/#NRq1o6ECSxS1Cdm7-0NL6w/) so it is likely to be a problem with this.
We should probably update the docker worker error message to say which state(s) could not be created, and which features they relate to...
Cause of this is from bug 1220252 where the balrog vpn proxy was disabled.  Unfortunately the tasks have not been unscheduled.
(In reply to Greg Arndt [:garndt] from comment #3)
> Cause of this is from bug 1220252 where the balrog vpn proxy was disabled. 
> Unfortunately the tasks have not been unscheduled.

can we unschedulde this task since this cause a lot of "failures" on treeherder ?
(In reply to Pete Moore [:pmoore][:pete] from comment #2)
> We should probably update the docker worker error message to say which
> state(s) could not be created, and which features they relate to...

Created bug 1221491 for this.
See Also: → 1221491
Looking good - all the above failures were on 3 November, and today is 9 November, so at least 5 days with no reports. Let's leave this open until the end of the month, and then close it if no more failures...
I am not sure why we would have any failures at least with this error.  This was specifically because it was disabled, but has since been re-enabled.  I don't have a strong opinion about keeping it open, but I don't expect this to happen again.
Looking at the last 14 jobs that have been starred, it seems two had this error signature.  Other errors that I see starred are:

I see errors like:
HTTPError: 400 Client Error: BAD REQUEST 
ssl.SSLError: ('The read operation timed out',) 
2015-11-25 14:22:19,875 - INFO - retry: Giving up on <function <lambda> at 0x7f26f571ca28> 
[taskcluster:error] Task was aborted because states could not be created successfully. Error: Gateway Time-out
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.