Intermittent [taskcluster:error] Task was aborted because states could not be created successfully. Error: connect EHOSTUNREACH

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: cbook, Unassigned)

Tracking

({intermittent-failure})

Details

(URL)

(Reporter)

Description

3 years ago
https://treeherder.mozilla.org/logviewer.html#?job_id=2587365&repo=mozilla-central

askcluster:error] Task was aborted because states could not be created successfully. Error: connect EHOSTUNREACH
     at /home/ubuntu/docker_worker/lib/states.js:63:15
     at runMicrotasksCallback (node.js:337:7)
     at process._tickDomainCallback (node.js:381:11)

 [taskcluster] Unsuccessful task run with exit code: -1 completed in 181.471 seconds
The only feature enabled is "balrogVPNProxy" (see https://tools.taskcluster.net/task-inspector/#NRq1o6ECSxS1Cdm7-0NL6w/) so it is likely to be a problem with this.
We should probably update the docker worker error message to say which state(s) could not be created, and which features they relate to...
Cause of this is from bug 1220252 where the balrog vpn proxy was disabled.  Unfortunately the tasks have not been unscheduled.
(Reporter)

Comment 4

3 years ago
(In reply to Greg Arndt [:garndt] from comment #3)
> Cause of this is from bug 1220252 where the balrog vpn proxy was disabled. 
> Unfortunately the tasks have not been unscheduled.

can we unschedulde this task since this cause a lot of "failures" on treeherder ?
Comment hidden (Intermittent Failures Robot)
(In reply to Pete Moore [:pmoore][:pete] from comment #2)
> We should probably update the docker worker error message to say which
> state(s) could not be created, and which features they relate to...

Created bug 1221491 for this.
See Also: → bug 1221491
Comment hidden (Intermittent Failures Robot)
Looking good - all the above failures were on 3 November, and today is 9 November, so at least 5 days with no reports. Let's leave this open until the end of the month, and then close it if no more failures...
I am not sure why we would have any failures at least with this error.  This was specifically because it was disabled, but has since been re-enabled.  I don't have a strong opinion about keeping it open, but I don't expect this to happen again.
Comment hidden (Intermittent Failures Robot)
Looking at the last 14 jobs that have been starred, it seems two had this error signature.  Other errors that I see starred are:

I see errors like:
HTTPError: 400 Client Error: BAD REQUEST 
ssl.SSLError: ('The read operation timed out',) 
2015-11-25 14:22:19,875 - INFO - retry: Giving up on <function <lambda> at 0x7f26f571ca28> 
[taskcluster:error] Task was aborted because states could not be created successfully. Error: Gateway Time-out
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.