Closed
Bug 1345071
Opened 8 years ago
Closed 8 years ago
CLOSED TREES for Taskcluster Problems on integration and aurora trees
Categories
(Taskcluster :: General, enhancement)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cbook, Unassigned)
References
Details
Taskcluster seems to run into problems currently.
We first encountered problems in nightlys with Bug 1345066 and now
a) Decision tasks on autoland and inbound are started twice
b) live log urls go to a local ip like https://gzbsawyaaaavvkdmqrgveamgolg26yxji4tqrw3y5zvkrb5e.taskcluster-worker.net:32805/log/bOXwwPYrTwGDU4R-LClhVQ
also jhford mentioned issues he see, so something is going - closing integration + try trees for investigation of all this.
Reporter | ||
Comment 1•8 years ago
|
||
seems we stabilize so reopen the closed trees but leaving this bug open for investigation
Severity: blocker → normal
Reporter | ||
Comment 2•8 years ago
|
||
from jhford (posting here since he has bugzilla problems:
The issue seems to be that there was an SSL problem in the uploads. The failure occured while doing uploads. The SSL cert for the queue works in firefox. I verified that cloud-mirror has a valid cert even though it is not involved in the upload chain at all.
The error messages that I was seeing:
>
> [taskcluster:error] Error uploading "public/build/host/bin/mar" artifact. timeout of 30000ms exceeded
>
> [taskcluster:error] Error uploading "public/build/target.cppunittest.tests.zip" artifact. timeout of 30000ms exceeded
>
> [taskcluster:error] Error calling 'stopped' for artifactHandler : Encountered error when uploading artifact(s)
> [taskcluster 2017-03-07 11:09:02.665Z] Unsuccessful task run with exit code: -1 completed in 2141.379 seconds
Without more information in the debugging logs for the job, I can't tell exactly what timed out. I suspect that it was the S3 endpoint. Given that it seems to be working again, I think this was a momentary outtage that wasn't big enough to trigger a warning on the aws status site.
Comment 3•8 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #0)
> Taskcluster seems to run into problems currently.
>
> We first encountered problems in nightlys with Bug 1345066 and now
>
> a) Decision tasks on autoland and inbound are started twice
They are being restarted due to failure, I think (at least from the rev in that bug).
> b) live log urls go to a local ip like
> https://gzbsawyaaaavvkdmqrgveamgolg26yxji4tqrw3y5zvkrb5e.taskcluster-worker.
> net:32805/log/bOXwwPYrTwGDU4R-LClhVQ
This is normal for running tasks.
Looking at the decision task failures:
[task 2017-03-07T08:50:05.897192Z] "PUT /queue/v1/task/dvkrOhASSAK0RYfeS6s7PQ HTTP/1.1" 200 378
[task 2017-03-07T09:04:30.756152Z] "PUT /queue/v1/task/Y2iohALXQBudfZttuNIVVQ HTTP/1.1" 400 10310
and that 400 was due, basically, to the PUT taking too long (more than 15 minutes). This operation does not depend on S3, but does depend on network between the heroku ELBs and workers. The result is the same, though: something failed in one of our providers' systems, and is now resolved.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•