No instances for platforms with pending

RESOLVED FIXED

Status

Taskcluster
General
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: dustin, Assigned: jhford)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

a year ago
Created attachment 8783978 [details]
Screenshot from 2016-08-23 10-33-06.png

We have 12 pending for taskcluster-images, and no instances.  Build and test are getting big pendings, too.
(Reporter)

Comment 1

a year ago
Aug 23 09:33:57 taskcluster-aws-provisioner2 app/provisioner.1: 14:33:57.400Z  INFO aws-provisioner: got pending tasks count (workerType=taskcluster-images, pendingTasks=12) 
Aug 23 09:33:57 taskcluster-aws-provisioner2 app/provisioner.1: 14:33:57.401Z  INFO aws-provisioner: desired change is within limits (runningCapacity=0, pendingCapacity=0, pendingTasks=12, idealChange=12, actualChange=12) 
Aug 23 09:33:57 taskcluster-aws-provisioner2 app/provisioner.1: 14:33:57.401Z  INFO aws-provisioner: determined change (workerType=taskcluster-images, change=12) 

yet it doesn't even attempt to start any, according to the logs.
Assignee: nobody → jhford
This is fallout from code that I landed a little earlier.  I've rolled heroku back to last night's copy to restore service.  I'll retry the landing tomorrow.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Once you discover what happened with the code that landed earlier, could you update this bug with the root cause?
(Reporter)

Comment 4

a year ago
Let's keep this open until the trees are re-opened, at least (preferably until the root cause is documented)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Trees reopened after the gecko decision tasks started running.
Comment 5 was meant to go into bug 1297451.
Root cause:
https://github.com/taskcluster/aws-provisioner/pull/114/commits/c772d4d1aaff5d1a3127c924b7558fae7973a085#diff-a28f5109b445a26ce3277f34a1ef34f6L152

I switched from `Array.indexOf(x) === -1` to `_.includes(x)`, but it should have been `!_.includes(x)`.  That means that I would only be trying to spawn *invalid* worker types instead of only spawning valid ones.

Dustin, is the current tree closure this bug?  It seems that bug 1297451 happened after this one, so it follows that if 1297451 is a second tree closure that started after this one, that the trees must have been between this bug and 1297451.

Can we FIXED this?
(Reporter)

Comment 8

a year ago
Thanks!  Yes, I'm not sure what time the trees re-opened for this bug, but it was shortly after comment 4, and you're right that bug 1297451 is a separate, subsequent event.
Status: REOPENED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.