Bug 1519849 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

The `claimWork` endpoint should hold requests for 20s, since this this endpoint implements long polling.

Sometimes this isn't happening.

I've been able to reproduce this locally.

```
$ date; echo; echo '{"workerGroup": "mdc1", "workerId": "t-yosemite-r7-380", "tasks": 1}' | curl -v --header 'Content-Type: application/json' --request POST --data @- http://localhost:8080/queue/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta; echo; echo; date
Mon 14 Jan 2019 15:50:57 CET

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> POST /queue/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 68
> 
* upload completely sent off: 68 out of 68 bytes
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin,Cache-Control
< Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT
< Access-Control-Allow-Origin: *
< Access-Control-Max-Age: 900
< Access-Control-Request-Method: *
< Cache-Control: no-store no-cache must-revalidate
< Connection: keep-alive
< Content-Length: 17
< Content-Security-Policy: report-uri /__cspreport__;default-src 'none';frame-ancestors 'none';
< Content-Type: application/json; charset=utf-8
< Date: Mon, 14 Jan 2019 14:50:57 GMT
< Etag: W/"11-IIixY78GOTb7jUPZdbQZ7mNo6gk"
< Server: Cowboy
< Via: 1.1 vegur
< X-Content-Type-Options: nosniff
< X-For-Request-Id: 130321b2-6e93-499c-ad03-832e2e8eb7d9
< X-Taskcluster-Endpoint: https://queue.taskcluster.net/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta
< X-Taskcluster-Proxy-Perm-Clientid: mozilla-ldap/pmoore@mozilla.com/dev
< X-Taskcluster-Proxy-Revision: 
< X-Taskcluster-Proxy-Version: 5.0.1
< 
{
  "tasks": []
* Connection #0 to host localhost left intact
}

Mon 14 Jan 2019 15:50:57 CET
```

Here we see this request returns a 200 HTTP response with zero tasks in less than a second.
The `claimWork` endpoint should hold requests for 20s, since this this endpoint implements long polling.

Sometimes this isn't happening.

I've been able to reproduce this locally.

```
$ date; echo; echo '{"workerGroup": "mdc1", "workerId": "t-yosemite-r7-380", "tasks": 1}' | curl -v --header 'Content-Type: application/json' --request POST --data @- http://localhost:8080/queue/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta; echo; echo; date
Mon 14 Jan 2019 15:50:57 CET

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> POST /queue/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 68
> 
* upload completely sent off: 68 out of 68 bytes
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin,Cache-Control
< Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT
< Access-Control-Allow-Origin: *
< Access-Control-Max-Age: 900
< Access-Control-Request-Method: *
< Cache-Control: no-store no-cache must-revalidate
< Connection: keep-alive
< Content-Length: 17
< Content-Security-Policy: report-uri /__cspreport__;default-src 'none';frame-ancestors 'none';
< Content-Type: application/json; charset=utf-8
< Date: Mon, 14 Jan 2019 14:50:57 GMT
< Etag: W/"11-IIixY78GOTb7jUPZdbQZ7mNo6gk"
< Server: Cowboy
< Via: 1.1 vegur
< X-Content-Type-Options: nosniff
< X-For-Request-Id: 130321b2-6e93-499c-ad03-832e2e8eb7d9
< X-Taskcluster-Endpoint: https://queue.taskcluster.net/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta
< X-Taskcluster-Proxy-Perm-Clientid: mozilla-ldap/pmoore@mozilla.com/dev
< X-Taskcluster-Proxy-Revision: 
< X-Taskcluster-Proxy-Version: 5.0.1
< 
{
  "tasks": []
* Connection #0 to host localhost left intact
}

Mon 14 Jan 2019 15:50:57 CET
```

Here we see this request returns a 200 HTTP response with zero tasks in less than a second.

Note, this is especially weird (_wrong_) because the Queue claims there are no workers of this worker type:

```
$ curl 'http://localhost:8080/queue/v1/provisioners/releng-hardware/worker-types/gecko-t-osx-1010-beta/workers'
{
  "workers": []
}
```

Even if that worker was quarantined, it should show up in the list of workers, if we have just called claimWork endpoint.

So something is going wrong in the Queue.


Note - both of the above requests were made via the taskcluster-proxy with sufficient scopes.
The `claimWork` endpoint should hold requests for 20s, since this this endpoint implements long polling.

Sometimes this isn't happening.

I've been able to reproduce this locally.

```
$ date; echo; echo '{"workerGroup": "mdc1", "workerId": "t-yosemite-r7-380", "tasks": 1}' | curl -v --header 'Content-Type: application/json' --request POST --data @- http://localhost:8080/queue/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta; echo; echo; date
Mon 14 Jan 2019 15:50:57 CET

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> POST /queue/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 68
> 
* upload completely sent off: 68 out of 68 bytes
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: X-Requested-With,Content-Type,Authorization,Accept,Origin,Cache-Control
< Access-Control-Allow-Methods: OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT
< Access-Control-Allow-Origin: *
< Access-Control-Max-Age: 900
< Access-Control-Request-Method: *
< Cache-Control: no-store no-cache must-revalidate
< Connection: keep-alive
< Content-Length: 17
< Content-Security-Policy: report-uri /__cspreport__;default-src 'none';frame-ancestors 'none';
< Content-Type: application/json; charset=utf-8
< Date: Mon, 14 Jan 2019 14:50:57 GMT
< Etag: W/"11-IIixY78GOTb7jUPZdbQZ7mNo6gk"
< Server: Cowboy
< Via: 1.1 vegur
< X-Content-Type-Options: nosniff
< X-For-Request-Id: 130321b2-6e93-499c-ad03-832e2e8eb7d9
< X-Taskcluster-Endpoint: https://queue.taskcluster.net/v1/claim-work/releng-hardware/gecko-t-osx-1010-beta
< X-Taskcluster-Proxy-Perm-Clientid: mozilla-ldap/pmoore@mozilla.com/dev
< X-Taskcluster-Proxy-Revision: 
< X-Taskcluster-Proxy-Version: 5.0.1
< 
{
  "tasks": []
* Connection #0 to host localhost left intact
}

Mon 14 Jan 2019 15:50:57 CET
```

Here we see this request returns a 200 HTTP response with zero tasks in less than a second.

Note, this is especially weird (_wrong_) because the Queue claims there are no workers of this worker type:

```
$ curl 'http://localhost:8080/queue/v1/provisioners/releng-hardware/worker-types/gecko-t-osx-1010-beta/workers'
{
  "workers": []
}
```

Even if that worker was quarantined, it should show up in the list of workers, if we have just called `claimWork` endpoint.

So something is going wrong in the Queue.


Note - both of the above requests were made via the `taskcluster-proxy` with sufficient scopes.

Back to Bug 1519849 Comment 0