Closed Bug 1474953 Opened 7 years ago Closed 5 years ago

a workers' recent tasks may not exist

Categories

(Taskcluster :: UI, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: dhouse, Assigned: hassan)

References

Details

(Whiteboard: taskcluster-web)

I have a worker that is not getting tasks when it requests them (and there are tasks in the queue). Please remove/reset/clear this worker's definition is task cluster so that it does not have a missing task id in recentTasks. Viewed from worker explorer, https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-415, this error is displayed: ``` Error ResourceNotFound eXaoNqntTqirwYAIuPYzcQ does not correspond to a task that exists. Are you sure this task has already been submitted? ``` Here is the current worker state from the queue api: ``` { "workerType": "gecko-t-osx-1010", "provisionerId": "releng-hardware", "workerId": "t-yosemite-r7-415", "workerGroup": "mdc1", "recentTasks": [ { "taskId": "HuE1-zbqQsWfcnHqr6ZZaQ", "runId": 0 }, { "taskId": "O4h25IxNQF6oNwHJTtEgRw", "runId": 0 }, { "taskId": "LxvEBScLRICJyX9hcL-FIw", "runId": 0 }, { "taskId": "Z2oqfWayS4GMAk7o780hZg", "runId": 0 }, { "taskId": "eXaoNqntTqirwYAIuPYzcQ", "runId": 0 }, { "taskId": "FBcNC4elRm-OFIQaMc2G4w", "runId": 0 }, { "taskId": "dGhh_iXdTu6rrfKfMeRXjw", "runId": 0 }, { "taskId": "HfDBWMjDQESoxI8k2E0TxQ", "runId": 0 }, { "taskId": "WU-P-dBkSCGuqbGy2nzLxQ", "runId": 0 }, { "taskId": "aFNOVf4tRsmczcovtMjgAQ", "runId": 0 }, { "taskId": "aDii_Q6STnK61ZzgiY674w", "runId": 0 }, { "taskId": "Y8l6yREoSDG91Kj5Soth-Q", "runId": 0 }, { "taskId": "TTBLg2LGS5S2-LBK8CcxbQ", "runId": 0 }, { "taskId": "Pto0whIXQTyEKkiNBu4T9A", "runId": 0 }, { "taskId": "CBnIRFlQSkerkeA4sUv0Cg", "runId": 0 }, { "taskId": "MSbGeKb9RqOZWwJn1rUf6A", "runId": 0 }, { "taskId": "Eif3Usc9SE2eoCrWHrnWnA", "runId": 0 }, { "taskId": "X6Hxk3TkSiC_-I_TLcRuow", "runId": 0 }, { "taskId": "BZCmtql-Tm2h_iDJyaL_vw", "runId": 0 }, { "taskId": "JaMpPUB2TqaNqdW_tqX26w", "runId": 0 } ], "expires": "2018-07-12T15:47:27.437Z", "firstClaim": "2018-05-10T13:33:11.530Z", [...] ] ```
That's just a tools bug -- that error should be swallowed and the UI should omit the row or show it as missing. Presumably that task has expired. The only state stored in the queue for a worker is this sort of advisory stuff used for debugging via the UI -- none of it affects claiming. So something else is wrong with the claiming process.
Component: Operations → Tools
Summary: Invalid task id. Please clear/remove worker definition. → a workers' recent tasks may not exist
I'm okay with the tasks not existing as long that is not breaking the claiming process. From the worker logs, it looks like generic-worker is requesting tasks and getting nothing (despite tasks being in the queue). What might cause that?
We looked in the logs and were unable to answer the question in comment 2. But let's leave this open to fix the UI issue.
This worker is taking tasks now. It "reclaimed" a task soon after I created this bug, and it has been running tasks since then but the tools ui still shows the error (I expect when that bad taskid "scrolls" out of the recent tasks then it will display this worker correctly).
I see this problem happened again for another worker: https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-357 ``` Error ResourceNotFound eTZxpnbWRxWqU825BrrRkQ does not correspond to a task that exists. Are you sure this task exists? { "method": "status", "params": { "taskId": "eTZxpnbWRxWqU825BrrRkQ" }, "payload": {}, "time": "2018-07-26T15:14:37.308Z" } ``` ``` $ wget -O - https://queue.taskcluster.net/v1/task/eTZxpnbWRxWqU825BrrRkQ/status --2018-07-26 09:20:47-- https://queue.taskcluster.net/v1/task/eTZxpnbWRxWqU825BrrRkQ/status Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt' Resolving queue.taskcluster.net (queue.taskcluster.net)... 184.72.216.59, 50.19.109.135, 50.16.233.7 Connecting to queue.taskcluster.net (queue.taskcluster.net)|184.72.216.59|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2018-07-26 09:20:48 ERROR 404: Not Found. ``` ``` $ wget -q -O - https://queue.taskcluster.net/v1/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-357 { "workerType": "gecko-t-osx-1010", "provisionerId": "releng-hardware", "workerId": "t-yosemite-r7-357", "workerGroup": "mdc1", "recentTasks": [ { "taskId": "eTZxpnbWRxWqU825BrrRkQ", "runId": 0 }, { "taskId": "TvfbKwmvSj29Q-kHS9kiyQ", "runId": 0 }, { "taskId": "DqRilpRuSyeGQgUQTkKPCQ", "runId": 0 }, { "taskId": "BsNAhcmhQjuCldEOlqSRkg", "runId": 0 }, { "taskId": "SdiZaW9_T9mh816wP5_JZQ", "runId": 0 }, { "taskId": "aRtdkQUJSAu2Ff2g59f73Q", "runId": 0 }, { "taskId": "aelzKWQkS52xSg8HyTZKbQ", "runId": 0 }, { "taskId": "fwRJXdPXT3GQKG0nUsg1PA", "runId": 0 }, { "taskId": "aRbAnwa7Q8OWhegJWQ0pPQ", "runId": 0 }, { "taskId": "PSUmmEg2Te-ifiFi8B7jGA", "runId": 0 }, { "taskId": "GKFj3kjORwSbGkrJ-8PSGw", "runId": 0 }, { "taskId": "Vj3wO7H6Q5-R7Je9IuYPsw", "runId": 0 }, { "taskId": "FJchuXl4TV-OnVCmtTJycA", "runId": 0 }, { "taskId": "FG6NKFhzTYeCW2CRqQ43nA", "runId": 0 }, { "taskId": "JJ0DETYPQRiKik94R-KvTg", "runId": 0 }, { "taskId": "Pg1tJeoMTPioZnVNKUSrMA", "runId": 0 }, { "taskId": "RAZRPgoZQ-qHf8H9LfBSLg", "runId": 0 }, { "taskId": "QEFHAK_zRsidfTgLsGB1Ww", "runId": 0 }, { "taskId": "J2hVj6GGQwqXKxbk8MkZIg", "runId": 0 }, { "taskId": "SJas7EHxTP6PlxPb_r9nmw", "runId": 0 } ], "expires": "2018-06-13T13:17:02.366Z", "firstClaim": "2017-10-18T18:27:29.898Z", "quarantineUntil": "3018-06-12T13:29:07.000Z", "actions": [ { "name": "ping", "title": "ping", "context": "worker", "url": "https://roller1.srv.releng.mdc1.mozilla.com:443/api/v1/workers/<workerId>/jobs?provisioner_id=<provisionerId>&worker_type=<workerType>&worker_group=<workerGroup>&task_name=ping", "method": "POST", "description": "ping server" }, { "name": "reboot", "title": "reboot", "context": "worker", "url": "https://roller1.srv.releng.mdc1.mozilla.com:443/api/v1/workers/<workerId>/jobs?provisioner_id=<provisionerId>&worker_type=<workerType>&worker_group=<workerGroup>&task_name=reboot", "method": "POST", "description": "reboot hardware" } ] } ```
Per earlier discussion, this is a UI bug, and it's not fixed yet so not surprising it still occurs :)
This has been implemented in taskcluster-tools but still need to be done in taskcluster-web. I can take care of the rest. Thank you Dave for the pull-request :)
Assignee: nobody → helfi92
Whiteboard: taskcluster-web
Component: Tools → UI and Tools

Dave: is this still happening in the new UI?

Flags: needinfo?(dhouse)

Closing this sounds good.

I think this was a problem for CIDuty when they could not view workers in the UI because the task data had been dropped (and the UI would fail the entire worker view/page when it could not load the tasks).

Flags: needinfo?(dhouse)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.