Closed
Bug 1475711
Opened 7 years ago
Closed 6 years ago
tasks in queue, but worker does not get tasks
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: dhouse, Unassigned)
References
Details
Sometimes a worker is running and requesting tasks, but it does not get tasks despite there being tasks in the queue.
Please record details when this happens. We need to know at what time the queue was checked, which worker was not taking work, and that the worker was polling for tasks.
Once we have some examples recorded, we can report the issue to the Taskcluster team for investigation.
Comment 1•7 years ago
|
||
I've just checked the Windows moonshots as we have a 1500 pending tasks. Current time is 9:00 am in UTC +3.
Workers that were lazy, aka not taken tasks for at least 60+ mins while having above mentioned queue, are:
T-W1064-MS-063 T-W1064-MS-066
T-W1064-MS-068 T-W1064-MS-069
T-W1064-MS-075 T-W1064-MS-081
T-W1064-MS-089 T-W1064-MS-106
T-W1064-MS-107 T-W1064-MS-110
T-W1064-MS-115 T-W1064-MS-123
T-W1064-MS-127 T-W1064-MS-129
T-W1064-MS-134 T-W1064-MS-157
T-W1064-MS-161 T-W1064-MS-162
T-W1064-MS-164 T-W1064-MS-165
T-W1064-MS-167 T-W1064-MS-168
T-W1064-MS-169 T-W1064-MS-220
T-W1064-MS-256 T-W1064-MS-264
T-W1064-MS-265 T-W1064-MS-294
Comment 2•7 years ago
|
||
So I rebooted all of them a little over half of them are still lazy. I've checked the logs from papertrail for the ones that didn't recover. To help dhouse and the taskcluster team I'll leave a few of the machines untouched so they can troubleshoot the problem themselves. I'll proceed ahead with re-imaging the rest.
Untouched workers after reboot:
T-W1064-MS-066 - Simple worker issue: doesn't pick up tasks after reboot.
https://papertrailapp.com/systems/1737001651/events
T-W1064-MS-068 - Didn't report to papertrail since july 13th.
https://papertrailapp.com/systems/1670078631/events
T-W1064-MS-089 - Seems like it's looking to take jobs. from papertrail: generic-worker: Checking for C:\dsc\task-claim-state.valid file... #015
https://papertrailapp.com/systems/1644165731/events
T-W1064-MS-106 - An example of worker becoming lazy with the last task completed as exception.
https://papertrailapp.com/systems/1730304451/events
I want to check in logs to see if this problem has repeated. I have heard of it happening for Windows workers also, and perhaps being an issue that affects specific workers for periods of time.
Flags: needinfo?(dhouse)
We do not have the historic data for the taskcluster queues to easily check this right now.
If we see this happen, we can record it and investigate more.
Flags: needinfo?(dhouse)
Comment 5•6 years ago
|
||
Hey Dave, can this be closed? I don't think this issue has been occurring lately.
Flags: needinfo?(dhouse)
(In reply to Zsolt Fay [:zfay] from comment #5)
Hey Dave, can this be closed? I don't think this issue has been occurring lately.
Sure. I'll close it out. Thanks!
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(dhouse)
Resolution: --- → WORKSFORME
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•