Closed
Bug 1493467
Opened 7 years ago
Closed 7 years ago
[MDC1] t-w1064-ms-036 not picking up tasks
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: zfay, Assigned: markco)
References
Details
Not exactly sure what is happening. Worker is not appearing in TC, tried to re-image it 2 days ago and hasn't recovered. The papertrail logs are showing generic-worker as:
Sep 22 16:11:16 T-W1064-MS-036.mdc1.mozilla.com generic-worker: Checking for C:\dsc\task-claim-state.valid file... #015
Sep 22 16:11:17 T-W1064-MS-036.mdc1.mozilla.com generic-worker: Checking for C:\dsc\task-claim-state.valid file... #015
Sep 22 16:11:18 T-W1064-MS-036.mdc1.mozilla.com generic-worker: Checking for C:\dsc\task-claim-state.valid file... #015
| Reporter | ||
Comment 1•7 years ago
|
||
@pmoore I cc-ed you in here as well to keep you in the loop. Not sure what is the issue yet or how much it has to do with the generic-worker.
| Assignee | ||
Updated•7 years ago
|
Assignee: relops → mcornmesser
| Assignee | ||
Comment 2•7 years ago
|
||
This is indicative that OCC did not complete or never ran.
Rundsc.ps1 creates it here:
https://github.com/mozilla-releng/OpenCloudConfig/blob/fb0c0f3dd021cee4c02c82feb3714ce38d0c2a82/userdata/rundsc.ps1#L1394
And the gneric-worker wrapper script looks for it here:
https://github.com/mozilla-releng/OpenCloudConfig/blob/fb0c0f3dd021cee4c02c82feb3714ce38d0c2a82/userdata/Configuration/GenericWorker/run-hw-generic-worker-10-and-reboot.bat#L43
I am going to kick off a new reimage. If it fails we have the time of this comment as marker for when to start looking in the logs.
| Assignee | ||
Comment 3•7 years ago
|
||
It has installed through and is looking for a task:
ep 23 15:03:03 T-W1064-MS-036.mdc1.mozilla.com generic-worker: 2018/09/23 22:03:02 Disk available: 43561398272 bytes#015
Sep 23 15:03:23 T-W1064-MS-036.mdc1.mozilla.com generic-worker: 2018/09/23 22:03:23 No task claimed. Idle for 1m12.1304384s (will exit if no task claimed in 1h58m47.8695616s).#015
| Assignee | ||
Comment 4•7 years ago
|
||
It looks like it is behaving now.:
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-win10-64-hw/workers/mdc1/T-W1064-MS-036
| Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 5•7 years ago
|
||
Re-opened ticket as the machine didn't took any jobs already for more then 8 hours.
I've rebooted it. I will check it again later
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-win10-64-hw/workers/mdc1/T-W1064-MS-036
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 6•7 years ago
|
||
This node is up and running now.
It looks like it spent 12 hours trying to recover:
https://papertrailapp.com/groups/1141234/events?q=ms-036%20AND%20Generic-worker.exe%20has%20not%20started%20within%20the%20expected%20time&focus=988477445416062984
Then recovered here:
https://papertrailapp.com/systems/2115494792/events?focus=988477156034240528&selected=988477156034240528
I will need to see a node as it is looping to troubleshoot this issue.
Closing this bug for now.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•