Closed
Bug 1498226
Opened 6 years ago
Closed 6 years ago
Windows machines are not taking jobs
Categories
(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)
Infrastructure & Operations
RelOps: OpenCloudConfig
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: noemi_erli, Assigned: grenade)
Details
Trees are closed because of this
Push with pending tasks: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending,testfailed,busted,exception&classifiedState=unclassified&group_state=expanded&searchStr=windows&revision=6c4930dac581c9d9acc581d66047d3a0d520e965&selectedJob=204753706
Artifact downloaded with 0 bytes: https://treeherder.mozilla.org/logviewer.html#?job_id=204780368&repo=autoland&lineNumber=460-462
Assignee | ||
Comment 1•6 years ago
|
||
i made an occ change today affecting the windows hosts file (bug 1497308, https://github.com/mozilla-releng/OpenCloudConfig/commit/82d15f355275a5b78427ebaacc55a37014f5512c).
if these errors are caused by a failure to communicate with taskcluster proxy (i need help from someone who knows), then my change could be responsible.
Assignee | ||
Comment 2•6 years ago
|
||
i think it's safe to say it was not the host file change that caused the backlog. safe to ignore comment 1. i tested with this, just now:
https://tools.taskcluster.net/groups/FZTzsiTSQ82mzd5qWdIcgQ/tasks/FZTzsiTSQ82mzd5qWdIcgQ/runs/0/logs/public%2Flogs%2Flive.log
Assignee | ||
Comment 3•6 years ago
|
||
papertrail logs indicate that instances are rebooting after gw creates the task user but the occ lock file has not yet been deleted at the time of the first reboot. i think this is fallout from another change of mine (bug 1479889, https://github.com/mozilla-releng/OpenCloudConfig/commit/a5ac4e7).
i think that when we were still formatting the z: drive, the machine stayed up long enough to delete the lock file and that now we've removed the drive formatting, a race condition exists where the machine reboots with the lock file intact.
i have put in a simple patch in the gw wrapper script: https://github.com/mozilla-releng/OpenCloudConfig/commit/4c854ad
this should get the workers taking jobs again.
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → rthijssen
Status: NEW → ASSIGNED
Component: Operations → Relops: OpenCloudConfig
Product: Taskcluster → Infrastructure & Operations
QA Contact: rthijssen
Assignee | ||
Comment 4•6 years ago
|
||
patch appears to have worked. instances are taking work again.
![]() |
||
Updated•6 years ago
|
Summary: Windows machines are not taking jobs - artifacts are not being build → Windows machines are not taking jobs
![]() |
||
Comment 5•6 years ago
|
||
Trees reopened at 15:23 UTC except autoland which waits for build and test coverage.
![]() |
||
Comment 6•6 years ago
|
||
Autoland reopened at 18:02 UTC.
Assignee | ||
Updated•6 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•