Closed Bug 1498226 Opened 6 years ago Closed 6 years ago

Windows machines are not taking jobs

Categories

(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: noemi_erli, Assigned: grenade)

Details

i made an occ change today affecting the windows hosts file (bug 1497308, https://github.com/mozilla-releng/OpenCloudConfig/commit/82d15f355275a5b78427ebaacc55a37014f5512c). if these errors are caused by a failure to communicate with taskcluster proxy (i need help from someone who knows), then my change could be responsible.
i think it's safe to say it was not the host file change that caused the backlog. safe to ignore comment 1. i tested with this, just now: https://tools.taskcluster.net/groups/FZTzsiTSQ82mzd5qWdIcgQ/tasks/FZTzsiTSQ82mzd5qWdIcgQ/runs/0/logs/public%2Flogs%2Flive.log
papertrail logs indicate that instances are rebooting after gw creates the task user but the occ lock file has not yet been deleted at the time of the first reboot. i think this is fallout from another change of mine (bug 1479889, https://github.com/mozilla-releng/OpenCloudConfig/commit/a5ac4e7). i think that when we were still formatting the z: drive, the machine stayed up long enough to delete the lock file and that now we've removed the drive formatting, a race condition exists where the machine reboots with the lock file intact. i have put in a simple patch in the gw wrapper script: https://github.com/mozilla-releng/OpenCloudConfig/commit/4c854ad this should get the workers taking jobs again.
Assignee: nobody → rthijssen
Status: NEW → ASSIGNED
Component: Operations → Relops: OpenCloudConfig
Product: Taskcluster → Infrastructure & Operations
QA Contact: rthijssen
patch appears to have worked. instances are taking work again.
Summary: Windows machines are not taking jobs - artifacts are not being build → Windows machines are not taking jobs
Trees reopened at 15:23 UTC except autoland which waits for build and test coverage.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.