Closed
Bug 1424383
Opened 7 years ago
Closed 7 years ago
Workers should terminate sooner after worker definition change
Categories
(Taskcluster :: Workers, enhancement)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Unassigned)
References
(Blocks 1 open bug)
Details
Currently, if you update a worker definition, old workers linger around and it takes hours - possibly days - for all workers in a pool to refresh and pick up the new worker definition.
Last week, we updated AMIs for bug 1291940 and bug 1415725. The initial AMIs were buggy in multiple ways. However, it took ~24 hours for us to notice some of the failures because old workers were still working and the percentage of new AMIs in service was initially very small.
When we deploy something, it is better to have meaningful results on the success of that deployment sooner rather than later.
This bug is a request to have workers terminate after their worker definition changes. i.e. if a worker definition is modified, the worker should refuse to process any new tasks. This will ensure that any worker definition changes result in a) all new tasks running on the new worker configuration immediately b) the worker pool refresh taking no longer than the longest execution time of a running task.
Reporter | ||
Comment 1•7 years ago
|
||
I'm going to nominate this for the stability effort. Comment #0 should be self-explanatory as to why I think it important for platform stability.
Blocks: tc-stability
Comment 2•7 years ago
|
||
Note, generic-worker workers check in every 30 mins to see if there are new AMIs and self terminate if there are.
This was implemented in bug 1298010 and rolled out in generic-worker 6.1.0 (see https://bugzilla.mozilla.org/show_bug.cgi?id=1298010#c16). The code changes are here: https://github.com/taskcluster/generic-worker/pull/27/files
We might want to use the same mechanism for docker-worker.
Comment 3•7 years ago
|
||
Workers now last 15 minutes without picking a job before shutdown.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•7 years ago
|
||
(In reply to Wander Lairson Costa [:wcosta] from comment #3)
> Workers now last 15 minutes without picking a job before shutdown.
That is true and is a good start. However, if a worker is busy, it could accumulate tasks and stay alive for hours or days after a configuration change.
The original request/issue is still valid. I would prefer to see all workers behave like generic-worker and self-terminate after a worker configuration change so there is an upper bound on the time between a configuration changing and tasks running on that configuration. So I encourage you to reopen this issue. Or resolve as WONTFIX (since docker-worker's days are apparently numbered).
Assignee | ||
Updated•6 years ago
|
Component: Docker-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•