Closed
Bug 1142630
Opened 9 years ago
Closed 6 years ago
docker-worker: Report exception on any runs in progess when a worker crashes.
Categories
(Taskcluster :: Workers, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: jlal, Unassigned)
Details
(Whiteboard: [docker-worker])
Workers do crash... When they do we wait for the queue to do the reclaim magic... This works but it is very slow (20min!) lets implement a logging system which will let us do this: -> when claiming a task add line to log (with run id/etc...) -> when task is finished add line to log -> When we crash On boot check for the log... if the log is present check for any incomplete tasks... If incomplete tasks are found verify that they are still running with the workers id and the run id during the crash. Report an exception for that run. The append only logs can be done safely (and are atomic within certain size limits). The lines can be json. Example: { state: 'running', taskId: .., runId: ... }
Comment 1•9 years ago
|
||
I think doing actual log is overkill. Whenever a task is claimed or resolved just do: fs.writeFileSync('/var/docker-worker/running-tasks.json', JSON.stringify([ { taskId: '...', runId: '...' } ])); When you do fs.writeFileSync nothing else can crash docker-worker. I'm not afraid of being killed by I/O issues, that's like extra rare. Doing atomic writes right is hard. And this is a small file. If you want it to be a atomic use a mv command after writing the file. But IMO it's acceptable to write a local file like this sync.
Updated•9 years ago
|
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
Updated•8 years ago
|
Whiteboard: [docker-worker]
Updated•8 years ago
|
Component: Docker-Worker → Worker
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•5 years ago
|
Component: Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•