Add a report-error capability to taskcluster-worker-runner and generic-worker
Categories
(Taskcluster :: Workers, task)
Tracking
(Not tracked)
People
(Reporter: dustin, Unassigned, Mentored)
References
Details
Bug 1558346 adds a workerManager.reportWorkerError
API method. Let's set up tc-worker-runner to support calling that API method in response to protocol messages from the worker and in response to errors starting the worker.
Then modify generic-worker to support sending such messages when it encounters errors.
Comment 1•6 years ago
•
|
||
Since we decided that it should be possible for generic-worker to run independently of tc-worker-runner (i.e. tc-worker-runner is an optional deployment component), I wanted to check that you mean generic-worker should call workerManager.reportWorkerError
directly rather than reporting failures to tc-worker-runner, and tc-worker-runner then forwarding the worker error on to workerManager.
If that's what you meant, then all is ok. :-)
Rationale: I see workerManager.reportWorkerError
as a straight out replacement to the current sentry reporting in generic-worker, that avoids tying us to a particular crash reporting vendor (sentry), albeit with the acknowledged and accepted caveat that we will need to duplicate/maintain some features that existed already in sentry. tc-worker-runner should also report bootstrapping failures to workerManager, but environments without tc-worker-runner should still be able to report worker errors.
Reporter | ||
Comment 2•6 years ago
|
||
g-w can still run without tc-worker-runner, but by simply displaying the error to stderr in that case (or whatever it already does). It would use the presence of a protocol feature to determine when it can use the worker-runner reporting approach, and when there's no worker-runner then that feature is implicitly not available and it just reports as usual.
I don't think this is a sentry replacement -- as you mentioned, this is about bootstrapping failures that a worker-pool admin might want to know about (e.g., config parameter missing), rather than crashes.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 3•6 years ago
|
||
This isn't blocking the tc-cloudops rollout, but is still a heckin' good thing to implement.
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Moved this to https://github.com/taskcluster/taskcluster/issues/2887
Description
•