Closed Bug 1245837 Opened 7 years ago Closed 6 years ago
python worker lib for signing+updates
* This worker will be based on https://github.com/mozilla/signingworker * The main difference for my needs will be the callback. If we have a way to inject a callback via config or python, we can genericize the overall worker * There's some enthusiasm over py35 + asyncio, which would make this dependent on bug 1245835
From irc -- part of the motivation for this is to have a worker and image that isn't governed by scopes for use with AUS/balrog and signing.
I like the idea of a generic worker in python. I assume it's basically a worjer-library you load and give config + a handle_task function and done.. For worker <-> queue interaction, things have changed since signing-worker was written. Using azure queues is more reliable, and temporary credentials returned from claimTask reduces the authority a worker can leak. I'll happily discuss this with you when you get to this point. There is docs: http://docs.taskcluster.net/queue/worker-interaction But it doesn't outline the temp creds stuff...
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #2) > I like the idea of a generic worker in python. I assume it's basically a > worjer-library you load and give config + a handle_task function and done.. That's the impression I got. I'll know more when I dig into signingworker, but we're under the impression the only thing that would change for a balrogworker would be the handle_task and config, so let's tear out the common code now. > For worker <-> queue interaction, things have changed since signing-worker > was written. Using azure queues is more reliable, and temporary credentials > returned from claimTask reduces the authority a worker can leak. > > I'll happily discuss this with you when you get to this point. There is docs: > http://docs.taskcluster.net/queue/worker-interaction Thanks! > But it doesn't outline the temp creds stuff... ok.
There seems to be confusion around the python-generic-worker name, and :dustin suggested python-worker-lib. I'm happy to rename if preferred.
Summary: python generic worker → python worker lib for signing+updates
Summing up some discussion here: * The new go generic worker  may work for us in the future. It will potentially allow for us to install or omit an engine and a number of plugins to configure its behavior to work for us; we could then potentially shell out to python. * However, this may not be ready in time for the balrog/signing work, and there is some disagreement whether the go generic worker is the right approach here. I'm going to continue on with the python generic worker lib for now, in the genericworker branch . The existing tests pass in python3.5 now.  http://docs.taskcluster.net/workers/taskcluster-worker/  https://github.com/escapewindow/python-generic-worker/tree/genericworker
Steps for testing a worker, populating as I go https://gist.github.com/escapewindow/8483872156a88b55d6638c6afa6ec867
Also, summing up the discussion in SF today: * I will be ditching the pulse portion of the python generic worker, because it's not necessary. * I will be following this workflow: http://docs.taskcluster.net/queue/worker-interaction/ * I need to reclaimTask() periodically while the task is running. This is a heartbeat that tells TC that we're still alive.
https://gist.github.com/escapewindow/931580c08de2a95550e262e32af73875 This uses asyncio to run a periodic function in the background while a script runs... I'll use this pattern for reclaimTask(). I don't think I actually need to be passing context to periodic(); the periodic calls stop when the loop.run_until_complete() call completes... though I might need it again if/when I start using run_forever().
I've got an extremely basic worker written now. I'm claiming the task (with the Azure dance), then running the task with reclaimTask() in the background. At the end I'm reportCompleted() or reportFailed() depending on the exit status of the script. Later I'll add exception exit statuses as well. I have to: * split the single file into multiple files -- this was for convenience. Starting to get unwieldy. * improve config (allow for simple argparse?) * retries * logfile for the worker, with rotation * upload artifacts * tests! * 100% unit test coverage * integration tests * automated task submission * docs! Then moving on to the signing script/library. Thinking about renaming this project 'relengworker' since the uses we have for it are releng-related. Also thinking about moving to a clean repo, instead of a fork of the signingworker repo. WIP is still in https://github.com/escapewindow/python-generic-worker/tree/genericworker
This is awesome! It looks like this will always use a subprocess to run the actual task. In that case, the language isn't particularly important. And the term "generic-worker" is already taken. So maybe "script-worker"?
(In reply to Dustin J. Mitchell [:dustin] from comment #10) > This is awesome! Thanks! > It looks like this will always use a subprocess to run the actual task. In > that case, the language isn't particularly important. Yes. That's the biggest reason behind this design. If we do decide that the taskcluster go worker is the future, then we can run the exact same signing/balrog script from there. It also has the nice properties of minimal worker configuration for the task specifics, and a strict separation between worker and task. > And the term > "generic-worker" is already taken. So maybe "script-worker"? Sure.
https://github.com/escapewindow/scriptworker is the most current wip.
I am now uploading artifacts to s3! Currently just the logs because I'm not putting anything in the artifact_dir yet, but in theory everything in the artifact_dir will get uploaded too. To do: * Do I want a max_timeout, output_timeout? The former is handled by taskcluster with its `expires` datetime. However, neither kills the running task on the worker. I could stop reclaiming the task if I pass either of these marks, which would lead to the task being claimed by another worker. I could also look into sigkilling the tasks, though in scriptharness I had to turn to `multiprocessing` to get that to work. I may end up leaving this for later, relying only on taskcluster's `expires`. Not sure. * Retries! taskcluster client handles retries for me, but I am using a few bare aiohttp calls without retries. I'd like to add them in. * Tests! * Docs! I have docstrings which will help here. I'd like a base set of docs in rtd. * Review!
Ready for 0.1.0 review: https://github.com/escapewindow/scriptworker/issues/1
0.1.0 ! https://github.com/escapewindow/scriptworker/issues/1 https://github.com/escapewindow/scriptworker/releases/tag/0.1.0 https://pypi.python.org/pypi/scriptworker/0.1 (looks like I'll want a README or README.rst for that, as opposed to the README.md I currently have) I will likely still need to make further changes as we roll out signing scriptworkers in TC, but I think this is far enough along to call this piece done.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.