python worker lib for signing+updates

RESOLVED FIXED

Status

Taskcluster
Worker
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: aki, Assigned: aki)

Tracking

Details

(Assignee)

Description

a year ago
* This worker will be based on https://github.com/mozilla/signingworker
* The main difference for my needs will be the callback.  If we have a way to inject a callback via config or python, we can genericize the overall worker
* There's some enthusiasm over py35 + asyncio, which would make this dependent on bug 1245835
(Assignee)

Updated

a year ago
Blocks: 1244181
From irc -- part of the motivation for this is to have a worker and image that isn't governed by scopes for use with AUS/balrog and signing.
I like the idea of a generic worker in python. I assume it's basically a worjer-library you load and give config + a handle_task function and done..

For worker <-> queue interaction, things have changed since signing-worker was written. Using azure queues is more reliable, and temporary credentials returned from claimTask reduces the authority a worker can leak.

I'll happily discuss this with you when you get to this point. There is docs:
http://docs.taskcluster.net/queue/worker-interaction

But it doesn't outline the temp creds stuff...
(Assignee)

Comment 3

a year ago
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #2)
> I like the idea of a generic worker in python. I assume it's basically a
> worjer-library you load and give config + a handle_task function and done..

That's the impression I got.  I'll know more when I dig into signingworker, but we're under the impression the only thing that would change for a balrogworker would be the handle_task and config, so let's tear out the common code now.

> For worker <-> queue interaction, things have changed since signing-worker
> was written. Using azure queues is more reliable, and temporary credentials
> returned from claimTask reduces the authority a worker can leak.
> 
> I'll happily discuss this with you when you get to this point. There is docs:
> http://docs.taskcluster.net/queue/worker-interaction

Thanks!

> But it doesn't outline the temp creds stuff...

ok.
(Assignee)

Comment 4

a year ago
There seems to be confusion around the python-generic-worker name, and :dustin suggested python-worker-lib.  I'm happy to rename if preferred.
(Assignee)

Updated

a year ago
Summary: python generic worker → python worker lib for signing+updates
(Assignee)

Comment 5

a year ago
Summing up some discussion here:

* The new go generic worker [1] may work for us in the future.  It will potentially allow for us to install or omit an engine and a number of plugins to configure its behavior to work for us; we could then potentially shell out to python.

* However, this may not be ready in time for the balrog/signing work, and there is some disagreement whether the go generic worker is the right approach here.

I'm going to continue on with the python generic worker lib for now, in the genericworker branch [2].
The existing tests pass in python3.5 now.

[1] http://docs.taskcluster.net/workers/taskcluster-worker/
[2] https://github.com/escapewindow/python-generic-worker/tree/genericworker
(Assignee)

Comment 6

a year ago
Steps for testing a worker, populating as I go
https://gist.github.com/escapewindow/8483872156a88b55d6638c6afa6ec867
(Assignee)

Comment 7

a year ago
Also, summing up the discussion in SF today:

* I will be ditching the pulse portion of the python generic worker, because it's not necessary.
* I will be following this workflow: http://docs.taskcluster.net/queue/worker-interaction/
 * I need to reclaimTask() periodically while the task is running.  This is a heartbeat that tells TC that we're still alive.
(Assignee)

Comment 8

a year ago
https://gist.github.com/escapewindow/931580c08de2a95550e262e32af73875

This uses asyncio to run a periodic function in the background while a script runs... I'll use this pattern for reclaimTask().

I don't think I actually need to be passing context to periodic(); the periodic calls stop when the loop.run_until_complete() call completes... though I might need it again if/when I start using run_forever().
(Assignee)

Comment 9

a year ago
I've got an extremely basic worker written now.
I'm claiming the task (with the Azure dance), then running the task with reclaimTask() in the background.
At the end I'm reportCompleted() or reportFailed() depending on the exit status of the script.  Later I'll add exception exit statuses as well.

I have to:

* split the single file into multiple files -- this was for convenience.  Starting to get unwieldy.
* improve config (allow for simple argparse?)
* retries
* logfile for the worker, with rotation
* upload artifacts
* tests!
 * 100% unit test coverage
 * integration tests
  * automated task submission
* docs!

Then moving on to the signing script/library.

Thinking about renaming this project 'relengworker' since the uses we have for it are releng-related.
Also thinking about moving to a clean repo, instead of a fork of the signingworker repo.

WIP is still in https://github.com/escapewindow/python-generic-worker/tree/genericworker
This is awesome!

It looks like this will always use a subprocess to run the actual task.  In that case, the language isn't particularly important.  And the term "generic-worker" is already taken.  So maybe "script-worker"?
(Assignee)

Comment 11

a year ago
(In reply to Dustin J. Mitchell [:dustin] from comment #10)
> This is awesome!

Thanks!

> It looks like this will always use a subprocess to run the actual task.  In
> that case, the language isn't particularly important.

Yes.  That's the biggest reason behind this design.  If we do decide that the taskcluster go worker is the future, then we can run the exact same signing/balrog script from there.

It also has the nice properties of minimal worker configuration for the task specifics, and a strict separation between worker and task.

> And the term
> "generic-worker" is already taken.  So maybe "script-worker"?

Sure.
(Assignee)

Comment 12

a year ago
https://github.com/escapewindow/scriptworker is the most current wip.
(Assignee)

Comment 13

a year ago
I am now uploading artifacts to s3!  Currently just the logs because I'm not putting anything in the artifact_dir yet, but in theory everything in the artifact_dir will get uploaded too.

To do:

* Do I want a max_timeout, output_timeout?  The former is handled by taskcluster with its `expires` datetime.  However, neither kills the running task on the worker.  I could stop reclaiming the task if I pass either of these marks, which would lead to the task being claimed by another worker.  I could also look into sigkilling the tasks, though in scriptharness I had to turn to `multiprocessing` to get that to work.

I may end up leaving this for later, relying only on taskcluster's `expires`.  Not sure.

* Retries!  taskcluster client handles retries for me, but I am using a few bare aiohttp calls without retries.  I'd like to add them in.

* Tests!

* Docs!  I have docstrings which will help here.  I'd like a base set of docs in rtd.

* Review!
(Assignee)

Comment 14

a year ago
Ready for 0.1.0 review: https://github.com/escapewindow/scriptworker/issues/1
(Assignee)

Comment 15

a year ago
0.1.0 !

https://github.com/escapewindow/scriptworker/issues/1
https://github.com/escapewindow/scriptworker/releases/tag/0.1.0
https://pypi.python.org/pypi/scriptworker/0.1 (looks like I'll want a README or README.rst for that, as opposed to the README.md I currently have)

I will likely still need to make further changes as we roll out signing scriptworkers in TC, but I think this is far enough along to call this piece done.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED

Updated

a year ago
Blocks: 1277682
You need to log in before you can comment on or make changes to this bug.