Closed Bug 1401995 Opened 3 years ago Closed 3 years ago

Migrate funsize to python3


(Release Engineering :: General, enhancement)

Not set


(firefox59 fixed)

Tracking Status
firefox59 --- fixed


(Reporter: sfraser, Assigned: sfraser, Mentored)


(Blocks 1 open bug)


(Keywords: good-first-bug)


(2 files)

Funsize still uses python2, we should be able to get some run time gains by swapping to python3 and asynchronously downloading the MARs, and running the external shell commands.
Blocks: funsize
The update generator is in-tree:

The Makefile and README have container build and local running examples

scripts/ needs updating to Python3. Asynchronous downloads and external calls should reduce the run time, too.

At the moment, funsize does the following:

For each partial to build:
    for that partial's from_mar and to_mar:
         download the mar
         call an external unwrapper
         call clamscan 

We could instead find the set of MARs to download, and run the download/unwrap/clamscan calls in async functions, allowing them to run concurrently. Then once all are present, run each external asynchonously, too.
Mentor: sfraser
Keywords: good-first-bug
Would love to take this on if you could assign it to me -- can you give me an idea on how to get started?
Flags: needinfo?(sfraser)
Hi nikshepsvn, I apologise - I did the work for this just before Christmas, and I'd forgotten to take the bug. There's still improvement that can be done afterwards, but it'd be a bit more open ended, so it's up to you whether you'd want to look at this component, or we can work through something else. Perhaps we can find something similar to work on?
Flags: needinfo?(sfraser)
Assignee: nobody → sfraser
Comment on attachment 8939537 [details]
Bug 1401995 Update funsize to use async, to reduce task time

It looks good to me overall. I'm concerned about `redo` and `resp`. Feel free to ask me for another round of review, if the changes are too heavy.

::: taskcluster/docker/funsize-update-generator/
(Diff revision 1)
>  # -> bucket of tc-gp-private-1d-us-east-1, path of releng/mbsdiff-cache/
>  # Trailing slash is important, due to prefix permissions in S3.
>  S3_BUCKET_AND_PATH=$(jq -r '.scopes[] | select(contains ("auth:aws-s3"))' /home/worker/task.json | awk -F: '{print $4}')
>  # Will be empty if there's no scope for AWS S3.
> -if [ -n "${S3_BUCKET_AND_PATH}" ]; then
> +if [ -n "${S3_BUCKET_AND_PATH}" ] && getent hosts taskcluster

TIL `getent`

::: taskcluster/docker/funsize-update-generator/scripts/
(Diff revision 1)
>      r.raise_for_status()
>      return r.json().get('secret', {})
>  @redo.retriable()
> -def download(url, dest, mode=None):
> +async def download(url, dest, mode=None):  # noqa: E999

IIUC, redo won't work anymore on this async function

::: taskcluster/docker/funsize-update-generator/scripts/
(Diff revision 1)
> +                        break
> -            fd.write(chunk)
> +                    fd.write(chunk)
> -            bytes_downloaded += len(chunk)
> +                    bytes_downloaded += len(chunk)
>      log.debug('Downloaded %s bytes', bytes_downloaded)
> -    if 'content-length' in r.headers:
> +    if 'content-length' in resp.headers:

`resp` shoud be undefined in this scope. I wonder if that works in the logs[1], because the ContextManager is still open in a different async context.


::: taskcluster/docker/funsize-update-generator/scripts/
(Diff revision 1)
> +    if not stderr:
> +        stderr = ""
> +    if not stdout:
> +        stdout = ""
> +
> +    for line in stdout.splitlines() + stderr.splitlines():

Should we log stderr in `log.warn` or `log.error` instead of `log.debug`?

::: taskcluster/docker/funsize-update-generator/scripts/
(Diff revision 1)
> +    await run_command(cmd, cwd=dest_dir, env=env, label=dest_dir)
> +
>  def find_file(directory, filename):
>      log.debug("Searching for %s in %s", filename, directory)
> -    for root, dirs, files in os.walk(directory):
> +    for root, _, files in os.walk(directory):


::: taskcluster/docker/funsize-update-generator/scripts/
(Diff revision 1)
> -    def setup(self):
> -        self.download_unwrap()
> -        self.download_martools()
> +    async def setup(self):
> +        await self.download_unwrap()
> +        await self.download_martools()
> +
> +    async def clone(self, workenv):

Nit: I don't think this function needs to be async.

::: taskcluster/docker/funsize-update-generator/scripts/
(Diff revision 1)
> -    manifest = []
> -    for e in task["extra"]["funsize"]["partials"]:
> -        for mar in (e["from_mar"], e["to_mar"]):
> -            verify_allowed_url(mar)
> +        verify_allowed_url(mar)
> -        work_env = WorkEnv()
> +    # work_env = WorkEnv()

Nit: Commented out code.

::: taskcluster/docker/funsize-update-generator/scripts/
(Diff revision 1)
> +
>      manifest_file = os.path.join(args.artifacts_dir, "manifest.json")
>      with open(manifest_file, "w") as fp:
>          json.dump(manifest, fp, indent=2, sort_keys=True)
> +    print(json.dumps(manifest, indent=2, sort_keys=True))

Nit: Is it a leftover of a debugging sessions? If not, let's `log` it with a message, instead.
Attachment #8939537 - Flags: review?(jlorenzo) → review+
Pushed by
Update funsize to use async, to reduce task time r=jlorenzo
Backout by
Backed out 1 changesets for multiple linting failures in r=backout on a CLOSED TREE
Pushed by
Update funsize to use async, to reduce task time r=jlorenzo
Closed: 3 years ago
Resolution: --- → FIXED
See Also: → 1430535
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.