Status

enhancement
RESOLVED FIXED
2 years ago
Last year

People

(Reporter: sfraser, Assigned: sfraser, Mentored)

Tracking

(Blocks 1 bug, {good-first-bug})

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox59 fixed)

Details

Attachments

(2 attachments)

Funsize still uses python2, we should be able to get some run time gains by swapping to python3 and asynchronously downloading the MARs, and running the external shell commands.
Blocks: funsize
The update generator is in-tree:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/docker/funsize-update-generator

The Makefile and README have container build and local running examples

scripts/funsize.py needs updating to Python3. Asynchronous downloads and external calls should reduce the run time, too.

At the moment, funsize does the following:

For each partial to build:
    for that partial's from_mar and to_mar:
         download the mar
         call an external unwrapper
         call clamscan 

We could instead find the set of MARs to download, and run the download/unwrap/clamscan calls in async functions, allowing them to run concurrently. Then once all are present, run each external make_incremental_update.sh asynchonously, too.
Mentor: sfraser
Keywords: good-first-bug

Comment 2

2 years ago
Would love to take this on if you could assign it to me -- can you give me an idea on how to get started?
Flags: needinfo?(sfraser)
Hi nikshepsvn, I apologise - I did the work for this just before Christmas, and I'd forgotten to take the bug. There's still improvement that can be done afterwards, but it'd be a bit more open ended, so it's up to you whether you'd want to look at this component, or we can work through something else. Perhaps we can find something similar to work on?
Flags: needinfo?(sfraser)
Assignee: nobody → sfraser
Comment hidden (mozreview-request)

Comment 6

2 years ago
mozreview-review
Comment on attachment 8939537 [details]
Bug 1401995 Update funsize to use async, to reduce task time

https://reviewboard.mozilla.org/r/209862/#review215420

It looks good to me overall. I'm concerned about `redo` and `resp`. Feel free to ask me for another round of review, if the changes are too heavy.

::: taskcluster/docker/funsize-update-generator/runme.sh:21
(Diff revision 1)
>  # -> bucket of tc-gp-private-1d-us-east-1, path of releng/mbsdiff-cache/
>  # Trailing slash is important, due to prefix permissions in S3.
>  S3_BUCKET_AND_PATH=$(jq -r '.scopes[] | select(contains ("auth:aws-s3"))' /home/worker/task.json | awk -F: '{print $4}')
>  
>  # Will be empty if there's no scope for AWS S3.
> -if [ -n "${S3_BUCKET_AND_PATH}" ]; then
> +if [ -n "${S3_BUCKET_AND_PATH}" ] && getent hosts taskcluster

TIL `getent`

::: taskcluster/docker/funsize-update-generator/scripts/funsize.py:83
(Diff revision 1)
>      r.raise_for_status()
>      return r.json().get('secret', {})
>  
>  
>  @redo.retriable()
> -def download(url, dest, mode=None):
> +async def download(url, dest, mode=None):  # noqa: E999

IIUC, redo won't work anymore on this async function https://github.com/mozilla-releng/redo/blob/6fdb1474b4b4f960cbc1a75f1c701f9951f8d5b8/redo/__init__.py#L162

::: taskcluster/docker/funsize-update-generator/scripts/funsize.py:99
(Diff revision 1)
> +                        break
> -            fd.write(chunk)
> +                    fd.write(chunk)
> -            bytes_downloaded += len(chunk)
> +                    bytes_downloaded += len(chunk)
>  
>      log.debug('Downloaded %s bytes', bytes_downloaded)
> -    if 'content-length' in r.headers:
> +    if 'content-length' in resp.headers:

`resp` shoud be undefined in this scope. I wonder if that works in the logs[1], because the ContextManager is still open in a different async context.

[1] https://github.com/mozilla-releng/redo/blob/6fdb1474b4b4f960cbc1a75f1c701f9951f8d5b8/redo/__init__.py#L162

::: taskcluster/docker/funsize-update-generator/scripts/funsize.py:128
(Diff revision 1)
> +    if not stderr:
> +        stderr = ""
> +    if not stdout:
> +        stdout = ""
> +
> +    for line in stdout.splitlines() + stderr.splitlines():

Should we log stderr in `log.warn` or `log.error` instead of `log.debug`?

::: taskcluster/docker/funsize-update-generator/scripts/funsize.py:150
(Diff revision 1)
> +    await run_command(cmd, cwd=dest_dir, env=env, label=dest_dir)
> +
>  
>  def find_file(directory, filename):
>      log.debug("Searching for %s in %s", filename, directory)
> -    for root, dirs, files in os.walk(directory):
> +    for root, _, files in os.walk(directory):

clean_up++!

::: taskcluster/docker/funsize-update-generator/scripts/funsize.py:206
(Diff revision 1)
>  
> -    def setup(self):
> -        self.download_unwrap()
> -        self.download_martools()
> +    async def setup(self):
> +        await self.download_unwrap()
> +        await self.download_martools()
> +
> +    async def clone(self, workenv):

Nit: I don't think this function needs to be async.

::: taskcluster/docker/funsize-update-generator/scripts/funsize.py:257
(Diff revision 1)
> -    manifest = []
> -    for e in task["extra"]["funsize"]["partials"]:
> -        for mar in (e["from_mar"], e["to_mar"]):
> -            verify_allowed_url(mar)
> +        verify_allowed_url(mar)
>  
> -        work_env = WorkEnv()
> +    # work_env = WorkEnv()

Nit: Commented out code.

::: taskcluster/docker/funsize-update-generator/scripts/funsize.py:466
(Diff revision 1)
> +
>      manifest_file = os.path.join(args.artifacts_dir, "manifest.json")
>      with open(manifest_file, "w") as fp:
>          json.dump(manifest, fp, indent=2, sort_keys=True)
>  
> +    print(json.dumps(manifest, indent=2, sort_keys=True))

Nit: Is it a leftover of a debugging sessions? If not, let's `log` it with a message, instead.
Attachment #8939537 - Flags: review?(jlorenzo) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 10

2 years ago
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2015e2842ca
Update funsize to use async, to reduce task time r=jlorenzo

Comment 11

2 years ago
Backout by ccoroiu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7565fa6f3481
Backed out 1 changesets for multiple linting failures in funsize.py r=backout on a CLOSED TREE
Comment hidden (mozreview-request)

Comment 13

2 years ago
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a98402f998c4
Update funsize to use async, to reduce task time r=jlorenzo

Comment 14

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a98402f998c4
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Blocks: 1430600
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.