create a client that queries relengapi archiver endpoint

RESOLVED FIXED

Status

Release Engineering
Applications: MozharnessCore
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: jlund, Assigned: jlund)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 3 obsolete attachments)

(Assignee)

Description

3 years ago
this should effectively replace all the places where we call repository_manifest.py[1] and any services we clone/pull mozharness now.

The client:
    1) obtains a valid s3 url for archive via relengapi's archiver endpoint
    2) downloads and extracts the archive locally
    3) merges contents of extracted archive into destination, overwriting duplicate paths (this is so we do not overwrite the work dir normally inside ('mozharness/build')
    4) cleans up archive and leftover extracted paths

though somewhat cumbersome, my hope is that it can serve as a 1 line script call for all places we currently deploy mozharness acting as a single source for interacting with relengapi

[1] http://hg.mozilla.org/build/tools/file/1bc622c35872/buildfarm/utils/repository_manifest.py
(Assignee)

Comment 1

3 years ago
Created attachment 8622920 [details] [diff] [review]
150616_archiver_client-tools.patch

dustin, I've asked for r? of you simply because you are reviewing the relengapi endpoint associated with this client. Though, I can easily pass it on to someone else if you prefer. :)
Attachment #8622920 - Flags: review?(dustin)
Comment on attachment 8622920 [details] [diff] [review]
150616_archiver_client-tools.patch

Review of attachment 8622920 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but I'd like to see
 * errors handled with exceptions
 * extract the tarball in-place
 * don't read the entire tarball into memory

::: buildfarm/utils/archiver_client.py
@@ +29,5 @@
> +# When an infra error happens we want to turn purple and
> +# let sheriffs determine if re-triggering is needed
> +INFRA_CODE = 3
> +
> +EXIT_CODE = FAILURE_CODE

Unused

@@ +35,5 @@
> +ENDPOINT_CONFIG = {
> +    'mozharness': {
> +        'url_format': "archiver/{endpoint}/{rev}?repo={repo}&region={region}&suffix={suffix}",
> +        'archive_root': "{repo}-{rev}",
> +        'archive_subdir': "testing/mozharness",

Can you explain what these mean in a comment?  It's not immediately clear from the code

@@ +107,5 @@
> +                      "archiver logs for errors. Task status: {}".format(task_result['status']))
> +    else:
> +        log.error("Archiver's task could not be resolved. Check archiver logs for errors. Task "
> +                  "status: {}".format(task_result['status']))
> +    return response

Hm, and this returns None to signal an error..

@@ +158,5 @@
> +def download_and_extract_archive(response, archive_name):
> +    log.info("downloading archive: {}".format(archive_name))
> +    try:
> +        with open(archive_name, 'wb') as archivef:
> +            archivef.write(response.read())

This loads the entire response into memory, then writes it back out.  That's probably not a big deal for mozharness, which is about 60M, but could lead to an unhappy surprise if this gets re-used for a different, larger repo archive.  The fix is pretty easy -- just use `shutil.copyfileobj`

In fact, since you only traverse the tarfile once (and thus don't need to seek in the file), you can actually hook it right up to Tarfile:

tar = tarfile.open(fileobj=archivef, mode='r|gz')

@@ +166,5 @@
> +
> +    log.info("extracting downloaded archive: {}".format(archive_name))
> +    try:
> +        with tarfile.open(archive_name) as tar:
> +            tar.extractall()

Why extract the entire tarball, then move it?  Especially on Windows systems, creating lots of files is pretty slow, so this will waste build time.

The `tarfile` module is a little weird, and doesn't let you specify a path to extract a file to a path that does not contain the filename from the tarball.  I think you could work around this with something like (assuming archive_subdir is /-terminated)

  for member in tar:
    if not member.name.startswith(archive_subdir):
      continue
    member.name = member.name.replace(archive_subdir, '')
    tar.extract(member, extracted_destination)

Another option is to use `tar.makefile`, which does exactly what you want, and is documented in the source file, but not in the documentation:
  https://hg.python.org/cpython/file/2.7/Lib/tarfile.py#l2220

Either one would get you the merging functionality you want (at least, as I understand the requirement)

@@ +221,5 @@
> +    except IOError, e:
> +        log.exception("IOError: {}".format(e))
> +        return INFRA_CODE
> +    except shutil.Error as e:
> +        log.exception("shutil.Error: {}".format(e))

`log.exception` will include the exception after the log message, so there's no need to format it into the message as well

@@ +249,5 @@
> +    response = get_url_response(url, options)
> +
> +    return_code = download_and_extract_archive(response, archive_path)
> +
> +    if return_code == SUCCESS_CODE:

You could avoid nesting here by inverting the check:

  if return_code != SUCCESS_CODE:
    return return_code

although it would be far more Pythonic to use exceptions for error handling, rather than integer error codes.

@@ +305,5 @@
> +
> +    if not ENDPOINT_CONFIG.get(endpoint):
> +        log.error("endpoint argument is unknown. "
> +                  "Given: '{}', Valid: {}".format(endpoint, str(ENDPOINT_CONFIG.keys())))
> +        exit(FAILURE_CODE)

This stanza should probably move up with the "if not len(args) == 1" bit in `options_args`
Attachment #8622920 - Flags: review?(dustin) → feedback+
(Assignee)

Updated

3 years ago
Assignee: nobody → jlund
(Assignee)

Comment 3

3 years ago
Created attachment 8626620 [details] [diff] [review]
150626_archiver_client-tools.patch

interdiff from previous patch: http://people.mozilla.org/~jlund/150626_archiver_client-tools-interdiff.patch

addresses issues in https://bugzilla.mozilla.org/show_bug.cgi?id=1175047#c2

adheres to new endpoint in https://github.com/mozilla/build-relengapi/pull/279
Attachment #8626620 - Flags: review?(dustin)
Comment on attachment 8626620 [details] [diff] [review]
150626_archiver_client-tools.patch

Review of attachment 8626620 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good - just a few minor bits

::: buildfarm/utils/archiver_client.py
@@ +92,5 @@
> +    if task_result.get('state') == "SUCCESS":
> +        if task_result.get('s3_urls'):
> +            # grab a s3 url using the preferred region if available
> +            s3_url = task_result['s3_urls'].get(options.region, task_result['s3_urls'].values()[0])
> +            print s3_url

debug print

@@ +132,5 @@
> +                log.debug("got a bad response. response code: {}".format(response.code))
> +
> +        except (urllib2.HTTPError, urllib2.URLError) as e:
> +            if num == options.max_retries - 1:
> +                log.exception(e)

No need to even pass `e` to `log.exception` - it will use the current exception (from `sys.exc_info()`).  As written, this might fail since `e` is not a string.

@@ +158,5 @@
> +    # make sure our extracted root path has a trailing slash
> +    extract_root = os.path.join(extract_root, '')
> +
> +    try:
> +        with tarfile.open(fileobj=response, mode='r|gz') as tar:

woo!
Attachment #8626620 - Flags: review?(dustin) → review+
(Assignee)

Comment 5

3 years ago
Created attachment 8627339 [details] [diff] [review]
150629_archiver_client-tools-final.patch

carrying dustin's r+

addressing comments. interdiff:
http://people.mozilla.org/~jlund/150629_archiver_client-tools-interdiff.patch
Attachment #8627339 - Flags: review+
(Assignee)

Updated

3 years ago
Attachment #8626620 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8622920 - Attachment is obsolete: true
(Assignee)

Comment 6

3 years ago
Comment on attachment 8627339 [details] [diff] [review]
150629_archiver_client-tools-final.patch

pushed to prod:

    https://hg.mozilla.org/build/tools/rev/99d3763dd131

with one more line change:

diff --git a/buildfarm/utils/archiver_client.py b/buildfarm/utils/archiver_client.py
index 67fac98..b92b7da 100755
--- a/buildfarm/utils/archiver_client.py
+++ b/buildfarm/utils/archiver_client.py
@@ -132,8 +132,7 @@ def get_url_response(url, options):

         except (urllib2.HTTPError, urllib2.URLError) as e:
             if num == options.max_retries - 1:
-                log.exception("Could not get a valid response from archiver endpoint. "
-                              "Status response code: {}".format(response.code))
+                log.exception("Could not get a valid response from archiver endpoint.")
                 exit(INFRA_CODE)
         num += 1
(Assignee)

Comment 7

3 years ago
landed on tools for a while. used along with relengapi endpoint here: https://bugzilla.mozilla.org/show_bug.cgi?id=1154795#c18

looks like this is all done for now. Thank you for all the reviews
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
(Assignee)

Comment 8

3 years ago
Created attachment 8628993 [details] [diff] [review]
150702_archiver_client-tools.patch

turns out that this client didn't work so well in buildbot staging across all our platforms (*shocking*).

1) first issue was the python interpreter I was using is 2.6. I believe we have 2.7 across all slaves and I could have explicitly set the path but I figured there is no harm in a small script like this being 2.6 compliant and I'm sure I'll run into other factories that don't always use 2.7
    a) so it wasn't a fan of str.format().
    b) or my with statement for opening the archive

2) for bonus fun, os.path.join() on windows will try to use back slashes. tarfile's member paths on the other hand use posix style forward slashes. I'd imagine that's how the paths look within the tar archive so when I go to extract members that match, startswith() gets confused.
Attachment #8628993 - Flags: review?(dustin)
(Assignee)

Updated

3 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8628993 - Flags: review?(dustin) → review+
(Assignee)

Comment 9

3 years ago
I will come back and remove some of the really loud unneeded debug lines once this is used across all trees
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
(Assignee)

Comment 10

3 years ago
Created attachment 8633813 [details] [diff] [review]
adds more debug lines to api responses and removes tar unpacking debug lines
Attachment #8633813 - Flags: review?(dustin)
Comment on attachment 8633813 [details] [diff] [review]
adds more debug lines to api responses and removes tar unpacking debug lines

Review of attachment 8633813 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/utils/archiver_client.py
@@ +101,4 @@
>              exit(FAILURE_CODE)
>      else:
>          log.error("Archiver's task could not be resolved. Check archiver logs for errors. Task "
> +                  "status: %s Task state:", task_result['status'], task_result['state'])

You removed the % subsitution here but left a %s in
Attachment #8633813 - Flags: review?(dustin) → review-
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 12

3 years ago
Created attachment 8634413 [details] [diff] [review]
150715_archiver_client-tools.patch

doh, silly mistake. fixed

also truncates rev from client side to a 12 char limit as per relengapi PR comment
Attachment #8633813 - Attachment is obsolete: true
Attachment #8634413 - Flags: review?(dustin)
Attachment #8634413 - Flags: review?(dustin) → review+
(Assignee)

Updated

3 years ago
Attachment #8634413 - Flags: checked-in+
(Assignee)

Updated

3 years ago
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.