Closed Bug 1175047 Opened 9 years ago Closed 9 years ago

create a client that queries relengapi archiver endpoint

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlund, Assigned: jlund)

References

Details

Attachments

(3 files, 3 obsolete files)

this should effectively replace all the places where we call repository_manifest.py[1] and any services we clone/pull mozharness now. The client: 1) obtains a valid s3 url for archive via relengapi's archiver endpoint 2) downloads and extracts the archive locally 3) merges contents of extracted archive into destination, overwriting duplicate paths (this is so we do not overwrite the work dir normally inside ('mozharness/build') 4) cleans up archive and leftover extracted paths though somewhat cumbersome, my hope is that it can serve as a 1 line script call for all places we currently deploy mozharness acting as a single source for interacting with relengapi [1] http://hg.mozilla.org/build/tools/file/1bc622c35872/buildfarm/utils/repository_manifest.py
dustin, I've asked for r? of you simply because you are reviewing the relengapi endpoint associated with this client. Though, I can easily pass it on to someone else if you prefer. :)
Attachment #8622920 - Flags: review?(dustin)
Comment on attachment 8622920 [details] [diff] [review] 150616_archiver_client-tools.patch Review of attachment 8622920 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, but I'd like to see * errors handled with exceptions * extract the tarball in-place * don't read the entire tarball into memory ::: buildfarm/utils/archiver_client.py @@ +29,5 @@ > +# When an infra error happens we want to turn purple and > +# let sheriffs determine if re-triggering is needed > +INFRA_CODE = 3 > + > +EXIT_CODE = FAILURE_CODE Unused @@ +35,5 @@ > +ENDPOINT_CONFIG = { > + 'mozharness': { > + 'url_format': "archiver/{endpoint}/{rev}?repo={repo}&region={region}&suffix={suffix}", > + 'archive_root': "{repo}-{rev}", > + 'archive_subdir': "testing/mozharness", Can you explain what these mean in a comment? It's not immediately clear from the code @@ +107,5 @@ > + "archiver logs for errors. Task status: {}".format(task_result['status'])) > + else: > + log.error("Archiver's task could not be resolved. Check archiver logs for errors. Task " > + "status: {}".format(task_result['status'])) > + return response Hm, and this returns None to signal an error.. @@ +158,5 @@ > +def download_and_extract_archive(response, archive_name): > + log.info("downloading archive: {}".format(archive_name)) > + try: > + with open(archive_name, 'wb') as archivef: > + archivef.write(response.read()) This loads the entire response into memory, then writes it back out. That's probably not a big deal for mozharness, which is about 60M, but could lead to an unhappy surprise if this gets re-used for a different, larger repo archive. The fix is pretty easy -- just use `shutil.copyfileobj` In fact, since you only traverse the tarfile once (and thus don't need to seek in the file), you can actually hook it right up to Tarfile: tar = tarfile.open(fileobj=archivef, mode='r|gz') @@ +166,5 @@ > + > + log.info("extracting downloaded archive: {}".format(archive_name)) > + try: > + with tarfile.open(archive_name) as tar: > + tar.extractall() Why extract the entire tarball, then move it? Especially on Windows systems, creating lots of files is pretty slow, so this will waste build time. The `tarfile` module is a little weird, and doesn't let you specify a path to extract a file to a path that does not contain the filename from the tarball. I think you could work around this with something like (assuming archive_subdir is /-terminated) for member in tar: if not member.name.startswith(archive_subdir): continue member.name = member.name.replace(archive_subdir, '') tar.extract(member, extracted_destination) Another option is to use `tar.makefile`, which does exactly what you want, and is documented in the source file, but not in the documentation: https://hg.python.org/cpython/file/2.7/Lib/tarfile.py#l2220 Either one would get you the merging functionality you want (at least, as I understand the requirement) @@ +221,5 @@ > + except IOError, e: > + log.exception("IOError: {}".format(e)) > + return INFRA_CODE > + except shutil.Error as e: > + log.exception("shutil.Error: {}".format(e)) `log.exception` will include the exception after the log message, so there's no need to format it into the message as well @@ +249,5 @@ > + response = get_url_response(url, options) > + > + return_code = download_and_extract_archive(response, archive_path) > + > + if return_code == SUCCESS_CODE: You could avoid nesting here by inverting the check: if return_code != SUCCESS_CODE: return return_code although it would be far more Pythonic to use exceptions for error handling, rather than integer error codes. @@ +305,5 @@ > + > + if not ENDPOINT_CONFIG.get(endpoint): > + log.error("endpoint argument is unknown. " > + "Given: '{}', Valid: {}".format(endpoint, str(ENDPOINT_CONFIG.keys()))) > + exit(FAILURE_CODE) This stanza should probably move up with the "if not len(args) == 1" bit in `options_args`
Attachment #8622920 - Flags: review?(dustin) → feedback+
Assignee: nobody → jlund
Comment on attachment 8626620 [details] [diff] [review] 150626_archiver_client-tools.patch Review of attachment 8626620 [details] [diff] [review]: ----------------------------------------------------------------- Looks good - just a few minor bits ::: buildfarm/utils/archiver_client.py @@ +92,5 @@ > + if task_result.get('state') == "SUCCESS": > + if task_result.get('s3_urls'): > + # grab a s3 url using the preferred region if available > + s3_url = task_result['s3_urls'].get(options.region, task_result['s3_urls'].values()[0]) > + print s3_url debug print @@ +132,5 @@ > + log.debug("got a bad response. response code: {}".format(response.code)) > + > + except (urllib2.HTTPError, urllib2.URLError) as e: > + if num == options.max_retries - 1: > + log.exception(e) No need to even pass `e` to `log.exception` - it will use the current exception (from `sys.exc_info()`). As written, this might fail since `e` is not a string. @@ +158,5 @@ > + # make sure our extracted root path has a trailing slash > + extract_root = os.path.join(extract_root, '') > + > + try: > + with tarfile.open(fileobj=response, mode='r|gz') as tar: woo!
Attachment #8626620 - Flags: review?(dustin) → review+
Attachment #8626620 - Attachment is obsolete: true
Attachment #8622920 - Attachment is obsolete: true
Comment on attachment 8627339 [details] [diff] [review] 150629_archiver_client-tools-final.patch pushed to prod: https://hg.mozilla.org/build/tools/rev/99d3763dd131 with one more line change: diff --git a/buildfarm/utils/archiver_client.py b/buildfarm/utils/archiver_client.py index 67fac98..b92b7da 100755 --- a/buildfarm/utils/archiver_client.py +++ b/buildfarm/utils/archiver_client.py @@ -132,8 +132,7 @@ def get_url_response(url, options): except (urllib2.HTTPError, urllib2.URLError) as e: if num == options.max_retries - 1: - log.exception("Could not get a valid response from archiver endpoint. " - "Status response code: {}".format(response.code)) + log.exception("Could not get a valid response from archiver endpoint.") exit(INFRA_CODE) num += 1
landed on tools for a while. used along with relengapi endpoint here: https://bugzilla.mozilla.org/show_bug.cgi?id=1154795#c18 looks like this is all done for now. Thank you for all the reviews
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
turns out that this client didn't work so well in buildbot staging across all our platforms (*shocking*). 1) first issue was the python interpreter I was using is 2.6. I believe we have 2.7 across all slaves and I could have explicitly set the path but I figured there is no harm in a small script like this being 2.6 compliant and I'm sure I'll run into other factories that don't always use 2.7 a) so it wasn't a fan of str.format(). b) or my with statement for opening the archive 2) for bonus fun, os.path.join() on windows will try to use back slashes. tarfile's member paths on the other hand use posix style forward slashes. I'd imagine that's how the paths look within the tar archive so when I go to extract members that match, startswith() gets confused.
Attachment #8628993 - Flags: review?(dustin)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8628993 - Flags: review?(dustin) → review+
I will come back and remove some of the really loud unneeded debug lines once this is used across all trees
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Comment on attachment 8633813 [details] [diff] [review] adds more debug lines to api responses and removes tar unpacking debug lines Review of attachment 8633813 [details] [diff] [review]: ----------------------------------------------------------------- ::: buildfarm/utils/archiver_client.py @@ +101,4 @@ > exit(FAILURE_CODE) > else: > log.error("Archiver's task could not be resolved. Check archiver logs for errors. Task " > + "status: %s Task state:", task_result['status'], task_result['state']) You removed the % subsitution here but left a %s in
Attachment #8633813 - Flags: review?(dustin) → review-
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
doh, silly mistake. fixed also truncates rev from client side to a 12 char limit as per relengapi PR comment
Attachment #8633813 - Attachment is obsolete: true
Attachment #8634413 - Flags: review?(dustin)
Attachment #8634413 - Flags: review?(dustin) → review+
Attachment #8634413 - Flags: checked-in+
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: