Closed Bug 858635 Opened 11 years ago Closed 11 years ago

Use system-wide tooltool cache

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: sbruno)

References

Details

Attachments

(7 files, 3 obsolete files)

13.29 KB, patch
rail
: review+
Details | Diff | Splinter Review
1.39 KB, patch
rail
: review+
Details | Diff | Splinter Review
12.19 KB, patch
sbruno
: review+
Details | Diff | Splinter Review
704 bytes, patch
sbruno
: review+
Details | Diff | Splinter Review
12.50 KB, patch
Details | Diff | Splinter Review
12.50 KB, patch
rail
: review+
Details | Diff | Splinter Review
12.26 KB, patch
Details | Diff | Splinter Review
It would be great to have system-wide tooltool cache, so instead of downloading to pwd it woud download somewhere else (/builds/tooltool-cache).

A couple of ideas:

* it should support --cache optional argument (download to PWD if it's not specified)
* clobberer should clean up old files (tooltool may "touch" the files it used)
Assignee: nobody → sbruno
I implemented a caching mechanism on top of the changes for Bug 768123.

The changes are currently tracked here: https://github.com/simone-mozilla/tooltool/commits/bug_858635.

If a cache folder is specified in the command line via the newly added -c parameter, tooltool checks whether the requested file is in the local cache: if it is, there is no need to connect to any remote service for fetching (file is retrieved from cache); if it is not present in the cache, tooltool tries to fetch from remote resources and the cache is also updated with the newly added files. This is somehow similar to how Maven manages a local artifacts repository.

Example usage and resulting output:

Simones-MacBook-Pro:tooltool sbruno$ cat manifest.tt
[
{
    "size": 47,
    "digest": "2005a41fe97a5e00997063705f39d42b6a43b1cf7ba306cbc7b1513de34cdcd050fc6326efa2107f19ba0cc67914745dbf13154fa748010a93cf072481ef4aaa",
    "algorithm": "sha512",
    "filename": "test.file"
}
]
Simones-MacBook-Pro:tooltool sbruno$ python  tooltool.py fetch --url http://tooltool.pub.build.mozilla.org/temp-sm-stuff -c ~/cache ; echo $?
CRITICAL - Folder "/Users/sbruno/cache" does not exist and cannot therefore be used as a local cache
1
Simones-MacBook-Pro:tooltool sbruno$ mkdir ~/cache
Simones-MacBook-Pro:tooltool sbruno$ python  tooltool.py fetch --url http://tooltool.pub.build.mozilla.org/temp-sm-stuff -c ~/cache ; echo $?
INFO - File test.file not present in local cache folder /Users/sbruno/cache
INFO - Attempting to fetch from 'http://tooltool.pub.build.mozilla.org/temp-sm-stuff'...
INFO - Success! File test.file fetched from http://tooltool.pub.build.mozilla.org/temp-sm-stuff
INFO - Updating local cache /Users/sbruno/cache..
INFO - Local cache /Users/sbruno/cache updated with test.file
0
Simones-MacBook-Pro:tooltool sbruno$ rm test.file
Simones-MacBook-Pro:tooltool sbruno$ python  tooltool.py fetch --url http://tooltool.pub.build.mozilla.org/temp-sm-stuff -c ~/cache ; echo $?
INFO - File test.file retrieved from local cache /Users/sbruno/cache
0
Attachment #740371 - Flags: review?(rail)
Depends on: 768123
Comment on attachment 740371 [details] [diff] [review]
Patch to add caching mechanism on top of changes for bug_768123

Review of attachment 740371 [details] [diff] [review]:
-----------------------------------------------------------------

The approach looks good to me. See my comments inline.

Additionally, you'll need to implement a mechanism for purging old files, so they don't eat disk space. See how it's implemented for hg shares: http://hg.mozilla.org/build/tools/file/b0c230b5b218/buildfarm/maintenance/purge_builds.py#l135.

::: tooltool.py
@@ +405,5 @@
> +    # case 2: file already in cache
> +    if cache_folder:
> +        try:
> +            shutil.copy(os.path.join(cache_folder, file_record.digest), os.path.join(os.getcwd(), file_record.filename))
> +            log.info("File %s retrieved from local cache %s" % (file_record.filename, cache_folder))

I liked that you don't check if file exists before copying. It's just a useless check in this case because it doesn't saves you from try/except due to possible race condition.

You still need to check "if file_record.validate()" here to make sure that the file retrieved from cache is valid.

The file in cache needed to be "touched" (os.utime) to mark its last access time. Since atime may be disabled on some file systems, you'll need to modify file's mtime. You may want to add a function, because it'll be used in case#3 as well.

@@ +450,5 @@
> +    # I am managing a cache and a new file has just been retrieved from a remote location
> +    if cache_folder and fetched:
> +        log.info("Updating local cache %s.." % cache_folder)
> +        try:
> +            shutil.copy(os.path.join(os.getcwd(), file_record.filename), os.path.join(cache_folder, file_record.digest))

make sure to "touch" the file in the cache here as well.

@@ +506,5 @@
> +
> +    #better to check early and exit in case the provided cache folder does not exist
> +    if options.get('cache_folder'):
> +        if (not os.path.exists(options.get('cache_folder')) or not os.path.isdir(options.get('cache_folder'))):
> +            log.critical('Folder "%s" does not exist and cannot therefore be used as a local cache' % options.get('cache_folder'))

I'd try to create the folder first and make it safe (0700), then fail. Otherwise we'll need to create the folder on all machines using puppet/whatever_windows_uses.
Attachment #740371 - Flags: review?(rail) → review-
Thanks Rail!

Only one note: I did not add a validation when the file is retrieved from the cache because there's already a validation for all files (however retrieved) around line 485. I'll probably refactor slightly the validations anyway so that they are not performed in different parts of the code.
According to the comments added previously to this bug, the following changes have been implemented:
- file validation has been slightly refactored
- cache folder is now created when the path provided as a parameter does not point to an existing folder
- a brand new purge command has been added to manage cache cleanup, based on two parameters:
  1)-s, --size: allows to specify how much free space is desired
  2) --max-age: all files older (according to mtime) than the specified number of days will be deleted
- if none of the mentioned two parameters is passed to the purge command, all files in the cache will be removed

Changes are tracked in https://github.com/simone-mozilla/tooltool.git (branch bug_858635); this patch is supposed to be applied on top of the patch for bug_768123 and it's been generated with command:

git diff -U8 3bb9a01b97b0c487c73e3a9702a32d5f6f65111b e47f430edfad31355da9b9cc4fa38226020726b8 > patch.diff
Attachment #740371 - Attachment is obsolete: true
Attachment #743576 - Flags: review?(rail)
Comment on attachment 743576 [details] [diff] [review]
Adding cache and related purge mechanism

Review of attachment 743576 [details] [diff] [review]:
-----------------------------------------------------------------

In overall it looks fine. See my comments regarding the "purge" part below.

::: test.sh
@@ +47,5 @@
>      fi
>  }
>  
>  mkdir -p testdir && cd testdir
> +tt="../tooltool.py --url $TEST_BASE_URL -v "

a nit, remove the trailing space in quotes

::: tooltool.py
@@ +380,5 @@
>  
>  
> +# used to modify mtime in cached files
> +# mtime is used by the purge command
> +def touch(f):

Can you move the comments under the function definition into a docstring?

@@ +595,5 @@
>      if cmd == 'validate':
>          return validate_manifest(options['manifest'])
>      elif cmd == 'add':
>          return add_files(options['manifest'], options['algorithm'], cmd_args)
> +    elif cmd == 'purge':

The "purge" part should be moved to purge_builds.py for various reasons:
* No every build uses tooltool, but they should be able to purge files
* a lot of (possible) duplication of code
Attachment #743576 - Flags: review?(rail) → review-
Thanks rail for the review. My favorite part of course was: "In overall it looks fine." :-)

My misunderstanding - I thought you provided the purge_builds.py script just as an example to look at, and I implemented everything within tooltool.

I will port the purge logic to purge_builds.py and upload a new patch.
After a quick chat with Rail and John Hopkins, we decided to keep the purge logic within tooltool. Changes will be added to the purge_builds.py script as well to trigger the tooltool purge where appropriate.
This new patch also incorporates the two extra cleanup suggestions made by Rail.

https://github.com/simone-mozilla/tooltool/commits/bug_858635

git diff -U8 3bb9a01b97b0c487c73e3a9702a32d5f6f65111b 4914214fa6b2f1b82feb2f4a5d8d42d6999e652c  > patch.diff
Attachment #743576 - Attachment is obsolete: true
Attachment #752888 - Flags: review?(rail)
Comment on attachment 752888 [details] [diff] [review]
Adding cache and related purge mechanism

Review of attachment 752888 [details] [diff] [review]:
-----------------------------------------------------------------

We talked about this in person last week and decided to drop max_age feature to simplify the logic. Additionally, there are my comments for the current implementation. Feel free to request a review whenever the new version is ready.

::: tooltool.py
@@ +416,5 @@
> +            log.info("File %s retrieved from local cache %s" % (file_record.filename, cache_folder))
> +            touch(os.path.join(cache_folder, file_record.digest))
> +            return True
> +        except IOError:
> +            log.info("File %s not present in local cache folder %s" % (file_record.filename, cache_folder))

touch() may raise an exception, what doesn't mean you need to refetch the file. Can you try/except in touch(), warn and not fail? In this case "return True" above would work as expected.

@@ +452,5 @@
>              log.info("failed to write to '%s'" % file_record.filename, exc_info=True)
> +
> +    # I am managing a cache and a new file has just been retrieved from a remote location
> +    if cache_folder and fetched:
> +        log.info("Updating local cache %s.." % cache_folder)

a nit: 3 dots

@@ +528,5 @@
> +    If both parameters are 0, a full cleanup is performed.
> +    No recursive deletion of files in subfolder is performed."""
> +
> +    gigs *= 1024 * 1024 * 1024
> +    threshold = time.time() - max_age * 24 * 60 * 60 * 1.

Can you use "1.0" instead of "1." if you need a float here?

@@ +547,5 @@
> +
> +    enough_free_space_yet = False
> +    found_youngest_file_still_older_than_max_age_yet = False
> +    # since 'files' has been ordered, iteration is from old to new files
> +    while files:

this loop have to check all files in the directory, but you can break early if you, for example, set gigs and you reached needed free space.
Attachment #752888 - Flags: review?(rail)
This changes enables tooltool purge when two environment variables are set: TOOLTOOL_HOME (the folder containing tooltool.py) and TOOLTOOL_CACHE (the tooltool cache folder to be cleaned up).
It dynamically imports the purge method from tooltool (instead of spawning a new process from python to invoke the python interpreter to run tooltool as a stand-alone script).
The validity of this solution is based on the assumption that no changes will be implemented in the signature of the tooltool purge method, which seems to me a reasonable assumption. In case the signature will change in future, this code will need to be changed accordingly (A dependency on tooltool interface changes would exist also in case of invocation as stand alone command line tool, though).
Attachment #758506 - Flags: review?(rail)
Attachment #758509 - Flags: review?(rail)
Additional quick note: the purge_builds.py changes have been tested running the introduced line as a separate test script after setting TOOLTOOL_HOME and TOOLTOOL_CACHE env variables.
This is a patch on top of your patch with some changes. If you accept it, count it as r+ for your patch. Comments for the changes I made incoming.
Attachment #758672 - Flags: review?(sbruno)
Comment on attachment 758672 [details] [diff] [review]
Patch for "Patch including fixes based on rail's review - part I: tooltool.py"

Review of attachment 758672 [details] [diff] [review]:
-----------------------------------------------------------------

::: tooltool.py
@@ -28,5 @@
>  import hashlib
>  import urllib2
>  import shutil
>  import sys
> -import time

Not used

@@ +382,5 @@
>      """Used to modify mtime in cached files;
>      mtime is used by the purge command"""
>      try:
>          os.utime(f, None)
> +    except OSError:

except: would catch KeyboardInterrup (kill).

@@ +415,5 @@
>      if cache_folder:
>          try:
> +            shutil.copy(os.path.join(cache_folder, file_record.digest),
> +                        os.path.join(os.getcwd(), file_record.filename))
> +            log.info("File %s retrieved from local cache %s" %

PEP8!

@@ +462,5 @@
>          log.info("Updating local cache %s..." % cache_folder)
>          try:
> +            if not os.path.exists(cache_folder):
> +                log.info("Creating cache in %s..." % cache_folder)
> +                os.makedirs(cache_folder, 0700)

this is the only place in the code which relies on cache folder existence, so I moved it here to improve readability.

@@ +502,5 @@
>      # Even if we get the file, lets ensure that it matches what the
>      # manifest specified
>      for localfile in fetched_files:
>          if not localfile.validate():
> +            failed_files.append(localfile.filename)

Old standing typo!

@@ +515,3 @@
>  
> +def freespace(p):
> +    "Returns the number of bytes free under directory `p`"

This makes pylint and other tools happier.

@@ +537,5 @@
>      """If gigs is non 0, it deletes files in `folder` until `gigs` GB are free, starting from older files.
>      If gigs is 0, a full purge will be performed.
>      No recursive deletion of files in subfolder is performed."""
>  
> +    full_purge = bool(gigs == 0)

full_purge is easier to read than "not gigs" in conditions

@@ +542,5 @@
>      gigs *= 1024 * 1024 * 1024
>  
> +    if not full_purge and freespace(folder) >= gigs:
> +        log.info("No need to cleanup")
> +        return

short cut return because listdir is expensive.

@@ +562,4 @@
>  
> +    # iterate files sorted by mtime
> +    for _, f in sorted(files):
> +        log.info("removing %s to free up space" % f)

I simplified the rest and unified the loops. You don't need to pop() in this case and alter the initial structure.
Comment on attachment 758509 [details] [diff] [review]
Patch including fixes based on rail's review - part II: purge_builds.py

Review of attachment 758509 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/maintenance/purge_builds.py
@@ +250,5 @@
> +        import imp
> +        try:
> +            tooltool = imp.load_source('tooltool', os.path.join(os.environ['TOOLTOOL_HOME'], "tooltool.py"))
> +            tooltool.purge(os.environ['TOOLTOOL_CACHE'], options.size)
> +        except:

Can you add a refined list of exceptions to be handled?
Comment on attachment 758672 [details] [diff] [review]
Patch for "Patch including fixes based on rail's review - part I: tooltool.py"

Review of attachment 758672 [details] [diff] [review]:
-----------------------------------------------------------------

I like it! Especially the refactoring of the loops: your solution is much more elegant than mine (and efficient, since it does not listdir in case no purge is needed)!
Attachment #758672 - Flags: review?(sbruno) → review+
Hi Rail,
Regarding the explicit list of exceptions raised in the code fragment added in purge_builds.py, I could not find in the python docs the list of exceptions raised by either imp.load_source and os.path.join (tooltool.purge does not raise explicitly any kind of exceptions).
How do you recommend to proceed?
imp may raise tons of possible exceptions from IOError to SyntaxError. If you don't want to stop builds if purge doesn't work it'd be safe to catch Exception and dump the stack.
Comment on attachment 758506 [details] [diff] [review]
Patch including fixes based on rail's review - part I: tooltool.py

r+ with the attachment 758672 [details] [diff] [review] applied
Attachment #758506 - Flags: review?(rail) → review+
Comment on attachment 758509 [details] [diff] [review]
Patch including fixes based on rail's review - part II: purge_builds.py

Review of attachment 758509 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/maintenance/purge_builds.py
@@ +250,5 @@
> +        import imp
> +        try:
> +            tooltool = imp.load_source('tooltool', os.path.join(os.environ['TOOLTOOL_HOME'], "tooltool.py"))
> +            tooltool.purge(os.environ['TOOLTOOL_CACHE'], options.size)
> +        except:

r+ if you use "except Exception:"
Attachment #758509 - Flags: review?(rail) → review+
I'll land the tooltool patches in a bit. We'll need to land the modified tooltool to 3 puppet repos:

http://hg.mozilla.org/build/puppet/
http://hg.mozilla.org/build/puppet-manifests/
http://hg.mozilla.org/users/dmitchell_mozilla.com/puppet320/
This removes a temporary file after unittests.
Attachment #759731 - Flags: review?(sbruno)
Comment on attachment 758506 [details] [diff] [review]
Patch including fixes based on rail's review - part I: tooltool.py

https://github.com/mozilla/build-tooltool/commit/b3ddf7fec1f6a08ef651a35909fc234d549fc22a
Attachment #758506 - Flags: checked-in+
Comment on attachment 758672 [details] [diff] [review]
Patch for "Patch including fixes based on rail's review - part I: tooltool.py"

https://github.com/mozilla/build-tooltool/commit/3c6f496dce2334bbe494fb2e5d492e0857606632
Attachment #758672 - Flags: checked-in+
Attachment #759731 - Flags: review?(sbruno) → review+
Blocks: 887301
Product: mozilla.org → Release Engineering
Attachment #760972 - Flags: review?(rail)
Attachment #760973 - Flags: review?(rail)
Attachment #760974 - Flags: review?(rail)
Hi Rail,

The content of the last patches should be identical to what you already reviewed; the changes have been ported to the version of tooltool living in the puppet repositories.
Comment on attachment 760972 [details] [diff] [review]
Patch for build/puppet-manifests repo

This repo is obsolete, no need for this patch anymore.
Attachment #760972 - Flags: review?(rail)
Comment on attachment 760974 [details] [diff] [review]
Patch for Dustin's puppet320 repo

The same here.
Attachment #760974 - Flags: review?(rail)
Attachment #760973 - Flags: review?(rail) → review+
An extra note: in order to clean up the cache, two extra environment variables need to be set up where the purge_build.py script is used.
The changes required to enable these environment variables are not encompassed by the puppet changes uploaded here.
Blocks: 920485
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
oh, I see the update in bug 920485, never mind
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: