Closed Bug 1019528 Opened 10 years ago Closed 10 years ago

mozregression should be able to use a local archive of builds

Categories

(Testing :: mozregression, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Gijs, Assigned: samdgarrett)

Details

Attachments

(1 file)

49 bytes, text/x-github-pull-request
wlach
: review-
wlach
: review-
wlach
: review-
parkouss
: review+
wlach
: review-
parkouss
: review+
Details | Review
I've often thought, I have crazy amounts of spare hard disk space. Disk space is cheap. On the other hand, the time I spend waiting for every 80mb OS X dmg to download (under a minute because I have a decent broadband connection - but not everyone does, and it still ends up being a fair amount of waiting) would be better spent doing other stuff. And when I'm done bisecting, all the builds go to never-never land.

It would be nice if I could designate a directory and a certain amount of space to be used by mozregression to store builds, and to speed up initial bisection. It wouldn't need to proactively download all of the builds on the ftp server, but it could keep a backed-off (ie fewer builds the further back we go) archive of some to save at least a few downloads, and therefore save time.
We should be able to do this fairly easily by taking advantage of a requests caching library for python like CacheControl: http://cachecontrol.readthedocs.org/

We should limit the size of the cache somehow so it doesn't go totally out of control (maybe 1 Gig, configurable?). We'd need to add functionality to do this to the cachecontrol library (it doesn't currently support it), but it should be relatively straightforward:

https://github.com/ionrock/cachecontrol/blob/master/cachecontrol/caches/file_cache.py
Hi William,

I've implemented the max_bytes option in the FileCache class here: https://github.com/ionrock/cachecontrol/pull/38.

Once the cachecontrol maintainer and I get something merged in I will update this branch: https://github.com/sinemetu1/mozregression/tree/bug1019528-cache-builds to use the max_bytes parameter. This branch is currently working without the max_bytes parameter.
Assignee: nobody → samdgarrett
Hey Sam, it looks like CacheControl's author is suggesting doing a custom override of the class inside mozregression. Do you want to take a look at approaching things that way? This feature would be really nice to have, but maybe most especially for us developers as it would make development go faster.
Flags: needinfo?(samdgarrett)
Yes I will see if I can get this implemented and done this weekend.
Flags: needinfo?(samdgarrett)
Attachment #8532664 - Flags: review?(wlachance)
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

This all looks good to me, though since it requires an explicit command line to work (AFAICT?) it might not get that broad use to me. I think we should do a followup to make it so that we have a cache directory in the user's "home" directory by default.
Attachment #8532664 - Flags: review?(wlachance) → review+
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

Hey William,

Can you re-review? I had to rebase after all of the refactorings and want to make sure I'm not missing anything.

Thanks.
Attachment #8532664 - Flags: review?(wlachance)
I find it quite intrusive. What do you think of using a singleton like pattern ? Maybe something like (in utils.py for example):

CACHE_SESSION = None

def set_http_cache_session(http_cache_dir=None, ...):
   # define CACHE_SESSION

def get_http_session():
   # it seems like the api is similar to request
   return CACHE_SESSION or requests

and in client code replace requests calls with get_http_session(). set_http_cache_session would be called in main, and there will be less session and http_cache_dir passing around. What do you think ?
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

Unfortunately with this patch applied I get breakage when I get to the point of bisecting inbound:

(mozregression)wlach@eideticker:~/src/mozregression$ mozregression -g 2014-12-05 -b 2014-12-12
... (every build selected as bad)
 1:15.72 LOG: MainThread Bisector INFO Getting inbound builds between 17de0f463944 and 035a951fc24a
Traceback (most recent call last):
  File "/home/wlach/src/mozregression/bin/mozregression", line 9, in <module>
    load_entry_point('mozregression==0.30', 'console_scripts', 'mozregression')()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 501, in cli
    app()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 310, in bisect_nightlies
    self.bisect_nightlies()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 310, in bisect_nightlies
    self.bisect_nightlies()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 260, in bisect_nightlies
    self.bisect_inbound()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 150, in bisect_inbound
    range=60*60*12)
  File "/home/wlach/src/mozregression/mozregression/inboundfinder.py", line 67, in get_build_infos
    pushlogs_finder = self._create_pushlog_finder(start_rev, end_rev)
TypeError: _create_pushlog_finder() takes exactly 4 arguments (3 given)

I think Julien's idea is also be worth considering as it would probably simplify things a bit.
Attachment #8532664 - Flags: review?(wlachance) → review-
(In reply to Julien Pagès from comment #8)
> I find it quite intrusive. What do you think of using a singleton

This was my initial thought for adding this functionality. I'll get a pull-request up with this pattern.
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

Hey guys,

I updated to a singleton. I also added a test and separated out the preparation for bisecting to make it a little more testable. What's in the PR now isn't the best test in the world but I guess it's slightly better.

One thing is that I had a hard time mocking out the create_launcher method. I ended up having to declare the import differently here: https://github.com/mozilla/mozregression/pull/143/files#diff-fa686b102516e50aa7df8250dd9a10a6L21 which worked. I'm not sure why it wasn't picking up the mocking before that.

Let me know if there are any issues.
Attachment #8532664 - Flags: review?(wlachance)
Attachment #8532664 - Flags: review?(j.parkouss)
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

Hey Sam,

First, thanks for the update! It looks definitely better to me. :) There is just some little things that I noted on the pull request.

About the mocking of the create_launcher, I think I can explain it:

In python a module have a dict to store its variables. The result is that when you do:

from mozregression.launchers import create_launcher

inside regression.py you are creating a new variable named create_launcher in the module.

So you have two variables (two variables referencing the same thing):

mozregression.launchers.create_launcher
AND
mozregression.regression.create_launcher

Now if you try to mock "mozregression.launchers.create_launcher", mozregression.regression.create_launcher is not impacted.

The solution is to patch "mozregression.regression.create_launcher" instead.

I hope this is useful;


Another thing, but this not directly related to the patch. Why do we want to use the cache by default ? Shoud'nt we activate it with a command line flag ? Or at least provide a way to deactivate it ?

Also I am thinking that it may be interesting to not use the cache for binaries download (after all, there is the --persist flag already). I think that we could provide strategies for the cache, for example:

 - caching everything (current patch)
 - caching everything but downloaded files
 - ...

But that may definitely done in another patches, if that sounds interesting for you also.

Thanks again Sam for updating the code - I know that it must have took some time (with recent refactoring among other things) but I'm really happy that we have a cleaner implementation for this. :)
Attachment #8532664 - Flags: review?(j.parkouss) → review+
Cool.

I rebased and squashed the commits, fixed the import issue, and added some tests. PR should be good to merge in now.
Hey Sam,

I have just noted a few nits, after that I do agree for the merge - even if I would have add a command option line to activate the cache on demand, not by default.  :) But that's my opinion on that.
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

I'm getting this error when I try run mozregression with the PR integrated:

(mozregression)wlach@eideticker:~/src/mozregression$ mozregression -g 2014-12-05 -b 2014-12-12 --inbound-branch fx-team
Traceback (most recent call last):
  File "/home/wlach/src/mozregression/bin/mozregression", line 9, in <module>
    load_entry_point('mozregression==0.30', 'console_scripts', 'mozregression')()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 473, in cli
    options.http_cache_dir, utils.one_gigabyte,
NameError: global name 'utils' is not defined

Also, it's not entirely clear to me what the behaviour is when --http_cache_dir (really should be --http-cache-dir) is not provided on the command line? I think we *should* actually make some kind of cache a default for mozregression. To answer Julien's concern, the cache has a bit of a different function than persist (which keeps extracted copies around for later running) -- this is simply a way to make sure that future downloads go faster.
Attachment #8532664 - Flags: review?(wlachance) → review-
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

If we don't provide --http-cache-dir it uses the default CacheControl, which is "...a threadsafe in memory dictionary for storage." See: http://cachecontrol.readthedocs.org/en/latest/#quick-start So this is a cache that goes away when mozregression stops running.

@William: I've added tests that run basically everything in our cli() methods except the last lines which do the actual work. Hopefully this will help us later on and we can extend these tests pretty easily.
Attachment #8532664 - Flags: review?(wlachance)
(In reply to Sam Garrett from comment #16)
> If we don't provide --http-cache-dir it uses the default CacheControl, which
> is "...a threadsafe in memory dictionary for storage." See:
> http://cachecontrol.readthedocs.org/en/latest/#quick-start So this is a
> cache that goes away when mozregression stops running.

Nice! I like this default. I did not know that the cache could be used in memory only, that's great. Plus we will probably be able to remove some code to cache month urls in NightlyUrlBuilder (https://github.com/mozilla/mozregression/blob/master/mozregression/build_data.py#L330).
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

Hmm, am I doing something wrong? I'm still getting an error with the test case I mentioned above, a different one this time:

(mozregression)wlach@eideticker:~/src/mozregression$ mozregression -g 2014-12-05 -b 2014-12-12 --inbound-branch fx-team --http-cache-dir cache
...
Traceback (most recent call last):
  File "/home/wlach/src/mozregression/bin/mozregression", line 9, in <module>
    load_entry_point('mozregression==0.30', 'console_scripts', 'mozregression')()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 530, in cli
    app()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 320, in bisect_nightlies
    self.bisect_nightlies()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 320, in bisect_nightlies
    self.bisect_nightlies()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 263, in bisect_nightlies
    self.bisect_inbound()
  File "/home/wlach/src/mozregression/mozregression/regression.py", line 202, in bisect_inbound
    if len(inbound_revisions) > 1 and verdict in ('g', 'b'):
TypeError: object of type 'NoneType' has no len()

The rest of the changes look good. I think I would prefer a persistent cache by default, but we can deal with that in a followup once we resolve this. :)
Attachment #8532664 - Flags: review?(wlachance)
Attachment #8532664 - Flags: review-
Attachment #8532664 - Flags: review+
Hi William,

I'm unable to reproduce that error. I removed the print statement. Can you reverify that this is an actual bug? I'm running on a Mac so maybe that is causing mine to not fail.
Flags: needinfo?(wlachance)
Flags: needinfo?(wlachance)
Attachment #8532664 - Flags: review?(j.parkouss)
Comment on attachment 8532664 [details] [review]
Caching http requests with cachecontrol functionality

This works fine for me! I suppose that William must have tested with an old version of the branch.

I would have preferred only changes related to the cache in your commit - to keep things simple - but that's fine. :) I will merge this in.
Attachment #8532664 - Flags: review?(j.parkouss) → review+
Merged.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Well, I was too fast testing/merging this. It seems that only the first inbound step is working, then I have the same error as William described in comment 18.

Sorry for that, I will fix it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Well, I think I fixed it:

https://github.com/mozilla/mozregression/pull/152

Unfortunately I had to remove some tests. I think we will rewrite them with bug 1111314 anyway.

Well, let's hope this time I close this for good.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: