Closed
Bug 1019528
Opened 10 years ago
Closed 10 years ago
mozregression should be able to use a local archive of builds
Categories
(Testing :: mozregression, defect)
Testing
mozregression
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Gijs, Assigned: samdgarrett)
Details
Attachments
(1 file)
I've often thought, I have crazy amounts of spare hard disk space. Disk space is cheap. On the other hand, the time I spend waiting for every 80mb OS X dmg to download (under a minute because I have a decent broadband connection - but not everyone does, and it still ends up being a fair amount of waiting) would be better spent doing other stuff. And when I'm done bisecting, all the builds go to never-never land. It would be nice if I could designate a directory and a certain amount of space to be used by mozregression to store builds, and to speed up initial bisection. It wouldn't need to proactively download all of the builds on the ftp server, but it could keep a backed-off (ie fewer builds the further back we go) archive of some to save at least a few downloads, and therefore save time.
Comment 1•10 years ago
|
||
We should be able to do this fairly easily by taking advantage of a requests caching library for python like CacheControl: http://cachecontrol.readthedocs.org/ We should limit the size of the cache somehow so it doesn't go totally out of control (maybe 1 Gig, configurable?). We'd need to add functionality to do this to the cachecontrol library (it doesn't currently support it), but it should be relatively straightforward: https://github.com/ionrock/cachecontrol/blob/master/cachecontrol/caches/file_cache.py
Assignee | ||
Comment 2•10 years ago
|
||
Hi William, I've implemented the max_bytes option in the FileCache class here: https://github.com/ionrock/cachecontrol/pull/38. Once the cachecontrol maintainer and I get something merged in I will update this branch: https://github.com/sinemetu1/mozregression/tree/bug1019528-cache-builds to use the max_bytes parameter. This branch is currently working without the max_bytes parameter.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → samdgarrett
Comment 3•10 years ago
|
||
Hey Sam, it looks like CacheControl's author is suggesting doing a custom override of the class inside mozregression. Do you want to take a look at approaching things that way? This feature would be really nice to have, but maybe most especially for us developers as it would make development go faster.
Flags: needinfo?(samdgarrett)
Assignee | ||
Comment 4•10 years ago
|
||
Yes I will see if I can get this implemented and done this weekend.
Flags: needinfo?(samdgarrett)
Assignee | ||
Comment 5•10 years ago
|
||
Attachment #8532664 -
Flags: review?(wlachance)
Comment 6•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality This all looks good to me, though since it requires an explicit command line to work (AFAICT?) it might not get that broad use to me. I think we should do a followup to make it so that we have a cache directory in the user's "home" directory by default.
Attachment #8532664 -
Flags: review?(wlachance) → review+
Assignee | ||
Comment 7•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality Hey William, Can you re-review? I had to rebase after all of the refactorings and want to make sure I'm not missing anything. Thanks.
Attachment #8532664 -
Flags: review?(wlachance)
Comment 8•10 years ago
|
||
I find it quite intrusive. What do you think of using a singleton like pattern ? Maybe something like (in utils.py for example): CACHE_SESSION = None def set_http_cache_session(http_cache_dir=None, ...): # define CACHE_SESSION def get_http_session(): # it seems like the api is similar to request return CACHE_SESSION or requests and in client code replace requests calls with get_http_session(). set_http_cache_session would be called in main, and there will be less session and http_cache_dir passing around. What do you think ?
Comment 9•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality Unfortunately with this patch applied I get breakage when I get to the point of bisecting inbound: (mozregression)wlach@eideticker:~/src/mozregression$ mozregression -g 2014-12-05 -b 2014-12-12 ... (every build selected as bad) 1:15.72 LOG: MainThread Bisector INFO Getting inbound builds between 17de0f463944 and 035a951fc24a Traceback (most recent call last): File "/home/wlach/src/mozregression/bin/mozregression", line 9, in <module> load_entry_point('mozregression==0.30', 'console_scripts', 'mozregression')() File "/home/wlach/src/mozregression/mozregression/regression.py", line 501, in cli app() File "/home/wlach/src/mozregression/mozregression/regression.py", line 310, in bisect_nightlies self.bisect_nightlies() File "/home/wlach/src/mozregression/mozregression/regression.py", line 310, in bisect_nightlies self.bisect_nightlies() File "/home/wlach/src/mozregression/mozregression/regression.py", line 260, in bisect_nightlies self.bisect_inbound() File "/home/wlach/src/mozregression/mozregression/regression.py", line 150, in bisect_inbound range=60*60*12) File "/home/wlach/src/mozregression/mozregression/inboundfinder.py", line 67, in get_build_infos pushlogs_finder = self._create_pushlog_finder(start_rev, end_rev) TypeError: _create_pushlog_finder() takes exactly 4 arguments (3 given) I think Julien's idea is also be worth considering as it would probably simplify things a bit.
Attachment #8532664 -
Flags: review?(wlachance) → review-
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to Julien Pagès from comment #8) > I find it quite intrusive. What do you think of using a singleton This was my initial thought for adding this functionality. I'll get a pull-request up with this pattern.
Assignee | ||
Comment 11•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality Hey guys, I updated to a singleton. I also added a test and separated out the preparation for bisecting to make it a little more testable. What's in the PR now isn't the best test in the world but I guess it's slightly better. One thing is that I had a hard time mocking out the create_launcher method. I ended up having to declare the import differently here: https://github.com/mozilla/mozregression/pull/143/files#diff-fa686b102516e50aa7df8250dd9a10a6L21 which worked. I'm not sure why it wasn't picking up the mocking before that. Let me know if there are any issues.
Attachment #8532664 -
Flags: review?(wlachance)
Attachment #8532664 -
Flags: review?(j.parkouss)
Comment 12•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality Hey Sam, First, thanks for the update! It looks definitely better to me. :) There is just some little things that I noted on the pull request. About the mocking of the create_launcher, I think I can explain it: In python a module have a dict to store its variables. The result is that when you do: from mozregression.launchers import create_launcher inside regression.py you are creating a new variable named create_launcher in the module. So you have two variables (two variables referencing the same thing): mozregression.launchers.create_launcher AND mozregression.regression.create_launcher Now if you try to mock "mozregression.launchers.create_launcher", mozregression.regression.create_launcher is not impacted. The solution is to patch "mozregression.regression.create_launcher" instead. I hope this is useful; Another thing, but this not directly related to the patch. Why do we want to use the cache by default ? Shoud'nt we activate it with a command line flag ? Or at least provide a way to deactivate it ? Also I am thinking that it may be interesting to not use the cache for binaries download (after all, there is the --persist flag already). I think that we could provide strategies for the cache, for example: - caching everything (current patch) - caching everything but downloaded files - ... But that may definitely done in another patches, if that sounds interesting for you also. Thanks again Sam for updating the code - I know that it must have took some time (with recent refactoring among other things) but I'm really happy that we have a cleaner implementation for this. :)
Attachment #8532664 -
Flags: review?(j.parkouss) → review+
Assignee | ||
Comment 13•10 years ago
|
||
Cool. I rebased and squashed the commits, fixed the import issue, and added some tests. PR should be good to merge in now.
Comment 14•10 years ago
|
||
Hey Sam, I have just noted a few nits, after that I do agree for the merge - even if I would have add a command option line to activate the cache on demand, not by default. :) But that's my opinion on that.
Comment 15•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality I'm getting this error when I try run mozregression with the PR integrated: (mozregression)wlach@eideticker:~/src/mozregression$ mozregression -g 2014-12-05 -b 2014-12-12 --inbound-branch fx-team Traceback (most recent call last): File "/home/wlach/src/mozregression/bin/mozregression", line 9, in <module> load_entry_point('mozregression==0.30', 'console_scripts', 'mozregression')() File "/home/wlach/src/mozregression/mozregression/regression.py", line 473, in cli options.http_cache_dir, utils.one_gigabyte, NameError: global name 'utils' is not defined Also, it's not entirely clear to me what the behaviour is when --http_cache_dir (really should be --http-cache-dir) is not provided on the command line? I think we *should* actually make some kind of cache a default for mozregression. To answer Julien's concern, the cache has a bit of a different function than persist (which keeps extracted copies around for later running) -- this is simply a way to make sure that future downloads go faster.
Attachment #8532664 -
Flags: review?(wlachance) → review-
Assignee | ||
Comment 16•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality If we don't provide --http-cache-dir it uses the default CacheControl, which is "...a threadsafe in memory dictionary for storage." See: http://cachecontrol.readthedocs.org/en/latest/#quick-start So this is a cache that goes away when mozregression stops running. @William: I've added tests that run basically everything in our cli() methods except the last lines which do the actual work. Hopefully this will help us later on and we can extend these tests pretty easily.
Attachment #8532664 -
Flags: review?(wlachance)
Comment 17•10 years ago
|
||
(In reply to Sam Garrett from comment #16) > If we don't provide --http-cache-dir it uses the default CacheControl, which > is "...a threadsafe in memory dictionary for storage." See: > http://cachecontrol.readthedocs.org/en/latest/#quick-start So this is a > cache that goes away when mozregression stops running. Nice! I like this default. I did not know that the cache could be used in memory only, that's great. Plus we will probably be able to remove some code to cache month urls in NightlyUrlBuilder (https://github.com/mozilla/mozregression/blob/master/mozregression/build_data.py#L330).
Comment 18•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality Hmm, am I doing something wrong? I'm still getting an error with the test case I mentioned above, a different one this time: (mozregression)wlach@eideticker:~/src/mozregression$ mozregression -g 2014-12-05 -b 2014-12-12 --inbound-branch fx-team --http-cache-dir cache ... Traceback (most recent call last): File "/home/wlach/src/mozregression/bin/mozregression", line 9, in <module> load_entry_point('mozregression==0.30', 'console_scripts', 'mozregression')() File "/home/wlach/src/mozregression/mozregression/regression.py", line 530, in cli app() File "/home/wlach/src/mozregression/mozregression/regression.py", line 320, in bisect_nightlies self.bisect_nightlies() File "/home/wlach/src/mozregression/mozregression/regression.py", line 320, in bisect_nightlies self.bisect_nightlies() File "/home/wlach/src/mozregression/mozregression/regression.py", line 263, in bisect_nightlies self.bisect_inbound() File "/home/wlach/src/mozregression/mozregression/regression.py", line 202, in bisect_inbound if len(inbound_revisions) > 1 and verdict in ('g', 'b'): TypeError: object of type 'NoneType' has no len() The rest of the changes look good. I think I would prefer a persistent cache by default, but we can deal with that in a followup once we resolve this. :)
Attachment #8532664 -
Flags: review?(wlachance)
Attachment #8532664 -
Flags: review-
Attachment #8532664 -
Flags: review+
Assignee | ||
Comment 19•10 years ago
|
||
Hi William, I'm unable to reproduce that error. I removed the print statement. Can you reverify that this is an actual bug? I'm running on a Mac so maybe that is causing mine to not fail.
Flags: needinfo?(wlachance)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(wlachance)
Attachment #8532664 -
Flags: review?(j.parkouss)
Comment 20•10 years ago
|
||
Comment on attachment 8532664 [details] [review] Caching http requests with cachecontrol functionality This works fine for me! I suppose that William must have tested with an old version of the branch. I would have preferred only changes related to the cache in your commit - to keep things simple - but that's fine. :) I will merge this in.
Attachment #8532664 -
Flags: review?(j.parkouss) → review+
Comment 22•10 years ago
|
||
Well, I was too fast testing/merging this. It seems that only the first inbound step is working, then I have the same error as William described in comment 18. Sorry for that, I will fix it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 23•10 years ago
|
||
Well, I think I fixed it: https://github.com/mozilla/mozregression/pull/152 Unfortunately I had to remove some tests. I think we will rewrite them with bug 1111314 anyway. Well, let's hope this time I close this for good.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•