Closed Bug 1003793 Opened 12 years ago Closed 12 years ago

[b2ghaystack] Use multiple processes to speed up retrieval of valid builds

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davehunt, Assigned: parkouss)

Details

(Whiteboard: [mentor=davehunt][lang=py][good first bug])

Attachments

(1 file, 2 obsolete files)

Once we have a list of timestamps within the desired range, we should use multiple processes to speed up determining if these are valid builds. At the moment we check these in serial, as shown here: https://github.com/davehunt/b2ghaystack/blob/dd1eb8a5407b54efea8adf885d1b388d6a9178c8/b2ghaystack/b2ghaystack.py#L66 which is pretty slow. See the documentation for the multiprocessing package: https://docs.python.org/2/library/multiprocessing.html
I suspect you could do this with threads as well and it would be fine. If you don't mind pulling in an extra package concurrent.futures is pretty awesome (in the stdlib in Python 3.4, but a packport to 2.x exists): http://pythonhosted.org//futures/ It allows using a process or thread pool and easily submitting jobs to it.
Hi, I'd like to work on this. I have two questions: * the b2ghaystack command line (with --dry-run) need 4 positional arguments. It seems to me that device_name and job_name could be anything in my case, but I can't find what I can use for bad_rev and good_rev - I tried mercurial changeset, but that's not working. * multprocessing or concurrent.futures ? mutiprocessing is also very easy to use with a pool of workers [1]. [1] https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
(In reply to j.parkouss from comment #2) > Hi, I'd like to work on this. I have two questions: That's great, thanks for commenting! > * the b2ghaystack command line (with --dry-run) need 4 positional arguments. > It seems to me that device_name and job_name could be anything in my case, > but I can't find what I can use for bad_rev and good_rev - I tried mercurial > changeset, but that's not working. Well, device_name is used to generate the URL for the builds, so that's important (see below). The job_name can be whatever you want, especially when running with --dry-run. The revisions are Mercurial changesets, you should be able to find some that work using Datazilla. Pick two datapoints and copy/paste the Gecko revision. See https://datazilla.mozilla.org/b2g/?branch=master&device=hamachi&range=7 The issue you will most likely have is that the builds b2ghaystack tries to check are behind authentication. We might first need to overcome this somehow, otherwise you may not be able to test your changes. You could perhaps try locally changing the URL at https://github.com/davehunt/b2ghaystack/blob/7a0b7d5b7fa1fe9c783d4e0fb96f09f946eaf133/b2ghaystack/b2ghaystack.py#L59 to http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/%s-%s%s/ and using a device name that's available on our public FTP. Maybe 'hamachi' for example. > * multprocessing or concurrent.futures ? mutiprocessing is also very easy to > use with a pool of workers [1]. I have no preference here. If you'd like to try both and profile them, I'd pick whichever gives us the biggest speedup. :)
With your instructions, I was able to test the application - It needed some local tweaks as the public ftp does not seems to have the same naming conventions (timestamps in the folder names for example). I choosed to use concurrent.futures, as I finally think it is more convenient here (for downloading). The speedup results, for the given command: time b2ghaystack --dry-run hamachi titi d7c07694f339 51b7091471a9 --------> 154 revisions found Getting builds from: http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-central-hamachi/ --------> 147 builds found --------> 117 builds within range --------> 49 builds matching revisions ... without patch: real 1m56.281s user 0m2.150s sys 0m0.147s with patch: real 0m10.241s user 0m1.817s sys 0m0.293s with patch, and the "-w5" command argument (to limit thread workers): real 0m27.950s user 0m2.000s sys 0m0.250s
Attachment #8428322 - Flags: review?(dave.hunt)
I was too fast ! I corrected my first patch and made it simpler by using a default max_workers value of 100 and simply use it - I think concurrent.futures is smart enough to not start a thread if there is less jobs to be done than the value of max_workers. Sorry for the inconvenience ;)
Attachment #8428322 - Attachment is obsolete: true
Attachment #8428322 - Flags: review?(dave.hunt)
Attachment #8428333 - Flags: review?(dave.hunt)
Comment on attachment 8428333 [details] [diff] [review] Use multiple processes to speed up retrieval of valid builds Review of attachment 8428333 [details] [diff] [review]: ----------------------------------------------------------------- This looks great, and works really well! I've mentioned a few nits to be cleaned up, once these are done please request review again and I'll land the patch. Could you please provide your name and e-mail address as you'd like them to appear in the git history? You can get these using |git config user.name| and |git config user.email|. Alternatively, you can apply the patch as a commit locally and then create a new patch from this using |git show --patch > bug1003793.patch|, which should then contain author details. ::: b2ghaystack/b2ghaystack.py @@ +12,4 @@ > import jenkins > import requests > > +from concurrent import futures Nit: Can you move this line up to be in accordance with PEP8? In fact, please could you run a PEP8 checker over your changes? See http://legacy.python.org/dev/peps/pep-0008/ and https://pypi.python.org/pypi/pep8 @@ +19,5 @@ > + for link in url_links(build_url, '^sources\.xml$', auth): > + sources_url = '%s/%s' % (build_url, link) > + r = requests.get(sources_url, auth=auth) > + search = re.search('<project .* path="gecko" remote="hgmozillaorg"' > + ' revision="(\w{12})"/>', r.text) Nit: E127 continuation line over-indented for visual indent (PEP8) @@ +32,2 @@ > > def url_links(url, regex=None, auth=None): Nit: E302 expected 2 blank lines, found 1 (PEP8) @@ +104,5 @@ > + sys.exit("Retrieving valid builds from %r generated an" > + " exception: %s" % (futures_results[future], > + future.exception())) > + all_builds.extend(future.result()) > + Nit: W293 blank line contains whitespace (PEP8) @@ +181,5 @@ > + dest='max_workers', > + default=100, > + type=int, > + help="maximum number of threads in parallel to speed up download" > + " process (default: %(default)s)") Do you think this is necessary? I think 100 is a reasonable default, but not sure if there would be a need to reduce or increase this from the command line.
Attachment #8428333 - Flags: review?(dave.hunt) → review+
I removed the command line option - I was not sure about this, and you convinced me. Probably a confusing and maybe useless command line option. The code is now pep8 compliant. Also, thank you for the hints with git! I added my name, email and a commit message with the patch.
Attachment #8428333 - Attachment is obsolete: true
Attachment #8429372 - Flags: review?(dave.hunt)
Landed in: https://github.com/mozilla/b2ghaystack/commit/9d5c4c2463af41fa050ad90c9184ba853e1cdadd If you have a github account then it may be using a different e-mail address, as it's not recognised you. You can also add this e-mail address to an existing github account to connect it to the commit. Let me know if you're interested in doing more work with b2ghaystack. There are a couple of things that might be useful, such as supporting the public builds and desktop B2G builds. Some of the functionality could be upstreamed to mozregression, and there's also bug 992401.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Many thanks for your contribution! :)
Comment on attachment 8429372 [details] [diff] [review] Use multiple processes to speed up retrieval of valid builds Review of attachment 8429372 [details] [diff] [review]: ----------------------------------------------------------------- Oops, forgot to set the review flag. :)
Attachment #8429372 - Flags: review?(dave.hunt) → review+
Assignee: nobody → j.parkouss
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: