Closed Bug 943871 Opened 11 years ago Closed 11 years ago

Autophone - use http instead of ftp for build urls

Categories

(Testing Graveyard :: Autophone, defect)

x86_64
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bc, Assigned: bc)

References

Details

Attachments

(1 file)

While investigating using the archived inbound builds for use in Autophone regression triage, I ran into a problem with the use of ftp urls for builds.

The archived inbound builds are only available over http: e.g.:

http://inbound-archive.pub.build.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/

Apart from the perhaps simpler logic of dealing with ftp, I think http is a better choice as it is 1) more readily available, 2) more likely to not be constrained by network flow restrictions.

My first thoughts on the approach would be to:

a) use something like BeautifulSoup to parse the http responses and make minimal changes to the overall logic.

b) use something like mozdownload. It doesn't appear that this supports fennec or the inbound archive.

c) use mozregression which has added support for the inbound archive. This doesn't appear to support fennec/arm/x86.

I'm leaning towards a) at least initially. Suggestions / Comments solicited.
Blocks: 943873
Tough call.  If mozdownload does most of what you need, I'd be tempted to expand that.  The most annoying part will be parsing the responses, so I'd see if mozdownload mostly handles that now.
mozdownload is very general but appears to be very tightly coupled to downloading desktop type builds from ftp.mozilla.org for the platform where it is running. The parsing is quite different in that it uses HTMLParser to collect links then 'grep's them out for different file patterns. I think expanding mozdownload for our use case would be a non-trivial undertaking.
Two things:

(1) It should be fairly easy to extend mozregression to support fennec/arm. I can give guidance on this if necessary.
(2) It's definitely annoying that we have duplicated logic to fetch builds between mozdownload/mozregression and lack of android support in mozdownload is one of the main things blocking that. If you have the cycles, I don't think working on that would be a waste.
I'm primarily focused at the moment on getting autophone in shape to do regression testing so I can identify specific revisions which cause issues. The Autophone use cases don't seem to fit well with the existing uses in mozregression or mozdownload. I'm just afraid of getting tied up working on mozregression or mozdownload when my real goal is making sure actionable bugs are filed for autophone.
Mm yeah, normally I think I would say that improving our general tools would be the way to go, but it's true that we're really focussing on getting actionable items out of autophone right now, so let's take the easiest route to get that done--but just keep an eye on what you're doing from a high level to make sure you don't end up duplicating too much, and even better would be writing it in such a way that it could be easily applied to mozdownload or mozregression.  I still want to generalize the build cache and potentially put that into mozdownload some day.
I spent a little time yesterday looking at all the repos that are available. I was thinking that supporting accessing build and related files by date, buildid, revision for an arbitrary platform, cpu regardless of the repo name and base url of the site would be a general purpose component that could be easily reused outside of autophone.

mozregression's use of the hg json-pushes was very interesting as well and supporting fetching builds by revision would be very helpful in autophone.
I spent a bit more time on this and tried some first stabs at writing some code for the general case of supporting arbitrary base urls and the general set of patterns in use for build directories and file names. It felt as if someone was going to jump out from behind a rock and shout "You are in a maze of twisty little passages, all alike". I'm going to just focus on the immediate need for Autophone for this bug but will continue trying to come up with a more general approach that we can use everywhere we need to get builds.
I used httplib2, beautifulsoup4, renamed some things to be more descriptive and removed the unused build_date_from_url|build_date. I also added a bunch of debug log messages and changed the default log level to INFO so we don't see them normally.

I tested with the following:

python trigger_runs.py 20131201030203
python trigger_runs.py 2013-11-10 2013-11-12
python trigger_runs.py 2013-12-01
python trigger_runs.py latest
python trigger_runs.py --build-location=tinderbox 20131202034749
python trigger_runs.py --build-location=tinderbox 2013-11-10 2013-11-12
python trigger_runs.py --build-location=tinderbox 2013-12-01
python trigger_runs.py --build-location=tinderbox latest
Attachment #8341156 - Flags: review?(mcote)
Comment on attachment 8341156 [details] [diff] [review]
bug-943871-http-builds.patch

Review of attachment 8341156 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, cleaner than the ftp implementation!  Just one nit and one suggestion.

::: builds.py
@@ +32,4 @@
>          self.nightly_dirnames = [(re.compile('(.*)-%s-android(-armv6|-x86)?$'
>                                               % repo)) for repo in repos]
>  
> +    def search_directories(self, start_time, end_time):

This function name kind of sounds like it's *doing* the search... maybe dirs_to_search() or something like that.

@@ +198,5 @@
> +                        directory_href, httplib.responses[builddir_response.status]))
> +                    continue
> +
> +                builddir_soup = BeautifulSoup(builddir_content)
> +                for build_link in builddir_soup.findAll('a'):

I think you could refactor the link-fetching code to avoid duplication between the list of builds and the build files, maybe using a generator.
Attachment #8341156 - Flags: review?(mcote) → review+
(In reply to Mark Côté ( :mcote ) from comment #9)
> Comment on attachment 8341156 [details] [diff] [review]
> bug-943871-http-builds.patch
> 
> Review of attachment 8341156 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good, cleaner than the ftp implementation!  Just one nit and one
> suggestion.
> 
> ::: builds.py
> @@ +32,4 @@
> >          self.nightly_dirnames = [(re.compile('(.*)-%s-android(-armv6|-x86)?$'
> >                                               % repo)) for repo in repos]
> >  
> > +    def search_directories(self, start_time, end_time):
> 
> This function name kind of sounds like it's *doing* the search... maybe
> dirs_to_search() or something like that.
> 

Yeah, I struggled over that while trying to keep it readable and short. How about get_search_directories?

> @@ +198,5 @@
> > +                        directory_href, httplib.responses[builddir_response.status]))
> > +                    continue
> > +
> > +                builddir_soup = BeautifulSoup(builddir_content)
> > +                for build_link in builddir_soup.findAll('a'):
> 
> I think you could refactor the link-fetching code to avoid duplication
> between the list of builds and the build files, maybe using a generator.

I don't understand.

directory_soup.findAll('a') is the list of potential build directories. We skip over the directories which do not have a build_time. We then have to make separate requests to each build directory to get the file list builddir_soup.findAll('a'). I don't see the how of or the benefit of redoing this using generators.
https://github.com/mozilla/autophone/commit/ac8391aacc45cfb1d14c5720a2e0e81854b71567

I went ahead and changed it to get_search_directories but left the other as is for now. I pushed so I can keep going with the inboundarchive and revision changes, I'll leave this open until we decide on the link-fetching code refactoring.
Apparently I didn't submit my comment last time... I was saying that lines 175-181 and lines 195-202 in the patched version are essentially the same, aside from the URL. They could be refactored to take a URL and then yield the results of findAll().
Mark and I talked and I understand what he was talking about better now. I'm going to use mozregression's approach for dealing with fetching the contents of the directories and will use a distinct function like urlLinks which will achieve the desired goal. Marking this fixed.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
oops, forgot to mention the changes will be in bug 945390
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: