Closed Bug 528064 Opened 11 years ago Closed 9 years ago

Script to download builds from the Mozilla server

Categories

(Mozilla QA Graveyard :: Mozmill Automation, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

()

Details

Attachments

(3 files, 7 obsolete files)

Before we start the software update tests we have to download a couple of builds from the FTP server. It's kinda annoying to navigate through the ftp folders. A Python script script could help us here which downloads the given build and saves it in the given folder. That way we can automate the process a bit more.

The following options would be necessary:
* version number (e.g. 3.5, 3.5.3, or 3.6b1)
* platform (linux-i686, mac, and win32)
* locale (e.g. en-US, or de)
* target folder (relative or absolute)

Given that information the build can be downloaded from the FTP server. Here an example URL:

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/releases/3.6b1/win32/de/

When saving this file locally I propose to use the following naming convention:

firefox-%version%-%locale%.%ext% (e.g. firefox-3.5.5-en-US.dmg)

Would be great if this script could also be used to download nightly builds in the future. So we should create a new Python module under mozmill-tests/scripts/lib/.

Any objections?
As given by Tracy, Bob has created a similar script for bash:
http://test.bclary.com/bin/download.sh

It takes an URL as parameter while we wanna have three parameters which will be used to create the URL automatically.

Shriram, would you still be interested to create this Python script?
Sure. I took a look at it and sounds pretty straightforward.
I will have to review it again one more time to write up the script.
I ended up having to do this for my regression range finder, in python, so this code might help some:

http://code.google.com/p/mozregression/source/browse/trunk/runnightly.py
Assigning to Shiram who will work on it.
Assignee: nobody → kshriram18
Status: NEW → ASSIGNED
Attached patch WorkInProgress (obsolete) — Splinter Review
I apologize for the big delay. 
I had to deal with some school stuff in the past two weeks.
I shall continue to work on it and finalize  the patch in the next two to three days.
If someone has more time to work on it immediately , please let me know
Attached patch WIP (obsolete) — Splinter Review
Attachment #419151 - Attachment is obsolete: true
As of now the most recent patch crashes at line 90.
This's probably because of the line 87. 
I shall continue to work on it in the next days
Shiram, I have taken a look at your WIP's and have some comments:

* Try to use the OptionParser (http://docs.python.org/library/optparse.html) to parse the command line options. It makes it much easier.

* Can you please make use of a downloader class we can stick into the scripts/lib folder? The executable download script itself should only have to use that class. That way we can integrate it into other scripts too.

* The platform version should also be a command line option as given in comment 0. Only the target folder is an argument

* Just concentrate your work on releases. Downloading nightly builds is not our primary goal for now.

* The local file should be named as given in comment 0 too.

If something is unclear to you or when you need help please ping me on IRC as usual. Thanks!
Shriram, can you please give us an update? If you will not have time please tell us so I can take over that work. We really need it in the near future.
I shall give an update in the next 12 hours.
I started work on the second patch, and had to spend some more time on python docs.
I shall make few more changes, test a little bit and upload the second patch.
Shriram, I can offer some more days for you. But if we do not have anything by Feb 8th I have to take over the work.
Henrik,
I will have a chance to upload a patch in the next few hours.
Sorry but time flies by. I have to take over the work for that script. Will work on it the next days.
Assignee: kshriram18 → hskupin
Sorry about this Henrik. I had network connectivity problems due to a trojan(spoofed attacker) for last week and a half. I resolved it with the help of the Internet Service Provider's assistance. I plan to resume work tomorrow and upload it before the weekend. Let me know if you could grant me that time.
It would be the last extension of time. So yes, if you have something by end of this week, please go on.
Blocks: 562352
Blocks: 562639
No longer blocks: 562352
Attached file temporary script (obsolete) —
That's a temporary script I have hacked together quickly to help us in downloading the builds.
Assignee: hskupin → nobody
Attachment #443646 - Attachment mime type: text/x-python-script → text/plain
This is a mass change. Every comment has "assigned-to-new" in it.

I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW
The script should be part of our automation repository:
http://hg.mozilla.org/qa/mozmill-automation/
Whiteboard: [mozmill-automation]
Component: Mozmill → Mozmill Tests
QA Contact: mozmill → mozmilltests
Whiteboard: [mozmill-automation] → [shared module]
Summary: [shared module] Script to download builds from the FTP site → Script to download builds from the FTP site
Blocks: 596828
Priority: -- → P2
I'd be happy to modify http://hg.mozilla.org/automation/getlatest-tinderbox/ to fulfill these requirements, if desired.  It doesn't download the files, just gets the URLs.  The base url would have to become a parameter (which it should, anyways) but most of the innards, despite the name, have nothing to do with tinderbox.  You'd probably also have to make mozmill-automation a python package to sensibly dependent on it.

Or if you want to roll your own or copy+paste code, then that's up to you.
I would be happy to take some basic stuff from your code to build up the url for all the different kinds of builds we have to download. I will investigate next month.
(In reply to comment #0)

Since the initial comment some things have been changed and will need a revised proposal for what we wanna have. Lets extend the proposal from comment 0:

> * version number (e.g. 3.5, 3.5.3, or 3.6b1)
> * platform (linux-i686, mac, and win32)
> * locale (e.g. en-US, or de)
> * target folder (relative or absolute)

* version number (i.e. 3.5, 3.5.3, or 4.0bx)
* type of build (release, release candidate build %x%, nightly)
* platform (linux, mac, and win)
* processor (32 bit or 64 bit) [exception is OS X with a combined build]
* locale (i.e. en-US, or de)
* target folder (relative or absolute)

Per default we should set defaults for the type, platform, processor, and locale which should match the system specifications. That way we can make the number of options, which have to be specified, as short as possible.

Anthony, Al, and Geo, do I miss something which is important for our automated release testing? If not I will use the given proposal to implement the download script. I will check the proposed solutions and available download scripts before I will start with our version of the script. I don't think an existent version fits all of our requirements.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: P2 → P1
Whiteboard: [shared module] → [mozmill-automation]
Can we add "application" to the list of requirements as well please?
(In reply to comment #24)
> Can we add "application" to the list of requirements as well please?

Sure. I hope you have the same naming scheme for folders and files? I haven't checked that yet in detail for Thunderbird.
(In reply to comment #25)
> (In reply to comment #24)
> > Can we add "application" to the list of requirements as well please?
> 
> Sure. I hope you have the same naming scheme for folders and files? I haven't
> checked that yet in detail for Thunderbird.

All our build automation is based on the same as Firefox. So they should be the same!
This fulfills my requirements.
Move of Mozmill related project bugs to newly created components. You can
filter out those emails by using "Mozmill-Tests-to-MozillaQA" as criteria.
Component: Mozmill Tests → Mozmill Automation
Product: Testing → Mozilla QA
QA Contact: mozmill-tests → mozmill-automation
Whiteboard: [mozmill-automation]
Also we would need an --url option which will override all the other options, and which can be used when you already have the url to the build to download.

I will try to get started on this script mid/end of this week.
Something we will have to take care of is the renaming of the folders as announced in dev.tree-management:

http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/4343accb10cb9986#
Summary: Script to download builds from the FTP site → Script to download builds from the Mozilla server
Attached patch WIP v1 (obsolete) — Splinter Review
Except the --url option, which we probably don't need, this WIP makes it possible to download release and candidate builds for Firefox and Thunderbird. Next step would be to also support nightly builds.

Geo, would be great if you could test the patch. Right now it relies on lxml and I'm not really happy about using it because it needs to be compiled. What do you think?
Attachment #419400 - Attachment is obsolete: true
Attachment #443646 - Attachment is obsolete: true
Attachment #503519 - Flags: feedback?(gmealer)
Perhaps a stupid question, but why aren't we just connecting via FTP? FTP is structured so you don't need to parse, maintains cwd, etc.

The python FTP lib is what I used in (the quick/dirty) http://hg.mozilla.org/users/clegnitto_mozilla.com/pulseshims/file/default/ftp_shim.py and it was trivial to download the nightly builds (and attach their build id in the same object by reading the text file next to them)
Christian, this script will be available and probably be used in the crowd extension. So I don't want to download builds from our FTP server. We should hit mirrors as much as possible. Not sure about the FTP network but it's not mirrored right? But in either way I should not use stage.mozilla.org in the current version of my script.
The requirement for a compiled library will hose you for the crowd extension anyway, I think. Would mechanize or another "web automation" library like that work instead of lxml, maybe without a compilation step?

Also, I really wouldn't worry about network impact until we have such adoption on the crowd extension that it actually starts causing some sort of issue. The ftp site is open to the public, after all.

For in-house use we probably don't want to hit mirrors anyway. If we hit a bad one it'll make the test launch super, super slow. We already have that potential in the update itself, let's not double the chances in the environment setup.

All said, I'll give it a shot, but I'd consider using straight ftp for our current requirement.
Ah, right. Hitting ftp directly is not mirrored, great point.
Comment on attachment 503519 [details] [diff] [review]
WIP v1

Appears to work well, though I didn't try an exhaustive cross of all options (and really concentrated on firefox).

Minor nitpick, it doesn't like having no arguments supplied--should throw help text. Same for leaving off any mandatory options (-p, for example).
Attachment #503519 - Flags: feedback?(gmealer) → feedback+
Oh, also your help text is caught in between revisions. At the top you say dest is a non-flag argument, in the text and code itself it's -d. That should be fixed.
(In reply to comment #34)
> The requirement for a compiled library will hose you for the crowd extension
> anyway, I think. Would mechanize or another "web automation" library like that
> work instead of lxml, maybe without a compilation step?

I know. That's why I'm searching for alternatives. If you know something please feel free to point me to that tool. In all simplicity we could just use a regex to get all that information from the HTML document.

> For in-house use we probably don't want to hit mirrors anyway. If we hit a bad
> one it'll make the test launch super, super slow. We already have that
> potential in the update itself, let's not double the chances in the environment
> setup.

Not an argument for me. It's even a chance for us to hit slow mirrors before we start update testing at all. It would even be an improvement. Beside that the FTP server is barely slow compared to most of our mirrors. I for myself don't see higher speeds as 370kB/s.
(In reply to comment #37)
> Oh, also your help text is caught in between revisions. At the top you say dest
> is a non-flag argument, in the text and code itself it's -d. That should be
> fixed.

Sorry, but I don't get it. You can specify -d XXX or --destination XXX. Where is your issue?
(In reply to comment #39)
> (In reply to comment #37)
> > Oh, also your help text is caught in between revisions. At the top you say dest
> > is a non-flag argument, in the text and code itself it's -d. That should be
> > fixed.
> 
> Sorry, but I don't get it. You can specify -d XXX or --destination XXX. Where
> is your issue?

 usage = 'usage: %prog [options] destination'

You don't really take destination as the final arg, you take it as a flagged option. Should just be %prog [options].

On that point, and I realize there are counter-examples, CLI design usually dictates that flags should always be optional, and anything mandatory should always be an ordered argument after the flags.

So I would have gone for %prog [options] build platform

Just a suggestion though. The help text one is a bigger deal.
Attached patch Patch v1 (obsolete) — Splinter Review
This is an updated version of the last WIP patch and should contain all the necessary items for the first version. I have changed the following:

* Removed dependency of lxml. Instead the HTMLParser is used now, which is a default package.
* Refactored class structure to make handling different kinds of builds more flexible.
* The destination folder is now an argument and not an option

Handling of nightly builds will be a follow-up work we can't do at the moment.
Attachment #503519 - Attachment is obsolete: true
Attachment #505102 - Flags: review?(jhammel)
Attached patch Patch v1.1 (obsolete) — Splinter Review
The active_url in the parser has also be stripped from ending slashes.
Attachment #505102 - Attachment is obsolete: true
Attachment #506281 - Flags: review?(jhammel)
Attachment #505102 - Flags: review?(jhammel)
Attached patch Patch v1.2 (obsolete) — Splinter Review
Moved the Scraper classes into their own module, now that we also want to access the classes from other scripts. That makes it much easier to use.

And as a side-note: Any packaging solution will be decided later on.
Attachment #506281 - Attachment is obsolete: true
Attachment #506355 - Flags: review?(jhammel)
Attachment #506281 - Flags: review?(jhammel)
We have the same need for our many of our automation utilities.  But we're taking a much different approach.  For our stuff, we're using pulse and moving away from directory scraping since it is rather brittle.  If you're interested, I can get you the code we're creating.  The one thing I'm not certain of is whether or not Thunderbird's build messages are in pulse, so that is one requirement you hvae that I don't know if our approach will satisfy, but INMHO, if thunderbird's buildbot isn't talking to pulse, that's something we ought to fix.
Comment on attachment 506355 [details] [diff] [review]
Patch v1.2

+base_path = os.path.dirname(os.path.abspath(__file__))
+sys.path.append(os.path.join(base_path, 'libs'))

You should never append to sys.path in this manner. Please use a standard python way of doing this.

+    usage = 'usage: %prog [options] folder'

Maybe directory vs. folder? The latter is a non-canonical

+    parser = OptionParser(usage=usage)

Should have a description= , preferably from the docstring

+    parser.add_option('--application', '-a',
+                      dest='application',
+                      choices=scraper.APPLICATIONS,
+                      default=scraper.APPLICATIONS[0],
+                      metavar='APPLICATION',
+                      help='The name of the application to download, default: "firefox"')

Should either be default='firefox', or help='The name..., default: "%s"' % APPLICATIONS[0]

+    # Check required options
+    if not options.version:
+        raise Exception("The version of the application to download has not been specified.")
+    if not options.platform:
+        raise Exception("The platform of the application to download has not been specified.")

     probably better off using parser.error()
     Also, in general, required arguments should be positional arguments, instead of the one positional argument being optional. ("Required options" is not internally consistent wording)


+    # Check for destination folder
+    destination = os.path.abspath(args[0]) if args else os.getcwd()

Should probably warn/error if there is more than one argument since you don't use them

+    # Instantiate scraper and download the build
+    if options.type in 'release':

should be '==' instead of 'in'

+    elif options.type in 'candidate':

should be '==' instead of 'in'; also, its probably better to have a more control-flowy way of getting the appropriate class:


scraper_args = {'application': options.application,
                'platform': options.platform, 
                'version': options.version}
# could also do scraper_args = dict([(i, getattr(options, i)) for i in 'application', 'platform', 'version'])
scrapers = {'release': ReleaseScraper,
            'candidate': ReleaseCandidateScraper}
scraper_options = { 'release': {'build': options.build}}
kwargs = scraper.args.copy().update(scraper_options.get(options.type, {}))
build = scrapers[options.type](**kwargs)


+    build.download(destination)

Probably better off having scraper.download return the file-like object then you can do with it what you want.  This way, you can use scraper as an 
API.


+#!/usr/bin/env python

Is the shebang usable?  Is this script executable?  If not, remove

+# ***** END LICENSE BLOCK *****

Should probably have a docstring that explains what this file does.  Should probably pass said docstring as the description argument to the parser

+# Constants
+BASE_URL = 'http://stage.mozilla.org/pub/mozilla.org';

No need for the semicolon

+APPLICATIONS = ['firefox', 'thunderbird']
+BUILD_TYPES = ['release', 'candidate', 'nightly']

Unless I misunderstand the intentions of the code, BUILD_TYPES is a bijection with respect to 
string <-> Scraper class; in this case, why not make a dict?  Following scraper declarations:

BUILD_TYPES = {'release': ReleaseScraper,
               'candidate': ReleaseCandidateScraper,
               'nightly': NightlyScraper }

+class AbstractException(Exception):
+    """Exception for an abstract class"""

Probably should be "Not implemented Exception for an abstract base class". I didn't understand what this was on first read. (Personally, I usually use NotImplementedError for such things).

+        req = urllib.urlopen(url)

Should probably use urllib2 instead of urllib so you can get reasonable timeouts

+        name = data.strip('/').replace(' ', '%20')

Should probably use urllib.quote()

+class GenericScraper(object):
+    """Generic class to download an application from the Mozilla server"""

Maybe should be renamed MozillaScraper or similar.  Its not that generic

+    def download(self, destination=os.getcwd()):
+        """Download the specified file"""
+
+        destination = os.path.abspath(destination)

Is there any reason to use abspath here?  Also you should probably note in the docstring that destination is supposed
to be a directory.  Again, I'd advise just returning the file-like object of what is downloaded vs. having the scraper determine where to put th contents


+        if not os.path.exists(destination):
+            raise NotFoundException('Destination not found', destination)

         Why does the destination have to exist?  Can't you do os.makedirs if its supposed to be a directory?
         Also, sanity checking should be os.path.isdir()?

+            print 'Downloading "%s" to "%s"' % (url, target)

Ideally shouldn't print from this class.

+            urllib.urlretrieve(url, target)

To match the comment ("Download the first matched directory entry"), this should break; otherwise, multiple entries will be downloaded to the same destination.  And again, for timeout reasons, you should probably use urllib2


+class NightlyScraper(GenericScraper):
+    """Class to download a nightly build from the Mozilla server"""
+
+    def __init__(self, branch='mozilla-central', debug=False, *args, **kwargs):
+        GenericScraper.__init__(self, *args, **kwargs)
+        self.branch = branch
+        self.debug = debug

raise NotImplementedError since its not done.
Attachment #506355 - Flags: review?(jhammel) → review-
(In reply to comment #44)
> We have the same need for our many of our automation utilities.  But we're
> taking a much different approach.  For our stuff, we're using pulse and moving
> away from directory scraping since it is rather brittle.  If you're interested,

Our use cases are a bit different as already said a lot of comments earlier. We also want to download builds (releases, release candidates, nighlies) from any release in the past. I believe Pulse can't manage that. Further we have to hit the mirrors to be able to download the builds as fast as possible. Your builds are probably 100% on internal FTP servers, ours are not.
(In reply to comment #45)
>      probably better off using parser.error()
>      Also, in general, required arguments should be positional arguments,
> instead of the one positional argument being optional. ("Required options" is
> not internally consistent wording)

Both options will be optional in the future when the INI file support has been landed. It's something I have to work next on.

> +    # Check for destination folder
> +    destination = os.path.abspath(args[0]) if args else os.getcwd()
> 
> Should probably warn/error if there is more than one argument since you don't
> use them

I moved this to an option, what it really is. It's not required.


> scraper_args = {'application': options.application,
>                 'platform': options.platform, 
>                 'version': options.version}
> # could also do scraper_args = dict([(i, getattr(options, i)) for i in
> 'application', 'platform', 'version'])
> scrapers = {'release': ReleaseScraper,
>             'candidate': ReleaseCandidateScraper}
> scraper_options = { 'release': {'build': options.build}}
> kwargs = scraper.args.copy().update(scraper_options.get(options.type, {}))
> build = scrapers[options.type](**kwargs)

That's great! I love it!

> +#!/usr/bin/env python
> 
> Is the shebang usable?  Is this script executable?  If not, remove

No, it's not. Just a left-over from moving that code to its own module.

> BUILD_TYPES = {'release': ReleaseScraper,
>                'candidate': ReleaseCandidateScraper,
>                'nightly': NightlyScraper }

Yeah, makes sense. It should even be moved to the script itself. It's nothing which should be part of this library.

> +class AbstractException(Exception):
> +    """Exception for an abstract class"""
> 
> Probably should be "Not implemented Exception for an abstract base class". I
> didn't understand what this was on first read. (Personally, I usually use
> NotImplementedError for such things).

Good call.

> +        req = urllib.urlopen(url)
> 
> Should probably use urllib2 instead of urllib so you can get reasonable
> timeouts

Timeouts are not supported on Python 2.4. I have to stay compatible with, so I will stay with urllib. Also it makes downloading files a lot easier.

> +        name = data.strip('/').replace(' ', '%20')
> 
> Should probably use urllib.quote()

Indeed.

> +class GenericScraper(object):
> +    """Generic class to download an application from the Mozilla server"""
> 
> Maybe should be renamed MozillaScraper or similar.  Its not that generic

done.

> +    def download(self, destination=os.getcwd()):
> +        """Download the specified file"""
> +
> +        destination = os.path.abspath(destination)
> 
> Is there any reason to use abspath here?  Also you should probably note in the
> docstring that destination is supposed
> to be a directory.  Again, I'd advise just returning the file-like object of
> what is downloaded vs. having the scraper determine where to put th contents

I have refactored that code which makes it more granular to use. It should fit your expectations.

> +        if not os.path.exists(destination):
> +            raise NotFoundException('Destination not found', destination)
> 
>          Why does the destination have to exist?  Can't you do os.makedirs if
> its supposed to be a directory?
>          Also, sanity checking should be os.path.isdir()?

Definitely better approach.

> +            print 'Downloading "%s" to "%s"' % (url, target)
> 
> Ideally shouldn't print from this class.

Moved out to the script.
Attached patch Patch v2Splinter Review
Updated script according to review comments. I really like the improvements! Thanks for the review Jeff!
Attachment #506355 - Attachment is obsolete: true
Attachment #506621 - Flags: review?(jhammel)
Blocks: 628655
Blocks: 628659
Comment on attachment 506621 [details] [diff] [review]
Patch v2

Since you're using re.match, which matches the whole expression, you probably don't need the '^' or '$' in your regexes.  Other than that, much better!
Attachment #506621 - Flags: review?(jhammel) → review+
See also https://bugzilla.mozilla.org/show_bug.cgi?id=628685 for consolidating this with other work
See Also: → 628685
(In reply to comment #49)
> Since you're using re.match, which matches the whole expression, you probably
> don't need the '^' or '$' in your regexes.  Other than that, much better!

No I cannot. At least not at the current state. Files like "firefox-4.0b8.tar.bz2.asc" will cause us to fetch the wrong file. I could get rid of the prefix but the trailing '$' is important here. I would rather leave both in.
Ah, you're right. re.match starts from the start of the string but as long as that matches it doesn't impose exact match.  My mistake.
Agreed with Jeff on IRC on it. I left both in the regex for now.

Landed as:
http://hg.mozilla.org/qa/mozmill-automation/rev/43e5ad800f96
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
I should have taken a deeper look at all the option we specify on the command line. I missed to add the parameter for the locale. Patch upcoming.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #507084 - Flags: review?(jhammel)
Attachment #507084 - Flags: review?(jhammel) → review+
Landed as:
http://hg.mozilla.org/qa/mozmill-automation/rev/0f55a53a9e67
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Comment on attachment 507303 [details] [diff] [review]
Don't hard-code the application

Changes look fine. :D
Attachment #507303 - Flags: review?(gmealer) → review+
Landed as:
http://hg.mozilla.org/qa/mozmill-automation/rev/5c1b517e42b5

Hopefully the last one. Anything else will get filed as own bugs.
Depends on: 653524
Blocks: 677250
Blocks: 679746
Blocks: 736098
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.