Closed Bug 677250 Opened 13 years ago Closed 12 years ago

Add support to download script to get tinderbox builds

Categories

(Mozilla QA Graveyard :: Mozmill Automation, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: david.guo)

References

Details

Attachments

(1 file, 5 obsolete files)

We should enhance the download script so we are also able to download tinderbox or hourly builds. It probably needs some refactoring of the current code before.
Download of hourly builds are separate and shouldn't be bundled with tinderbox builds. Updating summary.
Summary: Add support to download script to get tinderbox and hourly builds → Add support to download script to get tinderbox builds
Blocks: 709931
Attached patch Tinderbox Scraper Patch v1 (obsolete) — Splinter Review
Assignee: nobody → david.guo
Status: NEW → ASSIGNED
Attachment #609643 - Flags: review?(hskupin)
Comment on attachment 609643 [details] [diff] [review]
Tinderbox Scraper Patch v1

>+    if  not options.type == "daily" \
>+        and not options.type == "tinderbox" \
>+        and not options.version:

Please change it to: not options.type in ['daily', 'tinderbox'].

> PLATFORM_FRAGMENTS = {'linux': 'linux-i686',
>                       'linux64': 'linux-x86_64',
>                       'mac': 'mac',
>+                      'macosx': 'mac',
>                       'mac64': 'mac64',
>+                      'macosx64': 'mac',
>                       'win32': 'win32',
>                       'win64': 'win64-x86_64'}

This should be vice versa. The key is our internal platform name and the value is used on the FTP server. I can see that this is causing problems now and probably needs a refactoring. As I can see you have added it to the platform_regex method of the TinderBox scraper. What about making it a class member in general and let sub classes override values?

>+        if date is not None:
>+            self.date = datetime.strptime(date, '%Y-%m-%d')

Is the timestamp shown on the FTP site the same as the date entry in the second column? If yet it would be great to fetch the right tinderbox build:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-macosx64-debug/

>+        self.builds, self.build_index = self.get_build_info(self.date, self.build_index)
>+        self.timestamp = datetime.fromtimestamp(int(self.builds[self.build_index]))

If get_build_info returns an empty array because the specified date doesn't match an available build it will fail.

>+        self.timestamp = self.timestamp.strftime('%Y-%m-%d')

You should use a temporary variable here and not assign a intermediate value to self.timestamp.

>+    def get_build_info(self, date=None, build_index=None):
>+        url = '%s/tinderbox-builds/%s-%s/' % (self.base_url, self.branch, self.platform)
>+
>+        print 'Retrieving list of builds from %s' % url
>+        parser = DirectoryParser(url)
>+        regex = r'^\d{10}$'

I wouldn't limit it to exactly 10 digits. Leave it open to just be digits.

>+    def binary_regex(self):
>+        """Return the regex for the binary"""
>+
>+        regex_base_name = r'^%(APP)s-.*\.%(LOCALE)s\.%(PLATFORM)s'
[..]
>+        return regex % {'APP': self.application,
>+                        'LOCALE': self.locale,
>+                        'PLATFORM': self.platform_regex}

We do not have localized tinderbox builds. So we should raise an exception already in __init__?

I haven't tested it yet but will do once we have figured out remaining items and got those fixed. But looks good so far.
Attachment #609643 - Flags: review?(hskupin) → review-
Thanks for the review!

> What about making it a class member in general and let sub classes override values?

Do you want this to be done immediately? It might not apply cleanly as there are uses that depend on this being a global. 

E.g.. in download.py:
> choices=scraper.PLATFORM_FRAGMENTS.keys(),

> Is the timestamp shown on the FTP site the same as the date entry in the second
> column?

Not always; I think the timestamp is the time of the build locally as opposed to when it was uploaded/modified. It might be more consistent that way, what do you think?

> If get_build_info returns an empty array because the specified date doesn't match > an available build it will fail.

It shouldn't return an empty array and it should fail gracefully in get_build_info

> We do not have localized tinderbox builds. So we should raise an exception
> already in __init__?

Can you clarify what you mean by this?
(In reply to David Guo [:davidg] from comment #4)
> > What about making it a class member in general and let sub classes override values?
> 
> Do you want this to be done immediately? It might not apply cleanly as there
> are uses that depend on this being a global. 
> 
> E.g.. in download.py:
> > choices=scraper.PLATFORM_FRAGMENTS.keys(),

Good call. Probably leave it for now.

> Not always; I think the timestamp is the time of the build locally as
> opposed to when it was uploaded/modified. It might be more consistent that
> way, what do you think?

You could translate it to a readable timestamp and compare to the build id. Then we would know if it is the time of build or upload.

> > If get_build_info returns an empty array because the specified date doesn't match > an available build it will fail.
> 
> It shouldn't return an empty array and it should fail gracefully in
> get_build_info

Where would it fail gracefully? I don't see it. When a date is given and no build matches, the filter() call will return an empty array.

> > We do not have localized tinderbox builds. So we should raise an exception
> > already in __init__?
> 
> Can you clarify what you mean by this?

You are trying to use LOCALE which we should not allow for this type of build because it will never be available.
> You could translate it to a readable timestamp and compare to the build id. Then we
> would know if it is the time of build or upload.

The timestamp is the time of build.

> Where would it fail gracefully? I don't see it. When a date is given and no build
> matches, the filter() call will return an empty array.

True, the if not parser.entries should be deferred after the filter. Good catch.
(In reply to David Guo [:davidg] from comment #6)
> > would know if it is the time of build or upload.
> 
> The timestamp is the time of build.

That's perfect! We can't get anything better!
Attached patch Tinderbox Scraper Patch v2 (obsolete) — Splinter Review
Attachment #609643 - Attachment is obsolete: true
Attachment #610490 - Flags: review?(hskupin)
Comment on attachment 610490 [details] [diff] [review]
Tinderbox Scraper Patch v2

>+                       'tinderbox': {
>+                           'branch': options.branch,
>+                           'build_number': options.build_number,
>+                           'debug_build': options.debug_build,
>+                           'date': options.date}

Please keep the alphabetical order and put debug_build at the end.

>+    def __init__(self, branch='mozilla-central', date=None, build_number=None,
>+                 debug_build=False, *args, **kwargs):

Same here. Please put build_number before the date.

>+        if self.locale != 'en-US':
>+            sys.exit('The locale ' + self.locale +  ' is not supported')

We should say that only en-US is supported, which is also optional to specify.

>+        if date is not None:
>+            try:
>+                self.date = datetime.strptime(date, '%Y-%m-%d')

It's not enough to only define the day, month, or year. It would be the whole ISO string so that we can clearly identify a build by its build time. This way is probably only used by automation.

>+    def get_build_info(self, build_index=None):
>+        url = '%(BASE)s/tinderbox-builds/%(BRANCH)s-%(PLATFORM)s%(SUFFIX)s/' % {
>+                'BASE': self.base_url,
>+                'BRANCH':self.branch, 
>+                'PLATFORM': self.platform_regex,
>+                'SUFFIX': '-debug' if self.debug_build else ''}

Please check the DailyScraper and make use of something like the self.monthly_build_list_regex.

>+        # If timestamp is given, retrieve just that build
>+        if self.timestamp is not None:
>+            regex = '^' + self.timestamp + '$'
>+        else:
>+            regex = r'^\d+$'

You can do: regex = r'^' + self.timestamp + '$' if self.timestamp else r'^\d+$'. Also please don't forget the regex prefix 'r' for the first string. But...

>+        # If date is given, retrieve the subset of builds on that date
>+        if self.date is not None:
>+            parser.entries = filter(self.date_matches, parser.entries)

Why would we have to handle date and timestamp separately? Both should really be the kind of information. So please keep date and create a filter regex where the timestamp is used.

>+    def path_regex(self):
>+        """Return the regex for the path"""
>+
>+        regex = 'tinderbox-builds/%(BRANCH)s-%(PLATFORM)s%(SUFFIX)s/%(BUILD)s'
>+
>+        return regex % {'BRANCH': self.branch,
>+                        'PLATFORM': self.platform_regex,
>+                        'SUFFIX': '-debug' if self.debug_build else '',
>+                        'BUILD': self.builds[self.build_index]}

Please see my comment above to use a regex property so we don't have the same definition on several places.

>+        regex_base_name = r'^%(APP)s-.*\.%(LOCALE)s'
>+        regex_suffix = {'linux': r'\.linux-i686\.tar\.bz2$',
>+                        'linux64': r'\.linux-x86_64\.tar\.bz2$',
>+                        'mac': r'\.mac\.dmg$',
>+                        'mac64': r'\.mac\.dmg$',
>+                        'win32': r'.*\.win32\.installer\.exe$',
>+                        'win64': r'.*\.win64-x86_64\.installer\.exe$'}

Please try to follow how the DailyScraper works here. There is no need to define other behavior. Also regex_suffix is only for the file extension and not the basename.
Attachment #610490 - Flags: review?(hskupin) → review-
> It's not enough to only define the day, month, or year. It would be the whole ISO
> string so that we can clearly identify a build by its build time. This way is
> probably only used by automation.

It seems impractical to specify a build by second-precision when the timestamp does the same thing, especially since you have to decode the original timestamp just to be precise. Or maybe you have a different use case in mind?

> Also please don't forget the regex prefix 'r' for the first string.

It's actually the raw string prefix but I omitted it because there were no escape characters or backslashes in the string.

> Why would we have to handle date and timestamp separately? Both should really be
> the kind of information. So please keep date and create a filter regex where the
> timestamp is used.


Not sure what you mean by this.
Talked with David on IRC and we agreed on that the following topics:

* keep timestamp and date but add a comment for the different use case

Issues:
* There is a timezone issue which let us download the wrong build
* We have to support l10n builds (sorry was my fault)
* A wrong date option like '03-03-2012' will cause a non-meaning message
* If no builds in a folder have been found say so and not that the folder cannot be found

Also I will do:
* Raise a new bug to make the platform detection code better because on OS X 10.6 and 10.7 we should default to mac64
* Raise a bug against RelEng so the correct dates will be shown on the FTP site.
Attached patch Tinderbox Scraper Patch v3 (obsolete) — Splinter Review
Attachment #610490 - Attachment is obsolete: true
Attachment #610841 - Flags: review?(hskupin)
Attached patch Tinderbox Scraper Patch v4 (obsolete) — Splinter Review
Attachment #610841 - Attachment is obsolete: true
Attachment #610841 - Flags: review?(hskupin)
Attachment #610842 - Flags: review?(hskupin)
Comment on attachment 610842 [details] [diff] [review]
Tinderbox Scraper Patch v4

>+    def __init__(self, branch='mozilla-central', build_number=None, date=None,
>+                 debug_build=False, *args, **kwargs):
>+        MozillaScraper.__init__(self, *args, **kwargs)
[..]
>+        self.locale_build = self.locale != 'en-US'
>+        self.build_suffix = ''





>+        self.timezone = PacificTimezone();

We should add a comment why we have to do that. I will make this.


>+        if date is not None:
>+            if re.compile(self.date_validation_regex).match(date) is None:
>+                raise ValueError(date + ' does not match the accepted formats: '
>+                                        '%Y-%m-%d date or an UNIX timestamp')
>+
>+            try:
>+                self.date = datetime.strptime(date, '%Y-%m-%d')
>+            except:
>+                self.date = datetime.fromtimestamp(float(date), self.timezone)
>+                self.timestamp = date
>+        else:
>+            self.date = None

That's a bit too complicated. Why not simply retrieving the date/timestamp vice versa?

>+        # Set suffix for the build folder
>+        if self.locale_build:
>+            self.build_suffix = 'l10n'
>+            url = '/'.join([self.base_url, self.build_list_regex])
[..]
>+            if debug_build:
>+                self.build_suffix = '-debug'

This code has to be part of the regex code. Otherwise we will loose track of even more complex URLs.

So I have modified that code a bit.


>+    def date_validation_regex(self):
>+    def build_list_regex(self):

You didn't obey the alphabetical order of properties and methods. I have corrected those.

>+    def build_list_regex(self):
>+        regex = 'tinderbox-builds/%(BRANCH)s-%(PLATFORM)s%(SUFFIX)s'
>+
>+        return regex % {'BRANCH': self.branch,
>+                        'PLATFORM': '' if self.locale_build else self.platform_regex,
>+                        'SUFFIX': self.build_suffix}

I have moved the appropriate code in here for L10N and debug builds.

>+class PacificTimezone(tzinfo):

For now we can keep it but we probably should move this to a separate module later.

I have updated the patch while commenting on the bug and will upload a new version right away.
Attachment #610842 - Flags: review?(hskupin) → review-
Attached patch Tinderbox Scraper Patch v5 (obsolete) — Splinter Review
Since no-one from the team is around lets ask Clint for review on this. I have tested all possible ways and it looks good. The only thing we should take care of later is the handling of the correct macosx or macosx64 bits. It's different for en-US and other locales. At the moment it doesn't block us but once our CI should test default tinderbox builds it will break.
Attachment #610842 - Attachment is obsolete: true
Attachment #610927 - Flags: review?(ctalbert)
Forgot to removing some whitespace changes and debug code.
Attachment #610927 - Attachment is obsolete: true
Attachment #610927 - Flags: review?(ctalbert)
Attachment #610928 - Flags: review?(ctalbert)
Comment on attachment 610928 [details] [diff] [review]
Tinderbox Scraper Patch v5.1

This looks good. It's very fragile - if we change how we name builds, where we put them, if we change the US timezone rules, then this code breaks, and it breaks without any great error messages (in some cases, like the timezone stuff the code has no ability to know that it broke).

Ensure that you have good error messages in place when you are using this library in your other code and expect things to fail and ensure they fail noisily when you're using this code.

But the code itself looks fine, and I understand why we have to write it, so r+.  Just code defensively in the code that consumes this.
Attachment #610928 - Flags: review?(ctalbert) → review+
As discussed on IRC we will have to make sure to combine our tools with already existing ones and put those under mozbase. This module is one of those.

Landed as:
http://hg.mozilla.org/qa/mozmill-automation/rev/58aadf50a02a
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
As a note please see bug 671450 which will change the timestamps to a human readable format. David, would you mind already implementing this feature so we can make use of right away and don't have broken automation scripts?
Sure! I'll make sure to keep it up to date. Do you know when they will be switching formats?
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: