Closed Bug 468731 Opened 16 years ago Closed 15 years ago

talos testing of builds using sendchange

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: anodelman, Assigned: catlee)

References

Details

Attachments

(8 files, 7 obsolete files)

2.35 KB, patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
6.34 KB, patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
8.42 KB, patch
bhearsum
: review+
anodelman
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
1.28 KB, patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
3.46 KB, patch
bhearsum
: review+
anodelman
: review+
Details | Diff | Splinter Review
1.16 KB, patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
1.43 KB, patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
15.93 KB, patch
anodelman
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
With the ftpPoller (put in use to help fix bug 457885) it should be possible to test an arbitrary build by using a sendchange to the appropriate builder.  This should be tested and documentation written up.
From looking over the code, it seems like a more reasonable approach would be through the forcebuild code.  This is already used by l10n to force building of specific locales and we could adapt the code to talos' requirements.

bhearsum, what do you think?
Summary: force talos testing of builds using sendchange → force talos testing of builds using forcebuild
If you want the actual 'Force Build' and 'Rebuild' buttons to work you must remove all code that pulls in data from Change objects. Builds triggered through Schedulers often have these (and always do, in the Talos case) and that's where we pull the URL and filename from. 'Force Build' and 'Rebuild' do indeed use the forcebuild code, but there is currently no way to pass in any additional data - so a step like MozillaWget will fail when trying to pull the filePath out. I'm not entirely sure what the right approach for this is - perhaps passing in the filePath to TalsoFactory so it can pass it to MozillaWget instead of pulling it from the Change.

On the other hand, if you only want sendchange to work....you may not even need any code changes. Try something like:
buildbot sendchange --username="http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-linux-tbox-mozilla1.9.0/1229432940/" --master=localhost:9988 --branch=HEAD_LINUX -m"forced build" firefox-3.0.6pre.en-US.linux-i686.tar.bz2
Priority: -- → P3
I think we can do this with sendchange.
Assignee: anodelman → catlee
Blocks: 457885
Depends on: 477890
My work so far on sendchange support for Talos.

In talos-staging, we don't need the ftppollers any more, so they get removed, and replaced with a single PBChangeSource.

perfrunner.py now expects that the full URL is passed as the filename in the change object, and it derives the filename from the URL.

In mozilla2-staging we add which talos master to send sendchange requests to, and then pass that in to the NightlyBuildFactory.
if talosMasters has been set, we do a buildbot sendchange to each master after successfully uploading a build.

This relies on changes to post_upload.py to print out the URL where the build can be downloaded from.
Needed to add d:\mozilla-build\python25\scripts to $PATH to get this working on windows.
This uses a new build step called SendChangeStep to do the sendchange directly from the master.

The step uses twisted deferreds, so it won't block in the case where the remote master isn't accessible.
Attachment #362101 - Attachment is obsolete: true
Attachment #364162 - Flags: review?(bhearsum)
The changes to mozilla2-staging merely add the staging talos master to the list of talos masters to notify when new builds are completed.

The changes to talos-staging are more involved:
- The complete URL to the build is stored as the first (and only) entry in the changes.files property.

- ftppoller.py is updated to store the complete URL only in the files property, and to not use the links property

- perfrunner.py determines the filename from the URL with os.path.basenamae

- A new argument, 'workdirBase' is added to the TalosFactory, which represents where the talos data should be placed in relation to the build directory.

- master.cfg contains the configuration for the new slaves, as well as the new builders and schedulers to handle having the slaves doing tests on multiple branches.
Attachment #362099 - Attachment is obsolete: true
Attachment #364172 - Flags: review?(bhearsum)
Attachment #364172 - Flags: review?(anodelman)
We have to print out to stderr since upload.py in the source tree eats stdout from post_upload.py.
Attachment #364174 - Flags: review?(bhearsum)
Summary: force talos testing of builds using forcebuild → talos testing of builds using sendchange
Attachment #364162 - Flags: review?(bhearsum) → review-
Comment on attachment 364162 [details] [diff] [review]
do buildbot sendchange after a successful build

>diff -r 625b8ed5603b process/factory.py

> class NightlyBuildFactory(MercurialBuildFactory):
>+    def __init__(self, talosMasters=None, **kwargs):
>+        if talosMasters is None:
>+            self.talosMasters = []

Why not just set [] as the default?

>-        self.addStep(ShellCommand,
>+        def get_url(rc, stdout, stderr):
>+            m = re.search("^(http://.*?\.(tar\.bz2|dmg|zip))", "\n".join([stdout, stderr]), re.M)
>+            if m:
>+                return {'packageUrl': m.group(1)}
>+            return {}
>+
>+        self.addStep(SetProperty,
>          command=['make', 'upload'],
>          env=uploadEnv,
>-         workdir='build/%s' % self.objdir
>+         workdir='build/%s' % self.objdir,
>+         extract_fn = get_url,
>         )
> 
>+        talosBranch = "%s-%s" % (self.branchName, self.platform)
>+        for m in self.talosMasters:
>+            self.addStep(SendChangeStep(
>+                master = m,
>+                branch = talosBranch,
>+                files = [WithProperties('%(packageUrl)s')],
>+                user = "sendchange")
>+             )
>+

style nit: please use the same formatting as the rest of the steps (no space on either side of the '=', one space indent for parameters).

>+class SendChangeStep(BuildStep):
>+    flunkOnFailure = True

I think warnOnFailure is more appropriate here. This step failing is bad, but it doesn't indicate a problem with compilation or one of the post compilation tests. Can you use warnOnFailure instead?

One overall thing I noticed is that it seems to be possible for a sendchange to get done that has no files in it. If for some reason SetProperty() doesn't find a match it will quietly return an empty dict, and it appears SendChangeStep will happily pass that along. It doesn't look like the Talos side of things copes with it either:
> +         self.fileURL = self.changes[-1].files[0]

would cause an exception if files is empty. I could be missing something here, though.

This looks great overall, though, and would be a fantastic improvement over our current system.

r- because of the flunkOnFailure and the empty files. Would like the style nits fixed too.
Comment on attachment 364174 [details] [diff] [review]
Print out URL of build after upload

>diff -r 2ba74001ce81 stage/post_upload.py
>--- a/stage/post_upload.py	Tue Feb 24 12:10:02 2009 -0500
>+++ b/stage/post_upload.py	Wed Feb 25 17:08:06 2009 -0500
>@@ -4,16 +4,17 @@
> # followed by a list of filenames.
> 
> import sys, os, os.path, shutil
> from optparse import OptionParser
> from time import mktime, strptime
> 
> NIGHTLY_PATH = "/home/ftp/pub/%(product)s/nightly"
> TINDERBOX_BUILDS_PATH = "/home/ftp/pub/%(product)s/tinderbox-builds/%(tinderbox_builds_dir)s"
>+TINDERBOX_URL_PATH = "http://staging-stage.build.mozilla.org/pub/mozilla.org/%(product)s/tinderbox-builds/%(tinderbox_builds_dir)s"

Obviously this has to change to stage.mozilla.org later on. This looks fine. Please make sure to update the checkouts on stage.m.o and staging-stage.b.m.o when you land this.
Attachment #364174 - Flags: review?(bhearsum) → review+
Attachment #364162 - Attachment is obsolete: true
Attachment #364182 - Flags: review?(bhearsum)
(In reply to comment #10)
> (From update of attachment 364162 [details] [diff] [review])
> >diff -r 625b8ed5603b process/factory.py
> 
> > class NightlyBuildFactory(MercurialBuildFactory):
> >+    def __init__(self, talosMasters=None, **kwargs):
> >+        if talosMasters is None:
> >+            self.talosMasters = []
> 
> Why not just set [] as the default?

Using [] (or any mutable object) as the default is risky.  See http://docs.python.org/reference/compound_stmts.html#function-definitions

> >+        talosBranch = "%s-%s" % (self.branchName, self.platform)
> >+        for m in self.talosMasters:
> >+            self.addStep(SendChangeStep(
> >+                master = m,
> >+                branch = talosBranch,
> >+                files = [WithProperties('%(packageUrl)s')],
> >+                user = "sendchange")
> >+             )
> >+
> 
> style nit: please use the same formatting as the rest of the steps (no space on
> either side of the '=', one space indent for parameters).

Fixed

> >+class SendChangeStep(BuildStep):
> >+    flunkOnFailure = True
> 
> I think warnOnFailure is more appropriate here. This step failing is bad, but
> it doesn't indicate a problem with compilation or one of the post compilation
> tests. Can you use warnOnFailure instead?

Fixed

> One overall thing I noticed is that it seems to be possible for a sendchange to
> get done that has no files in it. If for some reason SetProperty() doesn't find
> a match it will quietly return an empty dict, and it appears SendChangeStep
> will happily pass that along. It doesn't look like the Talos side of things
> copes with it either:
> > +         self.fileURL = self.changes[-1].files[0]
> 
> would cause an exception if files is empty. I could be missing something here,
> though.

I think we'll get an exception in the SendChangeStep when it tries to render the property that doesn't exist.  Not ideal either, but I don't think talos would receive an empty list of files.

Maybe SendChangeStep could catch the error and report failed instead of raising an exception?
(In reply to comment #13)

> Maybe SendChangeStep could catch the error and report failed instead of raising
> an exception?

That sounds good.
Attachment #364182 - Attachment is obsolete: true
Attachment #364182 - Flags: review?(bhearsum)
Attachment #364320 - Flags: review?(bhearsum)
Attachment #364320 - Flags: review?(bhearsum) → review+
Attachment #364172 - Flags: review?(bhearsum) → review+
Comment on attachment 364172 [details] [diff] [review]
changes to buildbot-configs for talos sendchange support

Everything looks in order here to me...
Comment on attachment 364174 [details] [diff] [review]
Print out URL of build after upload

changeset:   232:eb58ab4a33a8

Updated the TINDERBOX_URL_PATH to the http://stage.mozilla.org url, and added a commented-out line with the staging URL.
Attachment #364174 - Flags: checked‑in+
Comment on attachment 364172 [details] [diff] [review]
changes to buildbot-configs for talos sendchange support

Doesn't this mix in a bunch of the pool of slaves code?  Other than that I'm fine with it.
same as before, minus the talos pool of slaves stuff
Attachment #364172 - Attachment is obsolete: true
Attachment #364582 - Flags: review?(bhearsum)
Attachment #364172 - Flags: review?(anodelman)
Attachment #364582 - Flags: review?(anodelman)
Attachment #364582 - Flags: review?(anodelman) → review+
Attachment #364582 - Flags: review?(bhearsum) → review+
Comment on attachment 364582 [details] [diff] [review]
changes to buildbot-configs for talos sendchange support

changeset:   968:8eff7e0d4026
Attachment #364582 - Flags: checked‑in+
Comment on attachment 364320 [details] [diff] [review]
do buildbot sendchange after a successful build

changeset:   207:92c831b902c3
Attachment #364320 - Flags: checked‑in+
Attachment #365288 - Flags: review?(bhearsum) → review+
Comment on attachment 365288 [details] [diff] [review]
Do buildbot sendchange from production to staging talos

Checked in with one minor change: a new port number on qm-buildbot01.mozilla.org

changeset:   993:59b331539b33
Attachment #365288 - Flags: checked‑in+
This depends on 458243 is as much as we don't want the talos linux slaves trying to test the 64-bit linux builds.
Depends on: 458243
Change to use a list of talos masters, since we're going to be doing that on production soon.

Stop using ftppoller on talos staging.  The branch renames are required so that the branch parameter sent via the sendchange lands in the proper scheduler on talos.
Attachment #367096 - Flags: review?(bhearsum)
Attachment #367096 - Flags: review?(anodelman)
Comment on attachment 367096 [details] [diff] [review]
sendchange for talos staging

Looks good to me.
Attachment #367096 - Flags: review?(bhearsum) → review+
Comment on attachment 367096 [details] [diff] [review]
sendchange for talos staging

Belated r+ from me as well.
Attachment #367096 - Flags: review?(anodelman) → review+
Attachment #367632 - Flags: review?(bhearsum) → review+
Comment on attachment 367632 [details] [diff] [review]
Do sendchange to both staging talos masters from production-master

changeset:   1016:7ffc26cf6b6d
Attachment #367632 - Flags: checked‑in+
One of the major technical aspects of this bug has been fixed - getting sendchange to work to test builds with Talos.  This will work fine for RelEng folks who have access to the build network.

There has been some discussion about creating a web interface to allow developers to do testing of arbitrary builds as well.

If we allow this, where should we send the test results for those tests?  Should they go to the same graph server, and be reported under the appropriate branch for the build?  Or should they get reported to a different branch, to keep the production data clean?  Or do the results need to be reported to graph server at all?
Depends on: 483991
This is the final step in getting sendchange working for the automated builds.  production-master will do sendchange to qm-rhel02 to notify of new builds that are available.

The ftp poller is disabled for everything except Firefox 3.0 builds.

The scheduler names for mozilla-central, mozilla-1.9.1 and tracemonkey branches are updated to match what will be sent from production-master.

The treeStableTimer for those branches is reduced to 0 since we're guaranteed that the files are in place by the time we get the sendchange notification.
Attachment #368017 - Flags: review?(bhearsum)
Comment on attachment 368017 [details] [diff] [review]
Do sendchange from production-master to qm-rhel02

Looks fine to me.

All these sendchange patches remind me that it would be nice to get perfrunner.py stuff into buildbotcustom, though.
Attachment #368017 - Flags: review?(bhearsum) → review+
Attachment #368017 - Flags: review?(anodelman)
Attachment #368017 - Attachment is obsolete: true
Attachment #368304 - Flags: review?(bhearsum)
Attachment #368017 - Flags: review?(anodelman)
Attachment #368304 - Flags: review?(anodelman)
Attachment #368304 - Flags: review?(bhearsum) → review+
Comment on attachment 368303 [details] [diff] [review]
Specify if sendchange failure should generate a build warning or not

Looks fine to me.
Attachment #368303 - Flags: review?(bhearsum) → review+
This includes support for having the build lie about its start time.
Attachment #368304 - Attachment is obsolete: true
Attachment #368326 - Flags: review?(anodelman)
Attachment #368304 - Flags: review?(anodelman)
Comment on attachment 368326 [details] [diff] [review]
Do sendchange from production-master to qm-rhel02

Looks good.
Attachment #368326 - Flags: review?(anodelman) → review+
Comment on attachment 368303 [details] [diff] [review]
Specify if sendchange failure should generate a build warning or not

changeset:   226:8f28fce52b4d
Attachment #368303 - Flags: checked‑in+
Comment on attachment 368326 [details] [diff] [review]
Do sendchange from production-master to qm-rhel02

changeset:   1030:6dea85195978
Attachment #368326 - Flags: checked‑in+
changeset:   1032:4f05e8a34fc5

was also checked in to fix some bustage.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Component: Release Engineering: Talos → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: