Closed Bug 631597 Opened 13 years ago Closed 12 years ago

Prototype the "Canary" system for Thunderbird

Categories

(Mozilla Messaging Graveyard :: Release Engineering, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: standard8, Assigned: jhopkins)

References

()

Details

Attachments

(1 file, 1 obsolete file)

As per the discussions recently, I'd like to look at getting the canary system prototyped for trunk (I'd only ever expect it on trunk really).

The general requirements are here:

https://wiki.mozilla.org/Thunderbird/Infrastructure/Canary_System

I'd like us to start prototyping it. So, following the requirements:

- Use the existing Thunderbird tree as the canary
- Have a database/webservice/something storing the data
- Use a wrapper around client.py for pulling the "valid" changesets
- If possible, set up a "good" tree with a few builders on one platform maybe
Note: we may be able to adapt some of the code that Firefox has for only building nightlies with an all-green changeset.
Last known good revisions are being published here!

 http://build.mozillamessaging.com/buildbot/production/known-good-revisions/
Assignee: nobody → john.hopkins
What rules have you got for known good revisions?

Could we combine the latest comm-central and mozilla-central revisions into one file?
The combined comm-central + mozilla-central revisions are here:

 http://build.mozillamessaging.com/buildbot/production/known-good-revisions/history.json

Here's how the known-good revisions are computed:

- produce a list of [buildid, comm-central rev, mozilla-central rev, platform] from the comm-central-*/ build texts at:

 ftp://ftp.mozilla.org/pub/thunderbird/tinderbox-builds/*/*.txt

Those are imported in a buildbot database table.

Join the above data with successful mozmill/xpcshell builds, resulting in a list of builds that passed building and testing.

Sort by buildset ID from newest->oldest.  Save the top result's mozilla-central rev into:

 http://build.mozillamessaging.com/buildbot/production/known-good-revisions/mozilla-central.txt

and the top result's comm-central rev into:

 http://build.mozillamessaging.com/buildbot/production/known-good-revisions/comm-central.txt

and all results (limited to 20) into:

 http://build.mozillamessaging.com/buildbot/production/known-good-revisions/history.json

One caveat: if someone builds against an old revision, that will become the last known good change.  It's an uncommon case that MoCo have chosen to live with and we've followed suit.
(In reply to comment #4)
> The combined comm-central + mozilla-central revisions are here:
>
> http://build.mozillamessaging.com/buildbot/production/known-good-revisions/history.json

So when we do the functions to pull against the numbers, I expect we'll find that we probably want the two revisions in one file. The reasoning being that we don't normally need the full history (as that file will get big), but we'll still probably want just one network access.

> Join the above data with successful mozmill/xpcshell builds, resulting in a
> list of builds that passed building and testing.

Was this implemented fully at the start of generating the existing data? As Windows xpcshell is basically permanent orange at the moment, so there shouldn't really be any data in the table.

> One caveat: if someone builds against an old revision, that will become the
> last known good change.  It's an uncommon case that MoCo have chosen to live
> with and we've followed suit.

I've seen that happening in on tinderbox occasionally where one dep build takes longer than the subsequent one.

Would we be able to detect that case and note it? For now we can just see how it goes.
Mark, I've attached a patch to add a --known-good option to client.py.  Please review and, if you're happy with it, land it in comm-central.

TODO: once this support is added, I'll tweak buildbot to pass in the --known-good argument to client.py when building comm-central.
Attachment #514081 - Flags: review?(bugzilla)
Mark, if you could add 'next steps' that would be helpful.  Thanks :)
Comment on attachment 514081 [details] [diff] [review]
Add support for --known-good to client.py

I think in general this is fine. However:

>+def get_last_known_good_mozilla_rev():
>+    kg_url = "http://build.mozillamessaging.com/buildbot/production/known-good-revisions/mozilla-central.txt"
>+    rev = urllib2.urlopen(kg_url).read().strip()
>+    if re.search(r'^[a-f0-9]+$', rev) and len(rev) == 12:
>+        return rev
>+    else:
>+        sys.exit("Error: invalid contents fetched from %s: '%s'" % (kg_url, rev))
>+

I think we should also handle IOError in a nice way since that is what will be thrown if the script can't get the url.

I was pondering if sys.exit was the right thing to do here, I guess it is as we've supplied an enable argument, and hence this isn't the default route through the code.

So I think we can land this once the IOError handling is improved.
Attachment #514081 - Flags: review?(mbanner) → review-
Attached patch updated patchSplinter Review
Attachment #514081 - Attachment is obsolete: true
Attachment #540964 - Flags: review?(mbanner)
Comment on attachment 540964 [details] [diff] [review]
updated patch

>     if not options.skip_mozilla:
>+        if options.known_good and options.mozilla_rev is None:
>+            options.mozilla_rev = get_last_known_good_mozilla_rev()
>+            print "Setting mozilla_rev to '%s'" % options.mozilla_rev

I think we should also have an print that says that we are getting the current known revision - just in case it takes a long time, as it did for me just now.
Attachment #540964 - Flags: review?(mbanner) → review+
Proposed next steps:

- Land the patch.
- Set up a test tree which will build with the known good option.
- Implement a notification system of some kind, maybe using pulse, or maybe email, that will alert people if the current known good changeset is more than, say, one day old. We'll probably want to fine tune that number, but it will give us something to start with.

You may want to split those into separate bugs as well.
gozer: this is waiting on a staging proxy server
Blocks: 685170
We have a staging proxy server now.

Next step: create a new set of builders.

standard8 says: I think for now just create a ThunderbirdTested tree and we'll just drop it later when we switch.
standard8: you mentioned you found one issue with the 'tested' builds on production.  Can you elaborate?
Depends on: 686719
Issues I've noticed at the moment:

- http://build.mozillamessaging.com/buildbot/production/known-good-revisions/mozilla-central-age.txt seems to be empty most of the time.
- If mozilla-central busts comm-central, and then we fix comm-central, then comm-central-tested is busted from the bustage fix until the next green cycle, which can be a long time.
-- I'm not quite sure the best way of fixing this at the moment. I need to watch it a bit more and I think we need to try and fix some of random oranges.
If/when we decide to take the canary system further, we can reopen this bug.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.