Closed Bug 710312 Opened 13 years ago Closed 12 years ago

enable releases to be run on more than one master

Categories

(Release Engineering :: Release Automation: Other, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

Attachments

(3 files, 1 obsolete file)

Once bugs 710221 and 708656 are fixed, we no longer have technical reasons why we can't do releases across multiple masters. We should consider enabling this, because we'd have access to a larger pool of build machines.

There's a couple of downside to this:
* Setting reserved slaves doesn't work as well. We could set a smaller one on each master instead of a large one on one to compensate, but maybe we don't need to at all, because of access to a larger pool of machines?
* All masters would need to be reconfiged before starting a release instead of just one. We could still land directly on the production branch, though.
* We'd lose the ability to find the full status of the release by looking at just one master's WebStatus.
(In reply to Ben Hearsum [:bhearsum] from comment #0)
> * We'd lose the ability to find the full status of the release by looking at
> just one master's WebStatus.

We could possibly build something in self-serve to mitigate this, which would be pretty important for finding things. The emails only give links to builders in the case of failures at the moment.
Good idea. It probably wouldn't be hard to add log links for successful builds, too, though we wouldn't want links to buildbot masters being sent to release-drivers.
found in triage.
Component: Release Engineering → Release Engineering: Automation
QA Contact: release → catlee
Bulk move of bugs to Release Automation component.
Component: Release Engineering: Automation (General) → Release Engineering: Automation (Release Automation)
Getting this done will get rid of some of manual steps from releases and make it harder to be lacking slaves when doing a release. I don't think the reporting issue is a big deal at this point because we have e-mail that links to failures when they happen. reserved_slaves won't work anymore, but access to a bigger slave pool in general might make up for that? Need to analyze that more, still.

We talked about this on Tuesday and we *think* it should be pretty easy to do at this point. Roughly:
* Move schedulers to scheduler master
* Enable release branches on all build masters

I'm going to give it a try in staging sometime in the next few weeks.
Assignee: nobody → bhearsum
Pretty straightforward - copy the release object loop into scheduler_master.cfg; limit build masters to builders/status/Triggerable; put the other objects in the scheduler. I've left in master-specific release branch enabling/disabling in case we want to eg, not run releases on AWS. With this patch, we can't go back to a release being fully on a build master (the scheduling would still be on the scheduler), but we can limit the build/status parts to one master by changing production-masters.json.
Attachment #666546 - Flags: feedback?(rail)
Attachment #666546 - Flags: feedback?(catlee)
I need to finish my staging run and run this by the group before enabling anything, but this is the patch that should do it.

Probably should land between 17.0b1 and 17.0b2.
Attachment #666548 - Flags: review?(catlee)
(In reply to Ben Hearsum [:bhearsum] from comment #8)
> Probably should land between 17.0b1 and 17.0b2.

There was no objections to this plan in the meeting today. As long as there's no review issues or bugs found, this is the plan of record.
Comment on attachment 666546 [details] [diff] [review]
buildbot-configs to enabled multimaster releases

My staging run had no issues related to scheduling (I had various issues within specific builders, but I don't believe they're relevant to this bug). You can check out results on these two masters:
http://dev-master01.build.scl1.mozilla.com:8019/builders
http://dev-master01.build.scl1.mozilla.com:8021/builders
Attachment #666546 - Flags: review?(rail)
Attachment #666546 - Flags: review?(catlee)
Attachment #666546 - Flags: feedback?(rail)
Attachment #666546 - Flags: feedback?(catlee)
Attachment #666548 - Flags: review?(catlee) → review+
Attachment #666546 - Flags: review?(catlee) → review+
Attachment #666546 - Flags: review?(rail) → review+
Per IRL conversation, we're only going to enable releases on 3 masters for now, to make it easier if we have epic release problems again. I locked all of the mw32-ix slaves to bm30/32.
Attachment #666548 - Attachment is obsolete: true
Attachment #671428 - Flags: review?(catlee)
Note to self: don't forget to update the docs.
Attachment #671428 - Flags: review?(catlee) → review+
Attachment #666546 - Flags: checked-in+
Attachment #671428 - Flags: checked-in+
This will add it to the test scheduler too, but it's pretty harmless.
Attachment #671428 - Attachment is obsolete: true
Attachment #671487 - Flags: review?(catlee)
Attachment #671487 - Flags: review?(catlee) → review+
Comment on attachment 671487 [details] [diff] [review]
add BuildSlaves.py to scheduler masters

Updated master-puppet1 with this. Had to deploy by hand because of bug 777742.
Attachment #671487 - Flags: checked-in+
Attachment #671428 - Attachment is obsolete: false
Comment on attachment 666546 [details] [diff] [review]
buildbot-configs to enabled multimaster releases

Had to back this out
Attachment #666546 - Flags: checked-in+ → checked-in-
Attachment #671428 - Flags: checked-in+ → checked-in-
Attachment #671428 - Flags: checked-in- → checked-in+
Comment on attachment 666546 [details] [diff] [review]
buildbot-configs to enabled multimaster releases

Had to add this to passwords.py on the build scheduler to make it work:
secrets = {
    'nightly-signing': [
        ('fake', 'fake', 'fake'),
    ],
    'dep-signing': [
        ('fake', 'fake', 'fake'),
    ],
    'release-signing': [
        ('fake', 'fake', 'fake'),
    ],
}

Note that scheduler masters aren't managed by Puppet so there's nothing to update there. It's a crappy hack, I filed bug 802153 to fix it.
Attachment #666546 - Flags: checked-in- → checked-in+
Things are looking good with this landing. I inspected the schedulers through the manhole on the build scheduler master and found the following (excess removed for the sake of brevity):
>>> for s in sorted(master.scheduler_manager.namedServices.keys()):
...   if s.startswith('rel'):
...     print s
... 
release-comm-beta-almost-ready-for-release
...
release-comm-beta-thunderbird_reset_schedulers
...
release-comm-beta-win32_repack_complete
release-comm-esr10-almost-ready-for-release
...
release-comm-esr10-thunderbird_reset_schedulers
...
release-comm-esr10-win32_repack_complete
release-comm-release-almost-ready-for-release
...
release-comm-release-thunderbird_reset_schedulers
...
release-comm-release-win32_repack_complete
release-mozilla-beta-almost-ready-for-release
...
release-mozilla-beta-fennec_reset_schedulers
...
release-mozilla-beta-firefox_reset_schedulers
...
release-mozilla-beta-xulrunner_push_to_mirrors
release-mozilla-esr10-almost-ready-for-release
...
release-mozilla-esr10-firefox_reset_schedulers
...
release-mozilla-esr10-win32_repack_complete
release-mozilla-release-almost-ready-for-release
...
release-mozilla-release-fennec_reset_schedulers
...
release-mozilla-release-firefox_reset_schedulers
...
release-mozilla-release-xulrunner_push_to_mirrors


And I see all of the builders for all releases on bm12, 30, and 32, as expected. bm13, 25, 34, and 35 have no release builders, also as expected.

Calling this FIXED as it's now in production. The real test will be 17.0b2.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updating the summary to match what was actually done - we can file a new bug when we're ready to enable on all masters. It's trivial to do at this point.
Summary: do release jobs on all build masters instead of one → enable releases to be run on more than one master
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: