Closed Bug 867583 Opened 11 years ago Closed 11 years ago

Create enough new panda and tegra buildbot masters to replace kvm-based masters

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Unassigned)

References

Details

Attachments

(5 files)

+++ This bug was initially created as a clone of Bug #849002 +++

+++ This bug was initially created as a clone of Bug #847932 +++

We are killing off the buildbot masters running on kvm - see bug#846332 for details. 

We need to stand up replacement panda and tegra buildbot masters in AWS, split evenly between us-east-1 and us-west-2.

It would be great if someone can grab this bug and verify that the documentation at https://wiki.mozilla.org/ReleaseEngineering/AWS_Master_Setup is proper and clear.
Based on current slavealloc & production-masters.json, there are 5 panda and 4 tegra buildbot masters. Allocating:

buildbot-master90.srv.releng.use1.mozilla.com    bm90-tests1-panda
buildbot-master91.srv.releng.usw2.mozilla.com    bm91-tests1-panda
buildbot-master92.srv.releng.use1.mozilla.com    bm92-tests1-panda
buildbot-master93.srv.releng.usw2.mozilla.com    bm93-tests1-panda
buildbot-master94.srv.releng.use1.mozilla.com    bm94-tests1-panda
buildbot-master95.srv.releng.usw2.mozilla.com    bm95-tests1-tegra
buildbot-master96.srv.releng.use1.mozilla.com    bm96-tests1-tegra
buildbot-master97.srv.releng.usw2.mozilla.com    bm97-tests1-tegra
buildbot-master98.srv.releng.use1.mozilla.com    bm98-tests1-tegra
Attachment #748085 - Flags: review?(rail)
Attachment #748085 - Flags: review?(rail) → review+
snippets generated by your script.

Question: do I land the puppet changes before or after I create the AWS instances?
Attachment #748099 - Flags: review?(rail)
Comment on attachment 748099 [details] [diff] [review]
puppet additions for AWS masters

(In reply to Hal Wine [:hwine] from comment #2)
> Created attachment 748099 [details] [diff] [review]
> puppet additions for AWS masters
> 
> snippets generated by your script.
> 
> Question: do I land the puppet changes before or after I create the AWS
> instances?

You need to land this in advance and wait until it's pulled by the puppet masters (every 30 mins IIRC).
Attachment #748099 - Flags: review?(rail) → review+
For posterity - CSV used to add systems to inventory. Note: allocation can't be set at this time (bug 871554). Allocation added manually.
I wasn't sure from the docs if I was changing the correct fields in the sample. (These snippets generated by modifications to your script.)

Are these correct?
Attachment #748831 - Flags: feedback?(rail)
Comment on attachment 748831 [details]
bash to create new AWS masters

lgtm
Attachment #748831 - Flags: feedback?(rail) → feedback+
Assignee: nobody → hwine
Status: NEW → ASSIGNED
Attachment #762985 - Flags: review?(rail)
Attachment #762985 - Flags: review?(rail) → review+
all new masters in inventory, bm90 created and adapter updated. Docs updated for that.

Next: create rest of masters, then file DNS/acl/nagios bugs
All hosts with mac & ip:
10.134.49.133 0e:44:11:aa:0e:c0 buildbot-master90.srv.releng.use1.mozilla.com
10.132.48.136 02:83:3f:25:35:bc buildbot-master91.srv.releng.usw2.mozilla.com
10.134.49.214 0e:44:11:b1:77:ef buildbot-master92.srv.releng.use1.mozilla.com
10.132.50.156 02:83:3f:1a:6e:92 buildbot-master93.srv.releng.usw2.mozilla.com
10.134.48.86  0e:44:11:b0:50:b3 buildbot-master94.srv.releng.use1.mozilla.com
10.132.49.197 02:83:3f:04:e8:57 buildbot-master95.srv.releng.usw2.mozilla.com
10.134.48.53  0e:44:11:8b:42:1c buildbot-master96.srv.releng.use1.mozilla.com
10.132.50.51  02:83:3f:0c:42:fd buildbot-master97.srv.releng.usw2.mozilla.com
10.134.48.51  0e:44:11:93:ca:56 buildbot-master98.srv.releng.use1.mozilla.com
Depends on: 884129
Depends on: 871537
Depends on: 888539
Product: mozilla.org → Release Engineering
releasing while on pto
Assignee: hwine → nobody
Status: ASSIGNED → NEW
My understanding from bug 884472 is that these masters are designed to replace other, unmanaged masters.  If that's the case, then this is doubly urgent both to get off of KVM and to manage all hosts using PuppetAgain.
Due to the current problems with cross-colo connections and timeouts, we're going to avoid making the problem worse by settings up yet more masters in ec2 (for now).

We still need to get off those KVM masters. Callek will work with relops to determine how much VMWare VM capacity we have left after the current batch of in-house masters are running in bug 927129.
Summary: Create enough panda and tegra buildbot masters in ec2 to replace kvm-based masters → Create enough new panda and tegra buildbot masters to replace kvm-based masters
Blocks: 864364
Was the answer that came out of yesterday's meeting that we need 4 more buildbot masters total, and then we will be able to decomm the 5 in scl1 and 4 in mtv1? I'd like to get those vms created so we can hand them off to you.
(In reply to Amy Rich [:arich] [:arr] from comment #16)
> Was the answer that came out of yesterday's meeting that we need 4 more
> buildbot masters total, and then we will be able to decomm the 5 in scl1 and
> 4 in mtv1? I'd like to get those vms created so we can hand them off to you.

Yes, right now we need 4 new, replacement, mobile test masters.

Longer-term we may need more in-house, non-mobile test masters.
Depends on: 940659
In-house masters are up.  Once those are built, is this good to close?
We're tracking setup work in bug 942206.
Blocks: 942206
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: