Closed Bug 927129 Opened 7 years ago Closed 6 years ago

Setup in-house buildbot masters for builders

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Linux
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: Callek)

References

Details

(Whiteboard: [capacity][buildduty])

Attachments

(4 files)

To work around the network disconnects we are experiencing in, e.g., bug 710942, when connecting our build slaves to masters in AWS US East, we should stand-up the build masters created in bug 867593 and get them in production.

I'm hoping that between the Mac, Windows, and very small subset of Linux builders we have that 2 build masters will be enough. 

If we need to make a choice of which build slaves to attach to these masters due to capacity reasons, we should start with Windows build slaves, then Mac, then Linux.
Added the first of 2 masters to slavealloc, with r=jhopkins over IRC.

mysql> describe masters;
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| masterid  | int(11)      | NO   | PRI | NULL    | auto_increment |
| nickname  | varchar(128) | NO   | UNI | NULL    |                |
| fqdn      | text         | NO   |     | NULL    |                |
| http_port | int(11)      | NO   |     | NULL    |                |
| pb_port   | int(11)      | NO   |     | NULL    |                |
| dcid      | int(11)      | NO   | MUL | NULL    |                |
| poolid    | int(11)      | NO   | MUL | NULL    |                |
| enabled   | tinyint(1)   | NO   |     | NULL    |                |
| notes     | text         | YES  |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+
9 rows in set (0.30 sec)

mysql> select * from masters WHERE nickname="bm57-build1" OR nickname="bm60-try1" OR nickname="bm30-build1" OR nickname="bm31-try1";
+----------+-------------+-----------------------------------------------+-----------+---------+------+--------+---------+------------+
| masterid | nickname    | fqdn                                          | http_port | pb_port | dcid | poolid | enabled | notes      |
+----------+-------------+-----------------------------------------------+-----------+---------+------+--------+---------+------------+
|      110 | bm30-build1 | buildbot-master30.srv.releng.scl3.mozilla.com |      8001 |9001 |   10 |     24 |       0 | bug 864364 |
|      111 | bm31-try1   | buildbot-master31.srv.releng.scl3.mozilla.com |      8101 |9101 |   10 |     25 |       0 | bug 864364 |
|      181 | bm57-build1 | buildbot-master57.srv.releng.use1.mozilla.com |      8001 |9001 |   21 |     37 |       1 |            |
|      187 | bm60-try1   | buildbot-master60.srv.releng.usw2.mozilla.com |      8101 |9101 |   21 |     43 |       1 | NULL       |
+----------+-------------+-----------------------------------------------+-----------+---------+------+--------+---------+------------+
4 rows in set (0.04 sec)
mysql> INSERT INTO masters VALUES (NULL, "bm82-build1", "buildbot-master82.srv.releng.scl3.mozilla.com", 8001, 9001, 10, 24, 0, NULL),(NULL, "bm83-try1", "buildbot-master83.srv.releng.scl3.mozilla.com", 8101, 9101, 10, 25, 0, NULL);

Query OK, 2 rows affected (0.17 sec)
Records: 2  Duplicates: 0  Warnings: 0
Attached patch add bm82 and 83Splinter Review
Without a rhyme or reason as to which master is which type, but matching the SQL I did for slavealloc
Attachment #817997 - Flags: review?(coop)
Comment on attachment 817997 [details] [diff] [review]
add bm82 and 83

lgtm
Attachment #817997 - Flags: review?(coop) → review+
Depends on: 927566
Landed puppet change to allow these masters to sign: https://hg.mozilla.org/build/puppet/rev/3ba0cbeea092
Ok, enabled the masters in slavealloc and with a tools checkin http://hg.mozilla.org/build/tools/rev/9f24221111b1

Then did SQL:

mysql> select name, dcid, trustid FROM slaves WHERE poolid=37 AND name LIKE "w64%";
+-----------------+------+---------+
| name            | dcid | trustid |
+-----------------+------+---------+
| w64-ix-slave06  |    7 |       5 |
| w64-ix-slave09  |    7 |       5 |
| w64-ix-slave10  |    7 |       5 |
| w64-ix-slave101 |    7 |       5 |
| w64-ix-slave103 |    7 |       5 |
| w64-ix-slave105 |    7 |       5 |
| w64-ix-slave107 |    7 |       5 |
| w64-ix-slave109 |    7 |       5 |
| w64-ix-slave111 |    7 |       5 |
| w64-ix-slave113 |    7 |       5 |
| w64-ix-slave115 |    7 |       5 |
| w64-ix-slave117 |    7 |       5 |
| w64-ix-slave119 |    7 |       5 |
| w64-ix-slave121 |    7 |       5 |
| w64-ix-slave123 |    7 |       5 |
| w64-ix-slave125 |    7 |       5 |
| w64-ix-slave127 |    7 |       5 |
| w64-ix-slave129 |    7 |       5 |
| w64-ix-slave13  |    7 |       5 |
| w64-ix-slave131 |    7 |       5 |
| w64-ix-slave133 |    7 |       5 |
| w64-ix-slave135 |    7 |       5 |
| w64-ix-slave137 |    7 |       5 |
| w64-ix-slave139 |    7 |       5 |
| w64-ix-slave141 |    7 |       5 |
| w64-ix-slave143 |    7 |       5 |
| w64-ix-slave145 |    7 |       5 |
| w64-ix-slave147 |    7 |       5 |
| w64-ix-slave149 |    7 |       5 |
| w64-ix-slave15  |    7 |       5 |
| w64-ix-slave151 |    7 |       5 |
| w64-ix-slave153 |    7 |       5 |
| w64-ix-slave155 |    7 |       5 |
| w64-ix-slave157 |    7 |       5 |
| w64-ix-slave17  |    7 |       5 |
| w64-ix-slave18  |    7 |       5 |
| w64-ix-slave20  |    7 |       5 |
| w64-ix-slave23  |    7 |       5 |
| w64-ix-slave42  |    7 |       5 |
| w64-ix-slave76  |    7 |       5 |
| w64-ix-slave78  |    7 |       5 |
| w64-ix-slave81  |    7 |       5 |
| w64-ix-slave83  |    7 |       5 |
| w64-ix-slave87  |    7 |       5 |
| w64-ix-slave89  |    7 |       5 |
| w64-ix-slave91  |    7 |       5 |
| w64-ix-slave93  |    7 |       5 |
| w64-ix-slave95  |    7 |       5 |
| w64-ix-slave97  |    7 |       5 |
| w64-ix-slave99  |    7 |       5 |
+-----------------+------+---------+
50 rows in set (0.00 sec)

mysql> UPDATE slaves SET poolid=24 WHERE poolid=37 AND name LIKE "w64%";
Query OK, 50 rows affected (0.05 sec)
Rows matched: 50  Changed: 50  Warnings: 0

mysql> select name, dcid, trustid FROM slaves WHERE poolid=43 AND name LIKE "w64%";
+----------------+------+---------+
| name           | dcid | trustid |
+----------------+------+---------+
| w64-ix-slave40 |    7 |       4 |
| w64-ix-slave41 |    7 |       4 |
| w64-ix-slave44 |    7 |       4 |
| w64-ix-slave45 |    7 |       4 |
| w64-ix-slave46 |    7 |       4 |
| w64-ix-slave47 |    7 |       4 |
| w64-ix-slave48 |    7 |       4 |
| w64-ix-slave49 |    7 |       4 |
| w64-ix-slave50 |    7 |       4 |
| w64-ix-slave51 |    7 |       4 |
| w64-ix-slave52 |    7 |       4 |
| w64-ix-slave53 |    7 |       4 |
| w64-ix-slave54 |    7 |       4 |
| w64-ix-slave55 |    7 |       4 |
| w64-ix-slave56 |    7 |       4 |
| w64-ix-slave57 |    7 |       4 |
| w64-ix-slave58 |    7 |       4 |
| w64-ix-slave59 |    7 |       4 |
| w64-ix-slave60 |    7 |       4 |
| w64-ix-slave61 |    7 |       4 |
| w64-ix-slave62 |    7 |       4 |
| w64-ix-slave63 |    7 |       4 |
| w64-ix-slave64 |    7 |       4 |
| w64-ix-slave65 |    7 |       4 |
| w64-ix-slave66 |    7 |       4 |
| w64-ix-slave67 |    7 |       4 |
| w64-ix-slave68 |    7 |       4 |
| w64-ix-slave69 |    7 |       4 |
| w64-ix-slave70 |    7 |       4 |
| w64-ix-slave71 |    7 |       4 |
| w64-ix-slave72 |    7 |       4 |
| w64-ix-slave73 |    7 |       4 |
| w64-ix-slave74 |    7 |       4 |
+----------------+------+---------+
33 rows in set (0.00 sec)

mysql> UPDATE slaves SET poolid=25 WHERE poolid=43 AND name LIKE "w64%";
Query OK, 33 rows affected (0.01 sec)
Rows matched: 33  Changed: 33  Warnings: 0


(((The above was meant to be commented yesterday)))

(((The following was done just now)))

Adding the mac builders attached to us-east now...

mysql> select name, dcid, trustid FROM slaves WHERE poolid=37 AND name LIKE "bld-lion%";
+-----------------+------+---------+
| name            | dcid | trustid |
+-----------------+------+---------+
| bld-lion-r5-001 |   10 |       5 |
| bld-lion-r5-002 |   10 |       5 |
| bld-lion-r5-003 |   10 |       5 |
| bld-lion-r5-004 |   10 |       5 |
| bld-lion-r5-005 |   10 |       5 |
| bld-lion-r5-006 |   10 |       5 |
| bld-lion-r5-007 |   10 |       5 |
| bld-lion-r5-008 |   10 |       5 |
| bld-lion-r5-009 |   10 |       5 |
| bld-lion-r5-010 |   10 |       5 |
| bld-lion-r5-011 |   10 |       5 |
| bld-lion-r5-012 |   10 |       5 |
| bld-lion-r5-013 |   10 |       5 |
| bld-lion-r5-014 |   10 |       5 |
| bld-lion-r5-015 |   10 |       5 |
| bld-lion-r5-041 |   10 |       5 |
| bld-lion-r5-042 |   10 |       5 |
| bld-lion-r5-044 |   10 |       5 |
| bld-lion-r5-045 |   10 |       5 |
| bld-lion-r5-046 |   10 |       5 |
| bld-lion-r5-047 |   10 |       5 |
| bld-lion-r5-048 |   10 |       5 |
| bld-lion-r5-049 |   10 |       5 |
| bld-lion-r5-050 |   10 |       5 |
| bld-lion-r5-051 |   10 |       5 |
| bld-lion-r5-052 |   10 |      15 |
| bld-lion-r5-053 |   10 |       5 |
| bld-lion-r5-054 |   10 |       5 |
| bld-lion-r5-055 |   10 |       5 |
| bld-lion-r5-056 |   10 |       5 |
| bld-lion-r5-057 |   10 |       5 |
| bld-lion-r5-058 |   10 |       5 |
| bld-lion-r5-059 |   10 |       5 |
+-----------------+------+---------+
33 rows in set (0.00 sec)

mysql> UPDATE slaves SET poolid=24 WHERE poolid=37 AND name LIKE "bld-lion%";
Query OK, 33 rows affected (0.01 sec)
Rows matched: 33  Changed: 33  Warnings: 0
Depends on: 933108
I've downtimed the nagios alerts for bm84 through to bm88 until 11-04-2013 10:20 PST. Please extend that, or add them to puppet as appropriate.
Attached patch masters 82-87Splinter Review
so for numbers on why this many of each master:

mysql> select name,dcid,poolid from slaves where poolid in (39,26,43) and dcid in (5,7,10,13);
...
|71 rows in set (0.02 sec)| 

mysql> select name,dcid,poolid from slaves where poolid in (37,27,41,59,12) and dcid in (5,7,10,13);
...
217 rows in set (0.02 sec)

from prior chats that is ~75 slaves per build master... which means 3 non-try and 1 more try.

for added info:

mysql> select * from pools where poolid in (37,27,41,59,12,39,26,43);
+--------+--------------------------+
| poolid | name                     |
+--------+--------------------------+
|     12 | build-scl1               |
|     26 | try-aws-us-west-1        |
|     27 | build-aws-us-west-1      |
|     37 | build-aws-us-east-1      |
|     39 | try-aws-us-east-1        |
|     41 | build-aws-us-west-2      |
|     43 | try-aws-us-west-2        |
|     59 | build-aws-us-west-2-rev2 |
+--------+--------------------------+
8 rows in set (0.01 sec)

mysql> select * from datacenters where dcid in (5,7,10,13);
+------+------+
| dcid | name |
+------+------+
|    5 | mtv1 |
|    7 | scl1 |
|   10 | scl3 |
|   13 | sjc1 |
+------+------+
4 rows in set (0.00 sec)
Attachment #826770 - Flags: review?(jhopkins)
Attachment #826775 - Flags: review?(jhopkins)
Attachment #826775 - Flags: review?(jhopkins) → review+
Attachment #826770 - Flags: review?(jhopkins) → review+
Puppetized:
 http://hg.mozilla.org/build/puppet/rev/13f251e661d6
 http://hg.mozilla.org/build/puppet/rev/fc9fcc37d7d6

Enabled the masters in production-masters:
 https://hg.mozilla.org/build/tools/rev/535d6fd19a1f

Updated SQL:

mysql> insert into masters (nickname,fqdn,http_port,pb_port,dcid,poolid,enabled) VALUES (
"bm84-build1","buildbot-master84.srv.releng.scl3.mozilla.com",8001,9001,10,24,0),("bm85-b
uild1","buildbot-master85.srv.releng.scl3.mozilla.com",8001,9001,10,24,0),("bm86-build1",
"buildbot-master86.srv.releng.scl3.mozilla.com",8001,9001,10,24,0),("bm87-try1","buildbot
-master87.srv.releng.scl3.mozilla.com",8101,9101,10,25,0);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from masters where nickname like "bm8%";
+----------+--------------------+-----------------------------------------------+---------
--+---------+------+--------+---------+------------+
| masterid | nickname           | fqdn                                          | http_por
t | pb_port | dcid | poolid | enabled | notes      |
+----------+--------------------+-----------------------------------------------+---------
--+---------+------+--------+---------+------------+
|      227 | bm80-tests1-macosx | buildbot-master80.srv.releng.usw2.mozilla.com |      820
1 |    9201 |   21 |     28 |       1 |            |
|      259 | bm82-build1        | buildbot-master82.srv.releng.scl3.mozilla.com |      800
1 |    9001 |   10 |     24 |       1 |            |
|      261 | bm83-try1          | buildbot-master83.srv.releng.scl3.mozilla.com |      810
1 |    9101 |   10 |     25 |       1 |            |
|      265 | bm84-build1        | buildbot-master84.srv.releng.scl3.mozilla.com |      800
1 |    9001 |   10 |     24 |       0 | NULL       |
|      267 | bm85-build1        | buildbot-master85.srv.releng.scl3.mozilla.com |      800
1 |    9001 |   10 |     24 |       0 | NULL       |
|      269 | bm86-build1        | buildbot-master86.srv.releng.scl3.mozilla.com |      800
1 |    9001 |   10 |     24 |       0 | NULL       |
|      271 | bm87-try1          | buildbot-master87.srv.releng.scl3.mozilla.com |      810
1 |    9101 |   10 |     25 |       0 | NULL       |
|      263 | bm89-tests1-panda  | buildbot-master89.srv.releng.scl3.mozilla.com |      820
1 |    9201 |   10 |     55 |       0 | bug 892691 |
+----------+--------------------+-----------------------------------------------+---------
--+---------+------+--------+---------+------------+
8 rows in set (0.01 sec)

Rebooted masters (to let buildbot and such start at machine startup) -- of course in this process I *accidentally* rebooted bm82 and 83, which will certainly disco-burn a good chunk of windows builds, try and non-try. informed sheriffs of that.
Ok, I did another snafu with this as well...

accidentally put (very very briefly) all try builders in this list into the production build pool for buildbot, very quickly pulled them back out though, as this .txt log of my sql history will show.
ToDo: Verify this bug is indeed done.
Flags: needinfo?(bugspam.Callek)
All the rev1 and rev2 w64-ix-slave machines are pointed at in-house masters now.
buildbot-master88 was created at the same time as 81 thru 89 (bug 867593 comment #13) but isn't set up yet.
All in house build machines are pointed at in house masters, except windows 110 and 03, which are locked to an aws host - I poked john hopins about those and closing this out
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(bugspam.Callek)
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.