Closed
Bug 1455741
Opened 6 years ago
Closed 6 years ago
Decommission buildbot machines buildbot-master74, master76 , master94,
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: apop, Assigned: zfay)
Details
The buildbot master machines are needed to be decommissioned : buildbot-master76.bb.releng.use1.mozilla.com buildbot-master94.bb.releng.use1.mozilla.com buildbot-master74.bb.releng.usw2.mozilla.com
Updated•6 years ago
|
Assignee: nobody → zfay
Comment 1•6 years ago
|
||
sanity check slavealloc entry is removed sanity check buildbot db entry is removed sanity check nagios monitoring is removed anything from: https://wiki.mozilla.org/ReleaseEngineering/How_To/Decommission_buildbot_masters terminate aws instance
Assignee | ||
Comment 2•6 years ago
|
||
We've verified every step until deletion of the master from DB. We've connected to relengwebadm.private.scl3 and then used this command: mysql -u buildslaves -p -h devtools-rw-vip.db.scl3.mozilla.com -D buildslaves. When we've run: DELETE from masters WHERE masterid = 215; This return the following error: ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`buildslaves`.`slaves`, CONSTRAINT `slaves_ibfk_10` FOREIGN KEY (`current_masterid`) REFERENCES `masters` (`masterid`)). Is there anything we are missing? From our initial debugging we believe that the masters that we tried to delete are linked in some form to the slaves, which causes the fail. After the debug we have run the following: mysql> select * from slaves where current_masterid = '215'; which returned: +---------+---------------------+----------+--------+---------+-----------+------+---------+-------+--------+---------------+-----------------+---------+------------------+-------+--------------+ | slaveid | name | distroid | bitsid | speedid | purposeid | dcid | trustid | envid | poolid | basedir | locked_masterid | enabled | current_masterid | notes | custom_tplid | +---------+---------------------+----------+--------+---------+-----------+------+---------+-------+--------+---------------+-----------------+---------+------------------+-------+--------------+ | 2591 | bld-linux64-ec2-301 | 15 | 2 | 9 | 5 | 21 | 5 | 2 | 41 | /builds/slave | NULL | 1 | 215 | | NULL | | 2593 | bld-linux64-ec2-302 | 15 | 2 | 9 | 5 | 21 | 5 | 2 | 41 | /builds/slave | NULL | 1 | 215 | | NULL | | 2633 | bld-linux64-ec2-312 | 15 | 2 | 9 | 5 | 21 | 5 | 2 | 41 | /builds/slave | NULL | 1 | 215 | NULL | NULL | | 2643 | bld-linux64-ec2-317 | 15 | 2 | 9 | 5 | 21 | 5 | 2 | 41 | /builds/slave | NULL | 1 | 215 | NULL | NULL | | 2645 | bld-linux64-ec2-318 | 15 | 2 | 9 | 5 | 21 | 5 | 2 | 41 | /builds/slave | NULL | 1 | 215 | NULL | NULL | +---------+---------------------+----------+--------+---------+-----------+------+---------+-------+--------+---------------+-----------------+---------+------------------+-------+--------------+
Flags: needinfo?(jlund)
Comment 3•6 years ago
|
||
As per :fubar's and :catlee's suggestions, we have reached the following decision: 1) We are moving the existing slaves, that are attached to one of the masters to be decommissioned to new ones. 2) Move the slaves to a new master that is in the same Region and Pool as the masters listed above. 3) Move 2 slaves per old master to new master and give them some time and see how they act. If nothing bad happens, finish the transfer Monday. @fubar / @catlee : Do you guys have any concerns with our suggestions? So we came up with the following "battle plan" OLD Master: buildbot-master94.bb.releng.use1.mozilla.com NEW Master: buildbot-master77.bb.releng.use1.mozilla.com For the following slaves: bld-linux64-ec2-301 bld-linux64-ec2-302 bld-linux64-ec2-312 bld-linux64-ec2-317 bld-linux64-ec2-318 OLD Master: buildbot-master74.bb.releng.usw2.mozilla.com NEW Master: buildbot-master73.bb.releng.usw2.mozilla.com For the following slaves: bld-linux64-ec2-301 bld-linux64-ec2-302 bld-linux64-ec2-312 bld-linux64-ec2-317 bld-linux64-ec2-318 OLD Master: buildbot-master76.bb.releng.use1.mozilla.com NEW Master: buildbot-master75.bb.releng.use1.mozilla.com For the following slaves: try-linux64-spot-004 try-linux64-spot-005 try-linux64-spot-007 try-linux64-spot-010 b-2008-ec2-0004 y-2008-spot-010 y-2008-spot-011 y-2008-spot-012 y-2008-spot-013 y-2008-spot-014 y-2008-spot-015 y-2008-spot-016 y-2008-spot-017 y-2008-spot-018 y-2008-spot-019 y-2008-spot-022 y-2008-spot-028 y-2008-spot-029 y-2008-spot-030 y-2008-spot-033 y-2008-spot-035 y-2008-spot-037 y-2008-spot-040 y-2008-spot-041 y-2008-spot-043 y-2008-spot-046 y-2008-spot-047 y-2008-spot-048 y-2008-spot-050 y-2008-spot-051 y-2008-spot-052 y-2008-spot-053 y-2008-spot-055 y-2008-spot-061 y-2008-spot-062 y-2008-spot-065 y-2008-spot-067 y-2008-spot-072 y-2008-spot-077 y-2008-ec2-0003
Flags: needinfo?(klibby)
Flags: needinfo?(jlund)
Flags: needinfo?(catlee)
Comment 4•6 years ago
|
||
(In reply to Danut Labici [:dlabici] from comment #3) > As per :fubar's and :catlee's suggestions, we have reached the following > decision: > 1) We are moving the existing slaves, that are attached to one of the > masters to be decommissioned to new ones. > 2) Move the slaves to a new master that is in the same Region and Pool as > the masters listed above. > 3) Move 2 slaves per old master to new master and give them some time and > see how they act. If nothing bad happens, finish the transfer Monday. > > @fubar / @catlee : Do you guys have any concerns with our suggestions? > > So we came up with the following "battle plan" > > OLD Master: buildbot-master94.bb.releng.use1.mozilla.com > NEW Master: buildbot-master77.bb.releng.use1.mozilla.com > For the following slaves: > bld-linux64-ec2-301 > bld-linux64-ec2-302 > bld-linux64-ec2-312 > bld-linux64-ec2-317 > bld-linux64-ec2-318 +1 > OLD Master: buildbot-master74.bb.releng.usw2.mozilla.com > NEW Master: buildbot-master73.bb.releng.usw2.mozilla.com > For the following slaves: > bld-linux64-ec2-301 > bld-linux64-ec2-302 > bld-linux64-ec2-312 > bld-linux64-ec2-317 > bld-linux64-ec2-318 +1 > OLD Master: buildbot-master76.bb.releng.use1.mozilla.com > NEW Master: buildbot-master75.bb.releng.use1.mozilla.com > For the following slaves: > try-linux64-spot-004 > try-linux64-spot-005 > try-linux64-spot-007 > try-linux64-spot-010 +1 > b-2008-ec2-0004 this should be moved to buildbot-master77 > y-2008-spot-010 > y-2008-spot-011 > y-2008-spot-012 > y-2008-spot-013 > y-2008-spot-014 > y-2008-spot-015 > y-2008-spot-016 > y-2008-spot-017 > y-2008-spot-018 > y-2008-spot-019 > y-2008-spot-022 > y-2008-spot-028 > y-2008-spot-029 > y-2008-spot-030 > y-2008-spot-033 > y-2008-spot-035 > y-2008-spot-037 > y-2008-spot-040 > y-2008-spot-041 > y-2008-spot-043 > y-2008-spot-046 > y-2008-spot-047 > y-2008-spot-048 > y-2008-spot-050 > y-2008-spot-051 > y-2008-spot-052 > y-2008-spot-053 > y-2008-spot-055 > y-2008-spot-061 > y-2008-spot-062 > y-2008-spot-065 > y-2008-spot-067 > y-2008-spot-072 > y-2008-spot-077 > y-2008-ec2-0003 +1
Flags: needinfo?(catlee)
Comment 6•6 years ago
|
||
I've done the following: OLD Master: buildbot-master94.bb.releng.use1.mozilla.com NEW Master: buildbot-master77.bb.releng.use1.mozilla.com For the following slaves: bld-linux64-ec2-301 ( mysql> update slaves SET current_masterid = '221' WHERE name = 'bld-linux64-ec2-301';) bld-linux64-ec2-302 ( mysql> update slaves SET current_masterid = '221' WHERE name = 'bld-linux64-ec2-302';) OLD Master: buildbot-master76.bb.releng.use1.mozilla.com NEW Master: buildbot-master75.bb.releng.use1.mozilla.com For the following slaves: try-linux64-spot-004 (mysql> update slaves set current_masterid = '217' where name = 'try-linux64-spot-005';) try-linux64-spot-005 (mysql> update slaves set current_masterid = '217' where name = 'try-linux64-spot-004';) Basically I've moved the first 2 slaves from a master to another in the database to see how it goes. If everything will go silky-smooth we'll proceed moving the rest.
Comment 7•6 years ago
|
||
UPDATE: After initial checks it seems that machines we've tested with the moves took the jobs and also got reassigned to a new master. From what we know this is exactly the expected outcome. We'll proceed with the other slaves.
Comment 8•6 years ago
|
||
UPDATE: I've done the job and b-2008-ec2-0004 went straight to buildbot-master77. Leaving this bug opened for the final checks. Buildduty team: Feel free to do the checks and close the bug after making sure that every slave got allocated to a new buildbot master and every slave is taking new jobs.
Comment 9•6 years ago
|
||
Hello. After I checked the DB, I noticed these: OLD Master: buildbot-master94.bb.releng.use1.mozilla.com current_masterid == 313 mysql> select * from slaves where current_masterid = 313; +---------+---------------------+----------+--------+---------+-----------+------+---------+-------+--------+---------------+-----------------+---------+------------------+-------+--------------+ | slaveid | name | distroid | bitsid | speedid | purposeid | dcid | trustid | envid | poolid | basedir | locked_masterid | enabled | current_masterid | notes | custom_tplid | +---------+---------------------+----------+--------+---------+-----------+------+---------+-------+--------+---------------+-----------------+---------+------------------+-------+--------------+ | 10527 | bld-linux64-ec2-003 | 15 | 2 | 9 | 5 | 19 | 5 | 2 | 37 | /builds/slave | NULL | 0 | 313 | NULL | NULL | | 10539 | bld-linux64-ec2-009 | 15 | 2 | 9 | 5 | 19 | 5 | 2 | 37 | /builds/slave | NULL | 0 | 313 | NULL | NULL | | 10541 | bld-linux64-ec2-010 | 15 | 2 | 9 | 5 | 19 | 5 | 2 | 37 | /builds/slave | NULL | 0 | 313 | NULL | NULL | | 10549 | bld-linux64-ec2-014 | 15 | 2 | 9 | 5 | 19 | 5 | 2 | 37 | /builds/slave | NULL | 0 | 313 | NULL | NULL | +---------+---------------------+----------+--------+---------+-----------+------+---------+-------+--------+---------------+-----------------+---------+------------------+-------+--------------+ 4 rows in set (0,00 sec) Should we move these slaves to the NEW Master: buildbot-master77.bb.releng.use1.mozilla.com ? ^ OLD Master: buildbot-master74.bb.releng.usw2.mozilla.com current_masterid == 215 mysql> select * from slaves where current_masterid = 215; Empty set (0,00 sec) Could we delete this master from buildslaves DB, now?^ OLD Master: buildbot-master76.bb.releng.use1.mozilla.com current_masterid == 219 mysql> select * from slaves where current_masterid = 219; Empty set (0,00 sec) Could we delete this master from buildslaves DB, now?^ Also, the slaves below appear in NEW Master: buildbot-master77.bb.releng.use1.mozilla.com bld-linux64-ec2-301 DC = us-west2 bld-linux64-ec2-302 DC = us-west2 bld-linux64-ec2-312 DC = us-west2 bld-linux64-ec2-317 DC = us-west2 bld-linux64-ec2-318 DC = us-west2 I've checked in Slave Allocator and these slaves are in us-west-2. Should we move these slaves to the NEW Master: buildbot-master73.bb.releng.usw2.mozilla.com
Flags: needinfo?(klibby)
Flags: needinfo?(jlund)
Flags: needinfo?(catlee)
Comment 10•6 years ago
|
||
Slaves moving masters (after we changed them) is an expected outcome, the fact that they have new (auto-assigned) masters is telling us everything works without any issues. I have moved the remaining slaves, re-enabled them in slavealloc and finished the decommission of the 3 masters. mysql> DELETE from masters WHERE masterid = 313; Query OK, 1 row affected (0,01 sec) mysql> DELETE from masters WHERE masterid = 215; Query OK, 1 row affected (0,00 sec) mysql> DELETE from masters WHERE masterid = 219; Query OK, 1 row affected (0,00 sec) Closing this bug for now as every step was been done and everything seems to work as expected.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(klibby)
Flags: needinfo?(jlund)
Flags: needinfo?(catlee)
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•