Releng Config/Automation support of staging run of scl1->scl3 move

RESOLVED FIXED

Status

task
RESOLVED FIXED
5 years ago
a year ago

People

(Reporter: Callek, Assigned: Callek)

Tracking

Details

Attachments

(4 attachments, 1 obsolete attachment)

(Assignee)

Description

5 years ago
On 2014-03-18 12:43 , Amy Rich wrote:
> Here are the machines I would like to move from scl1 -> scl3 this week:
>
> b-linux64-hp-001.build.scl1.mozilla.com -> b-linux64-hp-0001.try.releng.scl3.mozilla.com (scl3-releng-vlan264)
> bld-linux64-ix-027.build.scl1.mozilla.com -> b-linux64-ix-0001.build.releng.scl3.mozilla.com (scl3-releng-vlan252)
> w64-ix-slave03.winbuild.scl1.mozilla.com -> b-2008-ix-0018.wintry.releng.scl3.mozilla.com (scl3-releng-vlan244)
>
> Hal, can those be removed from the pool and new configs set up in slavealloc/buildbot to accomodate them on the other side?  If so, I'll open up a bug to make this happen.  If these machines are not feasible, please suggest machines from the same pools.
>

...

On 2014-03-18 13:25 , Amy Rich wrote:
> I have one more to add, since I just ran a preliminary image of an r4 mini in scl3 and it seemed to succeed:
>
> talos-r4-snow-001.build.scl1.mozilla.com -> t-snow-r4-0001.test.releng.scl3.mozilla.com (scl3-releng-vlan256)
(Assignee)

Comment 1

5 years ago
(In reply to Justin Wood (:Callek) from comment #0)
> > w64-ix-slave03.winbuild.scl1.mozilla.com -> b-2008-ix-0018.wintry.releng.scl3.mozilla.com (scl3-releng-vlan244)

We'll use currently-in-production w64-ix-slave07 instead (since 03 is currently on a loan)

so thats 

w64-ix-slave07.winbuild.scl1.mozilla.com -> b-2008-ix-0066.winbuild.releng.scl3.mozilla.com (scl3-releng-vlan236)

If you really needed to test the wintry not winbuild VLAN, let me know and I'll grab you a try one as well.
(Assignee)

Comment 2

5 years ago
(In reply to Justin Wood (:Callek) from comment #1)
Ignore that whole comment...

w64-ix-slave31.winbuild.scl1.mozilla.com --> b-2008-ix-0019.wintry.releng.scl3.mozilla.com (scl3-releng-vlan244)
(Assignee)

Comment 3

5 years ago
Now all hosts have been disabled in slavealloc, still waiting for bld-linux64-ix-027 to drain its last job.
(Assignee)

Comment 4

5 years ago
Armen, r? on the following SQL:

mysql> update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="b-2008-ix-0019" WHERE name="w64-ix-slave31";

update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="b-linux64-hp-0001" WHERE name="b-linux64-hp-001";

update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="b-linux64-ix-0001" WHERE name="bld-linux64-ix-027";

update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="t-snow-r4-0001" WHERE name="talos-r4-snow-001";
Flags: needinfo?(armenzg)
(Assignee)

Comment 5

5 years ago
Armen gave me r+ in IRC, but then trying to apply it I got:

ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`buildslaves`.`slaves`, CONSTRAINT `slaves_ibfk_6` FOREIGN KEY (`dcid`) REFERENCES `datacenters` (`dcid`))


Popping into #db I got help from :cyborgshadow which led me to use the following commands:

mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='b-2008-ix-0019' where s.name = 'w64-ix-slave31';
Query OK, 1 row affected, 1 warning (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='b-linux64-hp-0001' where s.name = 'b-linux64-hp-001';
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='b-linux64-ix-0001' where s.name = 'bld-linux64-ix-027';
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='t-snow-r4-0001' where s.name = 'talos-r4-snow-001';
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
Flags: needinfo?(armenzg)
(Assignee)

Comment 6

5 years ago
Posted patch [buildbot-configs] batch-0 (obsolete) — Splinter Review
So this does the config changes for the pools where we're moving a single host from this week.

I note it looks like we're missing some stuff from this list at a glance.

I based these values on https://docs.google.com/a/mozilla.com/spreadsheets/d/1vq3dcMGvwbW1sug0lipYj2-P_SRovf8q-_pEOxmvhFg/edit#gid=947240444

And of the following sheets there:

"talos-r4-snow", "linux build", "windows try", "linux try - thunderbird and fuzzing", 

All other sheets were not referenced for this patch.

If any of those 4 sheets change, I will need to know. I also want to avoid changing the mappings at this stage.
Attachment #8393781 - Flags: review?(hwine)
(Assignee)

Comment 7

5 years ago
needinfo to :arr on c#6, she says she'll look into missing hosts tomorrow, and since I need to be aware if those sheets change, and especially since I need to be aware if we're changing the mappings of any currently specified machines in those sheets
Flags: needinfo?(arich)
(Assignee)

Comment 8

5 years ago
see c#6 for patch details.

I initially attached a patch for wrong repo by accident
Attachment #8393781 - Attachment is obsolete: true
Attachment #8393781 - Flags: review?(hwine)
Attachment #8393785 - Flags: review?(hwine)
(Assignee)

Comment 9

5 years ago
Attachment #8393816 - Flags: review?(armenzg)

Updated

5 years ago
Attachment #8393816 - Flags: review?(armenzg) → review+

Updated

5 years ago
Attachment #8393785 - Flags: review?(hwine) → review+
(Assignee)

Comment 10

5 years ago
To document snippet of a convo I just had:

[12:04:00]	Callek	jmaher: this is re: Bug 985482 --- (and Bug 985625) --- the physical machine is not changing, so choose 1 --- (a) update graphserver DB to say the new slave name instead of the old slave name -- or (b) insert a new row into graph server to accomodate.
....
[12:39:00]	jmaher	Callek: I would vote for adding new machine names to the database


So thats the plan of record.
changes to https://docs.google.com/a/mozilla.com/spreadsheets/d/1vq3dcMGvwbW1sug0lipYj2-P_SRovf8q-_pEOxmvhFg after data validation of the etherpad:

the second w64-ix-slave144 has been removed from the "windows build - esr and b2g18" tab, replaced with w64-ix-slave156, and put in the windows build tab.

The following have been added to the wintry pool (windows try tab) and renamed:

w64-ix-slave32.winbuild.scl1.mozilla.com
w64-ix-slave33.winbuild.scl1.mozilla.com
w64-ix-slave34.winbuild.scl1.mozilla.com
w64-ix-slave35.winbuild.scl1.mozilla.com
w64-ix-slave36.winbuild.scl1.mozilla.com
w64-ix-slave37.winbuild.scl1.mozilla.com
w64-ix-slave38.winbuild.scl1.mozilla.com
w64-ix-slave39.winbuild.scl1.mozilla.com
w64-ix-slave40.winbuild.scl1.mozilla.com
w64-ix-slave04.winbuild.scl1.mozilla.com
w64-ix-slave05.winbuild.scl1.mozilla.com
w64-ix-slave157.winbuild.scl1.mozilla.com


added the following to "linux try - thunderbird and fuzzing":

b-linux64-ix-049.build.scl1.mozilla.com
b-linux64-ix-050.build.scl1.mozilla.com
Flags: needinfo?(arich)
You should be able to test these two machines now:

b-linux64-hp-0001.try.releng.scl3.mozilla.com
b-linux64-ix-0001.build.releng.scl3.mozilla.com

We're still working out some issues with the deployment for b-2008-ix-0019.wintry.releng.scl3.mozilla.com, and we're waiting for the attachment of an EDID for t-snow-r4-0001.test.releng.scl3.mozilla.com.
t-snow-r4-0001.test.releng.scl3.mozilla.com is now available for testing as well.
b-2008-ix-0019.wintry.releng.scl3.mozilla.com is now being installed.
something checked-in went into production :)
(Assignee)

Comment 16

5 years ago
Done the graph server changes (for the machines in this bug only)

mysql> select * from machines WHERE name IN ("b-linux64-hp-001", "bld-linux64-ix-027", "w64-ix-slave31", "talos-r4-snow-001");
+------+-------+---------------+-----------+-------------------+-----------+------------+
| id   | os_id | is_throttling | cpu_speed | name              | is_active | date_added |
+------+-------+---------------+-----------+-------------------+-----------+------------+
| 1455 |    21 |             0 | 2.4       | talos-r4-snow-001 |         1 | 1317830270 |
| 6585 |    27 |             0 | 1.0       | w64-ix-slave31    |         1 | 1358380982 |
+------+-------+---------------+-----------+-------------------+-----------+------------+
2 rows in set (0.00 sec)

mysql> select * from machines WHERE name LIKE "b-linux%";
Empty set (0.01 sec)

mysql> select * from machines WHERE name LIKE "bld-%";
Empty set (0.00 sec)

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into machines values (null, 21, 0, "2.4", "t-snow-r4-0001", 1, unix_timestamp());
Query OK, 1 row affected (0.00 sec)

mysql> insert into machines values (null, 27, 0, "1.0", "b-2008-ix-0019", 1, unix_timestamp());
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)
(Assignee)

Comment 17

5 years ago
This is done, all staged machines are taking jobs in production and working fine.

needinfo @ myself for c#11 touchups though.
Flags: needinfo?(bugspam.Callek)
(Assignee)

Comment 18

5 years ago
per c#11
Attachment #8398282 - Flags: review?(armenzg)
Flags: needinfo?(bugspam.Callek)

Updated

5 years ago
Attachment #8398282 - Flags: review?(armenzg) → review+
in production.
(Assignee)

Comment 21

5 years ago
The Buildbot-configs part 2 broke slave healths crons.

This patch should fix it
Attachment #8399594 - Flags: review?(armenzg)
(Assignee)

Comment 22

5 years ago
Comment on attachment 8399594 [details] [diff] [review]
[slave_health] Part 2 - fix for buildbot-configs part 2

Review of attachment 8399594 [details] [diff] [review]:
-----------------------------------------------------------------

r+ = jhopkins over IRC
Attachment #8399594 - Flags: review?(armenzg)
Attachment #8399594 - Flags: review+
Attachment #8399594 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Note for the future readers -- we didn't specifically test a b2g build on a linux box, so missed a builder -> cruncher flow. Discovered & fixed during move train A - see bug 1014221.
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.