Closed Bug 985625 Opened 12 years ago Closed 12 years ago

Releng Config/Automation support of staging run of scl1->scl3 move

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: Callek)

References

Details

Attachments

(4 files, 1 obsolete file)

On 2014-03-18 12:43 , Amy Rich wrote: > Here are the machines I would like to move from scl1 -> scl3 this week: > > b-linux64-hp-001.build.scl1.mozilla.com -> b-linux64-hp-0001.try.releng.scl3.mozilla.com (scl3-releng-vlan264) > bld-linux64-ix-027.build.scl1.mozilla.com -> b-linux64-ix-0001.build.releng.scl3.mozilla.com (scl3-releng-vlan252) > w64-ix-slave03.winbuild.scl1.mozilla.com -> b-2008-ix-0018.wintry.releng.scl3.mozilla.com (scl3-releng-vlan244) > > Hal, can those be removed from the pool and new configs set up in slavealloc/buildbot to accomodate them on the other side? If so, I'll open up a bug to make this happen. If these machines are not feasible, please suggest machines from the same pools. > ... On 2014-03-18 13:25 , Amy Rich wrote: > I have one more to add, since I just ran a preliminary image of an r4 mini in scl3 and it seemed to succeed: > > talos-r4-snow-001.build.scl1.mozilla.com -> t-snow-r4-0001.test.releng.scl3.mozilla.com (scl3-releng-vlan256)
(In reply to Justin Wood (:Callek) from comment #0) > > w64-ix-slave03.winbuild.scl1.mozilla.com -> b-2008-ix-0018.wintry.releng.scl3.mozilla.com (scl3-releng-vlan244) We'll use currently-in-production w64-ix-slave07 instead (since 03 is currently on a loan) so thats w64-ix-slave07.winbuild.scl1.mozilla.com -> b-2008-ix-0066.winbuild.releng.scl3.mozilla.com (scl3-releng-vlan236) If you really needed to test the wintry not winbuild VLAN, let me know and I'll grab you a try one as well.
(In reply to Justin Wood (:Callek) from comment #1) Ignore that whole comment... w64-ix-slave31.winbuild.scl1.mozilla.com --> b-2008-ix-0019.wintry.releng.scl3.mozilla.com (scl3-releng-vlan244)
Now all hosts have been disabled in slavealloc, still waiting for bld-linux64-ix-027 to drain its last job.
Armen, r? on the following SQL: mysql> update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="b-2008-ix-0019" WHERE name="w64-ix-slave31"; update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="b-linux64-hp-0001" WHERE name="b-linux64-hp-001"; update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="b-linux64-ix-0001" WHERE name="bld-linux64-ix-027"; update slaves SET dcid=(select dcid from datacenters where name="scl3") AND name="t-snow-r4-0001" WHERE name="talos-r4-snow-001";
Flags: needinfo?(armenzg)
Armen gave me r+ in IRC, but then trying to apply it I got: ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`buildslaves`.`slaves`, CONSTRAINT `slaves_ibfk_6` FOREIGN KEY (`dcid`) REFERENCES `datacenters` (`dcid`)) Popping into #db I got help from :cyborgshadow which led me to use the following commands: mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='b-2008-ix-0019' where s.name = 'w64-ix-slave31'; Query OK, 1 row affected, 1 warning (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='b-linux64-hp-0001' where s.name = 'b-linux64-hp-001'; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='b-linux64-ix-0001' where s.name = 'bld-linux64-ix-027'; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> Update slaves s JOIN datacenters dc on dc.name='scl3' set s.dcid = dc.dcid, s.name='t-snow-r4-0001' where s.name = 'talos-r4-snow-001'; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
Flags: needinfo?(armenzg)
Attached patch [buildbot-configs] batch-0 (obsolete) — Splinter Review
So this does the config changes for the pools where we're moving a single host from this week. I note it looks like we're missing some stuff from this list at a glance. I based these values on https://docs.google.com/a/mozilla.com/spreadsheets/d/1vq3dcMGvwbW1sug0lipYj2-P_SRovf8q-_pEOxmvhFg/edit#gid=947240444 And of the following sheets there: "talos-r4-snow", "linux build", "windows try", "linux try - thunderbird and fuzzing", All other sheets were not referenced for this patch. If any of those 4 sheets change, I will need to know. I also want to avoid changing the mappings at this stage.
Attachment #8393781 - Flags: review?(hwine)
needinfo to :arr on c#6, she says she'll look into missing hosts tomorrow, and since I need to be aware if those sheets change, and especially since I need to be aware if we're changing the mappings of any currently specified machines in those sheets
Flags: needinfo?(arich)
see c#6 for patch details. I initially attached a patch for wrong repo by accident
Attachment #8393781 - Attachment is obsolete: true
Attachment #8393781 - Flags: review?(hwine)
Attachment #8393785 - Flags: review?(hwine)
Attachment #8393816 - Flags: review?(armenzg)
Attachment #8393816 - Flags: review?(armenzg) → review+
Attachment #8393785 - Flags: review?(hwine) → review+
To document snippet of a convo I just had: [12:04:00] Callek jmaher: this is re: Bug 985482 --- (and Bug 985625) --- the physical machine is not changing, so choose 1 --- (a) update graphserver DB to say the new slave name instead of the old slave name -- or (b) insert a new row into graph server to accomodate. .... [12:39:00] jmaher Callek: I would vote for adding new machine names to the database So thats the plan of record.
changes to https://docs.google.com/a/mozilla.com/spreadsheets/d/1vq3dcMGvwbW1sug0lipYj2-P_SRovf8q-_pEOxmvhFg after data validation of the etherpad: the second w64-ix-slave144 has been removed from the "windows build - esr and b2g18" tab, replaced with w64-ix-slave156, and put in the windows build tab. The following have been added to the wintry pool (windows try tab) and renamed: w64-ix-slave32.winbuild.scl1.mozilla.com w64-ix-slave33.winbuild.scl1.mozilla.com w64-ix-slave34.winbuild.scl1.mozilla.com w64-ix-slave35.winbuild.scl1.mozilla.com w64-ix-slave36.winbuild.scl1.mozilla.com w64-ix-slave37.winbuild.scl1.mozilla.com w64-ix-slave38.winbuild.scl1.mozilla.com w64-ix-slave39.winbuild.scl1.mozilla.com w64-ix-slave40.winbuild.scl1.mozilla.com w64-ix-slave04.winbuild.scl1.mozilla.com w64-ix-slave05.winbuild.scl1.mozilla.com w64-ix-slave157.winbuild.scl1.mozilla.com added the following to "linux try - thunderbird and fuzzing": b-linux64-ix-049.build.scl1.mozilla.com b-linux64-ix-050.build.scl1.mozilla.com
Flags: needinfo?(arich)
You should be able to test these two machines now: b-linux64-hp-0001.try.releng.scl3.mozilla.com b-linux64-ix-0001.build.releng.scl3.mozilla.com We're still working out some issues with the deployment for b-2008-ix-0019.wintry.releng.scl3.mozilla.com, and we're waiting for the attachment of an EDID for t-snow-r4-0001.test.releng.scl3.mozilla.com.
t-snow-r4-0001.test.releng.scl3.mozilla.com is now available for testing as well.
b-2008-ix-0019.wintry.releng.scl3.mozilla.com is now being installed.
something checked-in went into production :)
Done the graph server changes (for the machines in this bug only) mysql> select * from machines WHERE name IN ("b-linux64-hp-001", "bld-linux64-ix-027", "w64-ix-slave31", "talos-r4-snow-001"); +------+-------+---------------+-----------+-------------------+-----------+------------+ | id | os_id | is_throttling | cpu_speed | name | is_active | date_added | +------+-------+---------------+-----------+-------------------+-----------+------------+ | 1455 | 21 | 0 | 2.4 | talos-r4-snow-001 | 1 | 1317830270 | | 6585 | 27 | 0 | 1.0 | w64-ix-slave31 | 1 | 1358380982 | +------+-------+---------------+-----------+-------------------+-----------+------------+ 2 rows in set (0.00 sec) mysql> select * from machines WHERE name LIKE "b-linux%"; Empty set (0.01 sec) mysql> select * from machines WHERE name LIKE "bld-%"; Empty set (0.00 sec) mysql> start transaction; Query OK, 0 rows affected (0.00 sec) mysql> insert into machines values (null, 21, 0, "2.4", "t-snow-r4-0001", 1, unix_timestamp()); Query OK, 1 row affected (0.00 sec) mysql> insert into machines values (null, 27, 0, "1.0", "b-2008-ix-0019", 1, unix_timestamp()); Query OK, 1 row affected (0.00 sec) mysql> commit; Query OK, 0 rows affected (0.00 sec)
This is done, all staged machines are taking jobs in production and working fine. needinfo @ myself for c#11 touchups though.
Flags: needinfo?(bugspam.Callek)
per c#11
Attachment #8398282 - Flags: review?(armenzg)
Flags: needinfo?(bugspam.Callek)
Attachment #8398282 - Flags: review?(armenzg) → review+
in production.
The Buildbot-configs part 2 broke slave healths crons. This patch should fix it
Attachment #8399594 - Flags: review?(armenzg)
Comment on attachment 8399594 [details] [diff] [review] [slave_health] Part 2 - fix for buildbot-configs part 2 Review of attachment 8399594 [details] [diff] [review]: ----------------------------------------------------------------- r+ = jhopkins over IRC
Attachment #8399594 - Flags: review?(armenzg)
Attachment #8399594 - Flags: review+
Attachment #8399594 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Note for the future readers -- we didn't specifically test a b2g build on a linux box, so missed a builder -> cruncher flow. Discovered & fixed during move train A - see bug 1014221.
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: