Closed Bug 933768 Opened 9 years ago Closed 8 years ago

Re-purpose mw32-ix-slave##, linux-ix-slave##, linux64-ix-slave##, bld-linux64-ix-05[1-3], mw32-ix-ref and linux-ix-ref as b-2008-ix-#### (rev2) machines

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P2)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

(Whiteboard: summary-in-comment-35 - final list of machines in comment 19)

Attachments

(9 files, 1 obsolete file)

These machines were supposed to be running on preproduction, however, preproduction didn't even exist.

Let's give them some usage until we shut scl1 down.

Thank you very much!
Assignee: relops → arich
Depends on: 940513
Depends on: 942093
It seems that this bug is now completed from the relops side IIUC.

Can I take these machines and put them in the pool?
Flags: needinfo?(arich)
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(arich)
Resolution: --- → FIXED
I will re-use this bug for the releng side.

Thanks for your help!
Assignee: arich → armenzg
Status: RESOLVED → REOPENED
Component: RelOps → Buildduty
Product: Infrastructure & Operations → Release Engineering
QA Contact: arich → armenzg
Resolution: FIXED → ---
Attached patch add more w64 machines (obsolete) — Splinter Review
Attachment #8343987 - Flags: review?(jhopkins)
We're so close for esr17 to be done that might as well use this bug for the remaining machines.

linux-ix-slave01.build.scl1.mozilla.com has address 10.12.48.195
linux-ix-slave02.build.scl1.mozilla.com has address 10.12.48.196
linux-ix-slave03.build.scl1.mozilla.com has address 10.12.48.197
linux-ix-slave06.build.scl1.mozilla.com has address 10.12.48.200
linux64-ix-slave03.build.scl1.mozilla.com has address 10.12.49.46
linux64-ix-slave04.build.scl1.mozilla.com has address 10.12.49.47
linux64-ix-slave05.build.scl1.mozilla.com has address 10.12.49.48
linux64-ix-slave06.build.scl1.mozilla.com has address 10.12.49.49

I would like them to turn into:
w64-ix-slave163
w64-ix-slave164
w64-ix-slave165
w64-ix-slave166
w64-ix-slave167
w64-ix-slave168
w64-ix-slave169
w64-ix-slave170
Summary: Re-purpose linux-ix-slave0{4,5} and linux64-ix-slave0{1,2} as w64-ix-slave[159-162] (rev2) → Re-purpose linux-ix-slave## and linux64-ix-slave## as w64-ix-slave### (rev2)
Attachment #8343987 - Attachment is obsolete: true
Attachment #8343987 - Flags: review?(jhopkins)
Attachment #8343994 - Flags: review?(jhopkins)
Comment on attachment 8343994 [details] [diff] [review]
win64_machines.diff

So here's the full list:

[159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170]
Attachment #8343994 - Flags: review?(jhopkins) → review+
I've found more machines in bug 849022.
I will post another patch.
Depends on: 849022
Summary: Re-purpose linux-ix-slave## and linux64-ix-slave## as w64-ix-slave### (rev2) → Re-purpose mw32-ix-slave##, linux-ix-slave## and linux64-ix-slave## as w64-ix-slave### (rev2) machines
Attachment #8343994 - Attachment is obsolete: true
Depends on: 947947
Depends on: 947951
Duplicate of this bug: 784721
It seems that this will be our final list of machines: 
w64-ix-slave159.build.scl1.mozilla.com - bug 942093
w64-ix-slave160.build.scl1.mozilla.com - bug 942093
w64-ix-slave161.build.scl1.mozilla.com - bug 942093
w64-ix-slave162.build.scl1.mozilla.com - bug 942093
w64-ix-slave163.build.scl1.mozilla.com - bug 947947
w64-ix-slave164.build.scl1.mozilla.com - bug 947947
w64-ix-slave165.build.scl1.mozilla.com - bug 947947
w64-ix-slave166.build.scl1.mozilla.com - bug 947947
w64-ix-slave167.build.scl1.mozilla.com - bug 947947
w64-ix-slave168.build.scl1.mozilla.com - bug 947947
w64-ix-slave169.build.scl1.mozilla.com - bug 947947
w64-ix-slave170.build.scl1.mozilla.com - bug 947947
I used such dumb where clauses; whatever.
It seems like mw32-ix-slave09 and mw32-ix-slave10 had been deleted already from slavealloc.

mysql> select name  from slaves where notes like '%863236%';
+--------------------+
| name               |
+--------------------+
| linux-ix-slave01   |
| linux-ix-slave02   |
| linux-ix-slave06   |
| linux64-ix-slave03 |
| linux64-ix-slave04 |
| linux64-ix-slave05 |
| linux64-ix-slave06 |
+--------------------+
7 rows in set (0.08 sec)

mysql> delete from slaves where notes like '%863236%';
Query OK, 7 rows affected (0.08 sec)

mysql> select name  from slaves where notes like '%933768%';
+--------------------+
| name               |
+--------------------+
| linux-ix-slave04   |
| linux-ix-slave05   |
| linux64-ix-slave01 |
| linux64-ix-slave02 |
+--------------------+
4 rows in set (0.01 sec)

mysql> delete from slaves where notes like '%933768%';
Query OK, 4 rows affected (0.09 sec)

mysql> select name  from slaves where name like 'linux-ix-slave03';
+------------------+
| name             |
+------------------+
| linux-ix-slave03 |
+------------------+
1 row in set (0.31 sec)

mysql> delete from slaves where name like 'linux-ix-slave03';
Query OK, 1 row affected (0.51 sec)

mysql> select name from slaves where name like 'mw32-ix-slave%';
+-----------------+
| name            |
+-----------------+
| mw32-ix-slave01 |
| mw32-ix-slave02 |
| mw32-ix-slave03 |
| mw32-ix-slave04 |
| mw32-ix-slave05 |
| mw32-ix-slave06 |
| mw32-ix-slave07 |
| mw32-ix-slave08 |
| mw32-ix-slave11 |
| mw32-ix-slave12 |
+-----------------+
10 rows in set (0.00 sec)

mysql> delete from slaves where name like 'mw32-ix-slave%';
Query OK, 10 rows affected (0.00 sec)
Comment on attachment 8343994 [details] [diff] [review]
win64_machines.diff

https://hg.mozilla.org/build/buildbot-configs/rev/5573505906dc

Adding 158 as well.
Attachment #8343994 - Attachment is obsolete: false
Attachment #8343994 - Flags: checked-in+
Patch is in production
I've put w64-ix-slave{159..170} into the try pool after deploying the try keys.

We have these machines left to be put into the pool once bug 947951 is completed on Q1.
mw32-ix-slave01.build.mtv1.mozilla.com
mw32-ix-slave02.build.mtv1.mozilla.com
mw32-ix-slave03.build.mtv1.mozilla.com
mw32-ix-slave04.build.mtv1.mozilla.com
mw32-ix-slave05.build.mtv1.mozilla.com
mw32-ix-slave06.build.mtv1.mozilla.com
mw32-ix-slave07.build.mtv1.mozilla.com
mw32-ix-slave08.build.mtv1.mozilla.com
mw32-ix-slave09.build.mtv1.mozilla.com
mw32-ix-slave10.build.mtv1.mozilla.com
mw32-ix-slave11.build.mtv1.mozilla.com
mw32-ix-slave12.build.mtv1.mozilla.com
win32-ix-ref.build.mtv1.mozilla.com
linux-ix-ref.build.mtv1.mozilla.com
Whiteboard: waiting on dep bug
I have rebooted the machines I put into produciton because I was looking at C:\slave\buildbot.tac instead of C:\builds\moz2_slave\buildbot.tac. Well done me!

I have not yet found out how to see those machines in here:
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=try&type=w64-ix
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) (gone Thu. 12/19/2013 until 1/2/2014) from comment #16)
> I have not yet found out how to see those machines in here:
> https://secure.pub.build.mozilla.org/builddata/reports/slave_health/
> slavetype.html?class=try&type=w64-ix

I can see them on that page now.
Time to add these:
b-2008-ix-0001.winbuild.releng.scl3.mozilla.com
b-2008-ix-0002.winbuild.releng.scl3.mozilla.com
b-2008-ix-0003.winbuild.releng.scl3.mozilla.com
b-2008-ix-0004.winbuild.releng.scl3.mozilla.com
b-2008-ix-0005.winbuild.releng.scl3.mozilla.com
b-2008-ix-0006.winbuild.releng.scl3.mozilla.com
b-2008-ix-0007.winbuild.releng.scl3.mozilla.com
b-2008-ix-0008.winbuild.releng.scl3.mozilla.com
b-2008-ix-0009.winbuild.releng.scl3.mozilla.com (waiting for a disk replacement)
b-2008-ix-0010.winbuild.releng.scl3.mozilla.com
b-2008-ix-0011.winbuild.releng.scl3.mozilla.com
b-2008-ix-0012.winbuild.releng.scl3.mozilla.com
b-2008-ix-0013.winbuild.releng.scl3.mozilla.com
b-2008-ix-0014.winbuild.releng.scl3.mozilla.com
b-2008-ix-0015.winbuild.releng.scl3.mozilla.com
b-2008-ix-0016.winbuild.releng.scl3.mozilla.com
b-2008-ix-0017.winbuild.releng.scl3.mozilla.com
Priority: -- → P2
Summary: Re-purpose mw32-ix-slave##, linux-ix-slave## and linux64-ix-slave## as w64-ix-slave### (rev2) machines → Re-purpose mw32-ix-slave##, linux-ix-slave## and linux64-ix-slave## as b-2008-ix-#### (rev2) machines
Whiteboard: waiting on dep bug
What a messy mess I have turned this bug into.

Summary:
[IN PRODUCTION] w64-ix-slave{159..170} (try pool in scl1) come from [1]:
* linux-ix-slave0[1-6] 
* linux64-ix-slave0[1-6]

[WIP] b-2008-ix-00[01-17] (build pool in scl3) come from [2]:
* mw32-ix-slave[01-12]
* bld-linux64-ix-05[1-3] [3][4]
* win32-ix-ref 
* linux-ix-ref 

[1] bug 947947
[2] bug 947951
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=948997#c0
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=948997#c7
Summary: Re-purpose mw32-ix-slave##, linux-ix-slave## and linux64-ix-slave## as b-2008-ix-#### (rev2) machines → Re-purpose mw32-ix-slave##, linux-ix-slave##, linux64-ix-slave##, bld-linux64-ix-05[1-3], mw32-ix-ref and linux-ix-ref as b-2008-ix-#### (rev2) machines
Whiteboard: summary-in-comment-19
Attachment #8361893 - Flags: review?(jhopkins)
Attachment #8361931 - Flags: checked-in+
Attachment #8361932 - Flags: checked-in+
Attachment #8346703 - Flags: checked-in+
Attachment #8346704 - Flags: checked-in+
Depends on: 948997
Duplicate of this bug: 949120
Attachment #8361893 - Flags: review?(jhopkins) → review+
coop, how can I test if this is enough for slave_health?
Attachment #8362659 - Flags: feedback?(coop)
Comment on attachment 8362659 [details] [diff] [review]
w2008_slave_health.diff

Review of attachment 8362659 [details] [diff] [review]:
-----------------------------------------------------------------

You will also need to add the new slavetype here:

https://hg.mozilla.org/users/coop_mozilla.com/slave_health/file/0be8a81fe645/js/slave_health.js#l22

...and here:

https://hg.mozilla.org/users/coop_mozilla.com/slave_health/file/0be8a81fe645/js/slave_health.js#l76
Attachment #8362659 - Flags: feedback?(coop) → feedback+
coop, I'm looking at the code and I need further clarification.
You mention that a new slavetype should be added to "getSlavetypeForPendingJobs", however, we have an interesting case in here where a "w64-ix" machine can take the same jobs that a "b-2008-ix" can.

How should we deal with this case?
I'm thinking of adding it until the day we get rid of "w64-ix" machines at which point the clause will be reached.
Meanwhile, IIUC, the pending list for "b-2008-ix" will be 0 as we would be calculating it for "w64-ix" but not for "b-2008-ix".

Would this work for you?

function getSlavetypeForPendingJob(slaveclass, pending) {
    var slavetype = ""; 
    if (slaveclass == "try") {
    if (pending.match(/(OS X|macosx64)/)) {
        slavetype = "bld-lion-r5";
    } else if (pending.match(/(WINNT|win32|win64)/)) {
        slavetype = "w64-ix";
Attachment #8362986 - Flags: review?(coop)
in production
Comment on attachment 8362986 [details] [diff] [review]
[slavehealth] add b-2008 machines

Review of attachment 8362986 [details] [diff] [review]:
-----------------------------------------------------------------

html does not allow for the same id to be repeated, and can confuse many things (getElementById()). There are fallbacks specced out to make things *slightly* saner, but that won't work great.
Attachment #8362986 - Flags: feedback-
I deployed it as-is without review to quite the cronjob.
I can adjust it once I receive the review.
http://hg.mozilla.org/users/coop_mozilla.com/slave_health/rev/128bcd9c0e58

WRT to comment 29 we have repeated IDs on the html page. I was just propagating the existing pattern without knowing that it needed to be fixed.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #27)
> coop, I'm looking at the code and I need further clarification.
> You mention that a new slavetype should be added to
> "getSlavetypeForPendingJobs", however, we have an interesting case in here
> where a "w64-ix" machine can take the same jobs that a "b-2008-ix" can.
> 
> How should we deal with this case?
> I'm thinking of adding it until the day we get rid of "w64-ix" machines at
> which point the clause will be reached.
> Meanwhile, IIUC, the pending list for "b-2008-ix" will be 0 as we would be
> calculating it for "w64-ix" but not for "b-2008-ix".

Where do we have the most capacity, i.e. which slaveclass is *likely* to run the job?

e.g. multiple machines can pick up linux jobs, but we display the pending count for spot only because that's where we shunt most of the work these days.
(In reply to Chris Cooper [:coop] from comment #31)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #27)
> > coop, I'm looking at the code and I need further clarification.
> > You mention that a new slavetype should be added to
> > "getSlavetypeForPendingJobs", however, we have an interesting case in here
> > where a "w64-ix" machine can take the same jobs that a "b-2008-ix" can.
> > 
> > How should we deal with this case?
> > I'm thinking of adding it until the day we get rid of "w64-ix" machines at
> > which point the clause will be reached.
> > Meanwhile, IIUC, the pending list for "b-2008-ix" will be 0 as we would be
> > calculating it for "w64-ix" but not for "b-2008-ix".
> 
> Where do we have the most capacity, i.e. which slaveclass is *likely* to run
> the job?
> 
It is more likely to run on w64-ix machines. We only have 12 b-2008-ix machines.

> e.g. multiple machines can pick up linux jobs, but we display the pending
> count for spot only because that's where we shunt most of the work these
> days.
Comment on attachment 8362986 [details] [diff] [review]
[slavehealth] add b-2008 machines

Review of attachment 8362986 [details] [diff] [review]:
-----------------------------------------------------------------

No blockers, but you should remove the code that won't match anything before landing.

::: index.html
@@ +96,5 @@
>        </tr>
>        <tr id="w64-ix">
>          <td class="slavetype"><a href="./slavetype.html?class=try&type=w64-ix">w64-ix</td><td class="pending">0</td><td class="status"></td>
>        </tr>
> +      <tr id="b-2008-ix">

Callek is right about the duplicate ids, but since the other slaveclasses have the same problem, I won't block review on it.

::: js/slave_health.js
@@ +25,5 @@
>  	if (pending.match(/(OS X|macosx64)/)) {
>  	    slavetype = "bld-lion-r5";
>  	} else if (pending.match(/(WINNT|win32|win64)/)) {
>  	    slavetype = "w64-ix";
> +	} else if (pending.match(/(WINNT|win32|win64)/)) {

This will never match anything.

@@ +35,5 @@
>  	if (pending.match(/(OS X|macosx64)/)) {
>  	    slavetype = "bld-lion-r5";
>  	} else if (pending.match(/(WINNT|win32|win64|fuzzer-win64)/)) {
>  	    slavetype = "w64-ix";
> +	} else if (pending.match(/(WINNT|win32|win64|fuzzer-win64)/)) {

This will never match anything.
Attachment #8362986 - Flags: review?(coop) → review+
It seems that the b-2008-ix machines don't have the right keys.
Fixed the non-matching lines:
http://hg.mozilla.org/users/coop_mozilla.com/slave_health/rev/673b516d52b5

Waiting for the b-2008-ix to have their keys deployed before putting them back into production.
Whiteboard: summary-in-comment-19 → summary-in-comment-35 - final list of machines in comment 19
It seems that on bug 930595 we stopped deploying the keys to the Windows machines.
I will deploy them manually for now.
I put 0001 into production.

Followed instructions: https://wiki.mozilla.org/ReleaseEngineering/How_To/Adjust_SSH_keys_on_a_slave#Production
It seems I had missed removing them from the configs.
Attachment #8363714 - Flags: review?(coop)
Attachment #8363714 - Flags: review?(coop) → review+
Comment on attachment 8363714 [details] [diff] [review]
[configs] remove mw32-ix, linux-ix and linux64-ix machines

Review of attachment 8363714 [details] [diff] [review]:
-----------------------------------------------------------------

Make sure staging_config.py doesn't have any mw32 slaves too, please.
Comment on attachment 8363714 [details] [diff] [review]
[configs] remove mw32-ix, linux-ix and linux64-ix machines

Staging was updated as well.
https://hg.mozilla.org/build/buildbot-configs/rev/efff4f5862a4
Attachment #8363714 - Flags: checked-in+
in production.
Depends on: 963123
Disabled b-2008-ix-0001 due to bug 963123
I've added b-2008-ix-0001 after I fixed the basedir to start with lower case 'c' rather than 'C'.
I see two leak builds that have failed:
https://tbpl.mozilla.org/php/getParsedLog.php?id=33723503&tree=Fx-Team&full=1 - 7 test failures on make check
https://tbpl.mozilla.org/php/getParsedLog.php?id=33715327&tree=Jamun&full=1 - 7 test failures on make check

It got started to bug 830931 (intermittent orange).

I want to see more jobs run.

FAILURES:
TIMEOUTS:
    --ion-eager --ion-parallel-compile=off --ion-check-range-analysis --no-sse3 c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\js\src\jit-test\tests\basic\bug710947.js
    c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\js\src\jit-test\tests\basic\bug710947.js
    --ion-eager --ion-parallel-compile=off c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\js\src\jit-test\tests\basic\bug710947.js
    --baseline-eager c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\js\src\jit-test\tests\basic\bug710947.js
    --baseline-eager --no-ti --no-fpu c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\js\src\jit-test\tests\basic\bug710947.js
    --no-baseline --no-ion --no-ti c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\js\src\jit-test\tests\basic\bug710947.js
    --no-baseline --no-ion c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\js\src\jit-test\tests\basic\bug710947.js
Result summary:
Passed: 27524
Failed: 7
I rebooted the other 16 b-2008-ix machines into production.
I will check again on them.
I fixed them and put them back into the pool.
0009 should be ready now as well.
Depends on: 966771
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.