After we disable talos jobs (partially) on Fedora minis we should re-purpose some of them

RESOLVED FIXED

Status

Infrastructure & Operations
Buildduty
P2
normal
RESOLVED FIXED
5 years ago
18 days ago

People

(Reporter: armenzg, Assigned: armenzg)

Tracking

Details

(Whiteboard: [re-js][re-nad-20130501])

Attachments

(10 attachments)

1.28 KB, patch
Callek
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
1.85 KB, patch
Callek
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
961 bytes, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
867 bytes, patch
Callek
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
578 bytes, patch
Callek
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
544 bytes, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
699 bytes, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
787 bytes, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
1.25 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
1.59 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
In bug 863903 we're disabling talos on FF23 based project branches.
After a couple of days I will be able to determine how many we can re-purpose.

This bug is to track the work after that bug.
Summary: After we disable talos jobs (partially) on Fedora minis we should re-purpose them → After we disable talos jobs (partially) on Fedora minis we should re-purpose some of them
Whiteboard: [re-js]
Assignee: nobody → armenzg
update: next action date: 2013-05-01
 - 04-25 - fedora talos jobs are expected to be disabled today (Apr 25). 
 - 05-01 - we'll be in a position to identify the number of machines to be reimaged. Likely in the 30 machine range.
Whiteboard: [re-js] → [re-js][re-nad-20130501]
We disabled talos jobs late in the day yesterday (2013-04-25 18:56:31 PDT) so the wait time numbers for yesterday do not help. [1]

Nevertheless, if I tweak this URL I can see how we did in the last 12 hours [2]:
https://secure.pub.build.mozilla.org/buildapi/reports/waittimes/testpool?starttime=1366938000&endtime=1366992000
and I can compare to the same time period for a day before: [3]
https://secure.pub.build.mozilla.org/buildapi/reports/waittimes/testpool?starttime=1366851600&endtime=1366905600

I'm going to disable 15 of our fed64 machines and 15 of the fed32 machines and see on Monday how we do today.

talos-r3-fed-001
-> 002 is loaned
talos-r3-fed-003
talos-r3-fed-004
talos-r3-fed-005
talos-r3-fed-006
talos-r3-fed-007
talos-r3-fed-008
talos-r3-fed-009
talos-r3-fed-010
talos-r3-fed-011
talos-r3-fed-012
talos-r3-fed-013
talos-r3-fed-014
talos-r3-fed-015
talos-r3-fed-016
talos-r3-fed64-031
talos-r3-fed64-032
talos-r3-fed64-033
talos-r3-fed64-034
--> not existent
talos-r3-fed64-036
talos-r3-fed64-037
talos-r3-fed64-038
talos-r3-fed64-039
talos-r3-fed64-040
talos-r3-fed64-041
talos-r3-fed64-042
talos-r3-fed64-043
talos-r3-fed64-044
talos-r3-fed64-045
talos-r3-fed64-046


[1]
fedora: 3592
  0:     3464    96.44%
 15:       92     2.56%
 30:       35     0.97%
 45:        1     0.03%

fedora64: 874
  0:      857    98.05%
 15:       17     1.95%
#####################################
[2]
fedora: 1297
0	1297	100.00%

fedora64: 291
0	291	100.00%
#####################################
[3]
fedora: 1789
0	1789	100.00%
fedora64: 622
0	617	99.20%
15	4	0.64%
30   	1	0.16%
Priority: -- → P2
I removed too many from each pool :(
I will put 10 machines back for each pool.

Friday's wait times:
####################
fedora:   3814   86.37%
fedora64: 1001    54.65%

The list for now will be this (I've added a new machine talos-r3-fed-002):
| talos-r3-fed-001   |
| talos-r3-fed-002   |
| talos-r3-fed-003   |
| talos-r3-fed-004   |
| talos-r3-fed-005   |
| talos-r3-fed64-031 |
| talos-r3-fed64-032 |
| talos-r3-fed64-033 |
| talos-r3-fed64-034 |
| talos-r3-fed64-036 |
Created attachment 743093 [details] [diff] [review]
[buildbot-configs] remove Fed32/64 machines and add 10 WinXP machines
Attachment #743093 - Flags: review?(bugspam.Callek)
Created attachment 743094 [details] [diff] [review]
[puppet] remove 10 Fedora machines
Attachment #743094 - Flags: review?(bugspam.Callek)
Created attachment 743095 [details] [diff] [review]
[slavealloc] add 10 more xp machines
Attachment #743095 - Flags: review?(bugspam.Callek)
Created attachment 743096 [details] [diff] [review]
[graphs] add 10 more xp machines
Attachment #743096 - Flags: review?(bugspam.Callek)

Updated

5 years ago
Attachment #743093 - Flags: review?(bugspam.Callek) → review+

Updated

5 years ago
Attachment #743094 - Flags: review?(bugspam.Callek) → review+
Attachment #743095 - Flags: review?(bugspam.Callek) → review+

Updated

5 years ago
Attachment #743096 - Flags: review?(bugspam.Callek) → review+
Created attachment 743166 [details] [diff] [review]
[opsi] add winxp machines

I forgot this patch.
Attachment #743166 - Flags: review?(bugspam.Callek)
Comment on attachment 743166 [details] [diff] [review]
[opsi] add winxp machines

Because I like you I'll let you land this, but I will never admit to reviewing an opsi patch, ever.
Attachment #743166 - Flags: review?(bugspam.Callek) → review+
Attachment #743093 - Flags: checked-in+
Attachment #743094 - Flags: checked-in+
Attachment #743095 - Flags: checked-in+
Attachment #743096 - Flags: checked-in+
Attachment #743166 - Flags: checked-in+
Today we had better wait times (we didn't have as high of a load though - only 42K jobs).

I will disable another 5 from each and see how we do tomorrow.

mysql> select name from slaves where notes like '%864741%' and envid=2 order by name;
+--------------------+
| name               |
+--------------------+
| talos-r3-fed-006   |
| talos-r3-fed-007   |
| talos-r3-fed-008   |
| talos-r3-fed-009   |
| talos-r3-fed-010   |
| talos-r3-fed64-037 |
| talos-r3-fed64-038 |
| talos-r3-fed64-039 |
| talos-r3-fed64-040 |
| talos-r3-fed64-041 |
+--------------------+
10 rows in set (0.11 sec)


Monday's wait times:
####################
fedora:   3346 - 99.13%
fedora64:  857 - 98.95%

Friday's wait times:
####################
fedora:   3814   86.37%
fedora64: 1001    54.65%
No longer blocks: 866194
Depends on: 866194
The following WinXP machines have been added to the production pool:
talos-r3-xp-128
talos-r3-xp-129
--
talos-r3-xp-131

Waiting on imaging for the remaining ones.
I'm putting 3 fed64 machines back so we won't score so lowly on a high load day.
We won't hit perfect with 3 more but we're OK to have better XP wait times than Fed64 as the end-to-end times are worst on Windows.

This means that I will request 7 more machines to be re-imaged:
| talos-r3-fed-006   |
| talos-r3-fed-007   |
| talos-r3-fed-008   |
| talos-r3-fed-009   |
| talos-r3-fed-010   |
| talos-r3-fed64-037 |
| talos-r3-fed64-038 |
| talos-r3-fed64-039 |


Tuesday's wait times (-10 machines):
####################################
Total jobs: 49k
fedora: 3883 - 93.79%
fedora64: 1112 - 76.35%

Monday's wait times (-5 machines):
##################################
Total jobs: 42k
fedora:   3346 - 99.13%
fedora64:  857 - 98.95%

Friday's wait times (-15 machines):
###################################
Total jobs: 39k
fedora:   3814   86.37%
fedora64: 1001    54.65%
Created attachment 745180 [details] [diff] [review]
[opsi] add more winxp machines
Attachment #745180 - Flags: review?
Created attachment 745181 [details] [diff] [review]
[graphs] add more xp machines
Attachment #745181 - Flags: review?
Created attachment 745182 [details] [diff] [review]
[slavealloc] add more xp machines
Created attachment 745183 [details] [diff] [review]
[configs] add more xp machines
Attachment #745183 - Flags: review?
Attachment #745182 - Flags: review?
Created attachment 745186 [details] [diff] [review]
[puppet] remove some fed32/fed64 machines

The list of machines.
Attachment #745186 - Flags: review?(rail)
Attachment #745183 - Flags: review? → review+
Attachment #745182 - Flags: review? → review+
Attachment #745186 - Flags: review?(rail) → review+
Attachment #745181 - Flags: review? → review+
Attachment #745180 - Flags: review? → review+
Attachment #745180 - Flags: checked-in+
Attachment #745181 - Flags: checked-in+
Attachment #745182 - Flags: checked-in+
Attachment #745183 - Flags: checked-in+
Attachment #745186 - Flags: checked-in+
In production.
All the machines that have been set aside have been re-imaged.

WinXP wait times are in the 90+% close to 95%.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering

Updated

18 days ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.