Closed Bug 681534 Opened 13 years ago Closed 13 years ago

Rebalance win32 slaves between buildpool and trybuildpool

Categories

(Release Engineering :: General, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: armenzg)

References

Details

(Whiteboard: [buildslaves])

Attachments

(2 files)

Now that all those win32 slaves are back online, wait times look much better. Very cool. However, looking at today's waittimes report, it seems like we are not optimally balanced yet:

(buildpool) win2k3: 209
  0:      142    67.94%
 15:       15     7.18%
 30:       14     6.70%
 45:       12     5.74%
 60:       12     5.74%
 75:        0     0.00%
 90+:       14     6.70%


(trybuildpool) win2k3: 127
  0:      125    98.43%
 15:        2     1.57%


If this keeps happening, it may make sense to rebalance our slave pools by moving some win32 slaves back from trybuildpool to buildpool? 

Lets watch this for another day/two to see if this pattern is consistent before doing anything. Filing to track.
There are several things that we need to do as well:
* see how many win32 slaves are ready for setup and set them up
* see how many win32 have not been rebooted from the previous pass
On 24aug2011, 6:03am:

(buildpool) win2k3: 168
  0:      144    85.71%
 15:       10     5.95%
 30:       10     5.95%
 45:        4     2.38%

(trybuildpool) win2k3: 119
  0:      119   100.00%



On 25aug2011, 6:03am:

(buildpool) win2k3: 177
  0:      154    87.01%
 15:        4     2.26%
 30:        2     1.13%
 45:        8     4.52%
 60:        7     3.95%
 75:        2     1.13%

(trybuildpool) win2k3: 108
  0:      108   100.00%
Whiteboard: [buildslaves]
Based on comment#0, comment#2, I think we have enough win32 slaves to keep up on trybuildpool, but need more win32 slaves on buildpool.

We could improve buildpool by any/all of:
* migrating some machines from trybuildpool to buildpool
* finding and fixing any offline win32 machines, and adding them to buildpool.
I did #2.
I addressed most slaves in bug 680494.

I will add 2 more slaves in bug 682083 and probably move 3 more from the trypool.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Priority: -- → P2
I have added 7 slaves in bug 682408.

According to slavealloc we now have 34 build slaves and 29 try slaves:
http://slavealloc.build.mozilla.org/ui/#silos

Let's see tomorrow's wait times and close this.
I added the slaves midday. Let's see wait times for tomorrow.

jobs submitted between Tue, 06 Sep 2011 00:00:00 -0700 (PDT) and Wed, 07 Sep 2011 00:00:00 -0700 (PDT)

* From buildpool:
win2k3: 219
  0:      199    90.87%
 15:        7     3.20%
 30:        4     1.83%
 45:        9     4.11%

* From trypool:
win2k3: 89
  0:       89   100.00%
We also have wait times on Linux slave on try.
I will do an analysis of which slaves are under maintenance tomorrow.

This is our current distribution (from silos page):
           buildpool trypool
win2ksp3        34       29
linux IX        54        7
linux VM        42       33

FTR we had 110 pushes on Tuesday with a high of 12 and with 10 hours of the day over 6 or more pushes/hour:
https://build.mozilla.org/buildapi/reports/pushes?int_size=3600&endtime=1316588400&starttime=1316502000
Wait: 893/96.19% (buildpool)
Wait: 798/85.96% (trybuildpool) 

In contrast 2 weeks ago we had:
* Tuesday - 123 pushes. A high of 11. 9 hours of the day with 6 or more pushes/hour.
https://build.mozilla.org/buildapi/reports/pushes?int_size=3600&endtime=1315378800&starttime=1315292400
Wait: 1261/98.18% (buildpool)
Wait: 521/96.35% (trybuildpool)
* Wednesday - 112 pushes. A high of 15. 8 hours of the day with 6 or more pushes/hour.
https://build.mozilla.org/buildapi/reports/pushes?int_size=3600&endtime=1315465200&starttime=1315378800
Wait: 1275/96.31% (buildpool)
Wait: 574/71.60% (trybuildpool) <- linux and w32 had bad wait times that day
 
With this analysis I wanted to say that Tuesday Sep. 20th is similar to a normal day wrt to pushes.
The interesting about this week's Tuesday is the ratio of buildpool jobs vs trypool jobs:
09/20 - 1.11
09/07 - 2.42
09/06 - 2.22

This could mean more clobber time due to try jobs.
I think I need a view that shows me # of build jobs VS try job broken down per platform over time with a correlation to active slaves and # of pushes. Besides the ratio & perhaps averages of each job type. oh dear.

* From the try pool 798/85.96% (trybuildpool) 
linux: 349
  0:      269    77.08%
 15:       26     7.45%
 30:       27     7.74%
 45:       14     4.01%
 60:       13     3.72%

win2k3: 140
  0:      108    77.14%
 15:       19    13.57%
 30:        9     6.43%
 45:        4     2.86%


* From the build pool 893/96.19% (buildpool)
linux: 354
  0:      351    99.15%
 15:        3     0.85%

win2k3: 153
  0:      131    85.62%
 15:       14     9.15%
 30:        6     3.92%
 45:        2     1.31%
Summary: Rebalance win32 slaves in buildpool and trybuildpool? → Rebalance win32/linux slaves between buildpool and trybuildpool
Long story short is that we have now another 6 win32 slaves that needed to be rebooted.
I hope we can analyze the wait times on Friday (yesterday there was a downtime).

> This is our current distribution (from silos page):
>            buildpool trypool
> win2ksp3        34       29   = 63 prod IXs
> linux IX        54        7   = 61 prod IXs
> linux VM        42       33   = 75 prod VMs

In slavealloc I see (just double checking the silos page info):
* 67 w32 slaves - 4 preprod & 63 prod (4 maintenance [1] & 6 MIA [2])
* 65 centos5 iX slaves - 4 preprod & 61 prod (1 maintenance & 0 MIA)

6 w32 slaves are out of the pool unnoticed & unhandled. If idleizer was there and the buildbot-start check enabled we would have caught all of this issues. Also checking http://build.mozilla.org/builds/last-job-per-slave.txt would have shown it.
I have rebooted all these 6 slaves.
I have rebooted any slaves pointing to bm05 & bm02

[1] bug 682574, bug 684349, bug 673972 & bug 684374

[2]
w32-ix-slave31 20 days - no note
* connected to bm5 (nonexistent master) - rebooted & back
w32-ix-slave34 15 days - no note
* connected to bm5 (nonexistent master)
w32-ix-slave26 14 days - no note
* undetermined reasons - see [3]
w32-ix-slave41 11 days - no note
* undetermined reasons - see [3]
w32-ix-slave28  7 days - no note
* undetermined reasons - see [3]
w32-ix-slave29  5 days - no note
* absolutely no idea from twistd.log

[3]
2011-09-10 10:47:30-0700 [Broker,client] Connected to buildbot-master13.build.scl1.mozilla.com:9001; s
lave is ready
2011-09-10 10:47:46-0700 [-] Received SIGBREAK, shutting down.
2011-09-10 10:47:46-0700 [Broker,client] lost remote
...
2011-09-10 10:47:46-0700 [Broker,client] lost remote                      
2011-09-10 10:47:46-0700 [Broker,client] Lost connection to buildbot-master13.build.scl1.mozilla.com:9
001                                                  
2011-09-10 10:47:46-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x015
BD260>
2011-09-10 10:47:46-0700 [-] Main loop terminated.
2011-09-10 10:47:46-0700 [-] Server Shut Down.

2011-09-14 12:34:21-0700 [-] Received SIGBREAK, shutting down.                                        
2011-09-14 12:34:21-0700 [Broker,client] Lost connection to buildbot-master12.build.scl1.mozilla.com:9
001                                                                                                   
2011-09-14 12:34:21-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x016
7C850>                                                                                                
2011-09-14 12:34:21-0700 [-] Main loop terminated.                                                    
2011-09-14 12:34:21-0700 [-] Server Shut Down.                                                        
2011-09-14 12:34:21-0700 [-] Server Shut Down.

2011-09-13 18:02:01-0700 [Broker,client] Connected to buildbot-master13.build.scl1.mozilla.com:9001; s
lave is ready                                                                                         
2011-09-14 01:02:01-0700 [-] I feel very idle and was thinking of rebooting as soon as the buildmaster
 says it's OK                                                                                         
2011-09-14 01:02:01-0700 [-] Telling the master we want to shutdown after any running builds are finis
hed                                                                                                   
2011-09-14 01:02:01-0700 [Broker,client] Master does not support slave initiated shutdown.  Upgrade ma
ster to 0.8.3 or later to use this feature.                                                           
2011-09-14 01:02:01-0700 [Broker,client] rebooting NOW, since the master won't talk to us             
2011-09-14 01:02:01-0700 [Broker,client] Invoking platform-specific reboot command 
2011-09-10 10:47:46-0700 [-] Server Shut Down.
FTR the w32 slaves on maintenance are these:
* w32-ix-slave03 - bug 682574
* w32-ix-slave05 - bug 684349
* w32-ix-slave06 - bug 673972
* w32-ix-slave35 - bug 684374

I had to use IPMI (RDP/ssh failed) to reach w32-ix-slave28 (which I rebooted yesterday) and I have rebooted again today. Yesterday it didn't even manage to connect (see log below):
2011-09-14 12:34:21-0700 [-] Server Shut Down.                                  
2011-09-23 07:51:54-0700 [-] Log opened.

I had to reboot w32-ix-slave41 (again) with the same error as in comment 8 (bm13-build1). The slave had taken one job yesterday which finished at Sep 22 12:23:12 2011 and lost connection at 12:25:14-0700 (2 mins later). I have rebooted it again today.

Both are now taking jobs.
Long story-short:
* I had missed some slaves to reboot
* I have found an OPSI issue with buildbot-startup which I am debugging

## OPSI issue
I see a bunch of slaves without the newer buildbot.bat [1] if not all.
This newer buildbot.bat has an extra 30 seconds so a SIGBREAK too early won't break the rebooting process.

The package on OPSI is marked as installed and the version says 2.2 which is the right one but without no effect. There is something very very funky in here.

I marked slave w32-ix-slave10 to "setup" the buildbot-startup package but did not deploy the newer buildbot.bat after a reboot.
After I attempted to install this package I see this:
> Failed to load application: No module named buildslave.bot

Differences from the machine to what is on the repos
* C:\runslave.py - the same
* D:\mozilla-build\start-buildbot.bat - the same
* c:\documents and settings\cltbld\start menu\programs\startup\buildbot.bat - _different_
* D:\mozilla-build\start-buildbot.sh - not needed anymore - not existent on slave

Then I noticed that production-opsi had local changes (see ~/opsi-package-sources/buildbot-startup/CLIENT_DATA/buildbot.bat.old or ~/armenzg_buildbot.bat). I have done the following to bring us up-to-date:
* hg revert buildbot-startup/CLIENT_DATA/buildbot.bat
* ./sync-binaries
* su -c './regenerate-package buildbot-startup'

This would seem to work, right? Well, not really. sync_binaries copies the files from ~/opsi-binaries unto ~/opsi-package-sources which overwrites the newer copy from hg of buildbot.bat with the older one from CVS.

w32-ix-slave10 now has the newer version of buildbot.bat.

This doesn't seem to fix the buildslave.bot. I will keep on investigating.

## Missed slaves
I also missed these slaves MIA because last-jot-per-slave.html has a block of buildpool and a block of trypool slaves which I completely missed.
* w32-ix-slave02 - (7 days - bm14-try1)
* w32-ix-slave07 - [3] Can't stop reactor (21 days - bm05 which is non-existent)
**  No module named buildslave.bot 
* w32-ix-slave08 - lost connection like in comment 8 (7 days - bm14-try1)
**  No module named buildslave.bot 
* w32-ix-slave20 - twistd.log does not say much besides failing to reboot (15 days - bm03-trybuilder)
** after rebooting nothing new is written on twistd.log
* w32-ix-slave23 - lost connection like in comment 8 (13 days - bm14-try1)

[1] http://mxr.mozilla.org/build/source/opsi-package-sources/buildbot-startup/CLIENT_DATA/buildbot.bat
[2] http://hg.mozilla.org/build/opsi-package-sources/rev/fbb827acf9a0
[3] twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running.
[4] 
cltbld@staging-opsi:~/opsi-binaries$ cvs remove buildbot-startup
cvs remove: Removing buildbot-startup
cvs remove: file `buildbot-startup/buildbot.bat' still in working directory
cvs remove: 1 file exists; remove it first
cltbld@staging-opsi:~/opsi-binaries$ cvs remove buildbot-startup/buildbot.bat 
cvs remove: file `buildbot-startup/buildbot.bat' still in working directory
cvs remove: 1 file exists; remove it first
cltbld@staging-opsi:~/opsi-binaries$ rm buildbot-startup/buildbot.bat 
cltbld@staging-opsi:~/opsi-binaries$ cvs remove buildbot-startup/buildbot.bat 
cvs remove: scheduling `buildbot-startup/buildbot.bat' for removal
cvs remove: use 'cvs commit' to remove this file permanently
cltbld@staging-opsi:~/opsi-binaries$ cvs commit -m "Bug 681534. Remove old copy of buildbot.bat which overwrites the newer buildbot.bat from opsi-package-sources. r=bustage"
Removing buildbot-startup/buildbot.bat;
/mofo/opsi-binaries/buildbot-startup/buildbot.bat,v  <--  buildbot.bat
new revision: delete; previous revision: 1.3
done
Long story-short: we need this patch to install the latest version of runslave.py since the current package does not do the job correctly.

I tried marking the "buildbot-startup" on slaves 04, 07, 08 & 11 which had the issue but did not deploy the newer runslave.py (To reproduce check for buildbotve in C:\runslave.py and you will see that is mising. Also running D:\mozilla-build\start-buildbot.bat would fail).

This package fixes the issue.

To fix a slave locally all you have to do is run this through ssh:
runas /user:administrator "D:\mozilla-build\wget\wget -OC:\runslave.py http://hg.mozilla.org/build/puppet-manifests/raw-file/61c0b7a13f40/modules/buildslave/files/runslave.py"
Attachment #562144 - Flags: review?(coop)
Comment on attachment 562144 [details] [diff] [review]
[opsi] buildbot-startup - install the latest runslave.py correctly

Review of attachment 562144 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #562144 - Flags: review?(coop) → review+
I will deploy this on Monday to have all day ahead of me.
We will measure the wait times on Tuesday.
I am happy we have so many machines back and that we know the core issue (old runslave.py and old buildbot.bat) and we have a fix :D
Comment on attachment 562144 [details] [diff] [review]
[opsi] buildbot-startup - install the latest runslave.py correctly

>+++ b/buildbot-startup/CLIENT_DATA/runslave.py
>@@ -0,0 +1,483 @@
>+#!/usr/bin/python
>+
>+# MOZILLA DEPLOYMENT NOTES
>+# - This file is distributed to all buildslaves by Puppet, placed at
>+#   /usr/local/bin/runslave.py on POSIX systems (linux, darwin) and
>+#   C:\runslave.py on Windows systems
>+# - It lives in the 'buildslave' puppet module

Nit: this comment is no longer accurate. I suggest we modify it to say that OPSI does a static import of ($rev$) from puppet-manifests... And note (in puppet-manifests) that there is a copy of this file in OPSI as well. So that we don't get bitten by someone editing this expecting puppet to change, or vice-versa.
Comment on attachment 562144 [details] [diff] [review]
[opsi] buildbot-startup - install the latest runslave.py correctly

Good point Callek. I have adjusted the comment. I will follow up with the comment fix up on puppet-manifests.

checked-in as:
http://hg.mozilla.org/build/opsi-package-sources/rev/0bcbdf4f0466

I have marked w32-ix-slave[07-14] to get this newer version of runslave.py and buildbot.bat. I have verified that the slaves start properly. I will look at how the builds they do over day go. We could deploy this system-wide in few days.
Attachment #562144 - Flags: checked-in+
Attachment #562421 - Flags: review?(coop)
Attachment #562421 - Flags: review?(bugspam.Callek)
Comment on attachment 562421 [details] [diff] [review]
[puppet] comment adjustment for runslave.py

I'm happy with this, we should be sure to copy the comment over to puppet-manifests as well, imho.
Attachment #562421 - Flags: review?(bugspam.Callek) → review+
Long story-short:
* I have deployed the OPSI package to 8 slaves in comment 15
** these slaves should be less likely to fail to reboot and get hung
* I have added another 3 slaves into the pool
* We will check the wait times tomorrow

######## 

Besides the w32 slaves in maintenance from comment 9 I have rebooted these slaves:
* w32-ix-slave20 - 18 days - bm03-trybuilder (how did I miss this one? :S)
** missing twistd.py [1]
* w32-ix-slave30 - 3 days - bm12-build1
** missing twistd.py [1]
* w32-ix-slave34 - 3 days - bm12-build1

NOTE: I left alone slaves that have been idle for 2 days since we just had a weekend.

To fix the twistd.py issue I reinstalled buildbot on those two machines (since a simple reboot did not fix it). I don't know how slave30's setup would have got messed up if it had been taking jobs up to 3 days ago.

We will check the wait times tomorrow.

[1] This shows up on the CMD window:
D:\mozilla-build\buildbotve\scripts\python.exe: can't open file 'D:\mozilla-buil
d\buildbotve\scripts\twistd.py': [Errno 2] No such file or directory
(In reply to Justin Wood (:Callek) from comment #17)
> Comment on attachment 562421 [details] [diff] [review] [diff] [details] [review]
> [puppet] comment adjustment for runslave.py
> 
> I'm happy with this, we should be sure to copy the comment over to
> puppet-manifests as well, imho.

You just reviewed the puppet-manifests one. I already deployed this change into the OPSI repo.
Attachment #562421 - Flags: review?(coop) → review+
I have deployed the newer buildbot.bat to all w32 slaves as I had not seen any fallout.

I deployed on Monday the newer buildbot.bat to w32-ix-slave08 and it got into a hung state. This is different to the previous one as it is not hung start-buildbot.bat died too early but because after rebooting the machine run runslave.py and immediately exiting.

I am trying to figure out who sent that SIGBREAK. The master or the slave and why?

2011-09-27 11:23:49-0700 - count_and_reboot.py is called
2011-09-27 11:24:02 2011 - The last job finished. (from the master's logs)
2011-09-27 11:25:42 2011 - "About to run runslave.py" // This means the slave has come back from a reboot
2011-09-27 11:25:48-0700 [-] Log opened. 
2011-09-27 11:25:50-0700 [-] Received SIGBREAK, shutting down.
2011-09-27 11:25:50 2011 - "runslave.py finished"

[1] Full log
2011-09-27 11:25:48-0700 [-] Log opened.                       
2011-09-27 11:25:48-0700 [-] twistd 10.2.0 (D:\mozilla-build\buildbotve\scripts\python.exe 2.5.0) starting up.
2011-09-27 11:25:48-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor.
2011-09-27 11:25:48-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x015BDEB8>
2011-09-27 11:25:48-0700 [-] Connecting to buildbot-master14.build.scl1.mozilla.com:9101
2011-09-27 11:25:48-0700 [Broker,client] message from master: attached                   
2011-09-27 11:25:48-0700 [Broker,client] SlaveBuilder.remote_print(WINNT 5.2 try leak test build): message from master: attached
2011-09-27 11:25:48-0700 [Broker,client] SlaveBuilder.remote_print(WINNT 5.2 Mobile Desktop try build): message from master: attached             
2011-09-27 11:25:48-0700 [Broker,client] SlaveBuilder.remote_print(WINNT 5.2 try build): message from master: attached     
2011-09-27 11:25:48-0700 [Broker,client] Connected to buildbot-master14.build.scl1.mozilla.com:9101; slave is ready
2011-09-27 11:25:50-0700 [-] Received SIGBREAK, shutting down.      
2011-09-27 11:25:50-0700 [Broker,client] lost remote                 
2011-09-27 11:25:50-0700 [Broker,client] lost remote
2011-09-27 11:25:50-0700 [Broker,client] lost remote
2011-09-27 11:25:50-0700 [Broker,client] Lost connection to buildbot-master14.build.scl1.mozilla.com:9101
2011-09-27 11:25:50-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x015BDEB8>
2011-09-27 11:25:50-0700 [-] Main loop terminated.                               
2011-09-27 11:25:50-0700 [-] Server Shut Down                            
2011-09-27 11:25:50-0700 [-] Server Shut Down.

[2] Full master's log:
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] BuildSlave.detached(w32-ix-slave08)
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] <Build WINNT 5.2 try leak test build>.lostRemote
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34]  stopping currentStep <buildbotcustom.steps.misc.DisconnectStep instance at 0x180276c8> 
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] releaseLocks(<buildbotcustom.steps.misc.DisconnectStep instance at 0x180276c8>): []
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34]  step 'maybe_rebooting' complete: success
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34]  <Build WINNT 5.2 try leak test build>: build finished
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34]  setting expectations for next time
2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] new expectations: 12126.7029839 seconds
2011-09-27 11:24:02-0700 [-] Sorted 0 builders in 0.02s
2011-09-27 11:24:11-0700 [-] Pulse <0x16a60878>: Processed 3 events (0 heartbeats) in 0.00 seconds
2011-09-27 11:24:13-0700 [-] Pulse <0x16fdc9e0>: Processed 2 events (0 heartbeats) in 0.00 seconds
2011-09-27 11:24:25-0700 [-] Sorted 0 builders in 0.02s
2011-09-27 11:25:28-0700 [-] Sorted 0 builders in 2.87s
2011-09-27 11:26:01-0700 [Broker,1073,10.12.48.34] Got slaveinfo from 'w32-ix-slave08'
2011-09-27 11:26:01-0700 [Broker,1073,10.12.48.34] bot attached
2011-09-27 11:26:01-0700 [-] Sorted 1 builders in 0.02s
2011-09-27 11:26:01-0700 [-] expiring old connection
2011-09-27 11:26:01-0700 [-] adbapi closing: MySQLdb
...
## 2 seconds later
...
2011-09-27 11:26:03-0700 [Broker,1073,10.12.48.34] BuildSlave.detached(w32-ix-slave08)
I am going to close this bug as I can't really distribute the slaves anymore.
We need more slaves. The DL machines will eventually come in bug 675338.
I have filed bug 690775 to have a report to help determine distribution of slaves between pools.


(This is a comment I composed on Monday but did not have time to review/publish until now).
We had a high load on Monday because of the merge day happened on Tuesday.

I gathered some info/conclusions:
* we might have the correct distribution of slaves to obtain similar wait times on both pools (73.28% vs 76.42%) (not sure if this is what we want)
** ratio of w32 build/try jobs -> 2.13
** ratio of w32 slaves 34/29 -> 1.17
** ratio of build jobs per w32 slave -> 7.71
** ratio of try jobs per w32 slave -> 6.47
* the load of Monday was quite higher than usual but we should able to handle this and more.

Monday Sept. 26th:
* 135 pushes - 71 pushes on try (53%) - 10-25% higher load that the days I used in comment7
* High of 15 pushes
* 13 hours with 6 or more pushes
* Wait: 1541/92.34% (buildpool) - ~20% higher
** win2k3: 262 jobs - 73.28% - 219jbs-09/06, 220jbs-09/07, 153jbs-09/20 - ~20% higher
* Wait: 694/85.01% (trybuildpool) - ~20% higher
** win2k3: 123 jobs - 76.42% - 89jbs-09/06, 104jbs-09/07, 140jbs-09/20 - 38%,18%,-13% variance
* Buildpool/Trypool ratio = 2.22 (which is similar to normal days)
* https://build.mozilla.org/buildapi/reports/pushes?starttime=1317020400&endtime=1317106800&int_size=3600

(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #7)
> FTR we had 110 pushes on Tuesday with a high of 12 and with 10 hours of the
> day over 6 or more pushes/hour:
> https://build.mozilla.org/buildapi/reports/
> pushes?int_size=3600&endtime=1316588400&starttime=1316502000
> Wait: 893/96.19% (buildpool)
> Wait: 798/85.96% (trybuildpool) 
> 
> In contrast 2 weeks ago we had:
> * Tuesday - 123 pushes. A high of 11. 9 hours of the day with 6 or more
> pushes/hour.
> https://build.mozilla.org/buildapi/reports/
> pushes?int_size=3600&endtime=1315378800&starttime=1315292400
> Wait: 1261/98.18% (buildpool)
> Wait: 521/96.35% (trybuildpool)
> * Wednesday - 112 pushes. A high of 15. 8 hours of the day with 6 or more
> pushes/hour.
> https://build.mozilla.org/buildapi/reports/
> pushes?int_size=3600&endtime=1315465200&starttime=1315378800
> Wait: 1275/96.31% (buildpool)
> Wait: 574/71.60% (trybuildpool) <- linux and w32 had bad wait times that day
>  
> With this analysis I wanted to say that Tuesday Sep. 20th is similar to a
> normal day wrt to pushes.
> The interesting about this week's Tuesday is the ratio of buildpool jobs vs
> trypool jobs:
> 09/20 - 1.11
> 09/07 - 2.42
> 09/06 - 2.22
> 
> * From the try pool 798/85.96% (trybuildpool) 
> 
> win2k3: 140
>   0:      108    77.14%
>  15:       19    13.57%
>  30:        9     6.43%
>  45:        4     2.86%
> 
> 
> * From the build pool 893/96.19% (buildpool)
> 
> win2k3: 153
>   0:      131    85.62%
>  15:       14     9.15%
>  30:        6     3.92%
>  45:        2     1.31%
Blocks: 675338
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Summary: Rebalance win32/linux slaves between buildpool and trybuildpool → Rebalance win32 slaves between buildpool and trybuildpool
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: