Closed
Bug 681534
Opened 13 years ago
Closed 13 years ago
Rebalance win32 slaves between buildpool and trybuildpool
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: armenzg)
References
Details
(Whiteboard: [buildslaves])
Attachments
(2 files)
19.95 KB,
patch
|
coop
:
review+
armenzg
:
checked-in+
|
Details | Diff | Splinter Review |
886 bytes,
patch
|
Callek
:
review+
coop
:
review+
|
Details | Diff | Splinter Review |
Now that all those win32 slaves are back online, wait times look much better. Very cool. However, looking at today's waittimes report, it seems like we are not optimally balanced yet: (buildpool) win2k3: 209 0: 142 67.94% 15: 15 7.18% 30: 14 6.70% 45: 12 5.74% 60: 12 5.74% 75: 0 0.00% 90+: 14 6.70% (trybuildpool) win2k3: 127 0: 125 98.43% 15: 2 1.57% If this keeps happening, it may make sense to rebalance our slave pools by moving some win32 slaves back from trybuildpool to buildpool? Lets watch this for another day/two to see if this pattern is consistent before doing anything. Filing to track.
Assignee | ||
Comment 1•13 years ago
|
||
There are several things that we need to do as well: * see how many win32 slaves are ready for setup and set them up * see how many win32 have not been rebooted from the previous pass
Reporter | ||
Comment 2•13 years ago
|
||
On 24aug2011, 6:03am: (buildpool) win2k3: 168 0: 144 85.71% 15: 10 5.95% 30: 10 5.95% 45: 4 2.38% (trybuildpool) win2k3: 119 0: 119 100.00% On 25aug2011, 6:03am: (buildpool) win2k3: 177 0: 154 87.01% 15: 4 2.26% 30: 2 1.13% 45: 8 4.52% 60: 7 3.95% 75: 2 1.13% (trybuildpool) win2k3: 108 0: 108 100.00%
Whiteboard: [buildslaves]
Reporter | ||
Comment 3•13 years ago
|
||
Based on comment#0, comment#2, I think we have enough win32 slaves to keep up on trybuildpool, but need more win32 slaves on buildpool. We could improve buildpool by any/all of: * migrating some machines from trybuildpool to buildpool * finding and fixing any offline win32 machines, and adding them to buildpool.
Assignee | ||
Comment 4•13 years ago
|
||
I did #2. I addressed most slaves in bug 680494. I will add 2 more slaves in bug 682083 and probably move 3 more from the trypool.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Priority: -- → P2
Assignee | ||
Comment 5•13 years ago
|
||
I have added 7 slaves in bug 682408. According to slavealloc we now have 34 build slaves and 29 try slaves: http://slavealloc.build.mozilla.org/ui/#silos Let's see tomorrow's wait times and close this.
Assignee | ||
Comment 6•13 years ago
|
||
I added the slaves midday. Let's see wait times for tomorrow. jobs submitted between Tue, 06 Sep 2011 00:00:00 -0700 (PDT) and Wed, 07 Sep 2011 00:00:00 -0700 (PDT) * From buildpool: win2k3: 219 0: 199 90.87% 15: 7 3.20% 30: 4 1.83% 45: 9 4.11% * From trypool: win2k3: 89 0: 89 100.00%
Assignee | ||
Comment 7•13 years ago
|
||
We also have wait times on Linux slave on try. I will do an analysis of which slaves are under maintenance tomorrow. This is our current distribution (from silos page): buildpool trypool win2ksp3 34 29 linux IX 54 7 linux VM 42 33 FTR we had 110 pushes on Tuesday with a high of 12 and with 10 hours of the day over 6 or more pushes/hour: https://build.mozilla.org/buildapi/reports/pushes?int_size=3600&endtime=1316588400&starttime=1316502000 Wait: 893/96.19% (buildpool) Wait: 798/85.96% (trybuildpool) In contrast 2 weeks ago we had: * Tuesday - 123 pushes. A high of 11. 9 hours of the day with 6 or more pushes/hour. https://build.mozilla.org/buildapi/reports/pushes?int_size=3600&endtime=1315378800&starttime=1315292400 Wait: 1261/98.18% (buildpool) Wait: 521/96.35% (trybuildpool) * Wednesday - 112 pushes. A high of 15. 8 hours of the day with 6 or more pushes/hour. https://build.mozilla.org/buildapi/reports/pushes?int_size=3600&endtime=1315465200&starttime=1315378800 Wait: 1275/96.31% (buildpool) Wait: 574/71.60% (trybuildpool) <- linux and w32 had bad wait times that day With this analysis I wanted to say that Tuesday Sep. 20th is similar to a normal day wrt to pushes. The interesting about this week's Tuesday is the ratio of buildpool jobs vs trypool jobs: 09/20 - 1.11 09/07 - 2.42 09/06 - 2.22 This could mean more clobber time due to try jobs. I think I need a view that shows me # of build jobs VS try job broken down per platform over time with a correlation to active slaves and # of pushes. Besides the ratio & perhaps averages of each job type. oh dear. * From the try pool 798/85.96% (trybuildpool) linux: 349 0: 269 77.08% 15: 26 7.45% 30: 27 7.74% 45: 14 4.01% 60: 13 3.72% win2k3: 140 0: 108 77.14% 15: 19 13.57% 30: 9 6.43% 45: 4 2.86% * From the build pool 893/96.19% (buildpool) linux: 354 0: 351 99.15% 15: 3 0.85% win2k3: 153 0: 131 85.62% 15: 14 9.15% 30: 6 3.92% 45: 2 1.31%
Summary: Rebalance win32 slaves in buildpool and trybuildpool? → Rebalance win32/linux slaves between buildpool and trybuildpool
Assignee | ||
Comment 8•13 years ago
|
||
Long story short is that we have now another 6 win32 slaves that needed to be rebooted. I hope we can analyze the wait times on Friday (yesterday there was a downtime). > This is our current distribution (from silos page): > buildpool trypool > win2ksp3 34 29 = 63 prod IXs > linux IX 54 7 = 61 prod IXs > linux VM 42 33 = 75 prod VMs In slavealloc I see (just double checking the silos page info): * 67 w32 slaves - 4 preprod & 63 prod (4 maintenance [1] & 6 MIA [2]) * 65 centos5 iX slaves - 4 preprod & 61 prod (1 maintenance & 0 MIA) 6 w32 slaves are out of the pool unnoticed & unhandled. If idleizer was there and the buildbot-start check enabled we would have caught all of this issues. Also checking http://build.mozilla.org/builds/last-job-per-slave.txt would have shown it. I have rebooted all these 6 slaves. I have rebooted any slaves pointing to bm05 & bm02 [1] bug 682574, bug 684349, bug 673972 & bug 684374 [2] w32-ix-slave31 20 days - no note * connected to bm5 (nonexistent master) - rebooted & back w32-ix-slave34 15 days - no note * connected to bm5 (nonexistent master) w32-ix-slave26 14 days - no note * undetermined reasons - see [3] w32-ix-slave41 11 days - no note * undetermined reasons - see [3] w32-ix-slave28 7 days - no note * undetermined reasons - see [3] w32-ix-slave29 5 days - no note * absolutely no idea from twistd.log [3] 2011-09-10 10:47:30-0700 [Broker,client] Connected to buildbot-master13.build.scl1.mozilla.com:9001; s lave is ready 2011-09-10 10:47:46-0700 [-] Received SIGBREAK, shutting down. 2011-09-10 10:47:46-0700 [Broker,client] lost remote ... 2011-09-10 10:47:46-0700 [Broker,client] lost remote 2011-09-10 10:47:46-0700 [Broker,client] Lost connection to buildbot-master13.build.scl1.mozilla.com:9 001 2011-09-10 10:47:46-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x015 BD260> 2011-09-10 10:47:46-0700 [-] Main loop terminated. 2011-09-10 10:47:46-0700 [-] Server Shut Down. 2011-09-14 12:34:21-0700 [-] Received SIGBREAK, shutting down. 2011-09-14 12:34:21-0700 [Broker,client] Lost connection to buildbot-master12.build.scl1.mozilla.com:9 001 2011-09-14 12:34:21-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x016 7C850> 2011-09-14 12:34:21-0700 [-] Main loop terminated. 2011-09-14 12:34:21-0700 [-] Server Shut Down. 2011-09-14 12:34:21-0700 [-] Server Shut Down. 2011-09-13 18:02:01-0700 [Broker,client] Connected to buildbot-master13.build.scl1.mozilla.com:9001; s lave is ready 2011-09-14 01:02:01-0700 [-] I feel very idle and was thinking of rebooting as soon as the buildmaster says it's OK 2011-09-14 01:02:01-0700 [-] Telling the master we want to shutdown after any running builds are finis hed 2011-09-14 01:02:01-0700 [Broker,client] Master does not support slave initiated shutdown. Upgrade ma ster to 0.8.3 or later to use this feature. 2011-09-14 01:02:01-0700 [Broker,client] rebooting NOW, since the master won't talk to us 2011-09-14 01:02:01-0700 [Broker,client] Invoking platform-specific reboot command 2011-09-10 10:47:46-0700 [-] Server Shut Down.
Assignee | ||
Comment 9•13 years ago
|
||
FTR the w32 slaves on maintenance are these: * w32-ix-slave03 - bug 682574 * w32-ix-slave05 - bug 684349 * w32-ix-slave06 - bug 673972 * w32-ix-slave35 - bug 684374 I had to use IPMI (RDP/ssh failed) to reach w32-ix-slave28 (which I rebooted yesterday) and I have rebooted again today. Yesterday it didn't even manage to connect (see log below): 2011-09-14 12:34:21-0700 [-] Server Shut Down. 2011-09-23 07:51:54-0700 [-] Log opened. I had to reboot w32-ix-slave41 (again) with the same error as in comment 8 (bm13-build1). The slave had taken one job yesterday which finished at Sep 22 12:23:12 2011 and lost connection at 12:25:14-0700 (2 mins later). I have rebooted it again today. Both are now taking jobs.
Assignee | ||
Comment 10•13 years ago
|
||
Long story-short: * I had missed some slaves to reboot * I have found an OPSI issue with buildbot-startup which I am debugging ## OPSI issue I see a bunch of slaves without the newer buildbot.bat [1] if not all. This newer buildbot.bat has an extra 30 seconds so a SIGBREAK too early won't break the rebooting process. The package on OPSI is marked as installed and the version says 2.2 which is the right one but without no effect. There is something very very funky in here. I marked slave w32-ix-slave10 to "setup" the buildbot-startup package but did not deploy the newer buildbot.bat after a reboot. After I attempted to install this package I see this: > Failed to load application: No module named buildslave.bot Differences from the machine to what is on the repos * C:\runslave.py - the same * D:\mozilla-build\start-buildbot.bat - the same * c:\documents and settings\cltbld\start menu\programs\startup\buildbot.bat - _different_ * D:\mozilla-build\start-buildbot.sh - not needed anymore - not existent on slave Then I noticed that production-opsi had local changes (see ~/opsi-package-sources/buildbot-startup/CLIENT_DATA/buildbot.bat.old or ~/armenzg_buildbot.bat). I have done the following to bring us up-to-date: * hg revert buildbot-startup/CLIENT_DATA/buildbot.bat * ./sync-binaries * su -c './regenerate-package buildbot-startup' This would seem to work, right? Well, not really. sync_binaries copies the files from ~/opsi-binaries unto ~/opsi-package-sources which overwrites the newer copy from hg of buildbot.bat with the older one from CVS. w32-ix-slave10 now has the newer version of buildbot.bat. This doesn't seem to fix the buildslave.bot. I will keep on investigating. ## Missed slaves I also missed these slaves MIA because last-jot-per-slave.html has a block of buildpool and a block of trypool slaves which I completely missed. * w32-ix-slave02 - (7 days - bm14-try1) * w32-ix-slave07 - [3] Can't stop reactor (21 days - bm05 which is non-existent) ** No module named buildslave.bot * w32-ix-slave08 - lost connection like in comment 8 (7 days - bm14-try1) ** No module named buildslave.bot * w32-ix-slave20 - twistd.log does not say much besides failing to reboot (15 days - bm03-trybuilder) ** after rebooting nothing new is written on twistd.log * w32-ix-slave23 - lost connection like in comment 8 (13 days - bm14-try1) [1] http://mxr.mozilla.org/build/source/opsi-package-sources/buildbot-startup/CLIENT_DATA/buildbot.bat [2] http://hg.mozilla.org/build/opsi-package-sources/rev/fbb827acf9a0 [3] twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running. [4] cltbld@staging-opsi:~/opsi-binaries$ cvs remove buildbot-startup cvs remove: Removing buildbot-startup cvs remove: file `buildbot-startup/buildbot.bat' still in working directory cvs remove: 1 file exists; remove it first cltbld@staging-opsi:~/opsi-binaries$ cvs remove buildbot-startup/buildbot.bat cvs remove: file `buildbot-startup/buildbot.bat' still in working directory cvs remove: 1 file exists; remove it first cltbld@staging-opsi:~/opsi-binaries$ rm buildbot-startup/buildbot.bat cltbld@staging-opsi:~/opsi-binaries$ cvs remove buildbot-startup/buildbot.bat cvs remove: scheduling `buildbot-startup/buildbot.bat' for removal cvs remove: use 'cvs commit' to remove this file permanently cltbld@staging-opsi:~/opsi-binaries$ cvs commit -m "Bug 681534. Remove old copy of buildbot.bat which overwrites the newer buildbot.bat from opsi-package-sources. r=bustage" Removing buildbot-startup/buildbot.bat; /mofo/opsi-binaries/buildbot-startup/buildbot.bat,v <-- buildbot.bat new revision: delete; previous revision: 1.3 done
Assignee | ||
Comment 11•13 years ago
|
||
Long story-short: we need this patch to install the latest version of runslave.py since the current package does not do the job correctly. I tried marking the "buildbot-startup" on slaves 04, 07, 08 & 11 which had the issue but did not deploy the newer runslave.py (To reproduce check for buildbotve in C:\runslave.py and you will see that is mising. Also running D:\mozilla-build\start-buildbot.bat would fail). This package fixes the issue. To fix a slave locally all you have to do is run this through ssh: runas /user:administrator "D:\mozilla-build\wget\wget -OC:\runslave.py http://hg.mozilla.org/build/puppet-manifests/raw-file/61c0b7a13f40/modules/buildslave/files/runslave.py"
Attachment #562144 -
Flags: review?(coop)
Comment 12•13 years ago
|
||
Comment on attachment 562144 [details] [diff] [review] [opsi] buildbot-startup - install the latest runslave.py correctly Review of attachment 562144 [details] [diff] [review]: ----------------------------------------------------------------- Looks good.
Attachment #562144 -
Flags: review?(coop) → review+
Assignee | ||
Comment 13•13 years ago
|
||
I will deploy this on Monday to have all day ahead of me. We will measure the wait times on Tuesday. I am happy we have so many machines back and that we know the core issue (old runslave.py and old buildbot.bat) and we have a fix :D
Comment 14•13 years ago
|
||
Comment on attachment 562144 [details] [diff] [review] [opsi] buildbot-startup - install the latest runslave.py correctly >+++ b/buildbot-startup/CLIENT_DATA/runslave.py >@@ -0,0 +1,483 @@ >+#!/usr/bin/python >+ >+# MOZILLA DEPLOYMENT NOTES >+# - This file is distributed to all buildslaves by Puppet, placed at >+# /usr/local/bin/runslave.py on POSIX systems (linux, darwin) and >+# C:\runslave.py on Windows systems >+# - It lives in the 'buildslave' puppet module Nit: this comment is no longer accurate. I suggest we modify it to say that OPSI does a static import of ($rev$) from puppet-manifests... And note (in puppet-manifests) that there is a copy of this file in OPSI as well. So that we don't get bitten by someone editing this expecting puppet to change, or vice-versa.
Assignee | ||
Comment 15•13 years ago
|
||
Comment on attachment 562144 [details] [diff] [review] [opsi] buildbot-startup - install the latest runslave.py correctly Good point Callek. I have adjusted the comment. I will follow up with the comment fix up on puppet-manifests. checked-in as: http://hg.mozilla.org/build/opsi-package-sources/rev/0bcbdf4f0466 I have marked w32-ix-slave[07-14] to get this newer version of runslave.py and buildbot.bat. I have verified that the slaves start properly. I will look at how the builds they do over day go. We could deploy this system-wide in few days.
Attachment #562144 -
Flags: checked-in+
Assignee | ||
Comment 16•13 years ago
|
||
Attachment #562421 -
Flags: review?(coop)
Attachment #562421 -
Flags: review?(bugspam.Callek)
Comment 17•13 years ago
|
||
Comment on attachment 562421 [details] [diff] [review] [puppet] comment adjustment for runslave.py I'm happy with this, we should be sure to copy the comment over to puppet-manifests as well, imho.
Attachment #562421 -
Flags: review?(bugspam.Callek) → review+
Assignee | ||
Comment 18•13 years ago
|
||
Long story-short: * I have deployed the OPSI package to 8 slaves in comment 15 ** these slaves should be less likely to fail to reboot and get hung * I have added another 3 slaves into the pool * We will check the wait times tomorrow ######## Besides the w32 slaves in maintenance from comment 9 I have rebooted these slaves: * w32-ix-slave20 - 18 days - bm03-trybuilder (how did I miss this one? :S) ** missing twistd.py [1] * w32-ix-slave30 - 3 days - bm12-build1 ** missing twistd.py [1] * w32-ix-slave34 - 3 days - bm12-build1 NOTE: I left alone slaves that have been idle for 2 days since we just had a weekend. To fix the twistd.py issue I reinstalled buildbot on those two machines (since a simple reboot did not fix it). I don't know how slave30's setup would have got messed up if it had been taking jobs up to 3 days ago. We will check the wait times tomorrow. [1] This shows up on the CMD window: D:\mozilla-build\buildbotve\scripts\python.exe: can't open file 'D:\mozilla-buil d\buildbotve\scripts\twistd.py': [Errno 2] No such file or directory
Assignee | ||
Comment 19•13 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #17) > Comment on attachment 562421 [details] [diff] [review] [diff] [details] [review] > [puppet] comment adjustment for runslave.py > > I'm happy with this, we should be sure to copy the comment over to > puppet-manifests as well, imho. You just reviewed the puppet-manifests one. I already deployed this change into the OPSI repo.
Updated•13 years ago
|
Attachment #562421 -
Flags: review?(coop) → review+
Assignee | ||
Comment 20•13 years ago
|
||
I have deployed the newer buildbot.bat to all w32 slaves as I had not seen any fallout. I deployed on Monday the newer buildbot.bat to w32-ix-slave08 and it got into a hung state. This is different to the previous one as it is not hung start-buildbot.bat died too early but because after rebooting the machine run runslave.py and immediately exiting. I am trying to figure out who sent that SIGBREAK. The master or the slave and why? 2011-09-27 11:23:49-0700 - count_and_reboot.py is called 2011-09-27 11:24:02 2011 - The last job finished. (from the master's logs) 2011-09-27 11:25:42 2011 - "About to run runslave.py" // This means the slave has come back from a reboot 2011-09-27 11:25:48-0700 [-] Log opened. 2011-09-27 11:25:50-0700 [-] Received SIGBREAK, shutting down. 2011-09-27 11:25:50 2011 - "runslave.py finished" [1] Full log 2011-09-27 11:25:48-0700 [-] Log opened. 2011-09-27 11:25:48-0700 [-] twistd 10.2.0 (D:\mozilla-build\buildbotve\scripts\python.exe 2.5.0) starting up. 2011-09-27 11:25:48-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor. 2011-09-27 11:25:48-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x015BDEB8> 2011-09-27 11:25:48-0700 [-] Connecting to buildbot-master14.build.scl1.mozilla.com:9101 2011-09-27 11:25:48-0700 [Broker,client] message from master: attached 2011-09-27 11:25:48-0700 [Broker,client] SlaveBuilder.remote_print(WINNT 5.2 try leak test build): message from master: attached 2011-09-27 11:25:48-0700 [Broker,client] SlaveBuilder.remote_print(WINNT 5.2 Mobile Desktop try build): message from master: attached 2011-09-27 11:25:48-0700 [Broker,client] SlaveBuilder.remote_print(WINNT 5.2 try build): message from master: attached 2011-09-27 11:25:48-0700 [Broker,client] Connected to buildbot-master14.build.scl1.mozilla.com:9101; slave is ready 2011-09-27 11:25:50-0700 [-] Received SIGBREAK, shutting down. 2011-09-27 11:25:50-0700 [Broker,client] lost remote 2011-09-27 11:25:50-0700 [Broker,client] lost remote 2011-09-27 11:25:50-0700 [Broker,client] lost remote 2011-09-27 11:25:50-0700 [Broker,client] Lost connection to buildbot-master14.build.scl1.mozilla.com:9101 2011-09-27 11:25:50-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x015BDEB8> 2011-09-27 11:25:50-0700 [-] Main loop terminated. 2011-09-27 11:25:50-0700 [-] Server Shut Down 2011-09-27 11:25:50-0700 [-] Server Shut Down. [2] Full master's log: 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] BuildSlave.detached(w32-ix-slave08) 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] <Build WINNT 5.2 try leak test build>.lostRemote 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] stopping currentStep <buildbotcustom.steps.misc.DisconnectStep instance at 0x180276c8> 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] releaseLocks(<buildbotcustom.steps.misc.DisconnectStep instance at 0x180276c8>): [] 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] step 'maybe_rebooting' complete: success 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] <Build WINNT 5.2 try leak test build>: build finished 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] setting expectations for next time 2011-09-27 11:24:02-0700 [Broker,1029,10.12.48.34] new expectations: 12126.7029839 seconds 2011-09-27 11:24:02-0700 [-] Sorted 0 builders in 0.02s 2011-09-27 11:24:11-0700 [-] Pulse <0x16a60878>: Processed 3 events (0 heartbeats) in 0.00 seconds 2011-09-27 11:24:13-0700 [-] Pulse <0x16fdc9e0>: Processed 2 events (0 heartbeats) in 0.00 seconds 2011-09-27 11:24:25-0700 [-] Sorted 0 builders in 0.02s 2011-09-27 11:25:28-0700 [-] Sorted 0 builders in 2.87s 2011-09-27 11:26:01-0700 [Broker,1073,10.12.48.34] Got slaveinfo from 'w32-ix-slave08' 2011-09-27 11:26:01-0700 [Broker,1073,10.12.48.34] bot attached 2011-09-27 11:26:01-0700 [-] Sorted 1 builders in 0.02s 2011-09-27 11:26:01-0700 [-] expiring old connection 2011-09-27 11:26:01-0700 [-] adbapi closing: MySQLdb ... ## 2 seconds later ... 2011-09-27 11:26:03-0700 [Broker,1073,10.12.48.34] BuildSlave.detached(w32-ix-slave08)
Assignee | ||
Comment 21•13 years ago
|
||
I am going to close this bug as I can't really distribute the slaves anymore. We need more slaves. The DL machines will eventually come in bug 675338. I have filed bug 690775 to have a report to help determine distribution of slaves between pools. (This is a comment I composed on Monday but did not have time to review/publish until now). We had a high load on Monday because of the merge day happened on Tuesday. I gathered some info/conclusions: * we might have the correct distribution of slaves to obtain similar wait times on both pools (73.28% vs 76.42%) (not sure if this is what we want) ** ratio of w32 build/try jobs -> 2.13 ** ratio of w32 slaves 34/29 -> 1.17 ** ratio of build jobs per w32 slave -> 7.71 ** ratio of try jobs per w32 slave -> 6.47 * the load of Monday was quite higher than usual but we should able to handle this and more. Monday Sept. 26th: * 135 pushes - 71 pushes on try (53%) - 10-25% higher load that the days I used in comment7 * High of 15 pushes * 13 hours with 6 or more pushes * Wait: 1541/92.34% (buildpool) - ~20% higher ** win2k3: 262 jobs - 73.28% - 219jbs-09/06, 220jbs-09/07, 153jbs-09/20 - ~20% higher * Wait: 694/85.01% (trybuildpool) - ~20% higher ** win2k3: 123 jobs - 76.42% - 89jbs-09/06, 104jbs-09/07, 140jbs-09/20 - 38%,18%,-13% variance * Buildpool/Trypool ratio = 2.22 (which is similar to normal days) * https://build.mozilla.org/buildapi/reports/pushes?starttime=1317020400&endtime=1317106800&int_size=3600 (In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #7) > FTR we had 110 pushes on Tuesday with a high of 12 and with 10 hours of the > day over 6 or more pushes/hour: > https://build.mozilla.org/buildapi/reports/ > pushes?int_size=3600&endtime=1316588400&starttime=1316502000 > Wait: 893/96.19% (buildpool) > Wait: 798/85.96% (trybuildpool) > > In contrast 2 weeks ago we had: > * Tuesday - 123 pushes. A high of 11. 9 hours of the day with 6 or more > pushes/hour. > https://build.mozilla.org/buildapi/reports/ > pushes?int_size=3600&endtime=1315378800&starttime=1315292400 > Wait: 1261/98.18% (buildpool) > Wait: 521/96.35% (trybuildpool) > * Wednesday - 112 pushes. A high of 15. 8 hours of the day with 6 or more > pushes/hour. > https://build.mozilla.org/buildapi/reports/ > pushes?int_size=3600&endtime=1315465200&starttime=1315378800 > Wait: 1275/96.31% (buildpool) > Wait: 574/71.60% (trybuildpool) <- linux and w32 had bad wait times that day > > With this analysis I wanted to say that Tuesday Sep. 20th is similar to a > normal day wrt to pushes. > The interesting about this week's Tuesday is the ratio of buildpool jobs vs > trypool jobs: > 09/20 - 1.11 > 09/07 - 2.42 > 09/06 - 2.22 > > * From the try pool 798/85.96% (trybuildpool) > > win2k3: 140 > 0: 108 77.14% > 15: 19 13.57% > 30: 9 6.43% > 45: 4 2.86% > > > * From the build pool 893/96.19% (buildpool) > > win2k3: 153 > 0: 131 85.62% > 15: 14 9.15% > 30: 6 3.92% > 45: 2 1.31%
Blocks: 675338
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Summary: Rebalance win32/linux slaves between buildpool and trybuildpool → Rebalance win32 slaves between buildpool and trybuildpool
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•