Closed Bug 696749 Opened 13 years ago Closed 13 years ago

Having bad wait times

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Unassigned)

Details

Only win7, winxp and tegras did well.
There is something fishy for buildduty to look into.

-------- Original Message --------
Subject: Wait: 14324/59.08% (testpool)
Date: Sat, 22 Oct 2011 06:08:11 -0700
From: nobody@cruncher.build.mozilla.org
To: dev-tree-management@lists.mozilla.org
Newsgroups: mozilla.dev.tree-management

Wait time report for testpool for jobs submitted between Fri, 21 Oct 2011 00:00:00 -0700 (PDT) and Sat, 22 Oct 2011 00:00:00 -0700 (PDT)

Total Jobs: 14324

Wait Times
  0:     8462    59.08%
 15:     2208    15.41%
 30:      955     6.67%
 45:      627     4.38%
 60:      478     3.34%
 75:      245     1.71%
 90+:     1349     9.42%

Platform break down

fedora: 2191
  0:      863    39.39%
 15:      361    16.48%
 30:      179     8.17%
 45:      130     5.93%
 60:      114     5.20%
 75:       84     3.83%
 90+:      460    20.99%


fedora64: 1922
  0:      834    43.39%
 15:      271    14.10%
 30:      135     7.02%
 45:       98     5.10%
 60:       96     4.99%
 75:       56     2.91%
 90+:      432    22.48%


leopard: 1714
  0:      809    47.20%
 15:      418    24.39%
 30:      138     8.05%
 45:      108     6.30%
 60:       51     2.98%
 75:       33     1.93%
 90+:      157     9.16%


snowleopard: 1719
  0:      795    46.25%
 15:      408    23.73%
 30:      164     9.54%
 45:      114     6.63%
 60:       95     5.53%
 75:       21     1.22%
 90+:      122     7.10%


snowleopard-r4: 1702
  0:      654    38.43%
 15:      407    23.91%
 30:      192    11.28%
 45:      136     7.99%
 60:      110     6.46%
 75:       28     1.65%
 90+:      175    10.28%


tegra: 1208
  0:     1208   100.00%


win7: 1991
  0:     1670    83.88%
 15:      176     8.84%
 30:       97     4.87%
 45:       32     1.61%
 60:        6     0.30%
 75:        7     0.35%
 90+:        3     0.15%


xp: 1877
  0:     1629    86.79%
 15:      167     8.90%
 30:       50     2.66%
 45:        9     0.48%
 60:        6     0.32%
 75:       16     0.85%


The number on the left is how many minutes a build waited to start, rounded down.

Builds with no changes (usually nightly builds): 0.

Rebuilds and forced rebuilds were excluded from the statistics.


Current backlog: http://build.mozilla.org/builds/pending/index.html

Generated at Sat, 22 Oct 2011 06:08:11 -0700 (PDT). All times are Mountain View, CA (US/Pacific).
I think this is related to having an extra 77 rev4 slaves on the non-windows test masters. They were pretty sluggish when I was interacting with them last week, so I suspect we're getting back into the regime where there is significant delay between the end of a step finishing and the slave getting the next one. Goes away when then r3-snow are disabled.
(In reply to Nick Thomas [:nthomas] from comment #1)
> I think this is related to having an extra 77 rev4 slaves on the non-windows
> test masters. They were pretty sluggish when I was interacting with them
> last week, so I suspect we're getting back into the regime where there is
> significant delay between the end of a step finishing and the slave getting
> the next one. Goes away when then r3-snow are disabled.

The plan is to turn them off on Wednesday.  Lets see if this rebounds when 10.6-r3 jobs finish going through.
Nevertheless those rev3 machines would be repurposed later on and add the problem back, no?

Shall we request adding few more test masters?
(In reply to Armen Zambrano G. [:armenzg] - Gone Wed. 26th and back Mon. 31st from comment #3)
> Nevertheless those rev3 machines would be repurposed later on and add the
> problem back, no?
> 
> Shall we request adding few more test masters?

ugh.  yes, I'll file the bug for that.  We'll also be adding 80 10.7 machines.

Using some rough math, we will have 405 unix testers.  We were ok with 244 slaves running on 3 masters.  This is roughly 80 slaves per master.

Since we are adding 160 more slaves, lets add 2 more masters.


~/puppet-manifests $ grep talos-r staging.pp *production.pp | cut -f2 -d " " | sort -u | wc -l
405

~/puppet-manifests $ grep talos-r3 staging.pp *production.pp | cut -f2 -d " " | sort -u | wc -l
244

(testing)~/mozilla/testing/buildbot-configs $ ./setup-master.py -l ../tools/buildfarm/maintenance/production-masters.json -R tests
bm04-tests1
bm06-tests1
bm11-tests1
bm15-tests1-windows
bm16-tests1-windows
test-master01
The wait times are much better after adding 2 linux only test masters. Do you think that the bug can be resolved?
The only thing left is to check which XP slaves to reboot. Otherwise the wait times look good.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.