Closed
Bug 794987
Opened 12 years ago
Closed 12 years ago
WinXP test pending count is very high (and unusually very much greater than Win7)
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Unassigned)
References
Details
(Whiteboard: [buildduty])
Filing at request of #build.
Pending test(s) @ Sep 27 09:15:02
linux (17)
13 mozilla-inbound
4 mozilla-central
win7x64 (17)
17 mozilla-central
winxp (16)
10 mozilla-central
4 mozilla-inbound
2 fx-team
Pending test(s) @ Sep 27 09:15:02
linux (44)
44 try
winxp (1038)
1038 try
Whilst Windows is the platform on which we generally have the highest pending test counts, recently the pending counts have been more reasonable - and even when the rate of checkins is high, we normally see both Win7 and WinXP elevated, not just WinXP. This leads me to think there is something else going on.
Graph:
http://cl.ly/JlQP
Comment 1•12 years ago
|
||
I currently see 63 jobs currently running.
According to slavealloc, out of 88 production slaves we have 77 that can take jobs (11 are out of action because of bug 794248).
talos-r3-xp-048 has a dead drive.
I rebooted talos-r3-xp-065 which had not taken jobs for 5 days (09-21).
I will also put talos-r3-xp-091 and 092 back to the pool. They were being used in bug 794248.
Whiteboard: [buildduty]
Comment 2•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] from comment #1)
> I rebooted talos-r3-xp-065 which had not taken jobs for 5 days (09-21).
> I will also put talos-r3-xp-091 and 092 back to the pool. They were being
> used in bug 794248.
>
This means that the capable production slaves were 74 (3 less than I had said).
Adding those 3 should be 77 capable slaves.
I see now 66 running. Are 11 slaves in rebooting mode?
talos-r3-xp-080 and talos-r3-xp-035 seem to have not taken any jobs according to:
http://build.mozilla.org/builds/last-job-per-slave.html
Comment 3•12 years ago
|
||
I've rescued 2 slaves that were legitimately hung so far: 016 (OPSI) and 035 (shutdown hang). Still iterating through the rest.
Comment 4•12 years ago
|
||
talos-r3-xp-080 needs a reboot.
I don't see that the XP jobs are taking any longer than usual:
http://brasstacks.mozilla.com/gofaster/#/executiontime/test
FTR the masters seem to be taking forever to load.
cpu_wio is not looking nice and CPU load has spike few times:
http://cl.ly/Jkxd
My gut feeling would suggest doubling our number of Windows masters and spread them around.
88 XP slaves + 94 Win7 slaves + 5 win764 slaves = 187 production slaves
This means we have ~62 slaves per master.
We would need data to prove that jobs are not scheduled as fast as they should.
Depends on: 794965
Comment 5•12 years ago
|
||
From bug 794248 we managed to get 4 healthy slaves out of re-imaging.
I was trying to put them on production but I noticed that OPSI has lost state.
It seems that the "talos-r3-xp-ref" key got removed from pckeys on production-opsi (which nevertheless gets recreated) but the state got reset to blank.
I can't put those slaves back into the pool without adding the right packages.
talos-r3-xp-0{85,91,92,93}
Depends on: 794248
Comment 6•12 years ago
|
||
I have set those slaves up with OPSI as well as talos-r3-xp-094.
According to slavealloc we should have a max capacity of 88 slaves.
The following are sick (10 slaves):
* talos-r3-xp-048 - dead drive
* talos-r3-xp-063
* talos-r3-xp-079
* talos-r3-xp-080 - reboot needed
* talos-r3-xp-081
* talos-r3-xp-082
* talos-r3-xp-084
* talos-r3-xp-086
* talos-r3-xp-088
* talos-r3-xp-095 - reboot needed
arr will try to re-image slaves from bug 794248 and see if any come out right.
Comment 7•12 years ago
|
||
arr and I just put these last 3 slaves into the pool:
* talos-r3-xp-0{63,79,82}
Comment 8•12 years ago
|
||
I have put these slaves into the pool:
* talos-r3-xp-0{81,88}
There might be 3 more that might come from bug 794248 but for now there is nothing left to be done besides moving away from the minis.
In any case, the pending count precipitated to normal levels around 10pm last night.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•