Closed Bug 794987 Opened 12 years ago Closed 12 years ago

WinXP test pending count is very high (and unusually very much greater than Win7)

Categories

(Release Engineering :: General, defect)

x86
Windows XP
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: [buildduty])

Filing at request of #build. Pending test(s) @ Sep 27 09:15:02 linux (17) 13 mozilla-inbound 4 mozilla-central win7x64 (17) 17 mozilla-central winxp (16) 10 mozilla-central 4 mozilla-inbound 2 fx-team Pending test(s) @ Sep 27 09:15:02 linux (44) 44 try winxp (1038) 1038 try Whilst Windows is the platform on which we generally have the highest pending test counts, recently the pending counts have been more reasonable - and even when the rate of checkins is high, we normally see both Win7 and WinXP elevated, not just WinXP. This leads me to think there is something else going on. Graph: http://cl.ly/JlQP
I currently see 63 jobs currently running. According to slavealloc, out of 88 production slaves we have 77 that can take jobs (11 are out of action because of bug 794248). talos-r3-xp-048 has a dead drive. I rebooted talos-r3-xp-065 which had not taken jobs for 5 days (09-21). I will also put talos-r3-xp-091 and 092 back to the pool. They were being used in bug 794248.
Whiteboard: [buildduty]
(In reply to Armen Zambrano G. [:armenzg] from comment #1) > I rebooted talos-r3-xp-065 which had not taken jobs for 5 days (09-21). > I will also put talos-r3-xp-091 and 092 back to the pool. They were being > used in bug 794248. > This means that the capable production slaves were 74 (3 less than I had said). Adding those 3 should be 77 capable slaves. I see now 66 running. Are 11 slaves in rebooting mode? talos-r3-xp-080 and talos-r3-xp-035 seem to have not taken any jobs according to: http://build.mozilla.org/builds/last-job-per-slave.html
I've rescued 2 slaves that were legitimately hung so far: 016 (OPSI) and 035 (shutdown hang). Still iterating through the rest.
talos-r3-xp-080 needs a reboot. I don't see that the XP jobs are taking any longer than usual: http://brasstacks.mozilla.com/gofaster/#/executiontime/test FTR the masters seem to be taking forever to load. cpu_wio is not looking nice and CPU load has spike few times: http://cl.ly/Jkxd My gut feeling would suggest doubling our number of Windows masters and spread them around. 88 XP slaves + 94 Win7 slaves + 5 win764 slaves = 187 production slaves This means we have ~62 slaves per master. We would need data to prove that jobs are not scheduled as fast as they should.
Depends on: 794965
From bug 794248 we managed to get 4 healthy slaves out of re-imaging. I was trying to put them on production but I noticed that OPSI has lost state. It seems that the "talos-r3-xp-ref" key got removed from pckeys on production-opsi (which nevertheless gets recreated) but the state got reset to blank. I can't put those slaves back into the pool without adding the right packages. talos-r3-xp-0{85,91,92,93}
Depends on: 794248
I have set those slaves up with OPSI as well as talos-r3-xp-094. According to slavealloc we should have a max capacity of 88 slaves. The following are sick (10 slaves): * talos-r3-xp-048 - dead drive * talos-r3-xp-063 * talos-r3-xp-079 * talos-r3-xp-080 - reboot needed * talos-r3-xp-081 * talos-r3-xp-082 * talos-r3-xp-084 * talos-r3-xp-086 * talos-r3-xp-088 * talos-r3-xp-095 - reboot needed arr will try to re-image slaves from bug 794248 and see if any come out right.
arr and I just put these last 3 slaves into the pool: * talos-r3-xp-0{63,79,82}
I have put these slaves into the pool: * talos-r3-xp-0{81,88} There might be 3 more that might come from bug 794248 but for now there is nothing left to be done besides moving away from the minis. In any case, the pending count precipitated to normal levels around 10pm last night.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.