Closed Bug 794987 Opened 12 years ago Closed 12 years ago

WinXP test pending count is very high (and unusually very much greater than Win7)

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: [buildduty])

Ed Morley [:emorley]

Reporter

Description

•

12 years ago

Filing at request of #build. Pending test(s) @ Sep 27 09:15:02 linux (17) 13 mozilla-inbound 4 mozilla-central win7x64 (17) 17 mozilla-central winxp (16) 10 mozilla-central 4 mozilla-inbound 2 fx-team Pending test(s) @ Sep 27 09:15:02 linux (44) 44 try winxp (1038) 1038 try Whilst Windows is the platform on which we generally have the highest pending test counts, recently the pending counts have been more reasonable - and even when the rate of checkins is high, we normally see both Win7 and WinXP elevated, not just WinXP. This leads me to think there is something else going on. Graph: http://cl.ly/JlQP

Armen [:armenzg]

Comment 1

•

12 years ago

I currently see 63 jobs currently running. According to slavealloc, out of 88 production slaves we have 77 that can take jobs (11 are out of action because of bug 794248). talos-r3-xp-048 has a dead drive. I rebooted talos-r3-xp-065 which had not taken jobs for 5 days (09-21). I will also put talos-r3-xp-091 and 092 back to the pool. They were being used in bug 794248.

Whiteboard: [buildduty]

Armen [:armenzg]

Comment 2

•

12 years ago

(In reply to Armen Zambrano G. [:armenzg] from comment #1) > I rebooted talos-r3-xp-065 which had not taken jobs for 5 days (09-21). > I will also put talos-r3-xp-091 and 092 back to the pool. They were being > used in bug 794248. > This means that the capable production slaves were 74 (3 less than I had said). Adding those 3 should be 77 capable slaves. I see now 66 running. Are 11 slaves in rebooting mode? talos-r3-xp-080 and talos-r3-xp-035 seem to have not taken any jobs according to: http://build.mozilla.org/builds/last-job-per-slave.html

Chris Cooper [:coop] (he/him)

Comment 3

•

12 years ago

I've rescued 2 slaves that were legitimately hung so far: 016 (OPSI) and 035 (shutdown hang). Still iterating through the rest.

Armen [:armenzg]

Comment 4

•

12 years ago

talos-r3-xp-080 needs a reboot. I don't see that the XP jobs are taking any longer than usual: http://brasstacks.mozilla.com/gofaster/#/executiontime/test FTR the masters seem to be taking forever to load. cpu_wio is not looking nice and CPU load has spike few times: http://cl.ly/Jkxd My gut feeling would suggest doubling our number of Windows masters and spread them around. 88 XP slaves + 94 Win7 slaves + 5 win764 slaves = 187 production slaves This means we have ~62 slaves per master. We would need data to prove that jobs are not scheduled as fast as they should.

Depends on: 794965

Armen [:armenzg]

Comment 5

•

12 years ago

From bug 794248 we managed to get 4 healthy slaves out of re-imaging. I was trying to put them on production but I noticed that OPSI has lost state. It seems that the "talos-r3-xp-ref" key got removed from pckeys on production-opsi (which nevertheless gets recreated) but the state got reset to blank. I can't put those slaves back into the pool without adding the right packages. talos-r3-xp-0{85,91,92,93}

Depends on: 794248

Armen [:armenzg]

Comment 6

•

12 years ago

I have set those slaves up with OPSI as well as talos-r3-xp-094. According to slavealloc we should have a max capacity of 88 slaves. The following are sick (10 slaves): * talos-r3-xp-048 - dead drive * talos-r3-xp-063 * talos-r3-xp-079 * talos-r3-xp-080 - reboot needed * talos-r3-xp-081 * talos-r3-xp-082 * talos-r3-xp-084 * talos-r3-xp-086 * talos-r3-xp-088 * talos-r3-xp-095 - reboot needed arr will try to re-image slaves from bug 794248 and see if any come out right.

Armen [:armenzg]

Comment 7

•

12 years ago

arr and I just put these last 3 slaves into the pool: * talos-r3-xp-0{63,79,82}

Armen [:armenzg]

Comment 8

•

12 years ago

I have put these slaves into the pool: * talos-r3-xp-0{81,88} There might be 3 more that might come from bug 794248 but for now there is nothing left to be done besides moving away from the minis. In any case, the pending count precipitated to normal levels around 10pm last night.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.

Bugzilla

WinXP test pending count is very high (and unusually very much greater than Win7)

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: emorley, Unassigned)

References

Details

(Whiteboard: [buildduty])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated