Last Comment Bug 1027101 - Windows 8 test jobs pending on Try for coming up to 3 hours
: Windows 8 test jobs pending on Try for coming up to 3 hours
Product: Infrastructure & Operations
Classification: Other
Component: CIDuty (show other bugs)
: unspecified
: x86 Windows 8.1
-- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: Justin Wood (:Callek)
: Jordan Lund (:jlund)
Depends on: 990157 1004813
  Show dependency treegraph
Reported: 2014-06-18 08:14 PDT by Ed Morley [:emorley]
Modified: 2018-05-08 15:23 PDT (History)
3 users (show)
See Also:
Due Date:
QA Whiteboard:
Iteration: ---
Points: ---
Change Request: ServiceNow Change Request (use flag)


Description User image Ed Morley [:emorley] 2014-06-18 08:14:22 PDT
Filed after jandem asked on IRC.

Whilst the answer may just be "Windows tests can't run on IRC and we're at lower capacity due to datacenter moves and also losing 9 machines due to bug 1026870", I feel its useful to track. CCing Tara since this affects developer productivity.

The pending Try win 8 test jobs go as far back as:

The builds for that push completed at ~13:27 UTC+1, and the 95% of them are still pending ~170 mins later.
Comment 1 User image Ed Morley [:emorley] 2014-06-18 08:17:11 PDT
s/IRC/AWS yet/
Comment 2 User image Ed Morley [:emorley] 2014-06-18 08:24:15 PDT
16:13 <philor> 1026870 is build, not tests, but I think it's roughly the same number of missing slaves, because slaverebooter didn't get around to rebooting windows so I just did them an hour ago for things that had been idle-busted up to 18 hours
16:14 <philor> it's probably stuck pointlessly rebooting the 112 tegras most of which are actually busted and should be disabled
16:23 <philor> edmorley|sheriffduty: don't remember, but I think it was either 9 or 11 Win8s I rebooted, it'll probably start to catch up as long as the other trees don't pick up load too quickly today
Comment 3 User image Justin Wood (:Callek) 2014-06-23 16:35:29 PDT
haven't heard any complaints today and slave health has 0 pending
Comment 4 User image Phil Ringnalda (:philor) 2014-06-23 17:09:58 PDT
It has 749 pending.

5.25 hour backlog.

3 slaves out of action from driver upgrade bustage for 50+ days.

3 slaves out of action from resolution bustage which probably followed driver upgrade bustage, for 20+ days.

1 slave which was loaned for 219 days, and has sat unreclaimed despite a "reclaiming" claim for another 50+ days.

0 pending at this time of a weekday simply will not happen, that's not an option. However, when we don't have 8 slaves out of action for months at a time, having Win8 be no worse than the other Windows versions, with 300-400 pending and a 3 hour backlog, is a possibility.
Comment 5 User image Justin Wood (:Callek) 2014-06-23 17:20:08 PDT
I was going on which said build.

And a 3 hour backlog on try is also not surprising
Comment 6 User image Phil Ringnalda (:philor) 2014-06-23 17:31:13 PDT
A 3 hour backlog *now* would not be surprising, no. If we had a 3 hour backlog at the end of the US day, I'd say we were doing as well as could be expected.

A 3 hour backlog at 8am when this was filed is surprising, and was a sign of bustage: between not rebooting slaves that don't have buildbot running and doing hundreds of tegras first and then not getting to Win8, slaverebooter was not successfully rebooting Win8 slaves, and that abnormal backlog at the start of the day was the result of that - knock out another 8 or 10 from the already depleted pool, and we'll get overnight backlog.

I ignored manual rebooting over this past weekend, and there were only a couple busted this morning, so I assume the latter half of the rebooting problem is now solved (by manually disabling sixty or so tegras).

Note You need to log in before you can comment on or make changes to this bug.