Closed Bug 1027101 Opened 6 years ago Closed 6 years ago
Windows 8 test jobs pending on Try for coming up to 3 hours
Filed after jandem asked on IRC. Whilst the answer may just be "Windows tests can't run on IRC and we're at lower capacity due to datacenter moves and also losing 9 machines due to bug 1026870", I feel its useful to track. CCing Tara since this affects developer productivity. The pending Try win 8 test jobs go as far back as: https://tbpl.mozilla.org/?tree=Try&rev=a01775c25cd2&jobname=winnt The builds for that push completed at ~13:27 UTC+1, and the 95% of them are still pending ~170 mins later.
16:13 <philor> 1026870 is build, not tests, but I think it's roughly the same number of missing slaves, because slaverebooter didn't get around to rebooting windows so I just did them an hour ago for things that had been idle-busted up to 18 hours 16:14 <philor> it's probably stuck pointlessly rebooting the 112 tegras most of which are actually busted and should be disabled 16:23 <philor> edmorley|sheriffduty: don't remember, but I think it was either 9 or 11 Win8s I rebooted, it'll probably start to catch up as long as the other trees don't pick up load too quickly today
haven't heard any complaints today and slave health has 0 pending
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
It has 749 pending. 5.25 hour backlog. 3 slaves out of action from driver upgrade bustage for 50+ days. 3 slaves out of action from resolution bustage which probably followed driver upgrade bustage, for 20+ days. 1 slave which was loaned for 219 days, and has sat unreclaimed despite a "reclaiming" claim for another 50+ days. 0 pending at this time of a weekday simply will not happen, that's not an option. However, when we don't have 8 slaves out of action for months at a time, having Win8 be no worse than the other Windows versions, with 300-400 pending and a 3 hour backlog, is a possibility.
I was going on https://bugzilla.mozilla.org/show_bug.cgi?id=1027101#c2 which said build. And a 3 hour backlog on try is also not surprising
A 3 hour backlog *now* would not be surprising, no. If we had a 3 hour backlog at the end of the US day, I'd say we were doing as well as could be expected. A 3 hour backlog at 8am when this was filed is surprising, and was a sign of bustage: between not rebooting slaves that don't have buildbot running and doing hundreds of tegras first and then not getting to Win8, slaverebooter was not successfully rebooting Win8 slaves, and that abnormal backlog at the start of the day was the result of that - knock out another 8 or 10 from the already depleted pool, and we'll get overnight backlog. I ignored manual rebooting over this past weekend, and there were only a couple busted this morning, so I assume the latter half of the rebooting problem is now solved (by manually disabling sixty or so tegras).
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.