Closed
Bug 1361603
Opened 7 years ago
Closed 7 years ago
Trees closed: most Win8 slaves not running jobs
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
Details
nagios alerted about 6K pending Win8 jobs, which is far too many, so I looked at https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w864-ix which shows around 30 with a reasonable time since their last job, and the rest between 3 and 5 hours idle. Closed all non-try Firefox trees.
Comment 1•7 years ago
|
||
Rebooting machines doesn't bring them back online, eg t-w864-ix-325. It's like they never call runslave (ie nothing in c:\slave has a modified time after the reboot). In the event viewer and found Information 5/2/2017 8:11:37 PM GroupPolicy (Microsoft-Windows-GroupPolicy) 1501 None Information 5/2/2017 7:27:40 PM GroupPolicy (Microsoft-Windows-GroupPolicy) 1500 None Information 5/2/2017 5:49:39 PM GroupPolicy (Microsoft-Windows-GroupPolicy) 1500 None Information 5/2/2017 4:11:39 PM GroupPolicy (Microsoft-Windows-GroupPolicy) 1502 None Information 5/2/2017 2:33:30 PM GroupPolicy (Microsoft-Windows-GroupPolicy) 1502 None Information 5/2/2017 2:33:13 PM GroupPolicy (Microsoft-Windows-GroupPolicy) 1501 None Information 5/2/2017 2:19:53 PM GroupPolicy (Microsoft-Windows-GroupPolicy) 1502 None Information 5/1/2017 6:36:33 AM GroupPolicy (Microsoft-Windows-GroupPolicy) 1500 None At 5/2/2017 2:19:53 PM, 2:33:30 PM, and 4:11:39 (PDT) the message was The Group Policy settings for the computer were processed successfully. New settings from 41 Group Policy objects were detected and applied. otherwise: The Group Policy settings for the computer were processed successfully. There were no changes detected since the last successful processing of Group Policy. The first message hasn't been since April 14th, so I think we had some GPO changes today (possibly related to bug 1358307).
Flags: needinfo?(q)
Comment 2•7 years ago
|
||
<Q> I has every machine recreate the scheduke task and close locks on the runslave log <Q> I found a bunch of hosts that could open the log and stopped <Q> they seem to come back after reboot now <Q> Tried 2 and they both worked I'm rebooting the hosts, it seems to take a couple of reboots to get them connected back to a buildbot master.
Flags: needinfo?(q)
Comment 3•7 years ago
|
||
Everything (which was stuck as of an hour ago on slavehealth) has had a least one reboot scheduled. I'll check back again in an hour or so.
Comment 4•7 years ago
|
||
Rebooted another 6 which needed a second go, and t-w864-ix-322 manually because t-w864-ix-322.build.mozilla.org doesn't exist in DNS (Alin is going to file that separately). Backlog is clearing nicely with the pool running again, over to Tomcat.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 5•7 years ago
|
||
trees reopen at 1am pacific
Nothing that I can find other than the runlogs being locked.
Flags: needinfo?(q)
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•