Closed Bug 1361603 Opened 7 years ago Closed 7 years ago

Trees closed: most Win8 slaves not running jobs

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

Details

Phil Ringnalda (:philor)

Reporter

Description

•

7 years ago

nagios alerted about 6K pending Win8 jobs, which is far too many, so I looked at https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w864-ix which shows around 30 with a reasonable time since their last job, and the rest between 3 and 5 hours idle.

Closed all non-try Firefox trees.

Nick Thomas [:nthomas] (UTC+12)

Comment 1

•

7 years ago

Rebooting machines doesn't bring them back online, eg t-w864-ix-325. It's like they never call runslave (ie nothing in c:\slave has a modified time after the reboot). In the event viewer and found

Information	5/2/2017 8:11:37 PM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1501	None
Information	5/2/2017 7:27:40 PM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1500	None
Information	5/2/2017 5:49:39 PM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1500	None
Information	5/2/2017 4:11:39 PM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1502	None
Information	5/2/2017 2:33:30 PM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1502	None
Information	5/2/2017 2:33:13 PM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1501	None
Information	5/2/2017 2:19:53 PM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1502	None
Information	5/1/2017 6:36:33 AM	GroupPolicy (Microsoft-Windows-GroupPolicy)	1500	None

At 5/2/2017 2:19:53 PM, 2:33:30 PM, and 4:11:39 (PDT) the message was
The Group Policy settings for the computer were processed successfully. New settings from 41 Group Policy objects were detected and applied.

otherwise:
The Group Policy settings for the computer were processed successfully. There were no changes detected since the last successful processing of Group Policy.

The first message hasn't been since April 14th, so I think we had some GPO changes today (possibly related to bug 1358307).

Flags: needinfo?(q)

Nick Thomas [:nthomas] (UTC+12)

Comment 2

•

7 years ago

<Q> I has every machine recreate the scheduke task and close locks on the runslave log
<Q> I found a bunch of hosts that could open the log and stopped
<Q> they seem to come back after reboot now
<Q> Tried 2 and they both worked

I'm rebooting the hosts, it seems to take a couple of reboots to get them connected back to a buildbot master.

Flags: needinfo?(q)

Nick Thomas [:nthomas] (UTC+12)

Comment 3

•

7 years ago

Everything (which was stuck as of an hour ago on slavehealth) has had a least one reboot scheduled. I'll check back again in an hour or so.

Nick Thomas [:nthomas] (UTC+12)

Comment 4

•

7 years ago

Rebooted another 6 which needed a second go, and t-w864-ix-322 manually because t-w864-ix-322.build.mozilla.org doesn't exist in DNS (Alin is going to file that separately).

Backlog is clearing nicely with the pool running again, over to Tomcat.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Carsten Book [:Tomcat]

Comment 5

•

7 years ago

trees reopen at 1am pacific

Nick Thomas [:nthomas] (UTC+12)

Comment 6

•

7 years ago

Q, do you have any thoughts about the root cause here ?

Flags: needinfo?(q)

Comment 7

•

7 years ago

Nothing that I can find other than the runlogs being locked.

Flags: needinfo?(q)

BMO Automation

Updated

•

6 years ago

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

4 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Trees closed: most Win8 slaves not running jobs

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

Tracking

(Not tracked)

People

(Reporter: philor, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated