Closed Bug 701763 Opened 13 years ago Closed 13 years ago

slaves that were PING UP but had not taken jobs

Categories

(Release Engineering :: General, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

Details

Some of these slaves were attached to bm04 and bm06 which converted to be mac only masters and therefore did not reboot after taking a job.
I have rebooted them and added a note to slave-alloc to check that they start taking jobs.

talos-r3-fed-004
talos-r3-fed-030
talos-r3-fed64-004
talos-r3-fed64-009
talos-r3-fed64-022
talos-r3-fed64-030
talos-r3-fed64-031
talos-r3-fed64-032
talos-r3-fed64-053
talos-r3-snow-018
talos-r3-snow-043
talos-r3-w7-045
talos-r3-w7-048
talos-r3-w7-053
talos-r3-xp-004
talos-r3-xp-005
talos-r3-xp-008
talos-r3-xp-015
talos-r3-xp-021
talos-r3-xp-022
talos-r3-xp-032
talos-r3-xp-043
talos-r3-xp-057
talos-r3-xp-058
talos-r3-xp-061

If there were any slaves that could have been rebooted but I didn't is because they were either loaned or they had a bug to set them up after being re-imaged.
Working slaves:

> talos-r3-fed64-004 - working
> talos-r3-fed64-009 - working
> talos-r3-fed64-022 - working
> talos-r3-fed64-032 - working
> talos-r3-snow-018 - working
> talos-r3-snow-043 - working
> talos-r3-xp-043 - working
> talos-r3-xp-058 - working

Further poking:

> talos-r3-fed-004 - never ever connected to any master
> talos-r3-fed-030 - never ever connected to any master
> talos-r3-fed64-031 - "No such resource"
> talos-r3-fed64-053 - "No such resource"
> talos-r3-w7-045 - not connected
> talos-r3-w7-048 - not connected
> talos-r3-w7-053 - not connected
> talos-r3-xp-004 - not connected
> talos-r3-xp-005 - not connected
> talos-r3-xp-008 - not connected
> talos-r3-xp-015 - not connected
> talos-r3-xp-021 - not connected
> talos-r3-xp-022 - not connected
> talos-r3-xp-032 - not connected
> talos-r3-xp-057 - not connected
> talos-r3-xp-061 - job with exception

Windows slaves not connecting could be related with bm15 and bm16 having cpu_wio and taking minutes to load up a slave's page.
talos-r3-fed64-030
talos-r3-fed64-031
talos-r3-fed64-053

I'm working on these 3 slaves over in bug 695580.
I ssh'ed into the xp slaves and I tried to reboot them but they say they are rebooting already:
C:\>shutdown -f -r -t 0
A system shutdown is in progress.

When I RDP I can see an "End Program" message. When I click on "End Now" it reboots immediately.

This can be related to bug 690232 (I don't have twistd.log in any of the slaves to check).

I manually RDPed to talos-r3-xp-0{04,05,08,21,22,32,57,61} and hit "End Now" for all of them.
The XP slaves all recovered (removed slave-alloc notes) after helping them reboot through RDP.

I have rebooted now the win7 and fed slaves once more.

I will check when I have time each one individually.
I re-installed buildbot on all three win7 slaves. It seems that they had been re-imaged but not dealt with:
* talos-r3-w7-045
** It was re-imaged in Oct. in bug 695580
** It requires further investigation as it has not yet been able to connect
* talos-r3-w7-048
** I have no reference to it on bugmail but I know it was down for 76 days.
** It is now connected but requires checking what it does when it grabs a job.
* talos-r3-w7-053
** It was down for 70 days
** I don't have any references to it
** It has picked up a job. check its results

The fedora slaves needed some puppet love [1]. Remove certs, clean, sign and reboot few times until I got it right. They're both back in the pool.

* talos-r3-fed-004
** The last mention is from bug 683814 to set it up back in August
* talos-r3-fed-030
** no reference in bugmail

NOTE: I fixed the buildbot issue on the w7-ref machine in bug 700729.

[1] https://wiki.mozilla.org/ReleaseEngineering/How_To/Set_Up_a_Freshly_Imaged_Slave#Linux.2FMac_.28puppet.29
talos-r3-w7-045 - it had the wrong computer name

We're done in here.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.