Closed
Bug 1019523
Opened 11 years ago
Closed 11 years ago
Large set of t-snow-r4 slaves is disabled (broken in slave-health)
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: sbruno, Assigned: sbruno)
References
Details
Many t-snow-r4 slaves are in "broken" status in slave_health and have not been taking jobs for a long time: https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-snow-r4
This does not seem related to the recent Train B move, which involved the following slaves (new name after move reported here) (source: Callek in #releng, https://callek.pastebin.mozilla.org/5327644):
t-snow-r4-0041.test.releng.scl3.mozilla.com
t-snow-r4-0042.test.releng.scl3.mozilla.com
t-snow-r4-0043.test.releng.scl3.mozilla.com
t-snow-r4-0044.test.releng.scl3.mozilla.com
t-snow-r4-0045.test.releng.scl3.mozilla.com
t-snow-r4-0046.test.releng.scl3.mozilla.com
t-snow-r4-0047.test.releng.scl3.mozilla.com
t-snow-r4-0048.test.releng.scl3.mozilla.com
t-snow-r4-0049.test.releng.scl3.mozilla.com
t-snow-r4-0050.test.releng.scl3.mozilla.com
t-snow-r4-0051.test.releng.scl3.mozilla.com
t-snow-r4-0052.test.releng.scl3.mozilla.com
t-snow-r4-0053.test.releng.scl3.mozilla.com
t-snow-r4-0054.test.releng.scl3.mozilla.com
t-snow-r4-0055.test.releng.scl3.mozilla.com
t-snow-r4-0056.test.releng.scl3.mozilla.com
t-snow-r4-0057.test.releng.scl3.mozilla.com
t-snow-r4-0058.test.releng.scl3.mozilla.com
t-snow-r4-0059.test.releng.scl3.mozilla.com
t-snow-r4-0060.test.releng.scl3.mozilla.com
t-snow-r4-0061.test.releng.scl3.mozilla.com
t-snow-r4-0062.test.releng.scl3.mozilla.com
t-snow-r4-0063.test.releng.scl3.mozilla.com
t-snow-r4-0064.test.releng.scl3.mozilla.com
t-snow-r4-0065.test.releng.scl3.mozilla.com
t-snow-r4-0066.test.releng.scl3.mozilla.com
t-snow-r4-0067.test.releng.scl3.mozilla.com
t-snow-r4-0068.test.releng.scl3.mozilla.com
t-snow-r4-0069.test.releng.scl3.mozilla.com
t-snow-r4-0070.test.releng.scl3.mozilla.com
t-snow-r4-0071.test.releng.scl3.mozilla.com
t-snow-r4-0072.test.releng.scl3.mozilla.com
t-snow-r4-0073.test.releng.scl3.mozilla.com
t-snow-r4-0074.test.releng.scl3.mozilla.com
t-snow-r4-0075.test.releng.scl3.mozilla.com
t-snow-r4-0076.test.releng.scl3.mozilla.com
t-snow-r4-0077.test.releng.scl3.mozilla.com
t-snow-r4-0078.test.releng.scl3.mozilla.com
t-snow-r4-0079.test.releng.scl3.mozilla.com
t-snow-r4-0080.test.releng.scl3.mozilla.com
t-snow-r4-0081.test.releng.scl3.mozilla.com
t-snow-r4-0082.test.releng.scl3.mozilla.com
t-snow-r4-0083.test.releng.scl3.mozilla.com
t-snow-r4-0084.test.releng.scl3.mozilla.com
E.g., t-snow-r4-0002 is not listed here but it has not been taking jobs for more than 20 hours (at the time I am raising this).
Theories to be verified (source: Callek in #releng):
(*) disconnected mid job due to some network blip ~ 13 hours ago [now], so never rebooted (*) slaverebooter somehow not trying to reboot these, despite my memory of how it works, (*) slaveapi itself being wedged, (*) slaveapi not having flows to the new machines or their pdu's
Assignee | ||
Comment 1•11 years ago
|
||
These seems related to some puppet installation issues after 3.6.1 upgrade, see https://bugzilla.mozilla.org/show_bug.cgi?id=986599#c34
Assignee | ||
Comment 2•11 years ago
|
||
Now that 986599 is fixed, I rebooted t-snow-r4-0002, which is now taking jobs again:
http://buildbot-master108.srv.releng.scl3.mozilla.com:8201/buildslaves/t-snow-r4-0002
I will now reboot the slaves listed below - hopefully they will start working again as well.
t-snow-r4-0011
t-snow-r4-0040
t-snow-r4-0033
t-snow-r4-0025
t-snow-r4-0006
t-snow-r4-0013
t-snow-r4-0030
t-snow-r4-0031
t-snow-r4-0004
t-snow-r4-0018
t-snow-r4-0037
t-snow-r4-0015
t-snow-r4-0027
t-snow-r4-0020
t-snow-r4-0034
t-snow-r4-0012
t-snow-r4-0016
t-snow-r4-0036
t-snow-r4-0007
t-snow-r4-0035
t-snow-r4-0010
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → sbruno
Assignee | ||
Comment 3•11 years ago
|
||
All boxes seem to be back to work.
(Thanks nthomas and )
Assignee | ||
Comment 4•11 years ago
|
||
The previous comment should have ended with: "Thanks nthomas and dustin for your help here!"
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Summary: Large set of t-snow-r4 are disabled (broken in slave-health) → Large set of t-snow-r4 slaves is disabled (broken in slave-health)
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•