Closed
Bug 812533
Opened 12 years ago
Closed 12 years ago
talos-r3-fed64 slaves that are connected but are not given any jobs
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Unassigned)
References
Details
(Whiteboard: [buildduty])
talos-r3-fed64-033 talos-r3-fed64-037 talos-r3-fed64-069 012-11-16 08:12:54-0800 [Broker,client] Connected to buildbot-master18.build.scl1.mozilla.com:9201; slave is ready 2012-11-16 07:51:49-0800 [Broker,client] Connected to buildbot-master17.build.scl1.mozilla.com:9201; slave is ready 2012-11-16 08:18:28-0800 [Broker,client] Connected to buildbot-master18.build.scl1.mozilla.com:9201; slave is ready Their uptimes are recent. They seem to reboot 6 hours after being connected to the master. The buildbot masters show them as connected. On the slave side: 2012-11-16 01:07:47-0800 [Broker,client] Connected to buildbot-master18.build.scl1.mozilla.com:9201; slave is ready 2012-11-16 08:07:47-0800 [-] I feel very idle and was thinking of rebooting as soon as the buildmaster says it's OK 2012-11-16 08:07:47-0800 [-] Telling the master we want to shutdown after any running builds are finished 2012-11-16 08:08:26-0800 [Broker,client] Master does not support slave initiated shutdown. Upgrade master to 0.8.3 or later to use this feature. 2012-11-16 08:08:26-0800 [Broker,client] rebooting NOW, since the master won't talk to us 2012-11-16 08:08:26-0800 [Broker,client] Invoking platform-specific reboot command On the master's side (10.12.49.213==talos-r3-fed64-033): 2012-11-16 08:07:53-0800 [Broker,55578,10.12.49.213] Peer will receive following PB traceback: 2012-11-16 08:07:53-0800 [Broker,55578,10.12.49.213] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/banana.py", line 153, in gotItem self.callExpressionReceived(item) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/banana.py", line 116, in callExpressionReceived self.expressionReceived(obj) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 514, in expressionReceived method(*sexp[1:]) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 826, in proto_message self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw) --- <exception caught here> --- File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 840, in _recvMessage netResult = object.remoteMessageReceived(self, message, netArgs, netKw) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 223, in perspectiveMessageReceived method = getattr(self, "perspective_%s" % message) exceptions.AttributeError: BuildSlave instance has no attribute 'perspective_shutdown'
Reporter | ||
Comment 1•12 years ago
|
||
From the masters perspective the story goes like this: 2012-11-16 08:07:53-0800 [Broker,55578,10.12.49.213] Peer will receive following PB traceback: 2012-11-16 08:07:53-0800 [Broker,55578,10.12.49.213] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/banana.py", line 153, in gotItem self.callExpressionReceived(item) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/banana.py", line 116, in callExpressionReceived self.expressionReceived(obj) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 514, in expressionReceived method(*sexp[1:]) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 826, in proto_message self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw) --- <exception caught here> --- File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 840, in _recvMessage netResult = object.remoteMessageReceived(self, message, netArgs, netKw) File "/builds/buildbot/tests1-linux/lib/python2.6/site-packages/twisted/spread/pb.py", line 223, in perspectiveMessageReceived method = getattr(self, "perspective_%s" % message) exceptions.AttributeError: BuildSlave instance has no attribute 'perspective_shutdown' ... 2012-11-16 08:08:25-0800 [Broker,55578,10.12.49.213] BuildSlave.detached(talos-r3-fed64-033) 2012-11-16 08:12:29-0800 [Broker,56308,10.12.49.213] Got slaveinfo from 'talos-r3-fed64-033' 2012-11-16 08:12:29-0800 [Broker,56308,10.12.49.213] bot attached ... 2012-11-16 01:01:56-0800 [Broker,55573,10.12.49.213] duplicate slave talos-r3-fed64-033; rejecting new slave and pinging old 2012-11-16 01:01:56-0800 [Broker,55573,10.12.49.213] old slave was connected from IPv4Address(TCP, '10.12.49.213', 58943) 2012-11-16 01:01:56-0800 [Broker,55573,10.12.49.213] new slave is from IPv4Address(TCP, '10.12.49.213', 46192) ... 2012-11-16 01:06:51-0800 [-] killing new slave on IPv4Address(TCP, '10.12.49.213', 46192) 2012-11-16 01:06:52-0800 [Broker,54874,10.12.49.213] BuildSlave.detached(talos-r3-fed64-033) 2012-11-16 01:06:52-0800 [Broker,54874,10.12.49.213] Unhandled error in Deferred: 2012-11-16 01:06:52-0800 [Broker,54874,10.12.49.213] Unhandled Error Traceback (most recent call last): Failure: twisted.spread.pb.PBConnectionLost: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] ... 2012-11-16 01:07:10-0800 [Broker,55578,10.12.49.213] Got slaveinfo from 'talos-r3-fed64-033' 2012-11-16 01:07:10-0800 [Broker,55578,10.12.49.213] bot attached ... 2012-11-15 17:55:37-0800 [Broker,54874,10.12.49.213] Got slaveinfo from 'talos-r3-fed64-033' 2012-11-15 17:55:38-0800 [Broker,54874,10.12.49.213] bot attached 2012-11-15 17:31:44-0800 [Broker,54421,10.12.49.213] BuildSlave.sendBuilderList (<BuildSlave 'talos-r3-fed64-033'>) failed 2012-11-15 17:31:44-0800 [Broker,54421,10.12.49.213] Unhandled Error 2012-11-15 17:31:44-0800 [Broker,54421,10.12.49.213] Unhandled Error Traceback (most recent call last): Failure: twisted.spread.pb.PBConnectionLost: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] 2012-11-15 17:31:44-0800 [Broker,54421,10.12.49.213] BuildSlave.detached(talos-r3-fed64-033) ...
Reporter | ||
Updated•12 years ago
|
Whiteboard: [buildduty]
Comment 2•12 years ago
|
||
Armen, its unclear to me what is the buildduty actionable here, reimage, netops conversation, etc?
Flags: needinfo?(armenzg)
Comment 3•12 years ago
|
||
Armen, its unclear to me what is the buildduty actionable here, reimage, netops conversation, etc?
Reporter | ||
Comment 4•12 years ago
|
||
I don't know myself. Let's bring it to the Monday meeting and see if anyone has any suggestions.
Flags: needinfo?(armenzg)
Reporter | ||
Comment 5•12 years ago
|
||
bhearsum said that we can perhaps fix this by bringing down the master and back up. Let's try that.
Comment 6•12 years ago
|
||
bm17/bm18/bm24 were restarted today.
Comment 7•12 years ago
|
||
talos-r3-fed64-033 talos-r3-fed64-037 talos-r3-fed64-069 are back now.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•