Closed
Bug 974493
Opened 11 years ago
Closed 11 years ago
some test machines unable to connect to their masters
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Unassigned)
References
Details
I've seen this on a couple of machines now:
2014-02-19 09:57:00-0800 [-] Log opened.
2014-02-19 09:57:00-0800 [-] twistd 10.2.0 (/tools/buildbot-0.8.4-pre-moz2/bin/python2.7 2.7.3) starting up.
2014-02-19 09:57:00-0800 [-] reactor class: twisted.internet.selectreactor.SelectReactor.
2014-02-19 09:57:00-0800 [-] Starting factory <buildslave.bot.BotFactory instance at 0x101427f38>
2014-02-19 09:57:00-0800 [-] Connecting to buildbot-master79.srv.releng.usw2.mozilla.com:9201
2014-02-19 09:57:00-0800 [-] Watching /builds/slave/talos-slave/shutdown.stamp's mtime to initiate shutdown
2014-02-19 09:57:00-0800 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective
2014-02-19 09:57:00-0800 [Broker,client] While trying to connect:
Traceback from remote host -- Traceback (most recent call last):
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/spread/pb.py", line 1346, in remote_respond
d = self.portal.login(self, mind, IPerspective)
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/cred/portal.py", line 116, in login
).addCallback(self.realm.requestAvatar, mind, *interfaces
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback
callbackKeywords=kw)
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildbot/master.py", line 498, in requestAvatar
p = self.botmaster.getPerspective(mind, avatarID)
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildbot/master.py", line 364, in getPerspective
d = sl.slave.callRemote("print", "master got a duplicate connection; keeping this one")
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/spread/pb.py", line 328, in callRemote
_name, args, kw)
File "/builds/buildbot/tests1-macosx/lib/python2.7/site-packages/twisted/spread/pb.py", line 807, in _sendMessage
raise DeadReferenceError("Calling Stale Broker")
twisted.spread.pb.DeadReferenceError: Calling Stale Broker
2014-02-19 09:57:00-0800 [Broker,client] Lost connection to buildbot-master79.srv.releng.usw2.mozilla.com:9201
2014-02-19 09:57:00-0800 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x101427f38>
2014-02-19 09:57:00-0800 [-] Main loop terminated.
2014-02-19 09:57:00-0800 [-] Server Shut Down.
Reporter | ||
Updated•11 years ago
|
Blocks: t-snow-r4-0156
Reporter | ||
Comment 1•11 years ago
|
||
Looks like these are caused by stale connections, I think I can fix them through the manhole:
2014-02-20 05:53:35-0800 [Broker,108345,10.12.49.154] duplicate slave talos-r3-fed-027; rejecting new slave and pinging old
2014-02-20 05:53:35-0800 [Broker,108345,10.12.49.154] old slave was connected from IPv4Address(TCP, '10.12.49.154', 56939)
2014-02-20 05:53:35-0800 [Broker,108345,10.12.49.154] new slave is from IPv4Address(TCP, '10.12.49.154', 58556)
2014-02-20 05:53:35-0800 [Broker,108345,10.12.49.154] Peer will receive following PB traceback:
2014-02-20 05:53:35-0800 [Broker,108345,10.12.49.154] Unhandled Error
Traceback (most recent call last):
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/spread/pb.py", line 1346, in remote_respond
d = self.portal.login(self, mind, IPerspective)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/cred/portal.py", line 116, in login
).addCallback(self.realm.requestAvatar, mind, *interfaces
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback
callbackKeywords=kw)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildbot/master.py", line 498, in requestAvatar
p = self.botmaster.getPerspective(mind, avatarID)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildbot/master.py", line 364, in getPerspective
d = sl.slave.callRemote("print", "master got a duplicate connection; keeping this one")
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/spread/pb.py", line 328, in callRemote
_name, args, kw)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/spread/pb.py", line 807, in _sendMessage
raise DeadReferenceError("Calling Stale Broker")
twisted.spread.pb.DeadReferenceError: Calling Stale Broker
Reporter | ||
Comment 2•11 years ago
|
||
I tried following the instructions on https://wiki.mozilla.org/ReleaseEngineering/How_To/Unstick_a_Stuck_Slave_From_A_Master, but these slaves didn't have a hung TCP connection. I tried forcing the slave to drop with these two manhole statements:
master.botmaster.slaves['talos-r3-fed-027'].disconnect()
master.botmaster.slaves['talos-r3-fed-027'].slave.broker.transport.loseConnection()
But that didn't work either. Then I noticed that the block that throws the error is conditional on slave.isConnected(), which returns slave.slave. So I set that to None:
master.botmaster.slaves['talos-r3-fed-027'].slave = None
And then the slaves were able to connect.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•