Closed
Bug 984578
Opened 11 years ago
Closed 9 years ago
Find solution to avoid slaves not being able to connect to a buildbot-master because it believes it is connected
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: armenzg, Unassigned)
Details
1. The master believes the slave is connected
2. runslave.py is executed and buildbot started
3. runslave.py finishes and the machine falls out of action
Bug 970075 is an example of this.
I wonder if this is the bug that causes us to see this message in our twistd.log emails:
> twisted.spread.pb.DeadReferenceError: Calling Stale Broker
This is what I'm seeing:
2014-03-17 13:10:06-0700 [-] Log opened.
2014-03-17 13:10:06-0700 [-] twistd 10.2.0 (C:\mozilla-build\buildbotve\scripts\python.exe 2.6.5) starting up.
2014-03-17 13:10:06-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor.
2014-03-17 13:10:06-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x02F577D8>
2014-03-17 13:10:06-0700 [-] Connecting to buildbot-master83.srv.releng.scl3.mozilla.com:9101
2014-03-17 13:10:06-0700 [-] Watching c:\builds\moz2_slave\shutdown.stamp's mtime to initiate shutdown
2014-03-17 13:10:06-0700 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective
2014-03-17 13:10:06-0700 [Broker,client] While trying to connect:
Traceback from remote host -- Traceback (most recent call last):
File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 1346, in remote_respond
d = self.portal.login(self, mind, IPerspective)
File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/cred/portal.py", line 116, in login
).addCallback(self.realm.requestAvatar, mind, *interfaces
File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback
callbackKeywords=kw)
File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildb
ot/master.py", line 498, in requestAvatar
p = self.botmaster.getPerspective(mind, avatarID)
File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildb
ot/master.py", line 364, in getPerspective
d = sl.slave.callRemote("print", "master got a duplicate connection; keeping this one")
File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 328, in callRemote
_name, args, kw)
File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 807, in _sendMessage
raise DeadReferenceError("Calling Stale Broker")
twisted.spread.pb.DeadReferenceError: Calling Stale Broker
2014-03-17 13:10:06-0700 [Broker,client] Lost connection to buildbot-master83.srv.releng.scl3.mozilla.com:9101
2014-03-17 13:10:06-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x02F577D8>
2014-03-17 13:10:06-0700 [-] Main loop terminated.
2014-03-17 13:10:06-0700 [-] Server Shut Down.
2014-03-17 13:10:06-0700 [-] Server Shut Down.
Comment 1•11 years ago
|
||
That looks like a bug in the master's duplicate slave arbitrator -- it should be catching that exception, determining that the old slave is disconnected, and allowing the new slave.
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
| Assignee | ||
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•