Closed Bug 984578 Opened 11 years ago Closed 9 years ago

Find solution to avoid slaves not being able to connect to a buildbot-master because it believes it is connected

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: armenzg, Unassigned)

Details

1. The master believes the slave is connected 2. runslave.py is executed and buildbot started 3. runslave.py finishes and the machine falls out of action Bug 970075 is an example of this. I wonder if this is the bug that causes us to see this message in our twistd.log emails: > twisted.spread.pb.DeadReferenceError: Calling Stale Broker This is what I'm seeing: 2014-03-17 13:10:06-0700 [-] Log opened. 2014-03-17 13:10:06-0700 [-] twistd 10.2.0 (C:\mozilla-build\buildbotve\scripts\python.exe 2.6.5) starting up. 2014-03-17 13:10:06-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor. 2014-03-17 13:10:06-0700 [-] Starting factory <buildslave.bot.BotFactory instance at 0x02F577D8> 2014-03-17 13:10:06-0700 [-] Connecting to buildbot-master83.srv.releng.scl3.mozilla.com:9101 2014-03-17 13:10:06-0700 [-] Watching c:\builds\moz2_slave\shutdown.stamp's mtime to initiate shutdown 2014-03-17 13:10:06-0700 [Broker,client] ReconnectingPBClientFactory.failedToGetPerspective 2014-03-17 13:10:06-0700 [Broker,client] While trying to connect: Traceback from remote host -- Traceback (most recent call last): File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 1346, in remote_respond d = self.portal.login(self, mind, IPerspective) File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/cred/portal.py", line 116, in login ).addCallback(self.realm.requestAvatar, mind, *interfaces File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback callbackKeywords=kw) File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildb ot/master.py", line 498, in requestAvatar p = self.botmaster.getPerspective(mind, avatarID) File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildb ot/master.py", line 364, in getPerspective d = sl.slave.callRemote("print", "master got a duplicate connection; keeping this one") File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 328, in callRemote _name, args, kw) File "/builds/buildbot/try1/lib/python2.7/site-packages/twisted/spread/pb.py", line 807, in _sendMessage raise DeadReferenceError("Calling Stale Broker") twisted.spread.pb.DeadReferenceError: Calling Stale Broker 2014-03-17 13:10:06-0700 [Broker,client] Lost connection to buildbot-master83.srv.releng.scl3.mozilla.com:9101 2014-03-17 13:10:06-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x02F577D8> 2014-03-17 13:10:06-0700 [-] Main loop terminated. 2014-03-17 13:10:06-0700 [-] Server Shut Down. 2014-03-17 13:10:06-0700 [-] Server Shut Down.
That looks like a bug in the master's duplicate slave arbitrator -- it should be catching that exception, determining that the old slave is disconnected, and allowing the new slave.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.