Closed Bug 995146 Opened 8 years ago Closed 8 years ago

Reconfig bustage - temporary fix for "exceptions.KeyError: 'tst-linux64-ec2-300'"

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Unassigned)

References

Details

Attachments

(1 file)

During reconfig, got:

    2014-04-11 02:02:35-0700 [-] Creating BuildMaster -- buildbot.version: 0.8.2-hg-ca197cab42a9-production-0.8
    2014-04-11 02:02:35-0700 [-] loading configuration from /builds/buildbot/tests1-linux64/master/master.cfg
    2014-04-11 02:02:35-0700 [-] JacuzziAllocator 64528144: created
    2014-04-11 02:02:35-0700 [-] nextAWSSlave: start
    2014-04-11 02:02:35-0700 [-] nextAWSSlave: start
    2014-04-11 02:02:46-0700 [-] finished loading config file
    2014-04-11 02:03:00-0700 [-] configuration update started
    2014-04-11 02:03:00-0700 [-] configuration update failed
    2014-04-11 02:03:00-0700 [-] Unhandled Error
            Traceback (most recent call last):
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 628, in loadTheConfigFile
                d = self.loadConfig(f)
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 939, in loadConfig
                d.addCallback(lambda res: self.loadConfig_Slaves(slaves))
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback
                callbackKeywords=kw)
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks
                self._runCallbacks()
            --- <exception caught here> ---
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
                self.result = callback(self.result, *args, **kw)
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 939, in <lambda>
                d.addCallback(lambda res: self.loadConfig_Slaves(slaves))
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 1065, in loadConfig_Slaves
                return self.botmaster.loadConfig_Slaves(new_slaves)
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 236, in loadConfig_Slaves
                dl.append(self.removeSlave(s))
              File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 254, in removeSlave
                d = self.slaves[s.slavename].disconnect()
            exceptions.KeyError: 'tst-linux64-ec2-300' 


Submit a correction or amendment below (click here to make a fresh posting)

http://hg.mozilla.org/build/buildbot-configs/rev/dbc0e8a93cb7#l3.33 vs http://hg.mozilla.org/build/buildbot-configs/rev/dbc0e8a93cb7#l3.112
Temporary fix to get masters working again
Attachment #8405297 - Flags: review?(nthomas)
Attachment #8405297 - Flags: review?(nthomas) → review+
Attachment #8405297 - Flags: checked-in+
Blocks: 995060
Running a new reconfig...
Reconfig successful.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
This is really weird. We've dropped slaves plently of times and never hit this. Perhaps something was connected under that name, even though AFAICT, it doesn't exist in AWS.
I had this for the following 9 masters:

reconfig_1397206657.log:[buildbot-master51.srv.releng.use1.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master52.srv.releng.use1.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master53.srv.releng.usw2.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master54.srv.releng.usw2.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master67.srv.releng.use1.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master68.srv.releng.usw2.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master103.srv.releng.scl3.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master104.srv.releng.scl3.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
reconfig_1397206657.log:[buildbot-master105.srv.releng.scl3.mozilla.com] out: 	exceptions.KeyError: 'tst-linux64-ec2-300'
Comment on attachment 8405297 [details] [diff] [review]
buildbot-configs patch

I still don't know why this happened, but I feel like it was a freak occurence. I just checked a couple of masters and this key definitely exists:
>>> master.botmaster.slaves['tst-linux64-ec2-300']
<BuildSlave 'tst-linux64-ec2-300'>
>>> 

I've backed this out. If it causes issues with the next reconfig I'll deal with any fallout.
Attachment #8405297 - Flags: checked-in+ → checked-in-
Broke things again, wtf
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I dunno what to do about this right now, but it's pretty low priority. We can probably fix it the next time we clean up slave lists.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.