Closed Bug 995146 Opened 11 years ago Closed 10 years ago

Reconfig bustage - temporary fix for "exceptions.KeyError: 'tst-linux64-ec2-300'"

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Unassigned)

References

Details

Attachments

(1 file)

During reconfig, got: 2014-04-11 02:02:35-0700 [-] Creating BuildMaster -- buildbot.version: 0.8.2-hg-ca197cab42a9-production-0.8 2014-04-11 02:02:35-0700 [-] loading configuration from /builds/buildbot/tests1-linux64/master/master.cfg 2014-04-11 02:02:35-0700 [-] JacuzziAllocator 64528144: created 2014-04-11 02:02:35-0700 [-] nextAWSSlave: start 2014-04-11 02:02:35-0700 [-] nextAWSSlave: start 2014-04-11 02:02:46-0700 [-] finished loading config file 2014-04-11 02:03:00-0700 [-] configuration update started 2014-04-11 02:03:00-0700 [-] configuration update failed 2014-04-11 02:03:00-0700 [-] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 628, in loadTheConfigFile d = self.loadConfig(f) File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 939, in loadConfig d.addCallback(lambda res: self.loadConfig_Slaves(slaves)) File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/twisted/internet/defer.py", line 260, in addCallback callbackKeywords=kw) File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/twisted/internet/defer.py", line 249, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 939, in <lambda> d.addCallback(lambda res: self.loadConfig_Slaves(slaves)) File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 1065, in loadConfig_Slaves return self.botmaster.loadConfig_Slaves(new_slaves) File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 236, in loadConfig_Slaves dl.append(self.removeSlave(s)) File "/builds/buildbot/tests1-linux64/lib/python2.7/site-packages/buildbot-0.8.2_hg_ca197cab42a9_production_0.8-py2.7.egg/buildbot/master.py", line 254, in removeSlave d = self.slaves[s.slavename].disconnect() exceptions.KeyError: 'tst-linux64-ec2-300' Submit a correction or amendment below (click here to make a fresh posting) http://hg.mozilla.org/build/buildbot-configs/rev/dbc0e8a93cb7#l3.33 vs http://hg.mozilla.org/build/buildbot-configs/rev/dbc0e8a93cb7#l3.112
Temporary fix to get masters working again
Attachment #8405297 - Flags: review?(nthomas)
Attachment #8405297 - Flags: review?(nthomas) → review+
Attachment #8405297 - Flags: checked-in+
Blocks: 995060
Running a new reconfig...
Reconfig successful.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This is really weird. We've dropped slaves plently of times and never hit this. Perhaps something was connected under that name, even though AFAICT, it doesn't exist in AWS.
I had this for the following 9 masters: reconfig_1397206657.log:[buildbot-master51.srv.releng.use1.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master52.srv.releng.use1.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master53.srv.releng.usw2.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master54.srv.releng.usw2.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master67.srv.releng.use1.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master68.srv.releng.usw2.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master103.srv.releng.scl3.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master104.srv.releng.scl3.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300' reconfig_1397206657.log:[buildbot-master105.srv.releng.scl3.mozilla.com] out: exceptions.KeyError: 'tst-linux64-ec2-300'
Comment on attachment 8405297 [details] [diff] [review] buildbot-configs patch I still don't know why this happened, but I feel like it was a freak occurence. I just checked a couple of masters and this key definitely exists: >>> master.botmaster.slaves['tst-linux64-ec2-300'] <BuildSlave 'tst-linux64-ec2-300'> >>> I've backed this out. If it causes issues with the next reconfig I'll deal with any fallout.
Attachment #8405297 - Flags: checked-in+ → checked-in-
Broke things again, wtf
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I dunno what to do about this right now, but it's pretty low priority. We can probably fix it the next time we clean up slave lists.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: