Closed Bug 1076623 Opened 11 years ago Closed 10 years ago

Aggregate db exceptions in emails from masters

Categories

(Release Engineering :: General, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: coop)

Details

Attachments

(1 file)

During a full buildbot db reboot we experienced a lot of tracebacks on buildbot masters, and the exception watcher hit this problem: Traceback (most recent call last): File "/builds/buildbot/build1/tools/buildfarm/maintenance/watch_twistd_log.py", line 247, in <module> hostname, exceptions, options.name) File "/builds/buildbot/build1/tools/buildfarm/maintenance/watch_twistd_log.py", line 117, in send_msg s.sendmail(fromaddr, [addr], m.as_string()) File "/tools/python27/lib/python2.7/smtplib.py", line 722, in sendmail raise SMTPSenderRefused(code, resp, from_addr) smtplib.SMTPSenderRefused: (552, '5.3.4 Message size exceeds fixed limit', 'cltbld@buildbot-master86.srv.releng.scl3.mozilla.com') The main culprit was: 2014-10-01 19:34:52-0700 [-] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/base.py", line 1165, in run self.mainLoop() File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/base.py", line 1174, in mainLoop self.runUntilCurrent() File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/base.py", line 796, in runUntilCurrent call.func(*call.args, **call.kw) File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/util/loop.py", line 146, in _loop_start self._remaining = list(self.get_processors()) --- <exception caught here> --- File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/master.py", line 153, in _get_processors builders = sorter(self.parent, builders) File "/builds/buildbot/build1/master/master_common.py", line 153, in prioritizeBuilders (time.time() - 3600, buildmaster.master_name, buildmaster.master_incarnation)) File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 182, in runQueryNow return self.runInteractionNow(self._runQuery, *args, **kwargs) File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 212, in runInteractionNow return self._runInteractionNow(interaction, *args, **kwargs) File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 234, in _runInteractionNow conn = self.get_sync_connection() File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 228, in get_sync_connection self._nonpool = self._spec.get_sync_connection() File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/dbspec.py", line 250, in get_sync_connection conn = dbapi.connect(*self.connargs, **connkw) File "/builds/buildbot/build1/lib/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect return Connection(*args, **kwargs) File "/builds/buildbot/build1/lib/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__ super(Connection, self).__init__(*args, **kwargs2) _mysql_exceptions.OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") Aggregation was originally implemented in bug 623594.
I wonder if we could hook these up to Sentry in some way? It might too hard to do in Buildbot itself, but the twistd log watcher could probably send them...
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
This patch yields exception entries formatted like this: -------------------------------------------------------------------------------- Count: 2, Exception: Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. First instance: 2015-02-24 05:07:47-0800, Most recent instance: 2015-02-24 05:07:47-0800 Example: Exception in bm70/twistd.log.2: 2015-02-24 05:07:47-0800 [HTTPPageGetter,client] Unhandled Error Traceback (most recent call last): Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
Attachment #8569363 - Flags: review?(bugspam.Callek)
Comment on attachment 8569363 [details] [diff] [review] Aggregate all exceptions in master twistd.logs Review of attachment 8569363 [details] [diff] [review]: ----------------------------------------------------------------- stamp
Attachment #8569363 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 8569363 [details] [diff] [review] Aggregate all exceptions in master twistd.logs Review of attachment 8569363 [details] [diff] [review]: ----------------------------------------------------------------- https://hg.mozilla.org/build/tools/rev/aef4138c6baa
Attachment #8569363 - Flags: checked-in+
In production.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: