Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Aggregate db exceptions in emails from masters

RESOLVED FIXED

Status

Release Engineering
General Automation
P2
normal
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: nthomas, Assigned: coop)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
During a full buildbot db reboot we experienced a lot of tracebacks on buildbot masters, and the exception watcher hit this problem:

Traceback (most recent call last):
  File "/builds/buildbot/build1/tools/buildfarm/maintenance/watch_twistd_log.py", line 247, in <module>
    hostname, exceptions, options.name)
  File "/builds/buildbot/build1/tools/buildfarm/maintenance/watch_twistd_log.py", line 117, in send_msg
    s.sendmail(fromaddr, [addr], m.as_string())
  File "/tools/python27/lib/python2.7/smtplib.py", line 722, in sendmail
    raise SMTPSenderRefused(code, resp, from_addr)
smtplib.SMTPSenderRefused: (552, '5.3.4 Message size exceeds fixed limit', 'cltbld@buildbot-master86.srv.releng.scl3.mozilla.com')


The main culprit was:
2014-10-01 19:34:52-0700 [-] Unhandled Error
        Traceback (most recent call last):
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/base.py", line 1165, in run
            self.mainLoop()
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/base.py", line 1174, in mainLoop
            self.runUntilCurrent()
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/base.py", line 796, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/util/loop.py", line 146, in _loop_start
            self._remaining = list(self.get_processors())
        --- <exception caught here> ---
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/master.py", line 153, in _get_processors
            builders = sorter(self.parent, builders)
          File "/builds/buildbot/build1/master/master_common.py", line 153, in prioritizeBuilders
            (time.time() - 3600, buildmaster.master_name, buildmaster.master_incarnation))
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 182, in runQueryNow
            return self.runInteractionNow(self._runQuery, *args, **kwargs)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 212, in runInteractionNow
            return self._runInteractionNow(interaction, *args, **kwargs)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 234, in _runInteractionNow
            conn = self.get_sync_connection()
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/connector.py", line 228, in get_sync_connection
            self._nonpool = self._spec.get_sync_connection()
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_a52601db35c3_production_0.8-py2.7.egg/buildbot/db/dbspec.py", line 250, in get_sync_connection
            conn = dbapi.connect(*self.connargs, **connkw)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
            return Connection(*args, **kwargs)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__
            super(Connection, self).__init__(*args, **kwargs2)
        _mysql_exceptions.OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")


Aggregation was originally implemented in bug 623594.
I wonder if we could hook these up to Sentry in some way? It might too hard to do in Buildbot itself, but the twistd log watcher could probably send them...
(Assignee)

Updated

3 years ago
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
(Assignee)

Comment 2

2 years ago
Created attachment 8569363 [details] [diff] [review]
Aggregate all exceptions in master twistd.logs

This patch yields exception entries formatted like this:

--------------------------------------------------------------------------------
Count: 2, Exception: Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
First instance: 2015-02-24 05:07:47-0800, Most recent instance: 2015-02-24 05:07:47-0800
Example:
Exception in bm70/twistd.log.2:
2015-02-24 05:07:47-0800 [HTTPPageGetter,client] Unhandled Error
	Traceback (most recent call last):
	Failure: twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
Attachment #8569363 - Flags: review?(bugspam.Callek)
Comment on attachment 8569363 [details] [diff] [review]
Aggregate all exceptions in master twistd.logs

Review of attachment 8569363 [details] [diff] [review]:
-----------------------------------------------------------------

stamp
Attachment #8569363 - Flags: review?(bugspam.Callek) → review+
(Assignee)

Comment 4

2 years ago
Comment on attachment 8569363 [details] [diff] [review]
Aggregate all exceptions in master twistd.logs

Review of attachment 8569363 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/build/tools/rev/aef4138c6baa
Attachment #8569363 - Flags: checked-in+
(Assignee)

Comment 5

2 years ago
In production.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.