Closed Bug 1164819 Opened 10 years ago Closed 8 years ago

buildbot bridge dies when it loses connection to mysql

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [bbb])

This is less frequent now that pool recycle is set, but I'm still seeing errors like: Traceback (most recent call last): File "/builds/bbb/bin/buildbot-bridge", line 9, in <module> load_entry_point('bbb==0.3', 'console_scripts', 'buildbot-bridge')() File "/builds/bbb/lib/python2.7/site-packages/bbb/runner.py", line 82, in main service.start() File "/builds/bbb/lib/python2.7/site-packages/bbb/services.py", line 180, in start self.reflectTasks() File "/builds/bbb/lib/python2.7/site-packages/bbb/services.py", line 186, in reflectTasks for t in self.bbb_db.tasks: File "/builds/bbb/lib/python2.7/site-packages/bbb/servicebase.py", line 40, in tasks tasks = self.tasks_table.select().execute().fetchall() File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/sql/base.py", line 386, in execute return e._execute_clauseelement(self, multiparams, params) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1984, in _execute_clauseelement connection = self.contextual_connect(close_with_result=True) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2026, in contextual_connect self._wrap_pool_connect(self.pool.connect, None), File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2065, in _wrap_pool_connect e, dialect, self) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1395, in _handle_dbapi_exception_noconnection exc_info File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2061, in _wrap_pool_connect return fn() File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 337, in connect return _ConnectionFairy._checkout(self) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 644, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 444, in checkout rec.checkin() File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 441, in checkout dbapi_connection = rec.get_connection() File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 525, in get_connection self.connection = self.__connect() File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 538, in __connect connection = self.__pool._creator() File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 89, in connect return dialect.connect(*cargs, **cparams) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 377, in connect return self.dbapi.connect(*cargs, **cparams) File "/builds/bbb/lib/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect return Connection(*args, **kwargs) File "/builds/bbb/lib/python2.7/site-packages/MySQLdb/connections.py", line 193, in __init__ super(Connection, self).__init__(*args, **kwargs2) sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") from time to time. Maybe this isn't a big deal because supervisord should be restarting it....
No longer blocks: bbb
Whiteboard: [bbb]
(In reply to Ben Hearsum (:bhearsum) from comment #0) > This is less frequent now that pool recycle is set, but I'm still seeing > errors like: ... > from time to time. Maybe this isn't a big deal because supervisord should be > restarting it.... ftr: mysql disconnect neither detected nor fixed by supervisord on Friday Aug 7
Hit this from a different type of error over the weekend: 2015-08-21 12:54:27,937 - bbb.runner - Running reflector service 2015-08-21 12:54:27,937 - bbb.services - Starting reflector 2015-08-21 12:54:27,940 - bbb.services - Processing task: 4N-0yTkmQzuNmq3zjLf7cw 2015-08-21 12:54:27,947 - bbb.services - BuildRequest 77842006 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:27,947 - bbb.services - Processing task: sKklWUdZS1mircngEJBBzw 2015-08-21 12:54:27,954 - bbb.services - BuildRequest 78070838 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:27,954 - bbb.services - Processing task: _dgDktc4R0GytOXdydnIpQ 2015-08-21 12:54:27,961 - bbb.services - BuildRequest 78092858 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:27,961 - bbb.services - Processing task: JJM9oHrARx6fsTffPT99aA 2015-08-21 12:54:27,967 - bbb.services - BuildRequest 78109284 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:27,967 - bbb.services - Processing task: A18IB6cARBGyKrV7w5YAMA 2015-08-21 12:54:27,972 - bbb.services - BuildRequest disappeared before starting, cancelling task 2015-08-21 12:54:27,983 - requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): queue.taskcluster.net 2015-08-21 12:54:28,493 - sqlalchemy.pool.QueuePool - Exception during reset or similar Traceback (most recent call last): File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2519, in do_rollback dbapi_connection.rollback() ProgrammingError: (2014, "Commands out of sync; you can't run this command now") 2015-08-21 12:54:32,688 - bbb.runner - Running reflector service 2015-08-21 12:54:32,688 - bbb.services - Starting reflector 2015-08-21 12:54:32,690 - bbb.services - Processing task: 4N-0yTkmQzuNmq3zjLf7cw 2015-08-21 12:54:32,697 - bbb.services - BuildRequest 77842006 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:32,698 - bbb.services - Processing task: sKklWUdZS1mircngEJBBzw 2015-08-21 12:54:32,704 - bbb.services - BuildRequest 78070838 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:32,705 - bbb.services - Processing task: _dgDktc4R0GytOXdydnIpQ 2015-08-21 12:54:32,711 - bbb.services - BuildRequest 78092858 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:32,711 - bbb.services - Processing task: JJM9oHrARx6fsTffPT99aA 2015-08-21 12:54:32,717 - bbb.services - BuildRequest 78109284 is done. BBListener should process it soon, reclaiming in the meantime 2015-08-21 12:54:32,717 - bbb.services - Processing task: A18IB6cARBGyKrV7w5YAMA 2015-08-21 12:54:32,723 - bbb.services - BuildRequest disappeared before starting, cancelling task 2015-08-21 12:54:32,734 - requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): queue.taskcluster.net 2015-08-21 12:54:33,245 - sqlalchemy.pool.QueuePool - Exception during reset or similar Traceback (most recent call last): File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 567, in _finalize_fairy fairy._reset(pool) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 701, in _reset pool._dialect.do_rollback(self) File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2519, in do_rollback dbapi_connection.rollback() ProgrammingError: (2014, "Commands out of sync; you can't run this command now") Looksl ike supervisor tried to restart it at least once based on the timestamps.
(In reply to Ben Hearsum (:bhearsum) from comment #2) > Hit this from a different type of error over the weekend: > 2015-08-21 12:54:27,937 - bbb.runner - Running reflector service > 2015-08-21 12:54:27,937 - bbb.services - Starting reflector > 2015-08-21 12:54:27,940 - bbb.services - Processing task: > 4N-0yTkmQzuNmq3zjLf7cw > 2015-08-21 12:54:27,947 - bbb.services - BuildRequest 77842006 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:27,947 - bbb.services - Processing task: > sKklWUdZS1mircngEJBBzw > 2015-08-21 12:54:27,954 - bbb.services - BuildRequest 78070838 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:27,954 - bbb.services - Processing task: > _dgDktc4R0GytOXdydnIpQ > 2015-08-21 12:54:27,961 - bbb.services - BuildRequest 78092858 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:27,961 - bbb.services - Processing task: > JJM9oHrARx6fsTffPT99aA > 2015-08-21 12:54:27,967 - bbb.services - BuildRequest 78109284 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:27,967 - bbb.services - Processing task: > A18IB6cARBGyKrV7w5YAMA > 2015-08-21 12:54:27,972 - bbb.services - BuildRequest disappeared before > starting, cancelling task > 2015-08-21 12:54:27,983 - requests.packages.urllib3.connectionpool - > Starting new HTTPS connection (1): queue.taskcluster.net > 2015-08-21 12:54:28,493 - sqlalchemy.pool.QueuePool - Exception during reset > or similar > Traceback (most recent call last): > File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line > 567, in _finalize_fairy > fairy._reset(pool) > File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line > 701, in _reset > pool._dialect.do_rollback(self) > File > "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", > line 2519, in do_rollback > dbapi_connection.rollback() > ProgrammingError: (2014, "Commands out of sync; you can't run this command > now") > 2015-08-21 12:54:32,688 - bbb.runner - Running reflector service > 2015-08-21 12:54:32,688 - bbb.services - Starting reflector > 2015-08-21 12:54:32,690 - bbb.services - Processing task: > 4N-0yTkmQzuNmq3zjLf7cw > 2015-08-21 12:54:32,697 - bbb.services - BuildRequest 77842006 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:32,698 - bbb.services - Processing task: > sKklWUdZS1mircngEJBBzw > 2015-08-21 12:54:32,704 - bbb.services - BuildRequest 78070838 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:32,705 - bbb.services - Processing task: > _dgDktc4R0GytOXdydnIpQ > 2015-08-21 12:54:32,711 - bbb.services - BuildRequest 78092858 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:32,711 - bbb.services - Processing task: > JJM9oHrARx6fsTffPT99aA > 2015-08-21 12:54:32,717 - bbb.services - BuildRequest 78109284 is done. > BBListener should process it soon, reclaiming in the meantime > 2015-08-21 12:54:32,717 - bbb.services - Processing task: > A18IB6cARBGyKrV7w5YAMA > 2015-08-21 12:54:32,723 - bbb.services - BuildRequest disappeared before > starting, cancelling task > 2015-08-21 12:54:32,734 - requests.packages.urllib3.connectionpool - > Starting new HTTPS connection (1): queue.taskcluster.net > 2015-08-21 12:54:33,245 - sqlalchemy.pool.QueuePool - Exception during reset > or similar > Traceback (most recent call last): > File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line > 567, in _finalize_fairy > fairy._reset(pool) > File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line > 701, in _reset > pool._dialect.do_rollback(self) > File > "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", > line 2519, in do_rollback > dbapi_connection.rollback() > ProgrammingError: (2014, "Commands out of sync; you can't run this command > now") > > > Looksl ike supervisor tried to restart it at least once based on the > timestamps. This might actually be something different...the service refuses to restart, with the same error every time. I filed bug 1197855 for it.
After loss of mysql during outage in bug 1241334, all 3 BBB hosts showed up in nagios as errors (only 1 process running). Restarting on all 3 showed the same symptoms: [root@buildbot-master72.bb.releng.usw2.mozilla.com ~]# for i in bblistener tclistener reflector; do supervisorctl restart buildbot_bridge_$i; done buildbot_bridge_bblistener: ERROR (not running) buildbot_bridge_bblistener: started buildbot_bridge_tclistener: stopped buildbot_bridge_tclistener: started buildbot_bridge_reflector: ERROR (not running) buildbot_bridge_reflector: started [root@buildbot-master72.bb.releng.usw2.mozilla.com ~]#
At about 05:40 this morning (2016-01-31) there was some mysql issue ('Too many connections' in nagios and buildbot logs), and the bblistener and reflector failed on bm72 and bm82. Supervisord tried to restart them but failed, possibly gave up too soon. Restarted all three services on both hosts, with the command from comment #4.
Ben: do we still need to build in some resiliency here?
Flags: needinfo?(bhearsum)
(In reply to Chris Cooper [:coop] from comment #6) > Ben: do we still need to build in some resiliency here? It would probably be good to have.
Flags: needinfo?(bhearsum)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.