Closed Bug 1166250 Opened 9 years ago Closed 9 years ago

buildbot bridge hitting "too many connections" errors to mysql

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [bbb])

Attachments

(1 file)

The traceback we get is:
Traceback (most recent call last):
  File "/builds/bbb/bin/buildbot-bridge", line 9, in <module>
    load_entry_point('bbb==0.3', 'console_scripts', 'buildbot-bridge')()
  File "/builds/bbb/lib/python2.7/site-packages/bbb/runner.py", line 48, in main
    **kwargs
  File "/builds/bbb/lib/python2.7/site-packages/bbb/services.py", line 42, in __init__
    super(BuildbotListener, self).__init__(*args, events=events, **kwargs)
  File "/builds/bbb/lib/python2.7/site-packages/bbb/servicebase.py", line 215, in __init__
    super(ListenerService, self).__init__(*args, **kwargs)
  File "/builds/bbb/lib/python2.7/site-packages/bbb/servicebase.py", line 200, in __init__
    self.bbb_db = BBBDb(bbb_db)
  File "/builds/bbb/lib/python2.7/site-packages/bbb/servicebase.py", line 35, in __init__
    metadata.create_all(self.db)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 3614, in create_all
    tables=tables)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1842, in _run_visitor
    with self._optional_conn_ctx_manager(connection) as conn:
  File "/tools/python27/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1835, in _optional_conn_ctx_manager
    with self.contextual_connect() as conn:
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2026, in contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2065, in _wrap_pool_connect
    e, dialect, self)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1395, in _handle_dbapi_exception_noconnection
    exc_info
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2061, in _wrap_pool_connect
    return fn()
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 337, in connect
    return _ConnectionFairy._checkout(self)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 644, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 439, in checkout
    rec = pool._do_get()
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 964, in _do_get
    self._dec_overflow()
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 961, in _do_get
    return self._create_connection()
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 284, in _create_connection
    return _ConnectionRecord(self)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 410, in __init__
    self.connection = self.__connect()
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 538, in __connect
    connection = self.__pool._creator()
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 89, in connect
    return dialect.connect(*cargs, **cparams)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 377, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/builds/bbb/lib/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/builds/bbb/lib/python2.7/site-packages/MySQLdb/connections.py", line 193, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1040, 'Too many connections')
Sheeri, how many connections are allowed according to the server? I should probably tweak the client to only use 1 connection per process (because the bridge isn't async in any way), but that will still leave us with 9 connections total, maybe more in the near future.
Flags: needinfo?(scabral)
It was 1200 max, but on Sunday I changed it to 5000 - max_used_connections is 2713, so that's the max we hit on Sunday.
Flags: needinfo?(scabral)
(In reply to Sheeri Cabral [:sheeri] from comment #2)
> It was 1200 max, but on Sunday I changed it to 5000 - max_used_connections
> is 2713, so that's the max we hit on Sunday.

Thanks. For posterity: we talked briefly on IRC and confirmed that this is a number for the overall server that we were tipping past. I suspect that the buildbot bridge combined with some daily or weekly reports tipped us over the limit.

I'm still going to shrink the pool size for the buildbot bridge to 1 per instance, but I won't be spending any extra time here given the above.
May as well reduce the pool size since we're single threaded anyways. Setting pool_timeout should get rid of the connection errors we've been getting if the bridge is idle for too long (AFAICT they are caused by the server timing us out and sqlalchemy not noticing).
Attachment #8610132 - Flags: review?(sdeckelmann)
Whiteboard: [bbb]
(In reply to Ben Hearsum [:bhearsum] from comment #4)
> Created attachment 8610132 [details] [diff] [review]
> improve db connections a bit more
> 
> May as well reduce the pool size since we're single threaded anyways.
> Setting pool_timeout should get rid of the connection errors we've been
> getting if the bridge is idle for too long (AFAICT they are caused by the
> server timing us out and sqlalchemy not noticing).

Eiw! If it's possible to create a test case, it may be the kind of thing that SQLAlchemy could be taught to properly handle.  Or is this already a known issue?
Comment on attachment 8610132 [details] [diff] [review]
improve db connections a bit more

Review of attachment 8610132 [details] [diff] [review]:
-----------------------------------------------------------------

Seems fine. Could you add a comment explaining the timeout setting? It's the kind of headscratcher I'm always wishing I knew more about in times of database-related contemplation. And sometimes in crises.
Attachment #8610132 - Flags: review?(sdeckelmann) → review+
(In reply to Selena Deckelmann :selenamarie :selena from comment #5)
> (In reply to Ben Hearsum [:bhearsum] from comment #4)
> > Created attachment 8610132 [details] [diff] [review]
> > improve db connections a bit more
> > 
> > May as well reduce the pool size since we're single threaded anyways.
> > Setting pool_timeout should get rid of the connection errors we've been
> > getting if the bridge is idle for too long (AFAICT they are caused by the
> > server timing us out and sqlalchemy not noticing).
> 
> Eiw! If it's possible to create a test case, it may be the kind of thing
> that SQLAlchemy could be taught to properly handle.  Or is this already a
> known issue?

Based on http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#connection-timeouts https://groups.google.com/forum/#!topic/sqlalchemy/oDXjm-S9y2w (particularly the response from point #2), it's something that apps need to handle by-design. I'll add some comments about this.
(In reply to Ben Hearsum [:bhearsum] from comment #7)
> (In reply to Selena Deckelmann :selenamarie :selena from comment #5)
> > (In reply to Ben Hearsum [:bhearsum] from comment #4)
> > > Created attachment 8610132 [details] [diff] [review]
> > > improve db connections a bit more
> > > 
> > > May as well reduce the pool size since we're single threaded anyways.
> > > Setting pool_timeout should get rid of the connection errors we've been
> > > getting if the bridge is idle for too long (AFAICT they are caused by the
> > > server timing us out and sqlalchemy not noticing).
> > 
> > Eiw! If it's possible to create a test case, it may be the kind of thing
> > that SQLAlchemy could be taught to properly handle.  Or is this already a
> > known issue?
> 
> Based on
> http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#connection-timeouts
> https://groups.google.com/forum/#!topic/sqlalchemy/oDXjm-S9y2w (particularly
> the response from point #2), it's something that apps need to handle
> by-design. I'll add some comments about this.

Hm, rereading some of this I might be wrong here. It seems like pool_recycle is the thing we actually care about, and pool_timeout just cause it to wait longer before giving up on a connection.
In any case, I added some comments about everything in https://github.com/bhearsum/bbb/commit/b8f0f8d5cbdb89cf6749b1d4218447c4fa97b6ab. This patch is built on top of bug 1156301, so I'll hold off landing it until that is ready.
Attachment #8610132 - Flags: checked-in+
Deployed in production!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: