mozpool fails on startup if the DB is not available

RESOLVED FIXED

Status

Testing Graveyard
Mozpool
RESOLVED FIXED
5 years ago
2 years ago

People

(Reporter: dividehex, Assigned: dustin)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
During the maintenance window today mozpool lost connection to the db and properly suicide but supervisor give up try to restart mozpool.  We should probably increase the 'startretries' in supervisord and/or increase the suicide delay to give it a bigger window of time before going into the FATAL state.  

==> supervisord.log-20130821 <==
2013-08-24 12:18:48,983 INFO exited: mozpool-server (exit status 1; not expected)
2013-08-24 12:18:49,985 INFO spawned: 'mozpool-server' with pid 30312
2013-08-24 12:18:50,323 INFO exited: mozpool-server (exit status 1; not expected)
2013-08-24 12:18:51,326 INFO spawned: 'mozpool-server' with pid 30321
2013-08-24 12:18:51,664 INFO exited: mozpool-server (exit status 1; not expected)
2013-08-24 12:18:53,668 INFO spawned: 'mozpool-server' with pid 30330
2013-08-24 12:18:54,008 INFO exited: mozpool-server (exit status 1; not expected)
2013-08-24 12:18:57,013 INFO spawned: 'mozpool-server' with pid 30339
2013-08-24 12:18:57,351 INFO exited: mozpool-server (exit status 1; not expected)
2013-08-24 12:18:58,353 INFO gave up: mozpool-server entered FATAL state, too many start retries too quickly
Why was mozpool bailing out so fast?  Its watchdog timer is ten minutes. Looking at the logs,

root INFO - [2013-08-24 12:18:54,918] Mozpool-4.1.2 server starting
Traceback (most recent call last):
  File "/opt/mozpool/frontend/bin/mozpool-server", line 9, in <module>
    load_entry_point('mozpool==4.1.2', 'console_scripts', 'mozpool-server')()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/web/server.py", line 66, in main
    mozpool.lifeguard.driver = devicemachine.LifeguardDriver(db)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/lifeguard/devicemachine.py", line 62, in __init__
    self.imaging_server_id = self.db.imaging_servers.get_id(config.get('server', 'fqdn'))
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/db/imaging_servers.py", line 17, in get_id
    whereclause=(model.imaging_servers.c.fqdn==fqdn)))
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/db/pool.py", line 65, in execute
    conn = self.engine.connect()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2471, in connect
    return self._connection_cls(self, **kwargs)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 878, in __init__
    self.__connection = connection or engine.raw_connection()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2557, in raw_connection
    return self.pool.unique_connection()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/pool.py", line 184, in unique_connection
    return _ConnectionFairy(self).checkout()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/pool.py", line 401, in __init__
    rec = self._connection_record = pool._do_get()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/pool.py", line 746, in _do_get
    con = self._create_connection()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/pool.py", line 189, in _create_connection
    return _ConnectionRecord(self)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/pool.py", line 282, in __init__
    self.connection = self.__connect()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/pool.py", line 344, in __connect
    connection = self.__pool._creator()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect
    return dialect.connect(*cargs, **cparams)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 281, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/pymysql/__init__.py", line 93, in Connect
    return Connection(*args, **kwargs)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/pymysql/connections.py", line 510, in __init__
    self._connect()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/pymysql/connections.py", line 676, in _connect
    self._get_server_information()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/pymysql/connections.py", line 803, in _get_server_information
    packet = MysqlPacket(self)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/pymysql/connections.py", line 200, in __init__
    self.__recv_packet()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/pymysql/connections.py", line 206, in __recv_packet
    raise OperationalError(2013, "Lost connection to MySQL server during query")
sqlalchemy.exc.OperationalError: (OperationalError) (2013, 'Lost connection to MySQL server during query') None None

it's running a DB query in the foreground during the startup process.  The unhandled exception causes the process to exit.

Supervisord really did the right thing, rather than pound the process indefinitely.  The bug is that the startup process requires a DB.
Assignee: relops → nobody
Component: RelOps: Puppet → Mozpool
Product: Infrastructure & Operations → Testing
QA Contact: dustin
Version: other → unspecified
Summary: supervisord gives up on restarting mozpool too quickly → mozpool fails on startup if the DB is not available
Created attachment 8336983 [details] [diff] [review]
bug909045.patch
Assignee: nobody → dustin
Attachment #8336983 - Flags: review?(jwatkins)
(Reporter)

Comment 3

5 years ago
Comment on attachment 8336983 [details] [diff] [review]
bug909045.patch

I like it.  r+
Attachment #8336983 - Flags: review?(jwatkins) → review+
Comment on attachment 8336983 [details] [diff] [review]
bug909045.patch

landed, but not shipped yet
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.