Closed Bug 1034377 Opened 10 years ago Closed 10 years ago

"Lock wait timeout exceeded" produces a 500 error; should be 409

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rfkelly, Assigned: rfkelly)

Details

(Whiteboard: [qa+])

Attachments

(1 file)

Traceback from loadtest:

Traceback (most recent call last): File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/SyncStorage-1.5.6-py2.6.egg/syncstorage/storage/sql/dbconnect.py", line 422, in report_backend_errors_wrapper return func(self, *args, **kwds) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/SyncStorage-1.5.6-py2.6.egg/syncstorage/storage/sql/dbconnect.py", line 530, in execute return self._exec_with_cleanup(connection, query_str, **params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/mozsvc/metrics.py", line 183, in timed_func return func(*args, **kwds) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/SyncStorage-1.5.6-py2.6.egg/syncstorage/storage/sql/dbconnect.py", line 550, in _exec_with_cleanup return connection.execute(sqltext(query_str), **params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 720, in execute return meth(self, multiparams, params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement compiled_sql, distilled_params File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context context) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1108, in _handle_dbapi_exception exc_info File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 940, in _execute_context context) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute cursor.execute(statement, parameters) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/cursors.py", line 132, in execute result = self._query(query) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/cursors.py", line 271, in _query conn.query(q) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 726, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 861, in _read_query_result result.read() File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 1064, in read first_packet = self.connection._read_packet() File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 826, in _read_packet packet.check_error() File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 370, in check_error raise_mysql_exception(self._data) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/err.py", line 116, in raise_mysql_exception _check_mysql_exception(errinfo) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/err.py", line 112, in _check_mysql_exception raise InternalError(errno, errorvalue) InternalError: (InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction') '/* [queryName=LOCK_COLLECTION_WRITE, retry=1] */ SELECT last_modified FROM user_collections WHERE userid=%s AND collection=%s FOR UPDATE' (567217, 6)


This should have been caught and turned into a "409 Conflict" but it doesn't look like it was.  Probably not catching the right error subclass.
Assignee: nobody → rfkelly
Summary: "Lock wait timeout exceeded" prouces a 500 error; should be 409 → "Lock wait timeout exceeded" produces a 500 error; should be 409
Whiteboard: [qa+]
AFAICT, the fact that this traceback starts in "report_backend_errors_wrapper" rather than in the top-level "catch_backend_errors_tween" means that it's logging the error before (correctly) turning it into a BackendError instance.  So maybe this is actually being caught and reported properly, and it's just the logged traceback that is confusing.

I'll dig a little more into the behaviour here though.
Here are some improvements to our "409 Conflict" generation based on my invetigation for this traceback:

  * Add an explicit message when logging such tracebacks, so that we can more easily distinguish "uncaught error" from "caught and logged".

  * Treat "1213: deadlock found when trying to get lock" as a retryable locking-related error

  * Move the "db locking error" => "ConflictError" mapping down onto the session object, so we correctly detect these if they show up on any query rather than just on the lock-taking query

These changes helped the "409 Conflict" response show up more reliably in my local testing on both sqlite and mysql (as opposed to these errors being reported as a generic "503 Service Unavailable").

Pinging :callahad for review in the interests of disseminating knowledge, and :bobm for feedback to sanity-check adding "1213" to the list of retryable errors.
Attachment #8457839 - Flags: review?(dan.callahan)
Attachment #8457839 - Flags: feedback?(bobm)
(In reply to Ryan Kelly [:rfkelly] from comment #2)

> Pinging :callahad for review in the interests of disseminating knowledge,
> and :bobm for feedback to sanity-check adding "1213" to the list of
> retryable errors.

Sounds good to me.  Is there / will there be a retry limit?
> Sounds good to me.  Is there / will there be a retry limit?

Yes, this flows into the existing retry logic which will retry exactly once, then fail back to the client with a 503 or 409, depending on the error.
Comment on attachment 8457839 [details] [diff] [review]
syncstorage-better-conflict-errors.diff

Sounds like a good strategy to me.
Attachment #8457839 - Flags: feedback?(bobm) → feedback+
Comment on attachment 8457839 [details] [diff] [review]
syncstorage-better-conflict-errors.diff

Bumping review to :telliott, let's get this landed before the next round of deployments.
Attachment #8457839 - Flags: review?(dan.callahan) → review?(telliott)
Attachment #8457839 - Flags: review?(telliott) → review+
Bear in mind that desktop currently doesn't support 409 Conflict: Bug 959034.
> Bear in mind that desktop currently doesn't support 409 Conflict: Bug 959034.

Ugh, I had forgotten about the error-bar aspect of that.  I'll ponder the implications before committing.
I committed this because it gives better internal retry handling, and because we already generate 409s under some circumstances anyway:

https://github.com/mozilla-services/server-syncstorage/commit/ee8349145405285bb0f4b78b28320c272fec156f

There's some risk that this might surface an error bar in situations where it wouldn't previously, but it seems acceptably small to me.  We should separately consider the best path forward there.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
OK. I checked my notes. I do not see any specific deployments coming up, so I will just hold this one.
(In reply to Ryan Kelly [:rfkelly] from comment #9)
> I committed this because it gives better internal retry handling, and
> because we already generate 409s under some circumstances anyway:

Should we add a graph with a 409 + UA break down?
> Should we add a graph with a 409 + UA break down?

Nah.  See Bug 1051739 where I just disabled them entirely until we get proper client support.
server-syncstorage 1.5.9 is now in Production
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: