[prod] ISE - OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')



7 years ago
7 years ago


(Reporter: mbrandt, Assigned: wenzel)


Dependency tree / graph


(Whiteboard: [prod], URL)



7 years ago
I ran a JMeter script against prod that submitted 150 complex marks (15 threads each sumbitting 10 marks). 5 failed due to db deadlocks.

I should point out that this is a big improvement from the previous behavior.

Traceback (most recent call last):

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/core/handlers/base.py", line 100, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/views/decorators/http.py", line 37, in inner
    return func(request, *args, **kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/markup/requests.py", line 128, in save_mark
    new_mark_reference = common.save_new_mark_with_data(mark_data, request.META['REMOTE_ADDR'])

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/../ffdemo/markup/common.py", line 71, in save_new_mark_with_data
    reference = create_save_mark(hash(stripped_points_obj_full), obscurred_ip, stripped_points_obj_simplified, data)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/transaction.py", line 299, in _commit_on_success
    res = func(*args, **kw)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/../ffdemo/markup/common.py", line 83, in create_save_mark
    new_mark = Mark.objects.create()

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/manager.py", line 138, in create
    return self.get_query_set().create(**kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/query.py", line 352, in create
    obj.save(force_insert=True, using=self.db)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/base.py", line 434, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/base.py", line 527, in save_base
    result = manager._insert(values, return_id=update_pk, using=using)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/manager.py", line 195, in _insert
    return insert_query(self.model, values, **kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/query.py", line 1479, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/sql/compiler.py", line 783, in execute_sql
    cursor = super(SQLInsertCompiler, self).execute_sql(None)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/sql/compiler.py", line 727, in execute_sql
    cursor.execute(sql, params)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/backends/mysql/base.py", line 86, in execute
    return self.cursor.execute(query, args)

  File "/usr/lib/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)

  File "/usr/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue

OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

Comment 1

7 years ago
Hm, we could restart the transaction once or twice but this might aggravate the situation :-/ Or we live with the fact that somewhat massive concurrent submissions lead to about a 3% failure rate. Hmmmmm.

Comment 2

7 years ago
Yeah ... my thoughts exactly (comment 1). The case to talk about is what do we feel is a "massive" concurrent? I'm not sure what would constitute this?

> we could restart the transaction once or twice but this might aggravate the situation :-/

Would this be simple/quick to implement and deploy to prod? If so perhaps we should experiment? If not perhaps we should focus our efforts on testing for and fixing regressions.

Comment 3

7 years ago
I'll take a look.
Blocks: 628811

Comment 4

7 years ago
I added collision-free sequence generation (as borrowed from zamboni's translation magic) in place of the INSERT/UPDATE transaction we used before. If it still fails after this fix, then it will be a different error message ;)

Cynicism aside, I believe this to have way less overhead (transactions are somewhat expensive), and with a little bit of luck, the database agrees.



7 years ago
Assignee: nobody → fwenzel

Comment 5

7 years ago
Aaand here's another index (this time on country_code), which is searched for and caused yet another full-table walk, resulting in slowness and database headaches here:


7 years ago
Depends on: 660931

Comment 6

7 years ago
This has landed on stage and prod. Marking fixed.

mbrandt: Feel free to run another one of the tests in comment 0 and report back with the results. Thanks!
Last Resolved: 7 years ago
Resolution: --- → FIXED

Comment 7

7 years ago
Wow! I think that fixed us right up. JMeter successfully posted 3000 marks (30 threads each posting 100 marks) with out a single failure.

QA verified. Thanks!
You need to log in before you can comment on or make changes to this bug.