Closed Bug 660876 Opened 13 years ago Closed 13 years ago

[prod] ISE - OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

Categories

(Websites Graveyard :: markup.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mbrandt, Assigned: wenzel)

References

()

Details

(Whiteboard: [prod])

I ran a JMeter script against prod that submitted 150 complex marks (15 threads each sumbitting 10 marks). 5 failed due to db deadlocks.

I should point out that this is a big improvement from the previous behavior.


Traceback (most recent call last):

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/core/handlers/base.py", line 100, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/views/decorators/http.py", line 37, in inner
    return func(request, *args, **kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/markup/requests.py", line 128, in save_mark
    new_mark_reference = common.save_new_mark_with_data(mark_data, request.META['REMOTE_ADDR'])

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/../ffdemo/markup/common.py", line 71, in save_new_mark_with_data
    reference = create_save_mark(hash(stripped_points_obj_full), obscurred_ip, stripped_points_obj_simplified, data)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/transaction.py", line 299, in _commit_on_success
    res = func(*args, **kw)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/../ffdemo/markup/common.py", line 83, in create_save_mark
    new_mark = Mark.objects.create()

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/manager.py", line 138, in create
    return self.get_query_set().create(**kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/query.py", line 352, in create
    obj.save(force_insert=True, using=self.db)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/base.py", line 434, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/base.py", line 527, in save_base
    result = manager._insert(values, return_id=update_pk, using=using)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/manager.py", line 195, in _insert
    return insert_query(self.model, values, **kwargs)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/query.py", line 1479, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/sql/compiler.py", line 783, in execute_sql
    cursor = super(SQLInsertCompiler, self).execute_sql(None)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/models/sql/compiler.py", line 727, in execute_sql
    cursor.execute(sql, params)

  File "/data/www/python/markup.mozilla.org/markup/ffdemo/vendor/src/django/django/db/backends/mysql/base.py", line 86, in execute
    return self.cursor.execute(query, args)

  File "/usr/lib/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)

  File "/usr/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue

OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
Hm, we could restart the transaction once or twice but this might aggravate the situation :-/ Or we live with the fact that somewhat massive concurrent submissions lead to about a 3% failure rate. Hmmmmm.
Yeah ... my thoughts exactly (comment 1). The case to talk about is what do we feel is a "massive" concurrent? I'm not sure what would constitute this?

> we could restart the transaction once or twice but this might aggravate the situation :-/

Would this be simple/quick to implement and deploy to prod? If so perhaps we should experiment? If not perhaps we should focus our efforts on testing for and fixing regressions.
I'll take a look.
I added collision-free sequence generation (as borrowed from zamboni's translation magic) in place of the INSERT/UPDATE transaction we used before. If it still fails after this fix, then it will be a different error message ;)

Cynicism aside, I believe this to have way less overhead (transactions are somewhat expensive), and with a little bit of luck, the database agrees.

https://github.com/mozilla/markup/commit/6ca2e83
Assignee: nobody → fwenzel
Status: NEW → ASSIGNED
Aaand here's another index (this time on country_code), which is searched for and caused yet another full-table walk, resulting in slowness and database headaches here:
https://github.com/mozilla/markup/commit/44f9b50
Depends on: 660931
This has landed on stage and prod. Marking fixed.

mbrandt: Feel free to run another one of the tests in comment 0 and report back with the results. Thanks!
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Wow! I think that fixed us right up. JMeter successfully posted 3000 marks (30 threads each posting 100 marks) with out a single failure.

QA verified. Thanks!
Status: RESOLVED → VERIFIED
Product: Websites → Websites Graveyard
You need to log in before you can comment on or make changes to this bug.