Intermittent deployment "Deadlock found when trying to get lock" during fixture loading

RESOLVED FIXED

Status

P1
normal
RESOLVED FIXED
11 months ago
18 days ago

People

(Reporter: emorley, Assigned: emorley)

Tracking

Details

Attachments

(2 attachments)

(Assignee)

Description

11 months ago
Twice in a row whilst trying to deploy to production (the third deploy succeeded), the deploy failed during the release phase step, in the `manage.py loaddata` management command, with:

`model.FailureClassification(pk=1): (1213, 'Deadlock found when trying to get lock; try restarting transaction')`

...whilst trying to load `fixtures/failure_classification.json`.

This is presumably due to contention with other queries running at the time.

The annoying thing about this is that the fixture file hadn't actually changed, so nothing needed updating. Perhaps we should refactor our use of fixtures to only update things that have changed?


Log excerpt:

-----> PRE-DEPLOY: Loading initial data...
...
Traceback (most recent call last):
  ...
  File "/app/treeherder/model/management/commands/load_initial_data.py", line 14, in handle
    'performance_bug_templates')
  File "/app/.heroku/python/lib/python2.7/site-packages/django/core/management/__init__.py", line 131, in call_command
    return command.execute(*args, **defaults)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/app/.heroku/python/lib/python2.7/site-packages/newrelic/api/function_trace.py", line 107, in literal_wrapper
    return wrapped(*args, **kwargs)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/core/management/commands/loaddata.py", line 69, in handle
    self.loaddata(fixture_labels)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/core/management/commands/loaddata.py", line 109, in loaddata
    self.load_label(fixture_label)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/core/management/commands/loaddata.py", line 175, in load_label
    obj.save(using=self.using)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/core/serializers/base.py", line 205, in save
    models.Model.save_base(self.object, using=using, raw=True, **kwargs)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/base.py", line 838, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/base.py", line 905, in _save_table
    forced_update)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/base.py", line 955, in _do_update
    return filtered._update(values) > 0
  File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/query.py", line 664, in _update
    return query.get_compiler(self.db).execute_sql(CURSOR)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 1204, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
django.db.utils.OperationalError: Problem installing fixture '/app/treeherder/model/fixtures/failure_classification.json': Could not load model.FailureClassification(pk=1): (1213, 'Deadlock found when trying to get lock; try restarting transaction')

Original logs:
https://dashboard.heroku.com/apps/treeherder-prod/activity/releases/63347985-98cb-4191-a383-570702583a81
https://dashboard.heroku.com/apps/treeherder-prod/activity/releases/62e50b90-d23b-4c50-bd22-f8d760669bc2
(Assignee)

Comment 1

10 months ago
And again, but on stage:
https://dashboard.heroku.com/apps/treeherder-stage/activity/releases/1534
(Assignee)

Comment 10

9 months ago
This issue is starting to get annoying.

https://dashboard.heroku.com/apps/treeherder-prod/activity/releases/281
(Assignee)

Updated

9 months ago
Priority: P2 → P1
(Assignee)

Updated

9 months ago
Assignee: nobody → emorley
(Assignee)

Comment 18

8 months ago
So I had an initial look into this and we seem to just be using the native Django fixtures, so not sure what we can do to fix this? It seems like this is something that Django should handle better, and we should file upstream?
(Assignee)

Updated

8 months ago
Assignee: emorley → nobody

Comment 19

8 months ago
Django describes[1] fixtures like so on their topic page:

> It’s sometimes useful to pre-populate your database with hard-coded data when you’re first setting up an app.

and your comment earlier Ed about our use of them:

> the fixture file hadn't actually changed, so nothing needed updating

makes a really good point that we're misusing them.

Having looked at all the fixtures we're loading I think most are candidates for conversion to Field.choices [2].  There are a couple that need some more thought given to them (although there's one that's templating so should probably be in a template) but at least reducing the number we load will hopefull reduce the frequency of deadlocks as a start.

[1]: https://docs.djangoproject.com/en/2.0/howto/initial-data/
[2]: https://docs.djangoproject.com/en/2.0/ref/models/fields/#choices
(Assignee)

Comment 20

7 months ago
Yeah good ideas :-)

Happened again today (the frequency has dropped though):
https://dashboard.heroku.com/apps/treeherder-stage/activity/releases/1849
(Assignee)

Comment 24

4 months ago
This is still happening multiple times a week (and/or day), eg:
https://dashboard.heroku.com/apps/treeherder-stage/activity/releases/2080
Yeah, I just hit it again on stage yesterday, fwiw.

Comment 27

21 days ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/165ef157470eb877d3191bcedb1529a8e95d2d29
Bug 1428031 - Heroku: Retry load_initial_data up to 5 times (#4186)

To work around the intermittent MySQL deadlocks encountered during
a small percentage of deployments. This will ensure deployments don't
cause email spam to app collaborators and Heroku admins, as well as
saving us from having to manually retry the release each time it
occurs.

The `IGNORE_PREDEPLOY_ERRORS` option has also been removed, since
it's mostly redundant with `SKIP_PREDEPLOY`.
(Assignee)

Updated

21 days ago
Assignee: nobody → emorley
Status: NEW → RESOLVED
Last Resolved: 21 days ago
Resolution: --- → FIXED

Comment 30

18 days ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/47419fde6c841a440f85cd341c612ce5495afc75
Bug 1428031 - Heroku: Increase max load_initial_data retries to 10 (#4203)

Since there was still a deploy failure after five attempts here:
https://dashboard.heroku.com/apps/treeherder-prototype/activity/releases/2847
You need to log in before you can comment on or make changes to this bug.