Write some documentation on mysql gotchas with database migrations

NEW
Unassigned

Status

Tree Management
Treeherder: Docs & Development
a year ago
a year ago

People

(Reporter: wlach, Unassigned)

Tracking

Details

Attachments

(1 attachment)

:jgraham requested some more detailed documentation on the procedures I use to apply schema changes to the Treeherder database without running into problems. This bug will track adding my notes to the Treeherder documentation.
Created attachment 8823369 [details] [review]
[treeherder] wlach:mysql_gotchas > mozilla:master
Comment on attachment 8823369 [details] [review]
[treeherder] wlach:mysql_gotchas > mozilla:master

Let me know if these notes make sense, or if additional clarification is needed.
Attachment #8823369 - Flags: review?(james)
Attachment #8823369 - Flags: review?(emorley)
Comment on attachment 8823369 [details] [review]
[treeherder] wlach:mysql_gotchas > mozilla:master

Thank you for doing this!

Some thoughts:

I still think we should be using the Django migrations system even if we don't use the auto-generated SQL. ie:
* Use RunSQL combined with the `state_operations` parameter [1] so Django knows what the SQL statement was doing and doesn't add it's own migration on top. 
* If ingestion must be paused, just pause it prior to pushing to the production branch.

Advantages:
* No chance of deviation between what runs on people's Vagrant instance (the auto-generated migration) vs on stage/prod (the manual SQL)
* The review can happen in one place (the PR) and not split across the PR and a separate SQL attachment
* No need to manually update the Django migrations table

[1] https://github.com/mozilla/treeherder/blob/78f66bc7c5f831acd675ef0f7224ac07f3ec8889/treeherder/model/migrations/0015_auto_20160402_1345.py#L18
Attachment #8823369 - Flags: review?(emorley)
An example workflow would be:

1) Make changes to model files
2) Run `./manage.py makemigrations --name <name>`
3) Run `./manage.py sqlmigrate <model> <name>`
4) Check whether the generated SQL is likely to be problematic
5) If so, modify the generated migrations file, turning it from:

```
    migrations.AlterIndexTogether(...),
```

...to:

```
    # The auto-generated SQL was not performant:
    # <paste of auto-generated SQL for reference and easy review>
    migrations.RunSQL(
        """<as many multi-line optimised SQL statements as desired>""",
        """<the reverse of those statements>""",
        state_operations=[migrations.AlterIndexTogether(...)],
    )
```

As a side note, I've found a few open Django tickets for combining multiple ALTER TABLE statements, though no one has worked on them yet:
https://code.djangoproject.com/ticket/24203
https://code.djangoproject.com/ticket/24363
(In reply to Ed Morley [:emorley] from comment #3)
> Comment on attachment 8823369 [details] [review]
> [treeherder] wlach:mysql_gotchas > mozilla:master
> 
> Thank you for doing this!
> 
> Some thoughts:
> 
> I still think we should be using the Django migrations system even if we
> don't use the auto-generated SQL. ie:
> ...

I personally don't have the inclination to validate whether or not this proposed procedure would work right now (I just have one more large-scale database change to apply, after which I will hopefully not have to touch it for a while), but I am open to someone else trying it for their change and updating the documentation accordingly. 

I agree it could have the advantages you mention, but I just can't stomach the thought of spending any more cycles on working in this area right now: almost inevitably these things turn out to be more complicated than they look at first glance.
It will work, it's already worked for:
https://github.com/mozilla/treeherder/blob/master/treeherder/model/migrations/0015_auto_20160402_1345.py

I'm happy to take over this PR if needed, I'd just like to avoid adding suboptimal practices to our docs, and I think we should also r- any future PR that intends to run manual SQL outside of RunSQL().
(In reply to Ed Morley [:emorley] from comment #6)
> It will work, it's already worked for:
> https://github.com/mozilla/treeherder/blob/master/treeherder/model/
> migrations/0015_auto_20160402_1345.py

I believe that RunSQL works. But I wouldn't trust any documented procedure for actually changing the *structure* of the larger tables (adding an index doesn't count) until it's actually been tested.
(In reply to William Lachance (:wlach) from comment #7)
> (In reply to Ed Morley [:emorley] from comment #6)
> > It will work, it's already worked for:
> > https://github.com/mozilla/treeherder/blob/master/treeherder/model/
> > migrations/0015_auto_20160402_1345.py
> 
> I believe that RunSQL works. But I wouldn't trust any documented procedure
> for actually changing the *structure* of the larger tables (adding an index
> doesn't count) until it's actually been tested.

(but if you want to do that, and take over this bug, then please go ahead)
I agree with emorley that seems like a better approach; I can try it with my unlanded migrations.
Comment on attachment 8823369 [details] [review]
[treeherder] wlach:mysql_gotchas > mozilla:master

Cancelling review request for now if you want to experiment with the migrations stuff before landing this.
Flags: needinfo?(james)
Attachment #8823369 - Flags: review?(james)
Flags: needinfo?(james)
I think this is pretty close, but someone else will need to finish it off as I'm not sure how to describe the preferred procedure w/ django migrations.
Assignee: wlachance → nobody
You need to log in before you can comment on or make changes to this bug.