Closed Bug 1293718 Opened 8 years ago Closed 8 years ago

Synchronize production schema with Django's view of schema.

Categories

(developer.mozilla.org Graveyard :: General, defect)

All
Other
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jwhitlock, Assigned: jwhitlock)

References

Details

(Keywords: in-triage, Whiteboard: [specification][type:bug])

Attachments

(7 files)

What did you do? ================ 1. Create a fresh local database 2. Run mysqldump --no-data to get the database schema 3. Do the same against production 4. Compare the results What happened? ============== There are indexes, including unique indexes, in the fresh local database that are not in the production database. There are small differences in the data columns between the two databases. What should have happened? ========================== The indexes may have different names, but they should be present in both databases. Columns should be identical. Is there anything else we should know? ====================================== The effect will be subtle difference in behavior between a freshly created database (like during tests) and the production database, leading to annoying, unreproducible bugs. Differences are determined by visually examining a diff of a fresh development schema versus the production schema, both generated with "mysqldump --no-data". Schema differences in indexes (not counting index names): account_emailaddress.user_id - Prod has a KEY index on user_id, not in dev database auth_group_permissions - Prod has a KEY index on group_id, not in dev database. - Prod does not have dev's CONSTRAINT on permission_id to auth_permission.id - Prod does not have dev's CONSTRAINT on group_id to auth_group.id auth_permission - Prod has a KEY index on content_type_id not in dev - Prod does not have dev's CONSTRAINT on content_type_id to django_content_type.id auth_user_groups - Prod has a KEY index on group_id not in dev - Prod does not have dev's CONSTRAINT on group_id to auth_group.id - Prod does not have dev's CONSTRAINT on user_id to auth_user.id auth_user_user_permissions - Prod has a KEY index on permission_id not in dev - Prod does not have dev's CONSTRAINT on permission_id to auth_permission.id - Prod does not have dev's CONSTRAINT on user_id to auth_user.id celery_taskmeta - Prod does not have KEY index on hidden like dev celery_tasksetmeta - Prod does not have KEY index on hidden like dev constance_config - Prod has an (extra) UNIQUE KEY on key(100) django_admin_log - Prod lacks dev's CONSTRAINT of content_type_id to django_content_type.id - Prod lacks dev's CONSTRAINT of user_id to auth_user.id feeder_bundle_feeds - Prod has a KEY index on bundle_id - Prod does not have dev's CONSTRAINT on bundle_id to feeder_bundle.id - Prod does not have dev's CONSTRAINT on feed_id to feeder_feed.id feeder_entry - Prod does not have dev's CONSTRAINT on feeder_feed.id socialaccount_socialapp_sites - Prod has KEY index on socialapp_id, not on dev socialaccount_socialtoken - Prod has KEY index on app_id, not on dev taggit_tag - Prod does not have UNIQUE KEY index on name like dev taggit_taggeditem - Prod has KEY index on content_type_id, not on dev tidings_watch - Prod does not have dev's KEY index on event_type - Prod does not have dev's KEY index on object_id - Prod does not have dev's KEY index on email - Prod does not have dev's KEY index on is_active waffle_flag_groups - Prod has KEY index on flag_id - Prod does not have CONSTRAINT on flag_id to waffle_flag.id - Prod does not have CONSTRAINT on group_id to auth_group.id waffle_flag_users - Prod has KEY index on flag_id - Prod does not have CONSTRAINT on flag_id to waffle_flag.id - Prod does not have CONSTRAINT on group_id to auth_group.id wiki_document - Prod has KEY index on (locale, id). dev has KEY index on locale - Prod does not have CONSTRAINT on current_revision_id to wiki_revision.id - Prod does not have CONSTRAINT on parent_topic_id to wiki_document.id - Prod does not have CONSTRAINT on parent_id to wiki_document.id wiki_documenttag - Prod does not have UNIQUE KEY on name wiki_documentzone - Prod has KEY index on document_id (as well as UNIQUE KEY index) - Prod does not have KEY index on url_root wiki_editortoolbar - Prod does not have CONSTRAINT on creator_id to auth_user.id wiki_localizationtag - Prod does not have UNIQUE KEY on name wiki_reviewtag - Prod does not have UNIQUE KEY on name wiki_reviewtaggedrevision - Prod does not have CONSTRAINT on content_object_id to wiki_revision.id - Prod does not have CONSTRAINT on tag_id to wiki_reviewtag.id wiki_revision - Prod does not have KEY index on document_id - Prod does not have CONSTRAINT on based_on_id to wiki_revision.id - Prod does not have CONSTRAINT on creator_id to auth_user.id - Prod does not have CONSTRAINT on document_id to wiki_document.id wiki_taggeddocument - Prod does not have CONSTRAINT on content_object_id to wiki_document.id - Prod does not have CONSTRAINT on tag_id to wiki_documenttag.id There are some schema differences in columns: attachments_attachment.current_revision_id: Prod allows NULL, dev does not djcelery_taskstate: Prod has DEFAULT NULL, dev has no default search_filter.shortcut: Prod has no default, dev has DEFAULT NULL search_filtergroup.slug: Prod has no default, dev has DEFAULT NULL tidings_watch.email: varchar(75) in prod, varchar(254) in dev wiki_document.render_max_age: dev has DEFAULT NULL, prod does not wiki_document.render_expires: dev has DEFAULT NULL, prod does not wiki_document.current_revision_id: prod has DEFAULT NULL, dev does not wiki_document.parent_id: prod has DEFAULT NULL, dev does not wiki_documentspamattempt.reviewed: prod has DEFAULT NULL, dev does not wiki_documentspamattempt.reviewer_id: prod has DEFAULT NULL, dev does not wiki_documentzone.url_root: dev has DEFAULT NULL, prod does not Production has many NOT NULL columns that take defaults: attachments_attachmentrevision.created: 2012-07-02 attachments_attachmentrevision.is_approved: '1' attachments_attachmentrevision.is_mindtouch_migration: '0' djcelery_crontabsschedule.minute: '*' djcelery_crontabsschedule.hour: '*' djcelery_crontabsschedule.day_of_week: '*' djcelery_periodictask.enabled: '1' djcelery_periodictask.total_run_count: '0' djcelery_periodictasks.ident: '1' djcelery_taskstate.hidden: '0' userban.date: '2013-05-28' userban.is_active: '1' waffle_flag.superusers: '1' waffle_flag.staff: '0' waffle_flag.authenticated: '0' waffle_flag.rollout: '0' waffle_switch.active: '0' wiki_document.is_template: '0' wiki_document.is_localizable: '0' wiki_document.locale: 'en-US' wiki_editortoolbar.default: '0' wiki_revision.created: '2011-12-21' wiki_revision.is_approved: '0' There are several columns where production has the type as datetime, and development as datetime(6). The difference [1] is that datetime(6) has a fractional second precision of 6 decimal places (microsecond precision), and omitting it gives a precision of no fractional seconds: account_emailconfirmation.created account_emailconfirmation.sent attachments_attachment.modified authkeys_key.created authkeys_keyaction.created celery_taskmeta.date_done celery_tasksetmeta.date_done core_ipban.created core_ipban.deleted django_admin_log.action_time django_migratons.applied django_session.expire_date djcelery_periodictask.expires djcelery_periodictask.last_run_at djcelery_periodictask.date_changed djcelery_periodictasks.last_update djcelery_taskstate.tstamp djcelery_taskstate.eta djcelery_taskstate.expires djcelery_workerstate.last_heartbeat feeder_entry.last_published feeder_entry.created feeder_entry.updated feeder_feed.last_modified feeder_feed.created feeder_feed.updated search_index.created_at search_outdatedobject.created_at socialaccount_socialaccount.last_login socialaccount_socialaccount.date_joined socialaccount_socialtoken.expires_at waffle_flag.created waffle_flag.modified waffle_sample.created waffle_sample.modified waffle_switch.created waffle_switch.modified wiki_document.render_scheduled_at wiki_document.render_started_at wiki_document.last_rendered_at wiki_document.render_expires wiki_document.modified wiki_documentdeletionlog.timestamp wiki_documentspamattempt.created wiki_documentspamattempt.reviewed wiki_revision.created So, why are there difference? Some of this is probably left over from the mindtouch migration, which probably involved some manual steps. Some of this may be bugs in South migrations. Some of this probably happened during the mega-upgrade for Django 1.7 (PR 3073 [1]), which included switching from South for migrations to Django 1.7 native migrations. Some of the indexes were added in 3rd party packages like django-taggit, and appeared in the Django 1.7 initial migration, without actually being applied to the production database. Some of this may have been manual attempts to fix production issues that were not reflected in code. There's enough issues that it should probably be fixed with some hand-written migrations targeted at production. [1] https://dev.mysql.com/doc/refman/5.7/en/fractional-seconds.html [2] https://github.com/mozilla/kuma/pull/3073
The production database schema
A fresh developer database schema. Change Django settings for a fresh database, run SQL "CREATE DATABASE <dbname> CHARACTER SET utf8;", and then run "./manage.py migrate" to create the schema.
Depends on: 1293749
Another source of differences: MyISAM doesn't support FOREIGN KEY and REFERENCES clauses (foreign key constraints), so these would have been omitted if the tables were initially MyISAM and later converted to InnoDB. MySQL allows "upgrading" column types, such as datetime to datetime(6). Django does not do this automatically, it must be done manually: https://docs.djangoproject.com/en/1.10/ref/databases/#fractional-seconds-support-for-time-and-datetime-fields The production database is MySQL 5.6, which does not support renaming keys / indexes. On this engine, a ADD / DROP (or DROP / ADD) must be used to rename. I'm working on a SQL file for a one-time fix of 90% of these issues. Others, like adding UNIQUE INDEX for taggit names, will require data cleanup as well as SQL.
Assignee: nobody → jwhitlock
Query the database to find records that violate foreign key constraints, and select * the records found in the production database.
Delete the records that violate foreign key constraints (or would, if they were present)
Update the schema for production, including: 1) Dropping defaults no longer used 2) Upgrading datetime to datetime(6) (date + time + microseconds) 3) Rearranging columns to match the order in a fresh database 4) Adding "NOT NULL" where missing 5) Renaming indexes to match a fresh database 6) Adding missing indexes and foreign key constraints 7) Other schema changes to match production to a fresh database constructed by Django. This only include schema changes that took less than 10 seconds to execute locally. It does not include adding a unique index to the tagging name fields.
Same as fix03_*.sql, but works on the wiki_document and wiki_revision tables, which take several minutes to update.
Attached file test_schema.sh
A test script used to: 1) Load a production database dump into a test database 2) Run fix01_show_constraint_violations.sql to find constraint violations 3) Run fix02_delete_constraint_violations.sql to delete constraint violations 4) Run fix01_show_constraint_violations.sql again to show no remaining violations 5) Run fix03_schema_fast_tables.sql to execute the quick schema changes 6) Run fix04_schema_slow_tables.sql to execute the slow schema changes This is not what should be run against the production database, but does show how to run the SQL scripts and capture the output for debugging.
I've created and uploaded some SQL scripts that deal with most of the schema differences. They are also on GitHub: https://gist.github.com/jwhitlock/ae48e7ca4ad418c5549e778c41cfe9bd * fix01_show_constraint_violations.sql queries for records that would violate foreign key constraints, due to manually deleted data * fix02_delete_constraint_violations.sql deletes the violators * fix03_schema_fast_tables.sql updates the schema, and the queries run in less than 10 seconds each locally * fix04_schema_slow_tables.sql finishes updating the schema, but the queries can take minutes (15 - 25 minutes total) It is relatively safe to run fix02 and fix03 against the live database. They may lock the table during the update, but it should execute quickly enough for a user to only see a delay in their request. The second query will cause many, if not all, requests to time out, and should be done during scheduled maintenance. This does not address the missing unique indexes, which will require more intense data cleanup to remove duplicates.
Depends on: 1299227
After the schema updates in bug 1299227, the only significant schema differences are the unique indexes on the name columns of tag tables. These four are tracked in bug 1293749.
Verified that production schema is now the same as the development schema, with the exception of tidings_watch.email, which has the (correct) type of varchar(254) in production, and the (incorrect) type of varchar(75) in development.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: