Closed Bug 1303767 Opened 8 years ago Closed 7 years ago

Use consistent MySQL collation/charset across all environments (utf8_bin)

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

The collation/charset varies between environments (see bug 1303763), so we should try and make them consistent.

Django doesn't let us set a collation:
https://docs.djangoproject.com/en/1.8/ref/databases/#collation-settings

...so the best solution instead is presumably to use a server default collation.

I think a default is set for SCL3, but not Vagrant/Heroku.

Cameron - I believe you've looked into this in the past, could you suggest the best way forwards, looking at the diffs in the parent bug? Some of the differences may in fact be from the last round of experimentation with charset/collation changes. Thanks!
Flags: needinfo?(cdawson)
Actually, some of the collation variations are on the per-project tables.

Django's lack of support is therefore not a problem for those - we should just add collation to the in-repo SQL.

Could you take care of that? :-)
Chatted on vidyo.  We will discuss more in the Treeherder meeting tomorrow.
Flags: needinfo?(cdawson)
Diff between Heroku stage -> SCL3 prod, found using 'SELECT @@character_set_database, @@collation_database;':

--- charset-collation-heroku-stage.txt  2016-09-20 23:12:58.653561000 +0100
+++ charset-collation-scl3-prod.txt     2016-09-20 23:13:22.278010700 +0100
@@ -1,88 +1,87 @@
 accessibility:  'utf8', 'utf8_general_ci'
 addon-sdk:  'utf8', 'utf8_general_ci'
 alder:  'utf8', 'utf8_general_ci'
 ash:  'utf8', 'utf8_general_ci'
 autoland:  'utf8', 'utf8_bin'
-autopush:  'latin1', 'latin1_swedish_ci'
+autopush:  'utf8', 'utf8_bin'
 b2g-inbound:  'utf8', 'utf8_general_ci'
 b2g-ota:  'utf8', 'utf8_bin'
 birch:  'utf8', 'utf8_general_ci'
 bmo-development:  'utf8', 'utf8_general_ci'
 bmo-master:  'utf8', 'utf8_general_ci'
 bmo-upstream-merge:  'utf8', 'utf8_bin'
 bmo:  'utf8', 'utf8_general_ci'
 bugzilla-4_2:  'utf8', 'utf8_general_ci'
 bugzilla-4_4:  'utf8', 'utf8_general_ci'
 bugzilla-5_0:  'utf8', 'utf8_general_ci'
 bugzilla-master:  'utf8', 'utf8_general_ci'
 bugzilla:  'utf8', 'utf8_general_ci'
 build-system:  'utf8', 'utf8_general_ci'
-camdtest:  'latin1', 'latin1_swedish_ci'
 cedar:  'utf8', 'utf8_general_ci'
 comm-aurora:  'utf8', 'utf8_general_ci'
 comm-beta:  'utf8', 'utf8_general_ci'
 comm-central:  'utf8', 'utf8_general_ci'
 comm-esr24:  'utf8', 'utf8_general_ci'
 comm-esr31:  'utf8', 'utf8_general_ci'
 comm-esr38:  'utf8', 'utf8_general_ci'
 comm-esr45:  'utf8', 'utf8_bin'
 cypress:  'utf8', 'utf8_general_ci'
 date:  'utf8', 'utf8_general_ci'
 elm:  'utf8', 'utf8_general_ci'
-example-addon:  'latin1', 'latin1_swedish_ci'
+example-addon:  'utf8', 'utf8_bin'
 fig:  'utf8', 'utf8_general_ci'
 fx-team:  'utf8', 'utf8_general_ci'
 gaia-master:  'utf8', 'utf8_general_ci'
 gaia-try:  'utf8', 'utf8_general_ci'
 gaia-v1_4:  'utf8', 'utf8_general_ci'
 gaia:  'utf8', 'utf8_general_ci'
 graphics:  'utf8', 'utf8_general_ci'
 gum:  'utf8', 'utf8_general_ci'
 holly:  'utf8', 'utf8_general_ci'
 jamun:  'utf8', 'utf8_general_ci'
 larch:  'utf8', 'utf8_general_ci'
 maple:  'utf8', 'utf8_general_ci'
 mozilla-aurora:  'utf8', 'utf8_general_ci'
 mozilla-b2g18:  'utf8', 'utf8_general_ci'
 mozilla-b2g18_v1_1_0_hd:  'utf8', 'utf8_general_ci'
 mozilla-b2g26_v1_2:  'utf8', 'utf8_general_ci'
 mozilla-b2g28_v1_3:  'utf8', 'utf8_general_ci'
 mozilla-b2g28_v1_3t:  'utf8', 'utf8_general_ci'
 mozilla-b2g30_v1_4:  'utf8', 'utf8_general_ci'
 mozilla-b2g32_v2_0:  'utf8', 'utf8_general_ci'
 mozilla-b2g34_v2_1:  'utf8', 'utf8_general_ci'
 mozilla-b2g34_v2_1s:  'utf8', 'utf8_general_ci'
 mozilla-b2g37_v2_2:  'utf8', 'utf8_general_ci'
 mozilla-b2g37_v2_2r:  'utf8', 'utf8_general_ci'
 mozilla-b2g44_v2_5:  'utf8', 'utf8_bin'
 mozilla-beta:  'utf8', 'utf8_general_ci'
 mozilla-central:  'utf8', 'utf8_general_ci'
 mozilla-esr17:  'utf8', 'utf8_general_ci'
 mozilla-esr24:  'utf8', 'utf8_general_ci'
 mozilla-esr31:  'utf8', 'utf8_general_ci'
 mozilla-esr38:  'utf8', 'utf8_general_ci'
 mozilla-esr45:  'utf8', 'utf8_bin'
 mozilla-inbound:  'utf8', 'utf8_general_ci'
 mozilla-release:  'utf8', 'utf8_general_ci'
-nss-try:  'latin1', 'latin1_swedish_ci'
-nss:  'latin1', 'latin1_swedish_ci'
+nss-try:  'utf8', 'utf8_bin'
+nss:  'utf8', 'utf8_bin'
 oak:  'utf8', 'utf8_general_ci'
 pine:  'utf8', 'utf8_general_ci'
 qa-try:  'utf8', 'utf8_general_ci'
 services-central:  'utf8', 'utf8_general_ci'
 servo:  'utf8', 'utf8_bin'
 staging-gaia-try:  'utf8', 'utf8_general_ci'
-stylo-try:  'latin1', 'latin1_swedish_ci'
-stylo:  'latin1', 'latin1_swedish_ci'
+stylo-try:  'utf8', 'utf8_bin'
+stylo:  'utf8', 'utf8_bin'
 taskcluster-integration:  'utf8', 'utf8_general_ci'
 thunderbird-aurora:  'utf8', 'utf8_general_ci'
 thunderbird-beta:  'utf8', 'utf8_general_ci'
 thunderbird-esr24:  'utf8', 'utf8_general_ci'
 thunderbird-trunk:  'utf8', 'utf8_general_ci'
 thunderbird-try:  'utf8', 'utf8_general_ci'
 try-comm-central:  'utf8', 'utf8_general_ci'
 try-taskcluster:  'utf8', 'utf8_general_ci'
 try:  'utf8', 'utf8_general_ci'
 unknown:  'utf8', 'utf8_general_ci'
 ux:  'utf8', 'utf8_general_ci'
 version-control-tools:  'utf8', 'utf8_bin'
For prior art, see bug 1194221 / bug 1201087.

Mauro: was the decision to change the plan from utf8_general_ci to utf8_bin (bug 1194221 comment 9) just because switching to the former was giving unique constraint errors (due to existing data)?

Everyone: the old data aside, if we had free choice, should we use utf8_general_ci or utf8_bin? The former is case-insensitive for searching, the latter uses more space plus means strings are returned as bytestrings not unicode strings. For more info, see:
https://docs.djangoproject.com/en/1.8/ref/databases/#collation-settings
http://dev.mysql.com/doc/refman/5.6/en/charset.html

Note that MySQL's default (and thus what we're seeing for new tables on Heroku/RDS, where a server default hasn't been set) is latin1 / latin1_swedish_ci, so whatever we do, we need to set a default one way or another (and fix up the inconsistent DBs).
Depends on: 1194221
Flags: needinfo?(mdoglio)
:emorley correct, the reference data table has data that would break unique key constraints using ut8_general_ci. At least that was true at the time I worked on that bug.
Flags: needinfo?(mdoglio)
Depends on: 1306926
Depends on: 1313236
Let's just use utf8_bin rather than utf8_general_ci, since:
* Most of the tables already use it
* Some tables will contain dupes if we try to switch in the other direction, which will be a pain to deal with

Tables currently using utf8_general_ci, that will need switching to utf_8:
treeherder.auth_group
treeherder.auth_group_permissions
treeherder.auth_permission
treeherder.auth_user
treeherder.auth_user_groups
treeherder.auth_user_user_permissions
treeherder.django_admin_log
treeherder.django_content_type
treeherder.django_migrations
treeherder.django_session

(eg via "SELECT table_name FROM information_schema.TABLES where table_schema = 'treeherder' and table_collation != 'utf8_bin';")

The SQL in this attachment both changes the table default collation, and migrates existing columns to it.

None of those are large tables, so converting won't take long.
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Attachment #8832955 - Flags: review?(james)
Summary: Use same collation/charset on Vagrant/SCL3/Heroku → Use consistent MySQL collation/charset across all environments (utf8_bin)
Attachment #8832955 - Flags: review?(james) → review+
SQL run against dev/stage/prod.

The server default, the `treeherder` schema default, all table defaults and all existing columns, are now set to charset `utf8` and collation `utf8_bin`.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: