Closed
Bug 1303767
Opened 8 years ago
Closed 8 years ago
Use consistent MySQL collation/charset across all environments (utf8_bin)
Categories
(Tree Management :: Treeherder, defect, P1)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
The collation/charset varies between environments (see bug 1303763), so we should try and make them consistent.
Django doesn't let us set a collation:
https://docs.djangoproject.com/en/1.8/ref/databases/#collation-settings
...so the best solution instead is presumably to use a server default collation.
I think a default is set for SCL3, but not Vagrant/Heroku.
Cameron - I believe you've looked into this in the past, could you suggest the best way forwards, looking at the diffs in the parent bug? Some of the differences may in fact be from the last round of experimentation with charset/collation changes. Thanks!
Flags: needinfo?(cdawson)
Assignee | ||
Comment 1•8 years ago
|
||
Actually, some of the collation variations are on the per-project tables.
Django's lack of support is therefore not a problem for those - we should just add collation to the in-repo SQL.
Could you take care of that? :-)
Comment 2•8 years ago
|
||
Chatted on vidyo. We will discuss more in the Treeherder meeting tomorrow.
Flags: needinfo?(cdawson)
Assignee | ||
Comment 3•8 years ago
|
||
Diff between Heroku stage -> SCL3 prod, found using 'SELECT @@character_set_database, @@collation_database;':
--- charset-collation-heroku-stage.txt 2016-09-20 23:12:58.653561000 +0100
+++ charset-collation-scl3-prod.txt 2016-09-20 23:13:22.278010700 +0100
@@ -1,88 +1,87 @@
accessibility: 'utf8', 'utf8_general_ci'
addon-sdk: 'utf8', 'utf8_general_ci'
alder: 'utf8', 'utf8_general_ci'
ash: 'utf8', 'utf8_general_ci'
autoland: 'utf8', 'utf8_bin'
-autopush: 'latin1', 'latin1_swedish_ci'
+autopush: 'utf8', 'utf8_bin'
b2g-inbound: 'utf8', 'utf8_general_ci'
b2g-ota: 'utf8', 'utf8_bin'
birch: 'utf8', 'utf8_general_ci'
bmo-development: 'utf8', 'utf8_general_ci'
bmo-master: 'utf8', 'utf8_general_ci'
bmo-upstream-merge: 'utf8', 'utf8_bin'
bmo: 'utf8', 'utf8_general_ci'
bugzilla-4_2: 'utf8', 'utf8_general_ci'
bugzilla-4_4: 'utf8', 'utf8_general_ci'
bugzilla-5_0: 'utf8', 'utf8_general_ci'
bugzilla-master: 'utf8', 'utf8_general_ci'
bugzilla: 'utf8', 'utf8_general_ci'
build-system: 'utf8', 'utf8_general_ci'
-camdtest: 'latin1', 'latin1_swedish_ci'
cedar: 'utf8', 'utf8_general_ci'
comm-aurora: 'utf8', 'utf8_general_ci'
comm-beta: 'utf8', 'utf8_general_ci'
comm-central: 'utf8', 'utf8_general_ci'
comm-esr24: 'utf8', 'utf8_general_ci'
comm-esr31: 'utf8', 'utf8_general_ci'
comm-esr38: 'utf8', 'utf8_general_ci'
comm-esr45: 'utf8', 'utf8_bin'
cypress: 'utf8', 'utf8_general_ci'
date: 'utf8', 'utf8_general_ci'
elm: 'utf8', 'utf8_general_ci'
-example-addon: 'latin1', 'latin1_swedish_ci'
+example-addon: 'utf8', 'utf8_bin'
fig: 'utf8', 'utf8_general_ci'
fx-team: 'utf8', 'utf8_general_ci'
gaia-master: 'utf8', 'utf8_general_ci'
gaia-try: 'utf8', 'utf8_general_ci'
gaia-v1_4: 'utf8', 'utf8_general_ci'
gaia: 'utf8', 'utf8_general_ci'
graphics: 'utf8', 'utf8_general_ci'
gum: 'utf8', 'utf8_general_ci'
holly: 'utf8', 'utf8_general_ci'
jamun: 'utf8', 'utf8_general_ci'
larch: 'utf8', 'utf8_general_ci'
maple: 'utf8', 'utf8_general_ci'
mozilla-aurora: 'utf8', 'utf8_general_ci'
mozilla-b2g18: 'utf8', 'utf8_general_ci'
mozilla-b2g18_v1_1_0_hd: 'utf8', 'utf8_general_ci'
mozilla-b2g26_v1_2: 'utf8', 'utf8_general_ci'
mozilla-b2g28_v1_3: 'utf8', 'utf8_general_ci'
mozilla-b2g28_v1_3t: 'utf8', 'utf8_general_ci'
mozilla-b2g30_v1_4: 'utf8', 'utf8_general_ci'
mozilla-b2g32_v2_0: 'utf8', 'utf8_general_ci'
mozilla-b2g34_v2_1: 'utf8', 'utf8_general_ci'
mozilla-b2g34_v2_1s: 'utf8', 'utf8_general_ci'
mozilla-b2g37_v2_2: 'utf8', 'utf8_general_ci'
mozilla-b2g37_v2_2r: 'utf8', 'utf8_general_ci'
mozilla-b2g44_v2_5: 'utf8', 'utf8_bin'
mozilla-beta: 'utf8', 'utf8_general_ci'
mozilla-central: 'utf8', 'utf8_general_ci'
mozilla-esr17: 'utf8', 'utf8_general_ci'
mozilla-esr24: 'utf8', 'utf8_general_ci'
mozilla-esr31: 'utf8', 'utf8_general_ci'
mozilla-esr38: 'utf8', 'utf8_general_ci'
mozilla-esr45: 'utf8', 'utf8_bin'
mozilla-inbound: 'utf8', 'utf8_general_ci'
mozilla-release: 'utf8', 'utf8_general_ci'
-nss-try: 'latin1', 'latin1_swedish_ci'
-nss: 'latin1', 'latin1_swedish_ci'
+nss-try: 'utf8', 'utf8_bin'
+nss: 'utf8', 'utf8_bin'
oak: 'utf8', 'utf8_general_ci'
pine: 'utf8', 'utf8_general_ci'
qa-try: 'utf8', 'utf8_general_ci'
services-central: 'utf8', 'utf8_general_ci'
servo: 'utf8', 'utf8_bin'
staging-gaia-try: 'utf8', 'utf8_general_ci'
-stylo-try: 'latin1', 'latin1_swedish_ci'
-stylo: 'latin1', 'latin1_swedish_ci'
+stylo-try: 'utf8', 'utf8_bin'
+stylo: 'utf8', 'utf8_bin'
taskcluster-integration: 'utf8', 'utf8_general_ci'
thunderbird-aurora: 'utf8', 'utf8_general_ci'
thunderbird-beta: 'utf8', 'utf8_general_ci'
thunderbird-esr24: 'utf8', 'utf8_general_ci'
thunderbird-trunk: 'utf8', 'utf8_general_ci'
thunderbird-try: 'utf8', 'utf8_general_ci'
try-comm-central: 'utf8', 'utf8_general_ci'
try-taskcluster: 'utf8', 'utf8_general_ci'
try: 'utf8', 'utf8_general_ci'
unknown: 'utf8', 'utf8_general_ci'
ux: 'utf8', 'utf8_general_ci'
version-control-tools: 'utf8', 'utf8_bin'
Assignee | ||
Comment 4•8 years ago
|
||
For prior art, see bug 1194221 / bug 1201087.
Mauro: was the decision to change the plan from utf8_general_ci to utf8_bin (bug 1194221 comment 9) just because switching to the former was giving unique constraint errors (due to existing data)?
Everyone: the old data aside, if we had free choice, should we use utf8_general_ci or utf8_bin? The former is case-insensitive for searching, the latter uses more space plus means strings are returned as bytestrings not unicode strings. For more info, see:
https://docs.djangoproject.com/en/1.8/ref/databases/#collation-settings
http://dev.mysql.com/doc/refman/5.6/en/charset.html
Note that MySQL's default (and thus what we're seeing for new tables on Heroku/RDS, where a server default hasn't been set) is latin1 / latin1_swedish_ci, so whatever we do, we need to set a default one way or another (and fix up the inconsistent DBs).
Depends on: 1194221
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(mdoglio)
Comment 5•8 years ago
|
||
:emorley correct, the reference data table has data that would break unique key constraints using ut8_general_ci. At least that was true at the time I worked on that bug.
Flags: needinfo?(mdoglio)
Assignee | ||
Comment 6•8 years ago
|
||
Let's just use utf8_bin rather than utf8_general_ci, since:
* Most of the tables already use it
* Some tables will contain dupes if we try to switch in the other direction, which will be a pain to deal with
Tables currently using utf8_general_ci, that will need switching to utf_8:
treeherder.auth_group
treeherder.auth_group_permissions
treeherder.auth_permission
treeherder.auth_user
treeherder.auth_user_groups
treeherder.auth_user_user_permissions
treeherder.django_admin_log
treeherder.django_content_type
treeherder.django_migrations
treeherder.django_session
(eg via "SELECT table_name FROM information_schema.TABLES where table_schema = 'treeherder' and table_collation != 'utf8_bin';")
The SQL in this attachment both changes the table default collation, and migrates existing columns to it.
None of those are large tables, so converting won't take long.
Assignee | ||
Updated•8 years ago
|
Summary: Use same collation/charset on Vagrant/SCL3/Heroku → Use consistent MySQL collation/charset across all environments (utf8_bin)
Updated•8 years ago
|
Attachment #8832955 -
Flags: review?(james) → review+
Assignee | ||
Comment 7•8 years ago
|
||
SQL run against dev/stage/prod.
The server default, the `treeherder` schema default, all table defaults and all existing columns, are now set to charset `utf8` and collation `utf8_bin`.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•