Closed
Bug 1194221
Opened 10 years ago
Closed 9 years ago
Change the reference data varchar column collations to utf8_bin instead of utf8_ci
Categories
(Tree Management :: Treeherder, defect)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mdoglio, Assigned: mdoglio)
References
Details
Attachments
(1 file)
9.28 KB,
text/plain
|
Details |
We don't need case sensitive filtering/sorting and using utf8_ci gives you better performance in query execution.
Assignee | ||
Comment 1•10 years ago
|
||
Also, the django orm doesn't support non-default collation type on varchar/text fields.
Assignee | ||
Comment 2•10 years ago
|
||
Given that the target tables are quite small we probably don't need to do anything fancy other than an alter table. :sheeri what do you think about it? The tables I'm talking about are those ones defined here https://github.com/mozilla/treeherder/blob/cc52bf7f35077ec529e35e0236948fc2a1f89c29/treeherder/model/sql/template_schema/treeherder_reference_1.sql.tmpl
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → mdoglio
Flags: needinfo?(scabral)
Comment 3•10 years ago
|
||
Ed Morley specifically used utf8_bin in https://bugzilla.mozilla.org/show_bug.cgi?id=1185030#c2
If the tables are small, does the performance difference matter? I ask b/c I assume in the bug Ed had good reason to change to a case-sensitive collation.
But yeah, we can change utf8_bin in the tables other than the datasource table (and anything else that uses tokens).
Flags: needinfo?(scabral)
Comment 4•10 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #3)
> Ed Morley specifically used utf8_bin in
> https://bugzilla.mozilla.org/show_bug.cgi?id=1185030#c2
That was not changing the collation, but setting a unique key, using the same collation as was there previously - so don't let it affect the decision here :-)
Assignee | ||
Comment 5•10 years ago
|
||
ok I'll go ahead and change the sql template files we have in our repository. I guess I can produce a sql script to apply the change on staging and I'll coordinate with sheeri to do the same on production if that goes well.
Comment 6•10 years ago
|
||
OK, sounds like a plan.
Assignee | ||
Comment 7•10 years ago
|
||
I tried to run this on staging. I put some notes in the file indicating the failures I got.
Assignee | ||
Comment 8•10 years ago
|
||
I'm putting this work on hold until we have a way to clean the reference data tables.
Assignee | ||
Comment 9•10 years ago
|
||
Since the final goal is to have a sql schema consistent with what the django orm would produce, I'm going to do it the other way around: utf8_bin everywhere! It should be sufficient to:
1- change the default db collation to utf8_bin
2- change the collation of those few utf8_unicode_ci tables to utf8_bin
Comment 10•10 years ago
|
||
Is this something we will undo in the future?
Comment 11•9 years ago
|
||
(In reply to Mauro Doglio [:mdoglio] from comment #9)
> Since the final goal is to have a sql schema consistent with what the django
> orm would produce, I'm going to do it the other way around: utf8_bin
> everywhere! It should be sufficient to:
> 1- change the default db collation to utf8_bin
> 2- change the collation of those few utf8_unicode_ci tables to utf8_bin
Updating the summary to reflect that.
Summary: Change all the reference data varchar columns to use utf8_ci instead of utf8_bin → Change the reference data varchar column collations to utf8_bin instead of utf8_ci
Comment 12•9 years ago
|
||
So, the one note I see that specifically talk about an error is:
/*altering the reference_data_signatures table raise errors for unique key constraints*/
alter table `reference_data_signatures` modify `name` varchar(255) character set utf8 collate utf8_general_ci;
...
When moving from utf8_bin to utf8_general_ci, ci = case insensitive, so if one value is "foo" and another "FOO" that can cause this type of error.
Comment 11 shows that you want to change the case insensitive stuff to binary (which is case sensitive) so this shouldn't be a problem.
Let me know if there's anything else I can help with. default charset is set to utf8 but there's no collation set, which defaults to utf8_general_ci. Do you want me to change the default collation on treeherder_stage and treeherder? Default collation is what new tables created will get.
Assignee | ||
Comment 13•9 years ago
|
||
:sheeri yes, that would be great.
Comment 14•9 years ago
|
||
OK, changing server default in bug 1201087. If you want me to change db defaults, let me know.
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•