Closed Bug 703286 Opened 13 years ago Closed 13 years ago

Set the default charset on all MySQL servers to UTF-8

Categories

(Data & BI Services Team :: DB: MySQL, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wenzel, Assigned: scabral)

Details

For context, take a look at:
https://bugzilla.mozilla.org/show_bug.cgi?id=702291#c10

We've had this problem at virtually every project we ever did: By default, MySQL creates databases with latin1 charset and Swedish collation.

Please set the MySQL server settings to UTF-8 as described under "Specify character settings at server startup" here:
http://dev.mysql.com/doc/refman/5.1/en/charset-applications.html

This goes for all of webdev's database servers. UTF-8 should require explicit opt-out, not opt-in.

Thanks!
Should/can we specify a default collation? utf8_unicode_ci or utf8_general_ci? I admit I don't know much about why you'd pick one or the other.
The general rule of thumb is if you don't know for sure what collation you need, you should leave it at the default. In the case of UTF-8, that's utf8_general_ci.

I can't see it making a huge difference for us one way or the other, so my preference would be to leave it at the default. It basically affects the sort order when you do ORDER BY queries, and I think nothing else.
(In reply to James Socol [:jsocol, :james] from comment #1)
> Should/can we specify a default collation? utf8_unicode_ci or
> utf8_general_ci? I admit I don't know much about why you'd pick one or the
> other.

According to [1], utf8_general_ci is just a faster version of utf8_unicode_ci, but slightly less correct when doing comparisons. Not exactly sure what that means, but most people seem to use utf8_general_ci.

[1] http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
Assignee: server-ops → server-ops-database
Component: Server Operations → Server Operations: Database
Assignee: server-ops-database → mpressman
I have added this into the puppet configs:

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

Is that it for the scope of this bug, or do we also want to have this bug be scoped for all current dbs?
Assignee: mpressman → scabral
(In reply to Sheeri Cabral [:sheeri] from comment #4)
> I have added this into the puppet configs:
> 
> [mysqld]
> character-set-server=utf8
> collation-server=utf8_general_ci

Great, thanks Sheeri!

> Is that it for the scope of this bug, or do we also want to have this bug be
> scoped for all current dbs?

I would, of course, like that, but it doesn't seem feasible without thorough testing in each of those apps to make sure nothing broke and we're not experiencing messed up unicode characters. So, for the sake of our sanity, let's not increase the scope and let everybody who cares on current products file bugs should they experience problems with their charsets.
OK, great, I'm marking this fixed then.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.