Closed
Bug 702291
Opened 14 years ago
Closed 14 years ago
Collation error on Basket
Categories
(Data & BI Services Team :: DB: MySQL, task, P1)
Data & BI Services Team
DB: MySQL
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wenzel, Assigned: nmaul)
References
Details
Basket's database seems to contain some latin1-encoded fields, which should never happen[tm]. Here's a traceback we've been getting:
-----
OperationalError at /subscriptions/subscribe/
(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
/data/www/django/basket.mozilla.com/basket/apps/subscriptions/handlers.py in create
def delete(self, request):
return rc.NOT_IMPLEMENTED
def create(self, request):
try:
email = request.POST.get('email', '')
>>> s = Subscriber.objects.get(email=email) ...
except Subscriber.DoesNotExist:
try:
s = Subscriber(email=email)
s.full_clean()
s.save()
except ValidationError, e:
![]() |
||
Comment 1•14 years ago
|
||
Jake, did you make a new database when we launched the recent version of basket?
This is old code, so it's pointing to a database issue. Can you check that the database collation is utf8?
Reporter | ||
Comment 2•14 years ago
|
||
(In reply to James Long (:jlongster) from comment #1)
> Can you check that the database collation is utf8?
.. and field collation too. Somewhat strangely, both entire MySQL databases as well as the individual fields inside them have collations associated with them.
Comment 3•14 years ago
|
||
db: tm-c01-master01
mysql> use basket_mozilla_com;
Database changed
mysql> show variables like "collation_database";
+--------------------+-------------------+
| Variable_name | Value |
+--------------------+-------------------+
| collation_database | latin1_swedish_ci |
+--------------------+-------------------+
1 row in set (0.00 sec)
@mpressman: would we be able to change/alter the collation to utf8 easily?
![]() |
||
Comment 4•14 years ago
|
||
fwiw, the email on these errors contains Unicode, such as: teshikata2000\uff20yahoo.co.jp
![]() |
Assignee | |
Comment 6•14 years ago
|
||
I'm afraid to touch this myself.
A database has a default charset for new tables. This is what Jason posted in comment 3, I think.
A table has a default charset (for new rows? or rows that don't specify? not sure). This can be seen in "show create table <table>".
A column has a charset, which is the same as the table default unless specified otherwise. This is also shown in "show create table <table>", except if it's the same as the default then it's not specifically listed.
I'm not at all certain of what to do here: http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
There are choices here, and it's not obvious to me how to proceed. Do we need to actually *convert* the data in the table to a different character set? I think probably not, but I'm not certain. Do we just need to tell MySQL that the existing data is already utf8, and not actually alter the data, just the column/table definitions? Maybe, but I don't know what "teshikata2000\uff20yahoo.co.jp" will end up looking like... Maybe it would still be "teshikata2000\uff20yahoo.co.jp", but in Unicode (that is, not actually translate the code in the middle).
If anyone has more/better info on this, please share. :)
![]() |
Assignee | |
Comment 7•14 years ago
|
||
Another note / clarification:
If you've been feeding the database UTF-8 data from the start (regardless of what MySQL thought it was), it should be possible to do something like this:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
That is, convert the affected column to an intermediary binary type, then to utf8. If the affected column is TEXT, then use BLOB as the intermediary. If it's VARCHAR, then use VARBINARY. If it's CHAR, then use BINARY.
Once all the affected columns are changed, it would be a good idea to change the table default as well, so any new columns get made properly:
ALTER TABLE tbl_name DEFAULT CHARACTER SET utf8;
Once all the tables are done, it might be good to do the same for the DB as a whole, so new tables have the proper default:
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8;
![]() |
||
Comment 8•14 years ago
|
||
According to this django bug: https://code.djangoproject.com/ticket/11950
"... Django always uses utf-8 to talk to the database."
I would assume that all data in the database is UTF-8 compatible. If we've ever entered non-latin-1 data, it should be utf-8 encoded. All other data in latin-1 is represented the same way in UTF-8, right?
So you're suggestion in comment 7 seems like the way to go. Without access to the database it's hard to verify it though. Can you do a quick select on the Subscriber table and see what the data looks like? I don't understand how there could be any UTF-8 data in there currently if the encoding is latin-1. Has this been a problem from the start or can you think of something that happened when we updated the code (did we recreate the database)?
Note that the error is happening when trying to *insert* new data, so we don't need to focus on recovering any current data that's not encoded correctly.
![]() |
||
Comment 9•14 years ago
|
||
Here's a snippet for converting mysql tables to utf-8:
http://michaelangela.wordpress.com/2008/05/07/snippet-to-update-django-tables-to-utf8/
Seems like the way to go.
Reporter | ||
Comment 10•14 years ago
|
||
It's mildly out of scope here, but if there's *any* way we can set up our database server configs so that by default, we never, ever, create a database that's not UTF-8, that would be very much appreciated. It's the umpteenth time we had to deal with this issue in production databases :( It is almost always caused by the person setting up the DB not picking any charset (I can't blame them), so MySQL shrugs and uses latin1-swedish. Broken-by-default is frustrating and stupid.
Matt Pressman, do you know if MySQL has such a default setting?
Comment 11•14 years ago
|
||
Seems that you can start up MySQL with such an option (the sanest place for config!):
http://dev.mysql.com/doc/refman/5.1/en/charset-applications.html
Reporter | ||
Comment 12•14 years ago
|
||
I filed bug 703286 to get the default settings fixed, so I stop scope-creeping this bug.
![]() |
||
Comment 13•14 years ago
|
||
Assigning this to IT because we need to run the commands in comment 9 (or something like them) to convert basket's database to UTF-8.
Assignee: jlong → server-ops-database
Component: Webdev → Server Operations: Database
QA Contact: webdev → cshields
![]() |
Assignee | |
Comment 14•14 years ago
|
||
I'm good with this... very nice to see an example specifically mentioning Django apps, so it feels like someone already did the homework. I'll do this in just a minute.
Assignee: server-ops-database → nmaul
![]() |
Assignee | |
Comment 15•14 years ago
|
||
This is completed.
[root@mradm02 basket]# python26 manage.py shell
Python 2.6.5 (r265:79063, Feb 28 2011, 21:55:45)
Type "copyright", "credits" or "license" for more information.
IPython 0.10 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: from django.db import connection
In [2]: cursor = connection.cursor()
In [3]: cursor.execute('SHOW TABLES')
Out[3]: 21L
In [4]: results=[]
In [5]: for row in cursor.fetchall(): results.append(row)
...:
In [6]: for row in results: cursor.execute('ALTER TABLE %s CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;' % (row[0]))
...:
Out[6]: 0L
Out[6]: 0L
Out[6]: 0L
Out[6]: 48L
Out[6]: 3L
Out[6]: 0L
Out[6]: 48L
Out[6]: 25L
Out[6]: 16L
Out[6]: 31L
Out[6]: 1L
Out[6]: 2L
Out[6]: 1890201L
Out[6]: 36L
Out[6]: 0L
Out[6]: 0L
Out[6]: 0L
Out[6]: 1L
Out[6]: 1530320L
Out[6]: 1219243L
Out[6]: 1L
In [7]: quit()
Do you really want to exit ([y]/n)? y
I this converts all the columns to utf8, then also changes the default for each of the tables to utf8. Afterwards I also did this:
mysql> alter database basket_mozilla_com default character set utf8;
Query OK, 1 row affected (0.00 sec)
mysql> show variables like "c%_database";
+------------------------+-----------------+
| Variable_name | Value |
+------------------------+-----------------+
| character_set_database | utf8 |
| collation_database | utf8_general_ci |
+------------------------+-----------------+
2 rows in set (0.00 sec)
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•