Closed Bug 503523 Opened 15 years ago Closed 15 years ago

UTF8 character set isn't used when connecting to MySQL

Categories

(addons.mozilla.org Graveyard :: Public Pages, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: clouserw, Assigned: davedash)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 503502 found that we aren't setting the correct character set when we connect to MySQL.  By default MySQL is latin1 (that's probably what we're doing) and we should be running "SET NAMES 'utf8'" when our connection fires up.  An example:

mysql> show variables like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

mysql> select tags from text_search_summary where match(tags) against('海');
+----------+
| tags     |
+----------+
| á,v,海 | 
+----------+
1 row in set (0.00 sec)

mysql> SET NAMES 'UTF8';
Query OK, 0 rows affected (0.00 sec)

mysql> select tags from text_search_summary where match(tags) against('海');
Empty set (0.00 sec)

It fails the second time because our ft_min_word_len is 2 but without the right encoding 海 is determined to be 3 characters long.

Before this bug gets fixed we need some serious testing and consideration of changing this both on the current data and future data.  Completely untested, but Sergey's last comment on http://bugs.mysql.com/bug.php?id=28581 could help migrate existing data.
mysql> SELECT count(localized_string) FROM translations WHERE char_length(localized_string) <> length(localized_string)
    -> ;
+-------------------------+
| count(localized_string) |
+-------------------------+
|                   36208 | 
+-------------------------+
1 row in set (3.31 sec)

SELECT tag_text FROM tags WHERE char_length(tag_text) <> length(tag_text)
can get us the affected tags.
Assignee: nobody → dd
Attached patch utf8 awesome (obsolete) — Splinter Review
This patch does the following: 
* Forces all cake connections to use utf8 charset
* Migration script (utf8.sql, will be renamed on commit to have a commit number)
Attachment #406345 - Flags: review?(clouserw)
Attached patch utf8 awesome!Splinter Review
Covering the rest of the tables.  Anything missed can be done at a later time.  We should make sure the data is backed up, and we log when we run this script, just to cover our bases.

Whether this lands in 5.2, depends on QA.  QA will have to do fairly thorough coverage of the site to make sure we've covered every text string that is coming from the DB to check for weird entities.
Attachment #406345 - Attachment is obsolete: true
Attachment #406381 - Flags: review?(clouserw)
Attachment #406345 - Flags: review?(clouserw)
Attachment #406381 - Flags: review?(clouserw) → review+
QA will take this patch for 5.2, and we'll run our Selenium testcases (search.html, search2.html, searchapi.html), plus our Litmus testsuite and ad-hoc.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Summary: UTF character set isn't used when connecting to MySQL → UTF8 character set isn't used when connecting to MySQL
Verified FIXED; we ran the above, and didn't notice anything amiss.
Status: RESOLVED → VERIFIED
Blocks: 525827
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: