Closed
Bug 525201
Opened 16 years ago
Closed 14 years ago
Tags Unicode Problem
Categories
(addons.mozilla.org Graveyard :: Localization, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
4.x (triaged)
People
(Reporter: barisderin, Assigned: davedash)
Details
(Whiteboard: [z])
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4
Tag strings on is shown as in UTF8 encoding. But they need to be converted and shown as Unicode.
Reproducible: Always
Steps to Reproduce:
1. Visit https://addons.mozilla.org/en-US/firefox/addon/11448/
2. At the right of the page below Tags section you will notice the UTF8 characters in tags.
3. Instead they need to be shown as Unicode.
Actual Results:
Tags contains UTF8 encoded characters.
Expected Results:
Tags needs to be shown in Unicode.
Comment 1•16 years ago
|
||
My guess is those strings are double encoded somewhere along the line, but I don't think it's a bug on AMO. There are plenty examples of UTF-8 strings working properly:
https://addons.mozilla.org/en-US/firefox/tag/搜索
https://addons.mozilla.org/en-US/firefox/tag/शब्दकोश
https://addons.mozilla.org/en-US/firefox/tag/فارسي
Are these tags that you added? If so, can you tell me what you were trying to add?
Reporter | ||
Comment 2•16 years ago
|
||
Those tags were working quite well almost one week ago. They were in Unicode. Second and Third tags were :
# Ekşi
# ekşi sözlük
but suddenly they converted into UTF8 encoded style as below:
# EkÅŸi
# ekşi sözlük
. So it seems that the problem is specific but something caused on Server or Database that Unicode to Encoded change. Just wanted to report.
Comment 3•16 years ago
|
||
davedash: fall out from the db conversion?
Assignee | ||
Comment 4•16 years ago
|
||
Interesting... I'll dig into this... the mysql changes we made would be a likely candidate for this not working.
Assignee | ||
Comment 5•16 years ago
|
||
Wil - So I ran,
UPDATE tags SET tag_text = CONVERT(CONVERT(CONVERT(tag_text USING latin1) using binary) using utf8) WHERE LENGTH(tag_text) > CHAR_LENGTH(tag_text) AND id = 969;
ERROR 1062 (23000): Duplicate entry 'ekşi sözlük' for key 2
Which means we already have a tag where the encoding is correct.
We've got 3 options:
* Remove all these from the tags table (and the user_addons_tags)
* Fix them case by case as they come up, like in this case.
* Write a script to attempt to fix them, but I think that might be time consuming
I'm leaning toward the second option, as this is something that anybody can quickly do if they see the tags on their addon - or if they tagged an addon, they can just remove it and retype the tag again.
Baris, feel free to do that now, and we can see if it's worth it for the other tags.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 6•16 years ago
|
||
Well, I removed them and added :
# ekşi sözlük
# ekşi
tags but they converted to
# Eksi Sozluk
# Eksi
. Has UTF8 support been removed from tags?
Assignee | ||
Comment 7•16 years ago
|
||
UTF8 support is there. My suspicion is that it's matching an existing tag "eksi sozluk" in the database.
Our tagging system is relatively primative so here's what happens:
SOmeone somewhere tags an addon as:
YAHoö
in the future... anybody who tags something as:
yahoo
Yahoo
yAhoo
will get their tag written as "YAHoö".
Which is a actually quite frustrating... since tag casing, especially when it comes to umlauts and other diactrics are important to some.
Unfortunately that's a big task and involves some rearchitecting of the tags system... I'll file a bug regarding that.
Assignee | ||
Comment 8•16 years ago
|
||
This issue is tracked in bug 525271
Comment 9•16 years ago
|
||
Er, that shouldn't happen. That's a mysql problem.
SELECT * FROM tags WHERE tag_text = 'YAHoö'
is returning the row with 'yahoo' in it.
Assignee | ||
Comment 10•16 years ago
|
||
That's expected:
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
You'd need to define the table differently for mysql to treat casing differently.
Comment 11•16 years ago
|
||
-> davedash for ideas. It'd be nice to figure out an answer in 5.5/5.6 at least, even if we don't get it fixed in that timeframe.
Assignee: nobody → dd
Priority: -- → P3
Target Milestone: --- → 5.5
Comment 13•15 years ago
|
||
Alright, the plan is to create a second column in the db to hold clean values (after any substitution or normalization, similar to the translations table).
In addition (and more to the heart of this problem) we'll need to ALTER the mysql table to be binary so it doesn't do ridiculous things like comment 9.
Priority: P3 → P2
Whiteboard: [z]
Target Milestone: 5.6 → 4.x (triaged)
Comment 14•14 years ago
|
||
I think we fixed this elsewhere:
mysql> SELECT * FROM tags WHERE tag_text = 'yahoo';
+-------+----------+-------------+---------------------+---------------------+------------+
| id | tag_text | blacklisted | created | modified | restricted |
+-------+----------+-------------+---------------------+---------------------+------------+
| 25318 | yahoo | 0 | 2010-09-21 06:36:30 | 2010-09-21 06:36:30 | 0 |
+-------+----------+-------------+---------------------+---------------------+------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM tags WHERE tag_text = 'YAHoö';
Empty set (0.00 sec)
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•