User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Firefox/1.0.7 (Debian package 1.0.7-1) Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Firefox/1.0.7 (Debian package 1.0.7-1) Strings in bugzilla are processed as byte coded, not Unicode, even if utf-8 parameter is set. Connection to database also not in unicode character set. As result unicode characters get broken in some places of bugzilla, case insensitive search is not possible. This bug derived from bug #126266#c275 and IMHO blocks bug #126266. Reproducible: Always Steps to Reproduce: 1. Install Bugzilla 2.22 2. Enter new bug with Summary in Russian language, for example, with comment and summary lines more than 60 characters. For example, "рфывиплрфывимдлофыивмдлофывимдлофывимдлыфвоимдл оиывфмдофы вилдм выфа фыва выфа вфы а выф авфы авфы выаф оифыв" 3. Click to "My Bugs" 4. Click to bug you just entered Actual Results: 1. Summary in bug list trimmed as "рфывиплрфывимдлофыивмдлофыви�...". 2. Comment wrapped with 40 charaters in line Expected Results: 1. Summary in bug list trimmed as "рфывиплрфывимдлофыивмдлофывимдлофывимдлыфвоимдл оиывфмдо...". (60 charaters, there is no bad unicode symbol before ..." 2. Comment wrapped with 80 charaters in line $perldoc perlunicode $perldoc Encode all strings, that come to perl script, must be decoded from input character set to native perl strings with Encode::decode function. All native perl strings must be encoded to output encoding with Encode::encode function. example: Bugzilla::set_db_connection_character_set("UTF-8"); $octets = Bugzilla::get_octets_from_db (); # or Bugzilla::get_octets_from_url () $string = Encode::decode("UTF-8", $octets); # if we use UTF-8 encoding in db do_something_with_string ($string); $octets = Encode::encode("UTF-8", $octets); Bugzilla::insert_octets_into_db ($octets); # or Bugzilla::send_octets_to_client ($octets);
Created attachment 216709 [details] [diff] [review] unicode strings This patch is only for illustration. String must be decoded not in inner functions of Bugzilla, but just when it comes from outside, and there is no encoding to output character set, because perl use utf-8 as native string encoding. $ perl -v This is perl, v5.8.8 built for i486-linux-gnu-thread-multi Copyright 1987-2006, Larry Wall mysql-server 5.0.18-9
Created attachment 216710 [details] [diff] [review] unicode strings v2 Use Encode:: not utf8:: strings encoded to utf-8 after processing is done.
Attachment #216710 - Flags: review? → review?(mkanat)
Confirming. It's what we should do as soon as possible. What's up if you happen to be bitten by bug 331583? Do we need to fix such databases here, or in bug 304550, or not at all?
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Summary: Bugzilla do not proccess strings as Unicode strings when UTF-8 parameter is set → Bugzilla does not proccess strings as Unicode strings when UTF-8 parameter is set
Target Milestone: --- → Bugzilla 2.24
Version: unspecified → 2.22
*** This bug has been marked as a duplicate of 331691 ***
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → DUPLICATE
this request for review doesn't make sense on a dupe. Marc, shouldn't this patch be moved elsewhere? Else remove this request.
Comment on attachment 216710 [details] [diff] [review] unicode strings v2 You're right, Frédéric. Renat, are you interested in proposing your patch at bug 331691? I can upload it there, if you don't want to do it yourself.
Attachment #216710 - Flags: review?(mkanat)
You need to log in before you can comment on or make changes to this bug.