Bugzilla does not proccess strings as Unicode strings when UTF-8 parameter is set

RESOLVED DUPLICATE of bug 331691

Status

()

Bugzilla
Bugzilla-General
--
major
RESOLVED DUPLICATE of bug 331691
12 years ago
10 years ago

People

(Reporter: Renat Sabitov, Unassigned)

Tracking

Dependency tree / graph

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Firefox/1.0.7 (Debian package 1.0.7-1)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Firefox/1.0.7 (Debian package 1.0.7-1)

Strings in bugzilla are processed as byte coded, not Unicode, even if utf-8 parameter is set. Connection to database also not in unicode character set. As result unicode characters get broken in some places of bugzilla, case insensitive search is not possible.

This bug derived from bug #126266#c275 and IMHO blocks bug #126266.

Reproducible: Always

Steps to Reproduce:
1. Install Bugzilla 2.22
2. Enter new bug with Summary in Russian language, for example, with comment and  summary lines more than 60 characters. For example, "рфывиплрфывимдлофыивмдлофывимдлофывимдлыфвоимдл оиывфмдофы вилдм выфа фыва выфа вфы а выф авфы авфы выаф оифыв"
3. Click to "My Bugs"
4. Click to bug you just entered
Actual Results:  
1. Summary in bug list trimmed as "рфывиплрфывимдлофыивмдлофыви�...".
2. Comment wrapped with 40 charaters in line

Expected Results:  
1. Summary in bug list trimmed as "рфывиплрфывимдлофыивмдлофывимдлофывимдлыфвоимдл оиывфмдо...". (60 charaters, there is no bad unicode symbol before ..."
2. Comment wrapped with 80 charaters in line

$perldoc perlunicode
$perldoc Encode

all strings, that come to perl script, must be decoded from input character set to native perl strings with Encode::decode function. All native perl strings must be encoded to output encoding with Encode::encode function. 

example:

Bugzilla::set_db_connection_character_set("UTF-8");

$octets = Bugzilla::get_octets_from_db (); # or Bugzilla::get_octets_from_url ()

$string = Encode::decode("UTF-8", $octets); # if we use UTF-8 encoding in db

do_something_with_string ($string);

$octets = Encode::encode("UTF-8", $octets);

Bugzilla::insert_octets_into_db ($octets); # or Bugzilla::send_octets_to_client ($octets);
(Reporter)

Comment 1

12 years ago
Created attachment 216709 [details] [diff] [review]
unicode strings

This patch is only for illustration. String must be decoded not in inner functions of Bugzilla, but just when it comes from outside, and there is no encoding to output character set, because perl use utf-8 as native string encoding.

$ perl -v

This is perl, v5.8.8 built for i486-linux-gnu-thread-multi

Copyright 1987-2006, Larry Wall

mysql-server              5.0.18-9
(Reporter)

Comment 2

12 years ago
Created attachment 216710 [details] [diff] [review]
unicode strings v2

Use Encode:: not utf8::

strings encoded to utf-8 after processing is done.
Attachment #216709 - Attachment is obsolete: true
Attachment #216710 - Flags: review?
(Reporter)

Updated

12 years ago
Blocks: 331691
(Reporter)

Updated

12 years ago
Blocks: 324413
(Reporter)

Updated

12 years ago
Depends on: 304550
(Reporter)

Updated

12 years ago
Attachment #216710 - Flags: review? → review?(mkanat)
Confirming. It's what we should do as soon as possible. What's up if you happen to be bitten by bug 331583? Do we need to fix such databases here, or in bug 304550, or not at all?
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Summary: Bugzilla do not proccess strings as Unicode strings when UTF-8 parameter is set → Bugzilla does not proccess strings as Unicode strings when UTF-8 parameter is set
Target Milestone: --- → Bugzilla 2.24
Version: unspecified → 2.22

*** This bug has been marked as a duplicate of 331691 ***
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → DUPLICATE
No longer blocks: 331691

Comment 5

12 years ago
this request for review doesn't make sense on a dupe. Marc, shouldn't this patch be moved elsewhere? Else remove this request.
Comment on attachment 216710 [details] [diff] [review]
unicode strings v2

You're right, Frédéric.

Renat, are you interested in proposing your patch at bug 331691? I can upload it there, if you don't want to do it yourself.
Attachment #216710 - Flags: review?(mkanat)

Updated

12 years ago
Target Milestone: Bugzilla 2.24 → ---
You need to log in before you can comment on or make changes to this bug.