Closed
Bug 304550
Opened 20 years ago
Closed 18 years ago
Bugzilla should always store data in MySQL as UTF-8
Categories
(Bugzilla :: Database, defect, P2)
Tracking
()
RESOLVED
FIXED
Bugzilla 3.0
People
(Reporter: yanzhenjiao, Assigned: mkanat)
References
()
Details
Attachments
(1 file, 3 obsolete files)
13.92 KB,
patch
|
LpSolit
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Build Identifier:
i want to tanslate the bugzilla into chinese,but it cann't display chinese.
so i modify the following file:
C:\bugzilla\Bugzilla\DB\mysql.pm
in the new subfunction,i add this line
#!!! i add this line!!!
$self->do("set names gb2312");
the code is:
sub new {
my ($class, $user, $pass, $host, $dbname, $port, $sock) = @_;
# construct the DSN from the parameters we got
my $dsn = "DBI:mysql:host=$host;database=$dbname";
$dsn .= ";port=$port" if $port;
$dsn .= ";mysql_socket=$sock" if $sock;
my $self = $class->db_new($dsn, $user, $pass);
#!!! i add this line!!!
$self->do("set names gb2312");
# all class local variables stored in DBI derived class needs to have
# a prefix 'private_'. See DBI documentation.
$self->{private_bz_tables_locked} = "";
bless ($self, $class);
return $self;
}
Reproducible: Always
![]() |
||
Comment 1•20 years ago
|
||
a dupe of bug 288346?
Assignee | ||
Comment 2•20 years ago
|
||
Not exactly. This bug has to do with a slightly larger problem that we may run
into once we enable UTF-8 support.
What the reporter has asked for is actually either INVALID or just a docs note,
but what we do actually need is the ability to possibly set the database to be
interpreted and stored as UTF-8 for new installations. So I'm going to take over
this bug for that purpose.
This has various extreme complications because of how perl deals with UTF-8, how
DBI deals (or doesn't deal) with UTF-8, and how the various database backends
deal (or don't deal) with UTF-8.
Somebody needs to investigate what the problems are if we *don't* set the
database to use UTF-8.
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Summary: it need to set database chatset to display chinese correctly. → Need to be able to set database charset (to UTF-8?)
Version: unspecified → 2.21
Assignee | ||
Comment 3•20 years ago
|
||
Oh, and if somebody does test, make sure that you also test Unicode characters
with a Unicode value greater than 10,000 -- I've had problems with those before,
on DBD::Pg, at least.
Comment 4•20 years ago
|
||
This requires a reinitdb for postgres, doesn't it? Or at least recreating the db.
Assignee | ||
Comment 5•20 years ago
|
||
(In reply to comment #4)
> This requires a reinitdb for postgres, doesn't it? Or at least recreating the
> db.
Yeah. As far as I know it would require a pg_dump and then a re-import of the
DB for Pg.
my bug report is a suggest for bugzilla's function.
now,bugzilla is widely used all around the world.
so,the multi-language support should be appreciated.there are some people
who have tanslate the bugzilla from english to the other languages,but,
when Bugzilla connecting the database,it hasn't deal with the charset.
if this feature is added,we can use it with our own language.
Assignee | ||
Updated•19 years ago
|
Assignee: database → mkanat
Summary: Need to be able to set database charset (to UTF-8?) → Bugzilla should use UTF-8 to connect to the database when the utf8 parameter is on
Target Milestone: --- → Bugzilla 2.24
Assignee | ||
Comment 7•19 years ago
|
||
*** Bug 328013 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 8•19 years ago
|
||
*** Bug 331583 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 9•19 years ago
|
||
This is a bug now, because bug 331583 reports that it causes data to get encoded twice.
Severity: enhancement → major
Priority: -- → P2
Assignee | ||
Updated•19 years ago
|
Target Milestone: Bugzilla 2.24 → Bugzilla 2.22
Assignee | ||
Comment 10•19 years ago
|
||
Okay, for MySQL I think what we need to do is SET CHARACTER SET 'utf8' if the UTF-8 param is on. That way it will correctly convert between our input UTF-8 and whatever encoding the database is. If the database is also UTF-8, it still works.
Comment 11•19 years ago
|
||
If you do it, you also need add comments in rel_notes for users of 2.22rc1. Because after upgrading users can't access non ASCII characters in Bugzilla.
They must repair bugs database.
I make it by next commands:
mysqldump.exe --default-character-set=latin1 -ubugs -p<yourpass> bugs >bugs.sql
!!! need manual correction of bugs.sql
mysql --default-character-set=utf8 -ubugs -p<yourpass> bugs <bugs.sql
Unfortunately current current version of mysqldump has bug - "mysqldump does not dumps tinytext and mediumtext default" - (we reported it as http://bugs.mysql.com/?id=18482), but after manual correction of 'bugs' table definition ('status_whiteboard', 'keywords' columns) and 'groups' table definition ('userregexp' column) this work right
Comment 12•19 years ago
|
||
*** Bug 344342 has been marked as a duplicate of this bug. ***
Comment 13•19 years ago
|
||
Max, are you working on this?
Comment 14•18 years ago
|
||
Apparently not. Taking for now; I feel that I may need some help soon, though.
Assignee: mkanat → wurblzap
Assignee | ||
Comment 15•18 years ago
|
||
Okay, I'm going to take this, now, and actually work on it. I suspect we'll only be able to fix it in 3.0, because we'll have to convert the whole database's character set to UTF-8. However, I'm not sure this is even possible in MySQL 4.0.
As such, in order to actually repair this bug, we will have to require MySQL 4.1. It's a significant-enough bug that all users should have it fixed. However, fixing it programatically in checksetup.pl would be impossible to do without MySQL 4.1.
See also:
http://dev.mysql.com/doc/refman/4.1/en/charset.html
FWIW, my goal is to eventually be able to eliminate the "utf8" parameter and have Bugzilla always use UTF-8, and this would be a critical part of it. (The other part is converting old databases, but I have that pretty well in-hand in bug 280633.)
I'm pretty sure we can fix this bug safely without that, though. We just will have to require MySQL 4.1.
Assignee: wurblzap → mkanat
Assignee | ||
Updated•18 years ago
|
Status: NEW → ASSIGNED
Summary: Bugzilla should use UTF-8 to connect to the database when the utf8 parameter is on → Bugzilla should always store data in MySQL as UTF-8
Target Milestone: Bugzilla 2.22 → Bugzilla 3.0
Assignee | ||
Comment 16•18 years ago
|
||
I think this is a blocker, because our UTF-8 support just can't really work without this patch. The reason that I'm nominating it instead of granting it directly is that I need justdave's approval to bump the MySQL version requirement.
I think it's entirely reasonable to bump the requirement. RHEL4 has MySQL 4.1, and so have most other distros for years.
Flags: blocking3.0?
Assignee | ||
Comment 17•18 years ago
|
||
Okay, I can vouch for this patch. It should work just fine, since all it's doing is telling MySQL that it should collate things as though they were UTF-8. It doesn't actually do any real character set *conversion*, it just changes the storage format in the MySQL backend to UTF-8.
Still, I'd like some testing of this from somebody who's more familiar with localization than I am. I want to know more about how it behaves with the UTF-8 parameter both on and off.
Attachment #239881 -
Flags: review?
Assignee | ||
Updated•18 years ago
|
Attachment #239881 -
Flags: review? → review?(wurblzap)
Assignee | ||
Comment 18•18 years ago
|
||
Okay, the previous version was destroying multi-byte encodings, but this version works fine.
HOWEVER, we seem to have a problem with MySQL *destroying* ISO-8859-X data, where X != 1. It absolutely shouldn't be doing this, but it is. So I'm looking into it.
I tested it with ISO-8859-7 (Greek) and it ate the characters.
Attachment #239881 -
Attachment is obsolete: true
Attachment #240066 -
Flags: review?(wurblzap)
Attachment #239881 -
Flags: review?(wurblzap)
Assignee | ||
Comment 19•18 years ago
|
||
I've reported my problem to MySQL as: http://bugs.mysql.com/22719
We'll see what they say.
Assignee | ||
Comment 20•18 years ago
|
||
Okay, as far as I can tell, MySQL's response is "MySQL is supposed to eat your data that way."
So, we have a problem:
If people have invalid UTF-8 in their database, it may be physically deleted. :-(
However, without this patch, MySQL definitely can't correctly sort non-English data. It may not even be able to store UTF-8 data correctly, always.
We do have recode.pl going in, from bug 280633.
So, my suggestion is this:
1) We do this, but at this point in checksetup we put a BIG WARNING that you
must now run recode.pl before you continue. Because this is 3.0, I think
it would be acceptable to require this.
2) After people have run recode.pl (We can perhaps keep track of it by a field
in bz_schema), then they can run checksetup again to upgrade their tables.
Of course, I need justdave's approval for this. :-)
Assignee | ||
Comment 21•18 years ago
|
||
Okay, I have a better solution:
1) If the installation has the utf-8 flag on (meaning its new or its all in utf-8), we do the table conversion, and permanently set a column in bz_schema to note that the DB is utf-8. However, before we do the conversion we print a big warning that it will destroy data and that people should run recode.pl if they ever had any non-utf8 data in their DB.
2) Old DBs continue running just the same until they turn on utf8.
3) When people turn on the utf8 param, we note that they should run checksetup.pl to convert their database.
Assignee | ||
Comment 22•18 years ago
|
||
Note that the only problem with the above solution is that we have to maintain the utf8-conversion code, and it will have to work on all future versions of Bugzilla. Right now, it only has to work once, during the 3.0 upgrade.
But the above solution is safer. Perhaps we can do it this way for 3.0, and then for 3.2 we can make it mandatory, and switch on the utf8 parameter always.
Assignee | ||
Comment 23•18 years ago
|
||
Okay, this version implements what I described above. I'm signing off, myself, on the DB part of the code, so you don't have to review that at all.
However, this patch also touches editparams.cgi and some params-related code, so I need review for that. LpSolit, could you just look at that part of the code? I've tested it all.
Attachment #240066 -
Attachment is obsolete: true
Attachment #241154 -
Flags: review?(LpSolit)
Attachment #240066 -
Flags: review?(wurblzap)
Assignee | ||
Comment 24•18 years ago
|
||
Oh, and justdave allowed in our last meeting to bump the requirement to 4.1, so that makes this OK to be a blocker.
Depends on: bz-recode
Flags: blocking3.0? → blocking3.0+
![]() |
||
Comment 25•18 years ago
|
||
Comment on attachment 241154 [details] [diff] [review]
v3
>Index: Bugzilla/Config/Common.pm
>+sub check_utf8 {
>+ return "You cannot disable UTF-8 support, because your MySQL database"
>+ . " is encoded in UTF-8.";
Don't write the final period. There is already one automatically added. Funny, isn't it? :)
>Index: Bugzilla/DB/Schema.pm
>+sub convert_type {
>+=cut
>+ my ($self, $type) = @_;
You *must* add a newline after =cut, else 011pod.t throws an error.
>Index: template/en/default/admin/params/core.html.tmpl
> "problems. Existing databases should set this to true only after " _
> "the data has been converted from existing legacy character " _
>- "encodings to UTF-8.",
Please make this string UTLRA-BOLD! I just tried turning this parameter on and running checksetup.pl again, and all my ISO-8859-1 data has been deleted. :( So please make it clear here that non-UTF-8 will be deleted.
I was going to r+ it, with my comments above fixed, but justdave just told me on IRC that the correct behavior was to reject the conversion and force the user to run recode.pl first. So that's the reason of my r-.
<justdave> mkanat was talking about that the other day...
<justdave> I thought we had decided to force users to run the recode script first or something
<justdave> in other words, yes, it's supposed to reject the conversion if there's non-utf data still present, and tell you to run recode.pl
<justdave> perhaps he hasn't redone the patch yet since we had that conversation
Attachment #241154 -
Flags: review?(LpSolit) → review-
Assignee | ||
Comment 26•18 years ago
|
||
(In reply to comment #25)
> <justdave> in other words, yes, it's supposed to reject the conversion if
> there's non-utf data still present, and tell you to run recode.pl
> <justdave> perhaps he hasn't redone the patch yet since we had that
> conversation
No, I have redone the patch. It's just that now it only does the conversion if you've turned on the "utf8" parameter.
I can't tell if there's any ISO-8859-X data in the database without going through all of the database in a loop.
I don't want to *always* stop checksetup.pl at this point.
I just assume that if somebody has the "utf8" parameter on, their data is in utf8.
checksetup printed out a large warning to you that it would delete things, when you ran it, and it told you to run recode.pl. Did you see the warning?
Assignee | ||
Comment 27•18 years ago
|
||
Okay, here's a version that fixes your comments.
I agree that it's a dangerous thing for checksetup.pl to do, but because it's 3.0 I think it's reasonable to expect people to have to do something manually in the middle of the installation.
We should definitely mention this in the release notes, though.
Attachment #241154 -
Attachment is obsolete: true
Attachment #241368 -
Flags: review?(LpSolit)
![]() |
||
Comment 28•18 years ago
|
||
Comment on attachment 241368 [details] [diff] [review]
v4
r=LpSolit
Attachment #241368 -
Flags: review?(LpSolit) → review+
![]() |
||
Updated•18 years ago
|
Flags: approval?
Assignee | ||
Comment 29•18 years ago
|
||
*** Bug 355994 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 30•18 years ago
|
||
*** Bug 355995 has been marked as a duplicate of this bug. ***
Updated•18 years ago
|
Flags: approval? → approval+
Comment 31•18 years ago
|
||
leaving this in the approval queue so I continue to see it, but I'm going to sit on this until I play with the recode thing (this week, I promise). I think it'll be a VERY good idea to make people run recode before running this, so recode needs to be a prerequisite (hmm, dependency is already set - let's wait for it)
Flags: approval+ → approval?
Assignee | ||
Comment 32•18 years ago
|
||
*** Bug 330430 has been marked as a duplicate of this bug. ***
Updated•18 years ago
|
Flags: approval? → approval+
Assignee | ||
Comment 33•18 years ago
|
||
Checking in Bugzilla/Constants.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Constants.pm,v <-- Constants.pm
new revision: 1.62; previous revision: 1.61
done
Checking in Bugzilla/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v <-- DB.pm
new revision: 1.91; previous revision: 1.90
done
Checking in Bugzilla/Config/Common.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Config/Common.pm,v <-- Common.pm
new revision: 1.13; previous revision: 1.12
done
Checking in Bugzilla/Config/Core.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Config/Core.pm,v <-- Core.pm
new revision: 1.7; previous revision: 1.6
done
Checking in Bugzilla/DB/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v <-- Mysql.pm
new revision: 1.45; previous revision: 1.44
done
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v <-- Schema.pm
new revision: 1.76; previous revision: 1.75
done
Checking in Bugzilla/DB/Schema/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v <-- Mysql.pm
new revision: 1.14; previous revision: 1.13
done
Checking in template/en/default/admin/params/core.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/admin/params/core.html.tmpl,v <-- core.html.tmpl
new revision: 1.5; previous revision: 1.4
done
Checking in template/en/default/global/messages.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/global/messages.html.tmpl,v <-- messages.html.tmpl
new revision: 1.47; previous revision: 1.46
done
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•18 years ago
|
Whiteboard: [relnote comment 27]
You need to log in
before you can comment on or make changes to this bug.
Description
•