Open Bug 784371 Opened 12 years ago Updated 12 years ago

If the default encoding of the DB is not UTF8, new DB tables are created first and later converted to UTF8

Categories

(Bugzilla :: Installation & Upgrading, defect, P4)

Tracking

()

People

(Reporter: gerv, Unassigned)

Details

Scenario: a 3.6 installation. The "you need to upgrade everything to UTF-8" message was seen some time ago, and dealt with. checksetup.pl runs cleanly. Now, upgrade to 4.2. You get this from checksetup.pl: Adding new table audit_log... Adding new table bug_tag... Adding new table field_visibility... Adding new table tag... WARNING: We are about to convert your table storage format to UTF-8. This allows Bugzilla to correctly store and sort international characters. However, if you have any non-UTF-8 data in your database, it ***WILL BE DELETED*** by this process. So, before you continue with checksetup.pl, if you have any non-UTF-8 data (or even if you're not sure) you should press Ctrl-C now to interrupt checksetup.pl, and run contrib/recode.pl to make all the data in your database into UTF-8. You should also back up your database before continuing. This will affect every single table in the database, even non-Bugzilla tables. If you ever used a version of Bugzilla before 2.22, we STRONGLY recommend that you stop checksetup.pl NOW and run contrib/recode.pl. Press Enter to continue or Ctrl-C to exit... Converting table storage format to UTF-8. This may take a while. audit_log.class needs to be converted to UTF-8... audit_log.field needs to be converted to UTF-8... audit_log.removed needs to be converted to UTF-8... audit_log.added needs to be converted to UTF-8... Converting the audit_log table to UTF-8... tag.name needs to be converted to UTF-8... Converting the tag table to UTF-8... Removing existing compiled templates... It looks like it is creating new tables not-as-UTF-8, and then that's triggering the UTF-8 conversion process on them. This is a bit perverse. Bugzilla should create new tables as UTF-8. Gerv
Wow, I never saw this scenario. Is it reproducible? And which DB server do you use? MySQL?
It's definitely reproducible; I am happy to help debug. I am using MySQL, the version which comes with Ubuntu 12.04. I am guessing there is some DB-wide "default charset" setting which is not UTF-8 and should be. Please let me know what info you need to debug this and I will get it. Gerv
in mysql the default charset for new tables is governed at either the database or server level. iirc both default to latin1 not utf8. we should probably set a database's default charset if it isn't UTF8 before we start creating tables.. ALTER DATABASE bugs DEFAULT CHARACTER SET utf8; to determine the current default: SHOW VARIABLES LIKE 'character_set_database'; to set it at the server level, add the following lines to your [mysqld] section of my.cnf: character_set_server=utf8 collation_server=utf8_unicode_ci
(In reply to Byron Jones ‹:glob› from comment #3) > we should probably set a database's default charset if it isn't UTF8 before > we start creating tables.. > > ALTER DATABASE bugs DEFAULT CHARACTER SET utf8; That's already what we are doing, in Bugzilla::DB::Mysql: sub bz_setup_database { ... if ( !$self->bz_db_is_utf8 && !@tables && (Bugzilla->params->{'utf8'} || !scalar keys %{Bugzilla->params}) ) { $self->_alter_db_charset_to_utf8(); } } sub _alter_db_charset_to_utf8 { my $self = shift; my $db_name = Bugzilla->localconfig->{db_name}; $self->do("ALTER DATABASE $db_name CHARACTER SET utf8"); } So we do it only if there are no tables at all in the DB yet. If you already have tables, then you must convert them to UTF8 first. This job is done by contrib/recode.pl. So the right place to fix the encoding is at the end of this script.
OS: Linux → All
Hardware: x86 → All
Summary: Bugzilla does not create new tables as UTF-8 even if it's already converted the old ones → On MySQL, recode.pl should set the default encoding of the database to UTF8 once all tables are converted
Target Milestone: --- → Bugzilla 4.2
I see this problem with new databases which have never been pre-2.22, so I don't think this is a full diagnosis of the problem. It could be that I am using the software in an unsupported way; if so, perhaps we should consider supporting my use case. I often roll back databases to a certain state, e.g. for testing. I do this by doing a mysqldump of the current state, doing the testing and perhaps messing up the DB, and then doing: mysql> drop database bugs_foo; mysql> create database bugs_foo; # mysql -u root -p bugs_foo < backup.db Now, it could be that when I create the database from the MySQL command line, the wrong charset is set for the database default, and backup.db does not contain commands which correct it. But I'm not sure how else to restore from backup. If I run checksetup.pl after dropping the database, it will recreate the database - but also a load of empty tables, and then the import won't work. (I assume.) This problem could perhaps be fixed by calling _alter_db_charset_to_utf8 early in checksetup.pl (before creating any new tables) if we find that all the existing tables are UTF-8 but the database itself is not. Gerv
I'm marking this bug as blocker to fix recode.pl to correctly change the charset of the DB to UTF8.
Flags: blocking4.4+
Flags: blocking4.2.3+
I investigated a bit more, and this bug is definitely not a blocker. The scenario is the following, with the utf8 parameter turned on: 1) If the DB encoding is not UTF8 and there are no tables yet, first set the default encoding of the DB to UTF8, then create all tables. This scenario is fine as all tables are automatically created with the UTF8 encoding. 2) If the DB encoding is not UTF8 and there are tables already, first update the DB schema and create missing tables (with the current default encoding of the DB), and *then* convert non-UTF8 tables to UTF8. This is the scenario gerv is talking about here. In order to alter the encoding of a DB table to UTF8, you must first remove indexes and foreign keys related to this table. Bugzilla uses what bz_schema contains as data, and so converting the old schema or the new one shouldn't matter as long as bz_schema exists in the old DB, which is true only if the old DB is from 2.20 or newer. Bugzilla 2.18 and older had this information hardcoded in checksetup.pl, which is totally impossible to parse to extract required information. So we would be unable to convert such old DB without breaking everything. This explains why the DB schema is updated first, to make sure bz_schema is populated. It is by far too risky to change this code for the reasons I give above. With the current scenario, all your tables will be correctly converted to UTF8, and the default encoding of the DB will be set to UTF8 too, so the next time you run checksetup.pl, you won't get this message again. So nothing is broken currently; this is just a minor annoyance when importing stuff using mysqldump. FYI, http://dev.mysql.com/doc/refman/5.5/en/mysqldump.html#option_mysqldump_default-character-set states that: --default-character-set=charset_name Use charset_name as the default character set. If no character set is specified, mysqldump uses utf8, and earlier versions use latin1. So you should be fine by default.
Severity: normal → minor
Flags: blocking4.4+
Flags: blocking4.2.3+
Priority: -- → P4
Summary: On MySQL, recode.pl should set the default encoding of the database to UTF8 once all tables are converted → If the default encoding of the DB is not UTF8, new DB tables are created first and later converted to UTF8
Target Milestone: Bugzilla 4.2 → ---
(In reply to Frédéric Buclin from comment #7) > It is by far too risky to change this code for the reasons I give above. To make sure I understand correctly: this risk is that if you are upgrading from 2.18 or earlier, there is nothing in the bz_schema table, and so things will break horribly? If so, surely that's easy to detect. If bz_schema is present and populated, and if all tables individually are UTF-8, but the database is not, then surely it's perfectly safe to switch the database to UTF-8 before creating any new tables? > next time you run checksetup.pl, you won't get this message again. So > nothing is broken currently; this is just a minor annoyance when importing > stuff using mysqldump. It's useful to know that there is no harm from seeing this message. It still takes 20-30 seconds, though, which is a pain :-) Gerv
You need to log in before you can comment on or make changes to this bug.