Closed Bug 783786 Opened 12 years ago Closed 12 years ago

PostgreSQL databases can be created with the wrong encoding

Categories

(Bugzilla :: Installation & Upgrading, defect)

4.3.2
defect
Not set
major

Tracking

()

RESOLVED FIXED
Bugzilla 4.2

People

(Reporter: LpSolit, Assigned: LpSolit)

Details

Attachments

(1 file)

While reviewing bug 187753, I wondered why PostgreSQL was crashing when inserting what was supposed to be short-enough strings. To make sure the patch wasn't the culprit, I tried to create a new product with a 64 characters long name, which is the max allowed for product names. PostgreSQL failed with: DBD::Pg::db do failed: ERROR: value too long for type character varying(64) [for Statement "INSERT INTO products(...)] When looking closer, I realized that all my DBs have the wrong encoding: List of Databases DB name | Owner | Encoding | Collation | Char Type | Access -----------+--------+-----------+--- -------+-----------+----------- bugs_36 | bugs | SQL_ASCII | C | C | bugs_cvs | bugs | SQL_ASCII | C | C | Per http://www.postgresql.org/docs/9.1/static/app-initdb.html, about the encoding: "The default is derived from the locale, or SQL_ASCII if that does not work." So for some reason, the locale couldn't be determine, and Pg falled back to SQL_ASCII. But http://www.postgresql.org/docs/9.1/static/sql-createdatabase.html says: "CREATE DATABASE will allow superusers to specify SQL_ASCII encoding regardless of the locale settings, but this choice is deprecated and may result in misbehavior of character-string functions" This is exactly what happened to me: bugs_cvs=> select quipid, length(quip), quip from quips where length(quip) < 50; quipid | length | quip --------+--------+---------------------- 5 | 16 | moi je suis mini 6 | 21 | moi aussi, très mini 8 | 6 | ààà Here, only the first length is correct, the other two are totally wrong due to letters outside the US-ASCII range. To fix this problem, databases must be created with ENCODING = 'UTF8': List of Databases DB name | Owner | Encoding | Collation | Char Type | Access -----------+--------+-----------+--- -------+-----------+----------- bugs_36 | bugs | SQL_ASCII | C | C | bugs_40 | bugs | UTF8 | C | C | bugs_cvs | bugs | SQL_ASCII | C | C | And in that case, string functions are working correctly again: bugs_40=> select length(name), char_length(name), name from products; length | char_length | name --------+-------------+------------------------------------------------------------------ 11 | 11 | TestProduct 64 | 64 | àààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààà 16 | 16 | télèphöne!@#¼¼½€ http://www.postgresql.org/docs/9.1/static/multibyte.html says that SQL_ASCII stores 1 byte per character, which is why it sees e.g. "à" as being 2 characters, as it needs two bytes to store it. So we shouldn't let checksetup.pl assume that the default encoding is UTF8 as this causes a lot of trouble.
Flags: blocking4.4+
Attached patch patch, v1Splinter Review
Till now, Pg was using the default "CREATE DATABASE $name" with no arguments.
Attachment #653077 - Flags: review?(dkl)
Flags: blocking4.2.3+
Keywords: relnote
Target Milestone: Bugzilla 4.4 → Bugzilla 4.2
Comment on attachment 653077 [details] [diff] [review] patch, v1 Review of attachment 653077 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. Do not have PostgreSQL installed myself, so if you have tested it, r=dkl
Attachment #653077 - Flags: review?(dkl) → review+
Flags: approval4.2+
Flags: approval+
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/ modified Bugzilla/DB/Schema/Pg.pm Committed revision 8355. Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/ modified Bugzilla/DB/Schema/Pg.pm Committed revision 8121.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
I forgot to add it to the release notes for 4.2.4. This is now done: Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/ modified template/en/default/pages/release-notes.html.tmpl Committed revision 8163.
Keywords: relnote
Actually, it was already in the relnotes for 4.2.3. I simply forgot to remove the keyword. I reverted this last checkin.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: