Closed
Bug 783786
Opened 12 years ago
Closed 12 years ago
PostgreSQL databases can be created with the wrong encoding
Categories
(Bugzilla :: Installation & Upgrading, defect)
Tracking
()
RESOLVED
FIXED
Bugzilla 4.2
People
(Reporter: LpSolit, Assigned: LpSolit)
Details
Attachments
(1 file)
|
825 bytes,
patch
|
dkl
:
review+
|
Details | Diff | Splinter Review |
While reviewing bug 187753, I wondered why PostgreSQL was crashing when inserting what was supposed to be short-enough strings. To make sure the patch wasn't the culprit, I tried to create a new product with a 64 characters long name, which is the max allowed for product names. PostgreSQL failed with:
DBD::Pg::db do failed: ERROR: value too long for type character varying(64) [for Statement "INSERT INTO products(...)]
When looking closer, I realized that all my DBs have the wrong encoding:
List of Databases
DB name | Owner | Encoding | Collation | Char Type | Access
-----------+--------+-----------+--- -------+-----------+-----------
bugs_36 | bugs | SQL_ASCII | C | C |
bugs_cvs | bugs | SQL_ASCII | C | C |
Per http://www.postgresql.org/docs/9.1/static/app-initdb.html, about the encoding:
"The default is derived from the locale, or SQL_ASCII if that does not work."
So for some reason, the locale couldn't be determine, and Pg falled back to SQL_ASCII. But http://www.postgresql.org/docs/9.1/static/sql-createdatabase.html says:
"CREATE DATABASE will allow superusers to specify SQL_ASCII encoding regardless of the locale settings, but this choice is deprecated and may result in misbehavior of character-string functions"
This is exactly what happened to me:
bugs_cvs=> select quipid, length(quip), quip from quips where length(quip) < 50;
quipid | length | quip
--------+--------+----------------------
5 | 16 | moi je suis mini
6 | 21 | moi aussi, très mini
8 | 6 | ààà
Here, only the first length is correct, the other two are totally wrong due to letters outside the US-ASCII range.
To fix this problem, databases must be created with ENCODING = 'UTF8':
List of Databases
DB name | Owner | Encoding | Collation | Char Type | Access
-----------+--------+-----------+--- -------+-----------+-----------
bugs_36 | bugs | SQL_ASCII | C | C |
bugs_40 | bugs | UTF8 | C | C |
bugs_cvs | bugs | SQL_ASCII | C | C |
And in that case, string functions are working correctly again:
bugs_40=> select length(name), char_length(name), name from products;
length | char_length | name
--------+-------------+------------------------------------------------------------------
11 | 11 | TestProduct
64 | 64 | àààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààà
16 | 16 | télèphöne!@#¼¼½€
http://www.postgresql.org/docs/9.1/static/multibyte.html says that SQL_ASCII stores 1 byte per character, which is why it sees e.g. "à" as being 2 characters, as it needs two bytes to store it.
So we shouldn't let checksetup.pl assume that the default encoding is UTF8 as this causes a lot of trouble.
Flags: blocking4.4+
| Assignee | ||
Comment 1•12 years ago
|
||
Till now, Pg was using the default "CREATE DATABASE $name" with no arguments.
Attachment #653077 -
Flags: review?(dkl)
| Assignee | ||
Updated•12 years ago
|
Comment 2•12 years ago
|
||
Comment on attachment 653077 [details] [diff] [review]
patch, v1
Review of attachment 653077 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good. Do not have PostgreSQL installed myself, so if you have tested it, r=dkl
Attachment #653077 -
Flags: review?(dkl) → review+
| Assignee | ||
Updated•12 years ago
|
Flags: approval4.2+
Flags: approval+
| Assignee | ||
Comment 3•12 years ago
|
||
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/DB/Schema/Pg.pm
Committed revision 8355.
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified Bugzilla/DB/Schema/Pg.pm
Committed revision 8121.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 4•12 years ago
|
||
I forgot to add it to the release notes for 4.2.4. This is now done:
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified template/en/default/pages/release-notes.html.tmpl
Committed revision 8163.
Keywords: relnote
| Assignee | ||
Comment 5•12 years ago
|
||
Actually, it was already in the relnotes for 4.2.3. I simply forgot to remove the keyword. I reverted this last checkin.
You need to log in
before you can comment on or make changes to this bug.
Description
•