Closed Bug 783786 Opened 12 years ago Closed 12 years ago

PostgreSQL databases can be created with the wrong encoding

Categories

(Bugzilla :: Installation & Upgrading, defect)

4.3.2
defect
Not set
major

Tracking

()

RESOLVED FIXED
Bugzilla 4.2

People

(Reporter: LpSolit, Assigned: LpSolit)

Details

Attachments

(1 file)

While reviewing bug 187753, I wondered why PostgreSQL was crashing when inserting what was supposed to be short-enough strings. To make sure the patch wasn't the culprit, I tried to create a new product with a 64 characters long name, which is the max allowed for product names. PostgreSQL failed with:

DBD::Pg::db do failed: ERROR:  value too long for type character varying(64) [for Statement "INSERT INTO products(...)]

When looking closer, I realized that all my DBs have the wrong encoding:

                            List of Databases
 DB name   | Owner  | Encoding  | Collation | Char Type |   Access
-----------+--------+-----------+--- -------+-----------+-----------
 bugs_36   | bugs   | SQL_ASCII | C         | C         | 
 bugs_cvs  | bugs   | SQL_ASCII | C         | C         | 


Per http://www.postgresql.org/docs/9.1/static/app-initdb.html, about the encoding:

"The default is derived from the locale, or SQL_ASCII if that does not work."

So for some reason, the locale couldn't be determine, and Pg falled back to SQL_ASCII. But http://www.postgresql.org/docs/9.1/static/sql-createdatabase.html says:

"CREATE DATABASE will allow superusers to specify SQL_ASCII encoding regardless of the locale settings, but this choice is deprecated and may result in misbehavior of character-string functions"

This is exactly what happened to me:

bugs_cvs=> select quipid, length(quip), quip from quips where length(quip) < 50;

 quipid | length |         quip         
--------+--------+----------------------
      5 |     16 | moi je suis mini
      6 |     21 | moi aussi, très mini
      8 |      6 | ààà

Here, only the first length is correct, the other two are totally wrong due to letters outside the US-ASCII range.

To fix this problem, databases must be created with ENCODING = 'UTF8':

                            List of Databases
 DB name   | Owner  | Encoding  | Collation | Char Type |   Access
-----------+--------+-----------+--- -------+-----------+-----------
 bugs_36   | bugs   | SQL_ASCII | C         | C         | 
 bugs_40   | bugs   | UTF8      | C         | C         |
 bugs_cvs  | bugs   | SQL_ASCII | C         | C         | 

And in that case, string functions are working correctly again:

bugs_40=> select length(name), char_length(name), name from products;
 length | char_length |                               name                               
--------+-------------+------------------------------------------------------------------
     11 |          11 | TestProduct
     64 |          64 | àààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààà
     16 |          16 | télèphöne!@#¼¼½€


http://www.postgresql.org/docs/9.1/static/multibyte.html says that SQL_ASCII stores 1 byte per character, which is why it sees e.g. "à" as being 2 characters, as it needs two bytes to store it.

So we shouldn't let checksetup.pl assume that the default encoding is UTF8 as this causes a lot of trouble.
Flags: blocking4.4+
Attached patch patch, v1Splinter Review
Till now, Pg was using the default "CREATE DATABASE $name" with no arguments.
Attachment #653077 - Flags: review?(dkl)
Flags: blocking4.2.3+
Keywords: relnote
Target Milestone: Bugzilla 4.4 → Bugzilla 4.2
Comment on attachment 653077 [details] [diff] [review]
patch, v1

Review of attachment 653077 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Do not have PostgreSQL installed myself, so if you have tested it, r=dkl
Attachment #653077 - Flags: review?(dkl) → review+
Flags: approval4.2+
Flags: approval+
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/DB/Schema/Pg.pm
Committed revision 8355.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified Bugzilla/DB/Schema/Pg.pm
Committed revision 8121.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
I forgot to add it to the release notes for 4.2.4. This is now done:

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified template/en/default/pages/release-notes.html.tmpl
Committed revision 8163.
Keywords: relnote
Actually, it was already in the relnotes for 4.2.3. I simply forgot to remove the keyword. I reverted this last checkin.
You need to log in before you can comment on or make changes to this bug.