Last Comment Bug 783786 - PostgreSQL databases can be created with the wrong encoding
: PostgreSQL databases can be created with the wrong encoding
Status: RESOLVED FIXED
:
Product: Bugzilla
Classification: Server Software
Component: Installation & Upgrading (show other bugs)
: 4.3.2
: All All
: -- major (vote)
: Bugzilla 4.2
Assigned To: Frédéric Buclin
: default-qa
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-18 08:46 PDT by Frédéric Buclin
Modified: 2012-11-03 12:02 PDT (History)
2 users (show)
LpSolit: approval+
LpSolit: blocking4.4+
LpSolit: approval4.2+
LpSolit: blocking4.2.3+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
patch, v1 (825 bytes, patch)
2012-08-18 08:49 PDT, Frédéric Buclin
dkl: review+
Details | Diff | Splinter Review

Description Frédéric Buclin 2012-08-18 08:46:57 PDT
While reviewing bug 187753, I wondered why PostgreSQL was crashing when inserting what was supposed to be short-enough strings. To make sure the patch wasn't the culprit, I tried to create a new product with a 64 characters long name, which is the max allowed for product names. PostgreSQL failed with:

DBD::Pg::db do failed: ERROR:  value too long for type character varying(64) [for Statement "INSERT INTO products(...)]

When looking closer, I realized that all my DBs have the wrong encoding:

                            List of Databases
 DB name   | Owner  | Encoding  | Collation | Char Type |   Access
-----------+--------+-----------+--- -------+-----------+-----------
 bugs_36   | bugs   | SQL_ASCII | C         | C         | 
 bugs_cvs  | bugs   | SQL_ASCII | C         | C         | 


Per http://www.postgresql.org/docs/9.1/static/app-initdb.html, about the encoding:

"The default is derived from the locale, or SQL_ASCII if that does not work."

So for some reason, the locale couldn't be determine, and Pg falled back to SQL_ASCII. But http://www.postgresql.org/docs/9.1/static/sql-createdatabase.html says:

"CREATE DATABASE will allow superusers to specify SQL_ASCII encoding regardless of the locale settings, but this choice is deprecated and may result in misbehavior of character-string functions"

This is exactly what happened to me:

bugs_cvs=> select quipid, length(quip), quip from quips where length(quip) < 50;

 quipid | length |         quip         
--------+--------+----------------------
      5 |     16 | moi je suis mini
      6 |     21 | moi aussi, très mini
      8 |      6 | ààà

Here, only the first length is correct, the other two are totally wrong due to letters outside the US-ASCII range.

To fix this problem, databases must be created with ENCODING = 'UTF8':

                            List of Databases
 DB name   | Owner  | Encoding  | Collation | Char Type |   Access
-----------+--------+-----------+--- -------+-----------+-----------
 bugs_36   | bugs   | SQL_ASCII | C         | C         | 
 bugs_40   | bugs   | UTF8      | C         | C         |
 bugs_cvs  | bugs   | SQL_ASCII | C         | C         | 

And in that case, string functions are working correctly again:

bugs_40=> select length(name), char_length(name), name from products;
 length | char_length |                               name                               
--------+-------------+------------------------------------------------------------------
     11 |          11 | TestProduct
     64 |          64 | àààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààà
     16 |          16 | télèphöne!@#¼¼½€


http://www.postgresql.org/docs/9.1/static/multibyte.html says that SQL_ASCII stores 1 byte per character, which is why it sees e.g. "à" as being 2 characters, as it needs two bytes to store it.

So we shouldn't let checksetup.pl assume that the default encoding is UTF8 as this causes a lot of trouble.
Comment 1 Frédéric Buclin 2012-08-18 08:49:51 PDT
Created attachment 653077 [details] [diff] [review]
patch, v1

Till now, Pg was using the default "CREATE DATABASE $name" with no arguments.
Comment 2 David Lawrence [:dkl] 2012-08-20 13:13:21 PDT
Comment on attachment 653077 [details] [diff] [review]
patch, v1

Review of attachment 653077 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Do not have PostgreSQL installed myself, so if you have tested it, r=dkl
Comment 3 Frédéric Buclin 2012-08-20 15:44:38 PDT
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/DB/Schema/Pg.pm
Committed revision 8355.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified Bugzilla/DB/Schema/Pg.pm
Committed revision 8121.
Comment 4 Frédéric Buclin 2012-11-03 10:55:00 PDT
I forgot to add it to the release notes for 4.2.4. This is now done:

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified template/en/default/pages/release-notes.html.tmpl
Committed revision 8163.
Comment 5 Frédéric Buclin 2012-11-03 12:02:54 PDT
Actually, it was already in the relnotes for 4.2.3. I simply forgot to remove the keyword. I reverted this last checkin.

Note You need to log in before you can comment on or make changes to this bug.