Closed Bug 948344 Opened 11 years ago Closed 8 years ago

Non-UTF8 database error: Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: pancallo, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release)
Build ID: 20131112160018

Steps to reproduce:

I have setup socorro like this :

http://planeshift.top-ix.org/pswiki/index.php?title=Socorro_Installation



Actual results:

2013-12-10 09:50:01,510 INFO - MainThread - LUCADEBUG About to call update_tcbs_build [datetime.date(2013, 12, 7)]
2013-12-10 09:50:01,609 DEBUG - MainThread - error when running <class 'socorro.cron.jobs.matviews.TCBSBuildCronApp'> on None
Traceback (most recent call last):
  File "/data/socorro/application/socorro/cron/crontabber.py", line 703, in _run_one
    for last_success in self._run_job(job_class, config, info):
  File "/data/socorro/application/socorro/cron/base.py", line 174, in main
    function(when)
  File "/data/socorro/application/socorro/cron/base.py", line 213, in _run_proxy
    self.run(connection, date)
  File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 52, in run
    self.run_proc(connection, [target_date])
  File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 22, in run_proc
    cursor.callproc(self.get_proc_name(), signature)
DataError: invalid input syntax for type json
DETAIL:  Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding
CONTEXT:  JSON data, line 1: ..."20131117000000", "Version": "0.6.0", "Renderer":...
SQL statement "INSERT INTO tcbs_build (
        signature_id, build_date,
        report_date, product_version_id,
        process_type, release_channel,
        report_count, win_count, mac_count, lin_count, hang_count,
        startup_count
        , is_gc_count
)
WITH raw_crash_filtered AS (
    SELECT
          uuid
        , json_object_field_text(r.raw_crash, 'IsGarbageCollecting') as is_garbage_collecting
    FROM
        raw_crashes r
    WHERE
        date_processed::date = updateday
)
SELECT signature_id, build_date(build),
        updateday, product_version_id,
        process_type, release_channel,
        count(*),
        sum(case when os_name = 'Windows' THEN 1 else 0 END),
        sum(case when os_name = 'Mac OS X' THEN 1 else 0 END),
        sum(case when os_name = 'Linux' THEN 1 else 0 END),
    count(hang_id),
    sum(case when uptime < INTERVAL '1 minute' THEN 1 else 0 END)
    , sum(CASE WHEN r.is_garbage_collecting = '1' THEN 1 ELSE 0 END) as gc_count
FROM reports_clean
        JOIN product_versions USING (product_version_id)
        JOIN products USING ( product_name )
    LEFT JOIN raw_crash_filtered r ON r.uuid::text = reports_clean.uuid
WHERE utc_day_is(date_processed, updateday)
                AND tstz_between(date_processed, build_date, sunset_date)
        -- 7 days of builds only
        --AND updateday <= ( build_date(build) + 6 )
        -- only nightly, aurora, and rapid beta
        AND reports_clean.release_channel IN ( 'nightly','aurora','release' )
        AND version_matches_channel(version_string, release_channel)
GROUP BY signature_id, build_date(build), product_version_id,
        process_type, release_channel"
PL/pgSQL function update_tcbs_build(date,boolean,interval) line 32 at SQL statement



Expected results:

No error! :)
This is what I did to convert the database to UTF8. It worked so far.

You can see the encoding with this command:

  > psql -l

If you see SQL_ASCII then it's wrong.

You then have to convert your 'breakpad' database to UTF-8 (others can stay ascii):

 > /etc/init.d/supervisor stop
 > su - postgres
 > pg_dump breakpad -U planeshift > db.sql
 > psql

 # ALTER DATABASE breakpad RENAME TO breakpad_ascii
 # CREATE DATABASE breakpad WITH OWNER = breakpad_rw TEMPLATE = template0 ENCODING = 'UTF-8';
 # GRANT ALL ON DATABASE breakpad to socorro;
 # GRANT ALL ON DATABASE breakpad to breakpad_rw; (unsure about this one, but I did it anyway)
 # GRANT ALL ON DATABASE breakpad to planeshift; (unsure about this one, but I did it anyway)
 
 > iconv --from-code ISO8859-1 --to-code UTF8 db.sql > db_utf8.sql

 Edit db_utf8.sql and change this line to UTF-8:
 SET client_encoding = 'UTF-8';

 > psql -U planeshift -d breakpad
 # \i db_utf8.sql

 >  /etc/init.d/supervisor start
(In reply to pancallo from comment #1)
> This is what I did to convert the database to UTF8. It worked so far.

Thanks so much for the detailed fix. 

Sorry about the issue with UTF8. I will update our docs to specify "-E UTF8" for all database creation.
Assignee: nobody → sdeckelmann
Summary: Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding → Non-UTF8 database error: Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee: sdeckelmann → nobody
Status: ASSIGNED → NEW
We're no longer doing Top Crashers with SQL.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.