Closed Bug 948344 Opened 11 years ago Closed 8 years ago

Non-UTF8 database error: Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: pancallo, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release) Build ID: 20131112160018 Steps to reproduce: I have setup socorro like this : http://planeshift.top-ix.org/pswiki/index.php?title=Socorro_Installation Actual results: 2013-12-10 09:50:01,510 INFO - MainThread - LUCADEBUG About to call update_tcbs_build [datetime.date(2013, 12, 7)] 2013-12-10 09:50:01,609 DEBUG - MainThread - error when running <class 'socorro.cron.jobs.matviews.TCBSBuildCronApp'> on None Traceback (most recent call last): File "/data/socorro/application/socorro/cron/crontabber.py", line 703, in _run_one for last_success in self._run_job(job_class, config, info): File "/data/socorro/application/socorro/cron/base.py", line 174, in main function(when) File "/data/socorro/application/socorro/cron/base.py", line 213, in _run_proxy self.run(connection, date) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 52, in run self.run_proc(connection, [target_date]) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 22, in run_proc cursor.callproc(self.get_proc_name(), signature) DataError: invalid input syntax for type json DETAIL: Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding CONTEXT: JSON data, line 1: ..."20131117000000", "Version": "0.6.0", "Renderer":... SQL statement "INSERT INTO tcbs_build ( signature_id, build_date, report_date, product_version_id, process_type, release_channel, report_count, win_count, mac_count, lin_count, hang_count, startup_count , is_gc_count ) WITH raw_crash_filtered AS ( SELECT uuid , json_object_field_text(r.raw_crash, 'IsGarbageCollecting') as is_garbage_collecting FROM raw_crashes r WHERE date_processed::date = updateday ) SELECT signature_id, build_date(build), updateday, product_version_id, process_type, release_channel, count(*), sum(case when os_name = 'Windows' THEN 1 else 0 END), sum(case when os_name = 'Mac OS X' THEN 1 else 0 END), sum(case when os_name = 'Linux' THEN 1 else 0 END), count(hang_id), sum(case when uptime < INTERVAL '1 minute' THEN 1 else 0 END) , sum(CASE WHEN r.is_garbage_collecting = '1' THEN 1 ELSE 0 END) as gc_count FROM reports_clean JOIN product_versions USING (product_version_id) JOIN products USING ( product_name ) LEFT JOIN raw_crash_filtered r ON r.uuid::text = reports_clean.uuid WHERE utc_day_is(date_processed, updateday) AND tstz_between(date_processed, build_date, sunset_date) -- 7 days of builds only --AND updateday <= ( build_date(build) + 6 ) -- only nightly, aurora, and rapid beta AND reports_clean.release_channel IN ( 'nightly','aurora','release' ) AND version_matches_channel(version_string, release_channel) GROUP BY signature_id, build_date(build), product_version_id, process_type, release_channel" PL/pgSQL function update_tcbs_build(date,boolean,interval) line 32 at SQL statement Expected results: No error! :)
This is what I did to convert the database to UTF8. It worked so far. You can see the encoding with this command: > psql -l If you see SQL_ASCII then it's wrong. You then have to convert your 'breakpad' database to UTF-8 (others can stay ascii): > /etc/init.d/supervisor stop > su - postgres > pg_dump breakpad -U planeshift > db.sql > psql # ALTER DATABASE breakpad RENAME TO breakpad_ascii # CREATE DATABASE breakpad WITH OWNER = breakpad_rw TEMPLATE = template0 ENCODING = 'UTF-8'; # GRANT ALL ON DATABASE breakpad to socorro; # GRANT ALL ON DATABASE breakpad to breakpad_rw; (unsure about this one, but I did it anyway) # GRANT ALL ON DATABASE breakpad to planeshift; (unsure about this one, but I did it anyway) > iconv --from-code ISO8859-1 --to-code UTF8 db.sql > db_utf8.sql Edit db_utf8.sql and change this line to UTF-8: SET client_encoding = 'UTF-8'; > psql -U planeshift -d breakpad # \i db_utf8.sql > /etc/init.d/supervisor start
(In reply to pancallo from comment #1) > This is what I did to convert the database to UTF8. It worked so far. Thanks so much for the detailed fix. Sorry about the issue with UTF8. I will update our docs to specify "-E UTF8" for all database creation.
Assignee: nobody → sdeckelmann
Summary: Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding → Non-UTF8 database error: Unicode escape for code points higher than U+007F not permitted in non-UTF8 encoding
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee: sdeckelmann → nobody
Status: ASSIGNED → NEW
We're no longer doing Top Crashers with SQL.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.