Invalid UTF8 data being inserted into JSON in Postgres, raw_crashes in particular is affected

RESOLVED FIXED

Status

Socorro
Backend
RESOLVED FIXED
5 years ago
9 months ago

People

(Reporter: Robert Kaiser, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

5 years ago
For my custom report on Android devices and their crashes, I'm trying to get that data out of the raw_crashes table now instead of doing weird app_notes parsing experiments. But on some crashes, I run into issues:

breakpad=> SELECT raw_crash->>'Android_Manufacturer' as manufacturer FROM raw_crashes WHERE uuid='5e8d5e28-f5b0-444d-a619-a1ed32130724';
ERROR:  invalid byte sequence for encoding "UTF8": 0xef


but this one works (giving an awfully huge JSON output with all the logcat):

breakpad=> SELECT raw_crash FROM raw_crashes WHERE uuid='5e8d5e28-f5b0-444d-a619-a1ed32130724';

and I can parse that huge JSON fine in PHP for my custom reports - Would be nicer to have the DB only give me the values I want, though.


Also, if that's not an error in how I'm using this field, then it might even be an error in what we store in the DB and we might run into that when starting to use this table more.
This issue is an upstream JSON library issue. I'm poking the maintainer of the library to fix it.
(Reporter)

Comment 2

5 years ago
This still happens:

[rkaiser@crashanalysis.dmz.phx1 ~]$ psql -h socorro-reporting1.db.phx1.mozilla.com -p 6432 -U crashkill breakpad
Password for user crashkill: 
psql (9.0.14, server 9.2.4)
WARNING: psql version 9.0, server version 9.2.
         Some psql features might not work.
Type "help" for help.

breakpad=> SELECT raw_crash->>'Android_Manufacturer' as manufacturer FROM raw_crashes WHERE uuid='5e8d5e28-f5b0-444d-a619-a1ed32130724';
ERROR:  invalid byte sequence for encoding "UTF8": 0xef
So, that encoding sequence is invalid. To fix the problem we need to ensure that Python does not try to insert invalid UTF8.
Assignee: sdeckelmann → nobody
Component: Database → Backend
Summary: Using JSON operators on raw_crashes gives encoding errors → Invalid UTF8 data being inserted into JSON in Postgres, raw_crashes in particular is affected
BTW, the query from comment 2 seems to WFM right now on stage and production:

$ psql -U $databaseUserName -h $databaseHost $databaseName
psql (9.2.4, server 9.2.6)
Type "help" for help.

breakpad=# SELECT raw_crash->>'Android_Manufacturer' as manufacturer FROM raw_crashes WHERE uuid='5e8d5e28-f5b0-444d-a619-a1ed32130724';
 manufacturer 
--------------
 samsung
(1 row)

breakpad=#
(Reporter)

Comment 5

4 years ago
OK, we still have invalid crashes in there, though. I just tried this one trying to find GMP crashes in the light of not having bug 1060083 yet:

breakpad=> SELECT uuid FROM raw_crashes WHERE raw_crash->>'GMPPlugin'='1';
ERROR:  invalid byte sequence for encoding "UTF8": 0xe6

I of course have no clue which report it tripped over there.
(Reporter)

Comment 6

3 years ago
I was told that bug 1060113 would resolve this but it didn't:

[rkaiser@crashanalysis.dmz.phx1 ~]$ psql -h socorro-reporting1... -p XXXX -U crashkill breakpad
Password for user crashkill: 
psql (9.2.9, server 9.2.6)
Type "help" for help.

breakpad=> SELECT uuid FROM raw_crashes WHERE raw_crash->>'GMPPlugin'='1';
ERROR:  invalid byte sequence for encoding "UTF8": 0xc3
(Reporter)

Comment 7

3 years ago
After the recent server migration I get the following:

psql (9.2.9)
Type "help" for help.

breakpad=> SELECT uuid FROM raw_crashes WHERE raw_crash->>'GMPPlugin'='1';
ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #7)
> After the recent server migration I get the following:
> 
> psql (9.2.9)
> Type "help" for help.
> 
> breakpad=> SELECT uuid FROM raw_crashes WHERE raw_crash->>'GMPPlugin'='1';
> ERROR:  canceling statement due to conflict with recovery
> DETAIL:  User query might have needed to see row versions that must be
> removed.

Please open a ticket with dbops. Sounds like the config from reporting1 didn't make it over to reporting2.
(Reporter)

Comment 9

3 years ago
Thanks, filed bug 1124668, I'll re-try once that's resolved.
I think the issues in here are either fixed or not relevant anymore. I'm going to mark it as FIXED. If that's not correct, please re-open.
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.