Closed
Bug 1168511
Opened 10 years ago
Closed 10 years ago
crontabber failing due to invalid unicode in JSON
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: lars)
References
Details
GraphicsDeviceCronApp is failing with:
Traceback (most recent call last):
File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/crontabber/
transaction_executor.py", line 46, in __call__
result = function(connection, *args, **kwargs)
File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-mas
ter-py2.6.egg/socorro/cron/jobs/matviews.py", line 52, in run
self.run_proc(connection, [target_date])
File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-mas
ter-py2.6.egg/socorro/cron/jobs/matviews.py", line 28, in run_proc
cursor.callproc(self.get_proc_name(), signature)
DataError: invalid input syntax for type json
DETAIL: low order surrogate must follow a high order surrogate.
CONTEXT: JSON data, line 1: ...gress": "xpcom-shutdown", "TelemetryEnvironment":...
| Reporter | ||
Updated•10 years ago
|
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
| Reporter | ||
Comment 1•10 years ago
|
||
OK this has been failing for several days, I can repro it:
SELECT
uuid
, json_object_field_text(r.raw_crash, 'IsGarbageCollecting') as is_garbage_collecting
FROM
raw_crashes r
WHERE
date_processed BETWEEN '2015-05-22'::timestamptz
AND '2015-05-22'::timestamptz + '1 day'::interval
| Reporter | ||
Comment 2•10 years ago
|
||
OK here's an example containing the data that Postgres is unhappy with:
37b76c55-1dc6-4c65-9159-3f04d2150522
| Reporter | ||
Comment 3•10 years ago
|
||
I archived (on sp-admin01) this crash, and removed it to see if we can get reporting to continue for now:
breakpad=# delete from raw_crashes where uuid = '37b76c55-1dc6-4c65-9159-3f04d2150522';
DELETE 1
| Reporter | ||
Comment 4•10 years ago
|
||
Lars - I've emailed you the raw JSON from comment 3, from the PG error message I believe that data in the "TelemetryEnvironment" has the problematic invalid unicode char.
| Reporter | ||
Comment 5•10 years ago
|
||
The actual problematic character in this case seems to be the unicode-encoded null byte, I've been finding and removing these like this (PG's representation of this char is '\u0000'):
DELETE FROM raw_crashes WHERE raw_crash::text LIKE '%\u0000%'::text AND date_processed >= '2015-07-21';
| Assignee | ||
Updated•10 years ago
|
Assignee: rhelmer → lars
Comment 6•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/1c8cd57f6702e4f837d32a077b84c1bf78fd57ad
Merge pull request #2915 from twobraids/remove-null-byte-from-raw-crashes
fixes Bug 1168511 - filter all strings in raw crash input to remove \x00
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 7•10 years ago
|
||
This shipped, but looks like we're still seeing the problem in raw_crashes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
Comment 8•10 years ago
|
||
This happened again today - Lars I saved it and emailed it to you this time.
Comment 9•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/89d3f394d8b083a061f8800b9dad472d7268b6cf
Merge pull request #2950 from twobraids/zero-theorem
Fixes Bug 1168511 (again) - add ability to test keys for null bytes
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Comment 10•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/d38cffb48fa8a46792e12473f38530145c9af088
Merge pull request #2957 from twobraids/more-unicode-crap
more Fixes Bug 1168511 - separate unicode & str cases in cleaning of null bytes.
You need to log in
before you can comment on or make changes to this bug.
Description
•