Closed
Bug 1168511
Opened 9 years ago
Closed 9 years ago
crontabber failing due to invalid unicode in JSON
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: lars)
References
Details
GraphicsDeviceCronApp is failing with: Traceback (most recent call last): File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/crontabber/ transaction_executor.py", line 46, in __call__ result = function(connection, *args, **kwargs) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-mas ter-py2.6.egg/socorro/cron/jobs/matviews.py", line 52, in run self.run_proc(connection, [target_date]) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-mas ter-py2.6.egg/socorro/cron/jobs/matviews.py", line 28, in run_proc cursor.callproc(self.get_proc_name(), signature) DataError: invalid input syntax for type json DETAIL: low order surrogate must follow a high order surrogate. CONTEXT: JSON data, line 1: ...gress": "xpcom-shutdown", "TelemetryEnvironment":...
Reporter | ||
Updated•9 years ago
|
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
Reporter | ||
Comment 1•9 years ago
|
||
OK this has been failing for several days, I can repro it: SELECT uuid , json_object_field_text(r.raw_crash, 'IsGarbageCollecting') as is_garbage_collecting FROM raw_crashes r WHERE date_processed BETWEEN '2015-05-22'::timestamptz AND '2015-05-22'::timestamptz + '1 day'::interval
Reporter | ||
Comment 2•9 years ago
|
||
OK here's an example containing the data that Postgres is unhappy with: 37b76c55-1dc6-4c65-9159-3f04d2150522
Reporter | ||
Comment 3•9 years ago
|
||
I archived (on sp-admin01) this crash, and removed it to see if we can get reporting to continue for now: breakpad=# delete from raw_crashes where uuid = '37b76c55-1dc6-4c65-9159-3f04d2150522'; DELETE 1
Reporter | ||
Comment 4•9 years ago
|
||
Lars - I've emailed you the raw JSON from comment 3, from the PG error message I believe that data in the "TelemetryEnvironment" has the problematic invalid unicode char.
Reporter | ||
Comment 5•9 years ago
|
||
The actual problematic character in this case seems to be the unicode-encoded null byte, I've been finding and removing these like this (PG's representation of this char is '\u0000'): DELETE FROM raw_crashes WHERE raw_crash::text LIKE '%\u0000%'::text AND date_processed >= '2015-07-21';
Assignee | ||
Updated•9 years ago
|
Assignee: rhelmer → lars
Comment 6•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/1c8cd57f6702e4f837d32a077b84c1bf78fd57ad Merge pull request #2915 from twobraids/remove-null-byte-from-raw-crashes fixes Bug 1168511 - filter all strings in raw crash input to remove \x00
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•9 years ago
|
||
This shipped, but looks like we're still seeing the problem in raw_crashes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 8•9 years ago
|
||
This happened again today - Lars I saved it and emailed it to you this time.
Comment 9•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/89d3f394d8b083a061f8800b9dad472d7268b6cf Merge pull request #2950 from twobraids/zero-theorem Fixes Bug 1168511 (again) - add ability to test keys for null bytes
Updated•9 years ago
|
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Comment 10•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/d38cffb48fa8a46792e12473f38530145c9af088 Merge pull request #2957 from twobraids/more-unicode-crap more Fixes Bug 1168511 - separate unicode & str cases in cleaning of null bytes.
You need to log in
before you can comment on or make changes to this bug.
Description
•