Closed Bug 1168511 Opened 10 years ago Closed 10 years ago

crontabber failing due to invalid unicode in JSON

Categories

(Socorro :: Backend, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: lars)

References

Details

GraphicsDeviceCronApp is failing with: Traceback (most recent call last): File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/crontabber/ transaction_executor.py", line 46, in __call__ result = function(connection, *args, **kwargs) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-mas ter-py2.6.egg/socorro/cron/jobs/matviews.py", line 52, in run self.run_proc(connection, [target_date]) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-mas ter-py2.6.egg/socorro/cron/jobs/matviews.py", line 28, in run_proc cursor.callproc(self.get_proc_name(), signature) DataError: invalid input syntax for type json DETAIL: low order surrogate must follow a high order surrogate. CONTEXT: JSON data, line 1: ...gress": "xpcom-shutdown", "TelemetryEnvironment":...
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
OK this has been failing for several days, I can repro it: SELECT uuid , json_object_field_text(r.raw_crash, 'IsGarbageCollecting') as is_garbage_collecting FROM raw_crashes r WHERE date_processed BETWEEN '2015-05-22'::timestamptz AND '2015-05-22'::timestamptz + '1 day'::interval
OK here's an example containing the data that Postgres is unhappy with: 37b76c55-1dc6-4c65-9159-3f04d2150522
I archived (on sp-admin01) this crash, and removed it to see if we can get reporting to continue for now: breakpad=# delete from raw_crashes where uuid = '37b76c55-1dc6-4c65-9159-3f04d2150522'; DELETE 1
Lars - I've emailed you the raw JSON from comment 3, from the PG error message I believe that data in the "TelemetryEnvironment" has the problematic invalid unicode char.
The actual problematic character in this case seems to be the unicode-encoded null byte, I've been finding and removing these like this (PG's representation of this char is '\u0000'): DELETE FROM raw_crashes WHERE raw_crash::text LIKE '%\u0000%'::text AND date_processed >= '2015-07-21';
Assignee: rhelmer → lars
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/1c8cd57f6702e4f837d32a077b84c1bf78fd57ad Merge pull request #2915 from twobraids/remove-null-byte-from-raw-crashes fixes Bug 1168511 - filter all strings in raw crash input to remove \x00
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
This shipped, but looks like we're still seeing the problem in raw_crashes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This happened again today - Lars I saved it and emailed it to you this time.
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/89d3f394d8b083a061f8800b9dad472d7268b6cf Merge pull request #2950 from twobraids/zero-theorem Fixes Bug 1168511 (again) - add ability to test keys for null bytes
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/d38cffb48fa8a46792e12473f38530145c9af088 Merge pull request #2957 from twobraids/more-unicode-crap more Fixes Bug 1168511 - separate unicode & str cases in cleaning of null bytes.
You need to log in before you can comment on or make changes to this bug.