raw_crashes data appears to have incorrect encoding on stage

RESOLVED WONTFIX

Status

Socorro
General
RESOLVED WONTFIX
3 years ago
a year ago

People

(Reporter: selenamarie, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Both of the following queries were run on Socorro's stage database.

This data was originally from production and part of the weekly 'stage refresh':

breakpad=#     select rc.uuid, json(rc.raw_crash->'dump_checksums'), date_processed from raw_crashes rc where date_processed > '2015-03-15' limit 10 ;
                 uuid                 |                                                       json                                                        |        date_processed         
--------------------------------------+-------------------------------------------------------------------------------------------------------------------+-------------------------------
 ed14476e-21d3-4536-a92c-9d3bf2150315 | {"upload_file_minidump": "6c5a6b5921880c321288133961c9667e"}                                                      | 2015-03-15 00:00:00.292512+00
 fc14a725-b36e-413c-8151-57fb92150315 | {"upload_file_minidump": "22df911851ade1460117f70624f1a3d0", "memory_report": "f860f3d6fc569d4acf297125f061d4e3"} | 2015-03-15 00:00:00.343784+00
 b7617932-4b0d-4ee8-bcba-67ed52150315 | {"upload_file_minidump": "e8ed78662ce75fc72ce3d1d6dc8fc108"}                                                      | 2015-03-15 00:00:01.061145+00
 3601d825-f82a-4044-8cc5-c47272150315 | {"upload_file_minidump": "b6fd7d62f3411d3e992e7077576b93bf"}                                                      | 2015-03-15 00:00:01.300166+00
 998067d6-95ce-401f-854f-5b3eb2150315 | {"upload_file_minidump": "2b215113b6bdb287f04a5c383e8a0e9a"}                                                      | 2015-03-15 00:00:01.473124+00
 0e116b54-c7d0-48b8-859e-dc4762150315 | {"upload_file_minidump": "d14a301fdb7a2871b464ece1af9d32f3"}                                                      | 2015-03-15 00:00:02.598883+00
 ce3f540c-8195-402e-b088-8aeff2150315 | {"upload_file_minidump": "2f85ddfe44c5c601089e7a9fd4837219"}                                                      | 2015-03-15 00:00:02.994641+00
 defe047e-e61f-4434-8547-131e32150315 | {"upload_file_minidump": "6e449b6f41603c809162c0305dadc1d8"}                                                      | 2015-03-15 00:00:03.48649+00
 8f011485-fa47-4c24-adf7-10eaf2150315 | {"upload_file_minidump": "dbe69780f60e9c84e0c00441196762be"}                                                      | 2015-03-15 00:00:03.927942+00
 2fa3bba8-87de-4077-9483-719482150315 | {"upload_file_minidump": "1a117984fe51bcced6d8345bd62c30c2", "memory_report": "40af22b61086b0fcf52d32e65f40a4d1"} | 2015-03-15 00:00:04.012292+00
(10 rows)


This data was inserted by the stage processors, from data submitted with our submitter_app.py: 


breakpad=#     select rc.uuid, json(rc.raw_crash->'dump_checksums'), date_processed from raw_crashes rc where date_processed > '2015-03-18' limit 10 ;
                 uuid                 |                               json                               |        date_processed         
--------------------------------------+------------------------------------------------------------------+-------------------------------
 e923ccbe-b7f0-4210-88cd-cff3e2150317 | "{u'upload_file_minidump': u'b2caa96d06bc06bcad4ed42cce336831'}" | 2015-03-18 00:00:03.369622+00
 40d5ccde-819e-4541-8a6b-04de52150317 | "{u'upload_file_minidump': u'd41d8cd98f00b204e9800998ecf8427e'}" | 2015-03-18 00:00:03.731346+00
 b42a5384-8750-4cb6-baa0-7d6232150317 | "{u'upload_file_minidump': u'c06b6ac885b38f8025bbc43927aba6dd'}" | 2015-03-18 00:00:04.577034+00
 ed79dda7-ebf3-4fae-9f93-d58da2150317 | "{u'upload_file_minidump': u'dce8cc7d8857b5a9de8bd74c7839396f'}" | 2015-03-18 00:00:04.909234+00
 2aef1c25-d0f8-4e6b-864a-f13a32150317 | "{u'upload_file_minidump': u'd7b5952271e57d1c09ec16c01f9278fa'}" | 2015-03-18 00:00:05.038086+00
 f5554683-fb43-4043-9ee6-df9192150317 | "{u'upload_file_minidump': u'9c9127b1c547a10ef7ffd20eb355c56e'}" | 2015-03-18 00:00:05.264665+00
 22661c49-b443-4481-b686-a2cf72150317 | "{u'upload_file_minidump': u'3a06794e2a3745d0e6cfe26a1e544369'}" | 2015-03-18 00:00:05.521126+00
 dc19de4e-22e0-4ab2-9c6d-fdf872150317 | "{u'upload_file_minidump': u'3f852628536cae9f4f505e683f12e2de'}" | 2015-03-18 00:00:05.808538+00
 1fa0082b-610a-46cb-b8fd-035c62150317 | "{u'upload_file_minidump': u'a0f25b0b536c0dd558b0c9d2cd54b923'}" | 2015-03-18 00:00:05.966864+00
 60fa8823-e8ab-4c1a-a316-b96e12150317 | "{u'upload_file_minidump': u'b7727fa3aeb17a5106c4799300be0622'}" | 2015-03-18 00:00:06.037547+00
(10 rows)

breakpad=#


You can see that the second set of data is wrapped in a python UTF8 cast. 

Both prod/stage databases appear to be configured the same (9.2.9 postgres, locale is UTF8) and the stage processors have the correct locale setting.  We haven't updated psycopg2 in a while.

My guess is that the submitter is doing something that's causing the crash to be interpreted as text instead of json by psycopg2 or Postgres.
submitter_app sends data in the same manner as the clients do, individual fields via HTTP POST.  The json is is not sent as a whole piece.  The collector is the process that assembles the raw crash into a json form.  

it is the processor that saves the raw crash to postgres.  See socorro/external/postgresql/crashstorage.py:148.  On that line the raw_crash mapping is serialized into a json string form to be sent to the 'raw_crashes' table.  One would think that they'd always be represented as strings in the 'raw_crashes' table.
:lars is maybe the issue in this somewhere: 

socorro/collector/submitter_app.py:61 

return DotDict(json.load(raw_crash_fp)) 

vs socorro/external/boto/crashstorage.py:243

return json.loads(raw_crash_as_string, object_hook=DotDict)

Maybe there's a difference between the string we're storing in S3 and what is/was on disk?

Although that should be functionally the same?
No longer blocks: 1118468

Comment 3

a year ago
no longer a concern
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.