Closed Bug 1144861 Opened 9 years ago Closed 7 years ago

raw_crashes data appears to have incorrect encoding on stage

Categories

(Socorro :: General, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: selenamarie, Unassigned)

Details

Both of the following queries were run on Socorro's stage database.

This data was originally from production and part of the weekly 'stage refresh':

breakpad=#     select rc.uuid, json(rc.raw_crash->'dump_checksums'), date_processed from raw_crashes rc where date_processed > '2015-03-15' limit 10 ;
                 uuid                 |                                                       json                                                        |        date_processed         
--------------------------------------+-------------------------------------------------------------------------------------------------------------------+-------------------------------
 ed14476e-21d3-4536-a92c-9d3bf2150315 | {"upload_file_minidump": "6c5a6b5921880c321288133961c9667e"}                                                      | 2015-03-15 00:00:00.292512+00
 fc14a725-b36e-413c-8151-57fb92150315 | {"upload_file_minidump": "22df911851ade1460117f70624f1a3d0", "memory_report": "f860f3d6fc569d4acf297125f061d4e3"} | 2015-03-15 00:00:00.343784+00
 b7617932-4b0d-4ee8-bcba-67ed52150315 | {"upload_file_minidump": "e8ed78662ce75fc72ce3d1d6dc8fc108"}                                                      | 2015-03-15 00:00:01.061145+00
 3601d825-f82a-4044-8cc5-c47272150315 | {"upload_file_minidump": "b6fd7d62f3411d3e992e7077576b93bf"}                                                      | 2015-03-15 00:00:01.300166+00
 998067d6-95ce-401f-854f-5b3eb2150315 | {"upload_file_minidump": "2b215113b6bdb287f04a5c383e8a0e9a"}                                                      | 2015-03-15 00:00:01.473124+00
 0e116b54-c7d0-48b8-859e-dc4762150315 | {"upload_file_minidump": "d14a301fdb7a2871b464ece1af9d32f3"}                                                      | 2015-03-15 00:00:02.598883+00
 ce3f540c-8195-402e-b088-8aeff2150315 | {"upload_file_minidump": "2f85ddfe44c5c601089e7a9fd4837219"}                                                      | 2015-03-15 00:00:02.994641+00
 defe047e-e61f-4434-8547-131e32150315 | {"upload_file_minidump": "6e449b6f41603c809162c0305dadc1d8"}                                                      | 2015-03-15 00:00:03.48649+00
 8f011485-fa47-4c24-adf7-10eaf2150315 | {"upload_file_minidump": "dbe69780f60e9c84e0c00441196762be"}                                                      | 2015-03-15 00:00:03.927942+00
 2fa3bba8-87de-4077-9483-719482150315 | {"upload_file_minidump": "1a117984fe51bcced6d8345bd62c30c2", "memory_report": "40af22b61086b0fcf52d32e65f40a4d1"} | 2015-03-15 00:00:04.012292+00
(10 rows)


This data was inserted by the stage processors, from data submitted with our submitter_app.py: 


breakpad=#     select rc.uuid, json(rc.raw_crash->'dump_checksums'), date_processed from raw_crashes rc where date_processed > '2015-03-18' limit 10 ;
                 uuid                 |                               json                               |        date_processed         
--------------------------------------+------------------------------------------------------------------+-------------------------------
 e923ccbe-b7f0-4210-88cd-cff3e2150317 | "{u'upload_file_minidump': u'b2caa96d06bc06bcad4ed42cce336831'}" | 2015-03-18 00:00:03.369622+00
 40d5ccde-819e-4541-8a6b-04de52150317 | "{u'upload_file_minidump': u'd41d8cd98f00b204e9800998ecf8427e'}" | 2015-03-18 00:00:03.731346+00
 b42a5384-8750-4cb6-baa0-7d6232150317 | "{u'upload_file_minidump': u'c06b6ac885b38f8025bbc43927aba6dd'}" | 2015-03-18 00:00:04.577034+00
 ed79dda7-ebf3-4fae-9f93-d58da2150317 | "{u'upload_file_minidump': u'dce8cc7d8857b5a9de8bd74c7839396f'}" | 2015-03-18 00:00:04.909234+00
 2aef1c25-d0f8-4e6b-864a-f13a32150317 | "{u'upload_file_minidump': u'd7b5952271e57d1c09ec16c01f9278fa'}" | 2015-03-18 00:00:05.038086+00
 f5554683-fb43-4043-9ee6-df9192150317 | "{u'upload_file_minidump': u'9c9127b1c547a10ef7ffd20eb355c56e'}" | 2015-03-18 00:00:05.264665+00
 22661c49-b443-4481-b686-a2cf72150317 | "{u'upload_file_minidump': u'3a06794e2a3745d0e6cfe26a1e544369'}" | 2015-03-18 00:00:05.521126+00
 dc19de4e-22e0-4ab2-9c6d-fdf872150317 | "{u'upload_file_minidump': u'3f852628536cae9f4f505e683f12e2de'}" | 2015-03-18 00:00:05.808538+00
 1fa0082b-610a-46cb-b8fd-035c62150317 | "{u'upload_file_minidump': u'a0f25b0b536c0dd558b0c9d2cd54b923'}" | 2015-03-18 00:00:05.966864+00
 60fa8823-e8ab-4c1a-a316-b96e12150317 | "{u'upload_file_minidump': u'b7727fa3aeb17a5106c4799300be0622'}" | 2015-03-18 00:00:06.037547+00
(10 rows)

breakpad=#


You can see that the second set of data is wrapped in a python UTF8 cast. 

Both prod/stage databases appear to be configured the same (9.2.9 postgres, locale is UTF8) and the stage processors have the correct locale setting.  We haven't updated psycopg2 in a while.

My guess is that the submitter is doing something that's causing the crash to be interpreted as text instead of json by psycopg2 or Postgres.
submitter_app sends data in the same manner as the clients do, individual fields via HTTP POST.  The json is is not sent as a whole piece.  The collector is the process that assembles the raw crash into a json form.  

it is the processor that saves the raw crash to postgres.  See socorro/external/postgresql/crashstorage.py:148.  On that line the raw_crash mapping is serialized into a json string form to be sent to the 'raw_crashes' table.  One would think that they'd always be represented as strings in the 'raw_crashes' table.
:lars is maybe the issue in this somewhere: 

socorro/collector/submitter_app.py:61 

return DotDict(json.load(raw_crash_fp)) 

vs socorro/external/boto/crashstorage.py:243

return json.loads(raw_crash_as_string, object_hook=DotDict)

Maybe there's a difference between the string we're storing in S3 and what is/was on disk?

Although that should be functionally the same?
No longer blocks: 1118468
no longer a concern
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.